Data Manipulation in R – Master All Concepts in One Place!

post

this guide, we'll explore how to perform data manipulation using the R programming language.

What is Data Manipulation in R?

Data manipulation involves reshaping, cleaning, and transforming data using R’s built-in structures, making it ready for analysis and visualization.

Before starting data manipulation, you should know how to import and export data in R (e.g., CSV, SPSS, text files).

Core Data Structures in R

1. Vectors

One-dimensional, ordered collections of elements.

Types: integer, numeric, logical, character, complex.

2. Matrices

Rectangular arrays where all elements are of the same type.

Useful for 2D or 3D data.

3. Lists

Flexible containers for elements of any type or structure.

Can store vectors, matrices, data frames, or even other lists.

4. Data Frames

Two-dimensional structures, like database tables or spreadsheets.

Suitable for storing datasets.

Creating Subsets of Data in R

As datasets grow, analyzing smaller samples becomes more efficient. This process is called subsetting.
Here are some common subsetting methods:

$ Operator

Accesses a single column of a data frame.

iris$Species

[[ ]] Operator

Returns a single element by position.

iris[[5]] 

[ ] Operator

Returns multiple elements based on indices or conditions.

iris[1:5, ] 

🎲 The sample() Function in R

Used to draw random samples from a dataset:

sample(1:6, 10, replace = TRUE) 

Use set.seed() to ensure reproducible results:

set.seed(100) sample(1:5, 10, replace = TRUE) 

πŸ§ͺ Applications of Subsetting

1. Removing Duplicates

duplicated(c(1,2,1,3,1,4)) 

2. Identifying Missing Data

complete.cases(data) na.omit(data) 

Example:

data <- read.table(header=TRUE, text=' subject sex size 1 M 7 2 F NA 3 F 9 4 M 11 ') write.csv(data, "table.csv", row.names=FALSE) file <- read.csv("table.csv") na.omit(file) 

βž• Adding Calculated Columns

You can compute new fields directly from existing ones:

data(iris) x <- iris$Sepal.Length / iris$Sepal.Width head(x) 

Using with() for cleaner syntax:

with(iris, Sepal.Length / Sepal.Width) 

Using within() to add new column to dataset:

iris <- within(iris, ratio <- Sepal.Length / Sepal.Width) 

πŸ“Š Creating Data Bins or Subgroups

cut() Function

Classifies numeric values into intervals:

frost <- c(1,2,3) cut(frost, 3, include.lowest=TRUE, labels=c("Low", "Med", "High")) 

table() Function

Counts number of items in each bin:

table(cut(frost, 3, include.lowest=TRUE, labels=c("Low", "Med", "High"))) 

πŸ”— Combining and Merging Datasets in R

1. Add Columns with cbind()

cbind(df1, df2) 

2. Add Rows with rbind()

rbind(df1, df2) 

3. Merge with merge() Function

Example:

states <- as.data.frame(state.x77) states$Name <- rownames(state.x77) rownames(states) <- NULL freezing <- states[states$Frost > 150, c("Name", "Frost")] big <- states[states$Area > 100000, c("Name", "Area")] merge(freezing, big) 

πŸ”„ Types of Joins with merge():

Join TypeParameter
Natural Joinall = FALSE (default)
Full Outer Joinall = TRUE
Left Outer Joinall.x = TRUE
Right Outer Joinall.y = TRUE

πŸ” match() Function in R

Finds the position of elements from one vector in another:

index <- match(freezing$Name, big$Name) 

Conclusion – Ready to Manipulate Data with R?

These concepts form the foundation for any data science task in R. Whether you're preparing for a data analysis project or enhancing your career in analytics, mastering data manipulation in R is a must.


Share This Job:

Write A Comment

    No Comments