+34 616 71 29 85 carsten@dataz4s.com

Subsetting data with square brackets

We are using the dataset LungCap which is a dataset of 725 rows with 6 variables.

 

Read in data to R

We start by reading in data. In this case, I read in from Excel with the read_excel() function and I will need to change three of the variables to factors.

 

head()

The head command is practical for viewing the first rows of a dataset and thus to get an overall idea of rows and columns in this case.

<script src="https://gist.github.com/DataGrube/4507ed0541e5b45998704e22bcbc75ba.js"></script>

 

 

dim() and length()

# Dimensions of the dataset
dim(LungCapData)
## [1] 725   6
# length in a vector or a variable
length(Age)
## [1] 725

 

 

Subsetting with brackets []

# Subsetting row 11 to 14
Age[11:14]
## [1] 19 17 12 10
# Subsetting with brackets on matrix or data frame
# Blank space after comma to include all columns
# Row 11 to 14 including all columns
LungCapData[11:14, ]
## # A tibble: 4 x 6
##   LungCap   Age Height Smoke Gender Caesarean
##     <dbl> <dbl>  <dbl> <fct> <fct>  <fct>
## 1   11.5     19   76.4 no    male   yes
## 2   10.9     17   71.7 no    male   no
## 3    6.52    12   57.5 no    male   no
## 4    6       10   61.1 no    female no

 

 

Subsetting a step further…

# double equal sign (==) is used to represent the meaning of equality in a mathematical sense
# mean age for females
mean(Age[Gender==“female”])
## [1] 12.44972
# mean age for males
mean(Age[Gender==“male”])
## [1] 12.20708

 

 

Subset as per gender

# Save gender data into objects
FemData <- LungCapData[Gender==“female”, ]
MaleData <- LungCapData[Gender==“female”, ]

 

# Checking
dim(FemData)

## [1] 358   6
dim(MaleData)
## [1] 358   6
summary(Gender)
## female   male
##    358    367
FemData[1:4, ]
## # A tibble: 4 x 6
##   LungCap   Age Height Smoke Gender Caesarean
##     <dbl> <dbl>  <dbl> <fct> <fct>  <fct>
## 1   10.1     18   74.7 yes   female no
## 2    9.55    16   69.7 no    female yes
## 3    6.22    11   58.7 no    female no
## 4    6       10   61.1 no    female no
# Subset for males over 15
MaleOver15 <- LungCapData[Gender==“male” & Age>15, ]

 

# Checking
dim(MaleOver15)

## [1] 89  6
MaleOver15[1:4,]
## # A tibble: 4 x 6
##   LungCap   Age Height Smoke Gender Caesarean
##     <dbl> <dbl>  <dbl> <fct> <fct>  <fct>
## 1    11.5    19   76.4 no    male   yes
## 2    10.9    17   71.7 no    male   no
## 3    10.0    16   72.4 no    male   no
## 4    11.3    17   77.7 no    male   no

This page

This page is inspired by Mike Marons Statlectures video ‘Subsetting data in R…’. View my Rpubs for this page here.

 

Carsten Grube

Carsten Grube

Freelance Data Analyst

0 Comments

+34 616 71 29 85

Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

Drop me a line

What are you working on just now? Can I help you, and can you help me? 

About me

Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children. 

What they say

20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.