Subsetting data with square brackets
We are using the dataset LungCap which is a dataset of 725 rows with 6 variables.
Read in data to R
We start by reading in data. In this case, I read in from Excel with the read_excel() function and I will need to change three of the variables to factors.
head()
The head command is practical for viewing the first rows of a dataset and thus to get an overall idea of rows and columns in this case.
<script src="https://gist.github.com/DataGrube/4507ed0541e5b45998704e22bcbc75ba.js"></script>
dim() and length()
# Dimensions of the dataset
dim(LungCapData)
## [1] 725 6
# length in a vector or a variable
length(Age)
## [1] 725
Subsetting with brackets []
# Subsetting row 11 to 14
Age[11:14]
## [1] 19 17 12 10
# Subsetting with brackets on matrix or data frame
# Blank space after comma to include all columns
# Row 11 to 14 including all columns
LungCapData[11:14, ]
## # A tibble: 4 x 6
## LungCap Age Height Smoke Gender Caesarean
## <dbl> <dbl> <dbl> <fct> <fct> <fct>
## 1 11.5 19 76.4 no male yes
## 2 10.9 17 71.7 no male no
## 3 6.52 12 57.5 no male no
## 4 6 10 61.1 no female no
Subsetting a step further…
# double equal sign (==) is used to represent the meaning of equality in a mathematical sense
# mean age for females
mean(Age[Gender==“female”])
## [1] 12.44972
# mean age for males
mean(Age[Gender==“male”])
## [1] 12.20708
Subset as per gender
# Save gender data into objects
FemData <- LungCapData[Gender==“female”, ]
MaleData <- LungCapData[Gender==“female”, ]
# Checking
dim(FemData)
## [1] 358 6
dim(MaleData)
## [1] 358 6
summary(Gender)
## female male
## 358 367
FemData[1:4, ]
## # A tibble: 4 x 6
## LungCap Age Height Smoke Gender Caesarean
## <dbl> <dbl> <dbl> <fct> <fct> <fct>
## 1 10.1 18 74.7 yes female no
## 2 9.55 16 69.7 no female yes
## 3 6.22 11 58.7 no female no
## 4 6 10 61.1 no female no
# Subset for males over 15
MaleOver15 <- LungCapData[Gender==“male” & Age>15, ]
# Checking
dim(MaleOver15)
## [1] 89 6
MaleOver15[1:4,]
## # A tibble: 4 x 6
## LungCap Age Height Smoke Gender Caesarean
## <dbl> <dbl> <dbl> <fct> <fct> <fct>
## 1 11.5 19 76.4 no male yes
## 2 10.9 17 71.7 no male no
## 3 10.0 16 72.4 no male no
## 4 11.3 17 77.7 no male no
This page
This page is inspired by Mike Marons Statlectures video ‘Subsetting data in R…’. View my Rpubs for this page here.

Carsten Grube
Freelance Data Analyst
Normal distribution
Confidence intervals
Simple linear regression, fundamentals
Two-sample inference
ANOVA & the F-distribution

+34 616 71 29 85
Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga
...........
Denmark: c/o Musvitvej 4, 3660 Stenløse
Drop me a line
What are you working on just now? Can I help you, and can you help me?
About me
Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.
Connect with me
What they say
20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.
0 Comments