Section 10 Subset Data

10.1 Subset a data.frame

Example data.frame:

x <- c(1:10); y <- rep(c('M','F'), each=5); z <- rep(c(T,F), length=10)

DF <- data.frame(Age=x, Sex=y, Vac=z)

Index Explanation Example.
$ Get the elements of the data.frame for the named column DF$Age
[ i, j ] Single square bracket with apprpriate index for the row(s) and column(s) DF[1:2,3]
Positive integer Select all elements corresponding to the integer value of the specific dimension DF[2,]
Negative integer Remove the elements corresponding to the integer value of the specific dimension DF[-2,]; DF[,-2]
Zero Select no element DF[0]
Blank Select all elements for the specific dimension DF[2,]; DF[,2]
Logical values Select the element corresponding to the logical value TRUE DF[,c(T,F,T)]
Names Select the element corresponding to the named value DF[,c('Age','Vac')]

10.2 Example

x <- c(1:10); y <- rep(c('M','F'), each=5); z <- rep(c(T,F), length=10)
DF <- data.frame(Age=x, Sex=y, Vac=z)

DF[2,]
DF[1:2,3]
DF[c(1,3),]

x[,2]
DF[-2,]
DF[,-2]
DF[-c(1,3),]

DF[0]

DF[]

DF[c(T,F,T),]
DF[,c(T,F,T)]

DF$Age
DF[,'Age']
DF[,c('Age'), drop=F]

DF[,c('Age','Vac')]

DF[c(1:3),'Age']

DF[2]

DF['Age']

DF[DF['Age']>4,]
DF[DF$Age>4,]
DF[DF$Sex='M',]
DF[DF$Vac==TRUE,]
DF[DF$Vac==T,]
DF[DF$Vac==1,]
DF[c(DF$Sex=='M' & DF$Vac==1),]
DF[c(DF$Age>=4 & DF$Vac==1),]

Note:

  • You can use a combination of operators to subset the data: x[c(1:3),'A']

  • When subsetted with only one index, the data.frame returns the column: x[2]

  • Some special functions to handle NA in a data.frame can be implemented as follow: complete.cases(DF); order(DF$A, na.last=FALSE); table(DF, useNA='always')