Section 43 More Data Management


43.1 Data

set.seed(123)

ID <- paste0('S',1:5)
Sex <- sample(x = c('M','F'), size = length(ID), replace = TRUE)
Ht1 <- sample(45:55, size = length(ID), replace = FALSE)
Ht2 <- sample(60:70, size = length(ID), replace = TRUE)
Ht3 <- sample(75:85, size = length(ID), replace = TRUE)

DF <- data.frame(ID=ID, Sex=Sex, Ht1=Ht1, Ht2=Ht2, Ht3=Ht3)

# Some missing values
DF[2,2] <- NA
DF[3,3] <- NA
DF[4,5] <- NA
DF[5,4] <- NA

# A duplicate ID

DF <- rbind(DF, DF[5,])

# DF


43.2 duplicated

index <- duplicated(DF)

index

DF[index,]

DF[!index,]


43.3 unique

unique(DF$ID)

length(unique(DF$ID))


43.4 is.na

index <- is.na(DF$Sex)

DF[!index,]


43.5 na.omit

DF1 <- na.omit(DF)
DF1


43.6 complete.cases

index <- complete.cases(DF)
index
DF[index,]


43.7 Exercise

  • In Statistics, bootstrapping is a technique that relies on random sampling with replacement. It is used to estimate the distribution of a sample statistic. Identify how we can implement the bootstrapping strategy in R.

  • The sampling distributions under the null hypothesis can be be obtained by the permutation test (or randomisation test) in which we resample the observed data. Identify how we can implement the permutation test strategy in R.