Section 43 More Data Management

43.1 Data

set.seed(123)

ID <- paste0('S',1:5)
Sex <- sample(x = c('M','F'), size = length(ID), replace = TRUE)
Ht1 <- sample(45:55, size = length(ID), replace = FALSE)
Ht2 <- sample(60:70, size = length(ID), replace = TRUE)
Ht3 <- sample(75:85, size = length(ID), replace = TRUE)

DF <- data.frame(ID=ID, Sex=Sex, Ht1=Ht1, Ht2=Ht2, Ht3=Ht3)

# Some missing values
DF[2,2] <- NA
DF[3,3] <- NA
DF[4,5] <- NA
DF[5,4] <- NA

# A duplicate ID

DF <- rbind(DF, DF[5,])

# DF

43.2 `duplicated`

index <- duplicated(DF)

index

DF[index,]

DF[!index,]

43.3 `unique`

unique(DF$ID)

length(unique(DF$ID))

43.4 `is.na`

index <- is.na(DF$Sex)

DF[!index,]

43.5 `na.omit`

DF1 <- na.omit(DF)
DF1

43.6 `complete.cases`

index <- complete.cases(DF)
index
DF[index,]

43.7 Exercise

In Statistics, bootstrapping is a technique that relies on random sampling with replacement. It is used to estimate the distribution of a sample statistic. Identify how we can implement the bootstrapping strategy in R.
The sampling distributions under the null hypothesis can be be obtained by the permutation test (or randomisation test) in which we resample the observed data. Identify how we can implement the permutation test strategy in R.