Section 44 Model Selection: Backward Stepwise Selection
44.1 Statistical Model
\[ \large y_{i} = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + ... + \beta_p x_{pi} + \epsilon_{i} \]
\[ i = 1,...,n; \space p = \space number \space of \space predictors \]
44.2 Steps
- Fit a Full model \(M_p\) which contains all \(p\) predictors. 
- For \(k = p,p-1,...,1\) - Consider all \(k\) models that contain all but one of the predictors in \(M_k\), for a total of \(k-1\) predictors. 
- Choose the best model \(M_{k-1}\) among these \(k\) models based on the smallest RSS, or equivalently, largest \(R^2\). 
 
- Select a single best model from among \(M_0, M_1, ..., M_p\) models using the criteria Adjusted \(R^2\), Mallow’s Cp, AIC or BIC. 
44.4 R implementation
library(leaps)
BP <- read.csv('data/BP.csv')
# Remove NA values from all variables
BP <- na.omit(BP)
BP$DM <- as.factor(BP$DM)
# Subset the data
BP <- BP[,-1]
# Backward stepwise selection
bwd.fm <- regsubsets(SBP ~ ., data=BP, 
                      nbest=1, nvmax=6, intercept=TRUE,
                      method='backward')
bwd.summary <- summary(bwd.fm)
bwd.summarySubset selection object
Call: regsubsets.formula(SBP ~ ., data = BP, nbest = 1, nvmax = 6, 
    intercept = TRUE, method = "backward")
6 Variables  (and intercept)
                Forced in Forced out
EthnicAsian         FALSE      FALSE
EthnicCaucasian     FALSE      FALSE
Age                 FALSE      FALSE
Income              FALSE      FALSE
BMI                 FALSE      FALSE
DM2                 FALSE      FALSE
1 subsets of each size up to 6
Selection Algorithm: backward
         EthnicAsian EthnicCaucasian Age Income BMI DM2
1  ( 1 ) " "         " "             " " " "    "*" " "
2  ( 1 ) " "         " "             " " " "    "*" "*"
3  ( 1 ) " "         " "             "*" " "    "*" "*"
4  ( 1 ) " "         " "             "*" "*"    "*" "*"
5  ( 1 ) " "         "*"             "*" "*"    "*" "*"
6  ( 1 ) "*"         "*"             "*" "*"    "*" "*"    (Intercept)     EthnicAsian EthnicCaucasian             Age          Income             BMI             DM2 
   -3.858060268    -0.002753277    -0.013373729     0.903319577    -0.003045831     2.348833049     4.073927322 
# Customised plot
par(mfrow=c(2,2))
xlab <- 'Number of X-variables'
plot(x = bwd.summary$rss, 
     xlab=xlab, ylab='RSS', 
     type='l', col='blue')
index <- which.min(bwd.summary$rss)
points(x = index, y = bwd.summary$rss[index], 
       col='red', cex=2, pch=20)
plot(x=bwd.summary$adjr2, 
     xlab=xlab, ylab='Adjusted R2', 
     type='l', col='blue')
index <- which.max(bwd.summary$adjr2)
points(x = index, y = bwd.summary$adjr2[index], 
       col='red', cex=2, pch=20)
plot(x=bwd.summary$cp, 
     xlab=xlab, ylab='Mallow Cp', 
     type='l', col='blue')
index <- which.min(bwd.summary$cp)
points(x = index, y = bwd.summary$cp[index], 
       col='red', cex=2, pch=20)
plot(x=bwd.summary$bic, 
     xlab=xlab, ylab='BIC', 
     type='l', col='blue')
index <- which.min(bwd.summary$bic)
points(x = index, y = bwd.summary$bic[index], 
       col='red', cex=2, pch=20)
(Intercept)         Age         BMI         DM2 
 -3.9018887   0.9022749   2.3487465   4.0731999 
44.3 Comments
Like forward stepwise selection, this approach fits \(1 + p(p+1)/2\) models.
The backward stepwise selection also provides an efficient alternative to best subset selection.
All models within backward selection approach are nested.
However, it It is not guaranteed to find the best possible model out of all \(2^p\) models containing subsets of the \(p\) predictors.
Backward selection requires that the number of samples \(n\) is larger than the number of variables \(p\) (\(n >> p\)).