Section 43 Model Selection: Forward Stepwise Selection


43.1 Statistical Model

\[ \large y_{i} = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + ... + \beta_p x_{pi} + \epsilon_{i} \]

\[ i = 1,...,n; \space p = \space number \space of \space predictors \]


43.2 Steps

  1. Fit a Null model \(M_0\) which contains no predictors. This model simply predicts the sample mean for each observation.

  2. For \(k = 1,2,...,p\)

    1. Fit all \(p-k\) models that that augment the predictors in \(M_k\) with one additional predictor.

    2. Choose the best model \(M_k\) among these \(p-k\) models based on the smallest RSS, or equivalently, largest \(R^2\).

  3. Select a single best model from among \(M_0, M_1, ..., M_p\) models using the criteria Adjusted \(R^2\), Mallow’s Cp, AIC or BIC.

43.3 Comments

  • This approach fits \(1 + p(p+1)/2\) models.

  • This approach is computationally efficient.

  • All models within forward selection approach are nested.

  • However, it It is not guaranteed to find the best possible model out of all \(2^p\) models containing subsets of the \(p\) predictors.

  • Forward selection can be implemented in scenarios when the number of samples \(n\) is smaller than the number of variables \(p\) (\(n < p\)).


43.4 R implementation


library(leaps)

BP <- read.csv('data/BP.csv')
# Remove NA values from all variables
BP <- na.omit(BP)
BP$DM <- as.factor(BP$DM)

# Subset the data
BP <- BP[,-1]

# Forward selection
fwd.fm <- regsubsets(SBP ~ ., data=BP, 
                      nbest=1, nvmax=6, intercept=TRUE,
                      method='forward')
fwd.summary <- summary(fwd.fm)
# names(fwd.summary)
fwd.summary
Subset selection object
Call: regsubsets.formula(SBP ~ ., data = BP, nbest = 1, nvmax = 6, 
    intercept = TRUE, method = "forward")
6 Variables  (and intercept)
                Forced in Forced out
EthnicAsian         FALSE      FALSE
EthnicCaucasian     FALSE      FALSE
Age                 FALSE      FALSE
Income              FALSE      FALSE
BMI                 FALSE      FALSE
DM2                 FALSE      FALSE
1 subsets of each size up to 6
Selection Algorithm: forward
         EthnicAsian EthnicCaucasian Age Income BMI DM2
1  ( 1 ) " "         " "             " " " "    "*" " "
2  ( 1 ) " "         " "             " " " "    "*" "*"
3  ( 1 ) " "         " "             "*" " "    "*" "*"
4  ( 1 ) " "         " "             "*" "*"    "*" "*"
5  ( 1 ) " "         "*"             "*" "*"    "*" "*"
6  ( 1 ) "*"         "*"             "*" "*"    "*" "*"
coef(fwd.fm, id=6)
    (Intercept)     EthnicAsian EthnicCaucasian             Age          Income             BMI             DM2 
   -3.858060268    -0.002753277    -0.013373729     0.903319577    -0.003045831     2.348833049     4.073927322 
# Plot of forward selection
plot(fwd.fm, scale='bic')

# Customised plot

par(mfrow=c(2,2))
xlab <- 'Number of X-variables'

plot(x = fwd.summary$rss, 
     xlab=xlab, ylab='RSS', 
     type='l', col='blue')
index <- which.min(fwd.summary$rss)
points(x = index, y = fwd.summary$rss[index], 
       col='red', cex=2, pch=20)

plot(x=fwd.summary$adjr2, 
     xlab=xlab, ylab='Adjusted R2', 
     type='l', col='blue')
index <- which.max(fwd.summary$adjr2)
points(x = index, y = fwd.summary$adjr2[index], 
       col='red', cex=2, pch=20)

plot(x=fwd.summary$cp, 
     xlab=xlab, ylab='Mallow Cp', 
     type='l', col='blue')
index <- which.min(fwd.summary$cp)
points(x = index, y = fwd.summary$cp[index], 
       col='red', cex=2, pch=20)

plot(x=fwd.summary$bic, 
     xlab=xlab, ylab='BIC', 
     type='l', col='blue')
index <- which.min(fwd.summary$bic)
points(x = index, y = fwd.summary$bic[index], 
       col='red', cex=2, pch=20)

coef(fwd.fm, id=3)
(Intercept)         Age         BMI         DM2 
 -3.9018887   0.9022749   2.3487465   4.0731999