Section 45 Model Selection: Stepwise Selection
45.1 Statistical Model
\[ \large y_{i} = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + ... + \beta_p x_{pi} + \epsilon_{i} \]
\[ i = 1,...,n; \space p = \space number \space of \space predictors \]
45.2 Model fitted by lm
\[ \large fm \leftarrow lm(SBP \sim BMI + Age + Income + DM + Ethnic, \space data=BP) \]
45.3 Stepwise Selection: Both (Forward & Backward)
\[ \large step(fm, \space direction='both', \space trace=TRUE) \]
45.3.2 Fit the full model
Call:
lm(formula = SBP ~ BMI + Age + Income + DM + Ethnic, data = BP)
Residuals:
    Min      1Q  Median      3Q     Max 
-6.3481 -1.0811 -0.0386  1.1131  4.8460 
Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)     -3.858060   4.413360  -0.874    0.382    
BMI              2.348833   0.045416  51.718   <2e-16 ***
Age              0.903320   0.100141   9.021   <2e-16 ***
Income          -0.003046   0.006982  -0.436    0.663    
DM2              4.073927   0.150422  27.083   <2e-16 ***
EthnicAsian     -0.002753   0.187405  -0.015    0.988    
EthnicCaucasian -0.013374   0.185495  -0.072    0.943    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.676 on 493 degrees of freedom
Multiple R-squared:  0.9267,    Adjusted R-squared:  0.9258 
F-statistic:  1038 on 6 and 493 DF,  p-value: < 2.2e-1645.3.3 Use the step function
Start:  AIC=523.41
SBP ~ BMI + Age + Income + DM + Ethnic
         Df Sum of Sq    RSS     AIC
- Ethnic  2       0.0 1385.0  519.41
- Income  1       0.5 1385.5  521.60
<none>                1385.0  523.41
- Age     1     228.6 1613.5  597.79
- DM      1    2060.6 3445.5  977.11
- BMI     1    7514.0 8898.9 1451.54
Step:  AIC=519.41
SBP ~ BMI + Age + Income + DM
         Df Sum of Sq    RSS     AIC
- Income  1       0.5 1385.5  517.61
<none>                1385.0  519.41
+ Ethnic  2       0.0 1385.0  523.41
- Age     1     229.0 1614.0  593.93
- DM      1    2061.8 3446.7  973.29
- BMI     1    7570.2 8955.2 1450.69
Step:  AIC=517.61
SBP ~ BMI + Age + DM
         Df Sum of Sq    RSS     AIC
<none>                1385.5  517.61
+ Income  1       0.5 1385.0  519.41
+ Ethnic  2       0.0 1385.5  521.60
- Age     1     228.6 1614.1  591.96
- DM      1    2061.2 3446.8  971.29
- BMI     1    7570.1 8955.6 1448.7145.3.4 Summary of the final model
Call:
lm(formula = SBP ~ BMI + Age + DM, data = BP)
Residuals:
    Min      1Q  Median      3Q     Max 
-6.3145 -1.0681 -0.0121  1.0920  4.8730 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -3.90189    4.39295  -0.888    0.375    
BMI          2.34875    0.04512  52.058   <2e-16 ***
Age          0.90227    0.09974   9.046   <2e-16 ***
DM2          4.07320    0.14995  27.165   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.671 on 496 degrees of freedom
Multiple R-squared:  0.9266,    Adjusted R-squared:  0.9262 
F-statistic:  2088 on 3 and 496 DF,  p-value: < 2.2e-1645.4 Stepwise Selection: Nested Model
\[ \large fm1 \leftarrow lm(SBP \sim BMI + Age + Income + DM + Ethnic, \space data=BP) \]
\[ \large anova(fm1) \]
\[ \large fm2 \leftarrow update(fm1, \space . \sim . \space -Ethnic, \space data=BP) \]
\[ \large anova(fm1, \space fm2) \]
Analysis of Variance Table
Response: SBP
           Df  Sum Sq Mean Sq   F value Pr(>F)    
BMI         1 15103.0 15103.0 5376.2201 <2e-16 ***
Age         1   335.4   335.4  119.3934 <2e-16 ***
Income      1     0.0     0.0    0.0043 0.9478    
DM          1  2061.8  2061.8  733.9299 <2e-16 ***
Ethnic      2     0.0     0.0    0.0030 0.9970    
Residuals 493  1385.0     2.8                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1Analysis of Variance Table
Model 1: SBP ~ BMI + Age + Income + DM + Ethnic
Model 2: SBP ~ BMI + Age + Income + DM
  Res.Df  RSS Df Sum of Sq     F Pr(>F)
1    493 1385                          
2    495 1385 -2 -0.016782 0.003  0.997Analysis of Variance Table
Response: SBP
           Df  Sum Sq Mean Sq   F value Pr(>F)    
BMI         1 15103.0 15103.0 5397.9649 <2e-16 ***
Age         1   335.4   335.4  119.8763 <2e-16 ***
Income      1     0.0     0.0    0.0043 0.9477    
DM          1  2061.8  2061.8  736.8983 <2e-16 ***
Residuals 495  1385.0     2.8                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1Analysis of Variance Table
Model 1: SBP ~ BMI + Age + Income + DM
Model 2: SBP ~ BMI + Age + DM
  Res.Df    RSS Df Sum of Sq     F Pr(>F)
1    495 1385.0                          
2    496 1385.5 -1  -0.53993 0.193 0.6606Analysis of Variance Table
Response: SBP
           Df  Sum Sq Mean Sq F value    Pr(>F)    
BMI         1 15103.0 15103.0 5406.76 < 2.2e-16 ***
Age         1   335.4   335.4  120.07 < 2.2e-16 ***
DM          1  2061.2  2061.2  737.91 < 2.2e-16 ***
Residuals 496  1385.5     2.8                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = SBP ~ BMI + Age + DM, data = BP)
Residuals:
    Min      1Q  Median      3Q     Max 
-6.3145 -1.0681 -0.0121  1.0920  4.8730 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -3.90189    4.39295  -0.888    0.375    
BMI          2.34875    0.04512  52.058   <2e-16 ***
Age          0.90227    0.09974   9.046   <2e-16 ***
DM2          4.07320    0.14995  27.165   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.671 on 496 degrees of freedom
Multiple R-squared:  0.9266,    Adjusted R-squared:  0.9262 
F-statistic:  2088 on 3 and 496 DF,  p-value: < 2.2e-16