Section 21 MLR: ANOVA

Multiple Linear Regression: Analysis of Variance

21.1 Analysis of Variance Table


\[ \large fm \leftarrow lm(SBP \sim BMI + Age, \space data=BP) \]

\[ \large anova(fm) \]


Df Sum Sq Mean Sq F value Pr(>F)
BMI 1 15103.0386 15103.0386 2177.760 0
Age 1 335.4035 335.4035 48.363 0
Residuals 497 3446.7570 6.9351 NA NA
Total 499 18885.1991 NA NA NA

21.2 Explanation


Degrees of freedom (df)

\(\large n\) = Total number of observations

Regression df for BMI = \(\large 1\)

Regression df for Age = \(\large 1\)

Residual df = \(\large n - 1 - 1\)

Total df = Regression df (BMI & Age) + Residual df = \(\large n - 1\)


Total Sum of Squares (TSS)

\[ \large TSS = \sum\limits_{i=1}^{n} (y_i-\bar y)^2 = S_{yy}\]


Sum of Squares due to Regression (SSb)

\[ \large SSb = \hat\beta_1S_{x_1y} + \hat\beta_2S_{x_2y}\]


Residual Sum of Squares (RSS)

\[ \large RSS = TSS - SSb = S_{yy} - \hat\beta_1S_{x_1y} - \hat\beta_2S_{x_2y} \]



Mean Squares

Mean square = Sum of squares / degrees of freedom

\(\large MS = SS / df\)


F-value (Variance Ratio)

F value = Regression MS / Residual MS


Pr(>F)

P-value: the probability of obtaining a variance ratio this large under the null hypothesis that the coefficient equals to zero.

Under the null hypothesis the variance ratio has an F distribution.


Error Variance = Residual Mean Square

\[ \large \hat\sigma^2 = Var(\hat\epsilon) = Residual \space MS = MSE \]