Section 30 Estimates: Variances

30.1 Estimates: Variances

\[ \large fm \leftarrow lm(SBP \sim BMI, \space data=BP) \]

\[ \large summary(fm) \]


Call:
lm(formula = SBP ~ BMI, data = BP)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.3636 -2.1681  0.1586  2.1492  6.5777 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 36.20368    1.48011   24.46   <2e-16 ***
BMI          2.63229    0.05903   44.59   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.756 on 498 degrees of freedom
Multiple R-squared:  0.7997,    Adjusted R-squared:  0.7993 
F-statistic:  1989 on 1 and 498 DF,  p-value: < 2.2e-16

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	36.2037	1.4801	24.4601	0
BMI	2.6323	0.0590	44.5940	0

30.2 Explanation

Statistical Model

\[ \large y_{i} = \beta_0 + \beta_1 x_{i} + \epsilon_{i} \]

Error Variance = Residual Mean Square

\(\large \hat\sigma^2 = Residual \space MS\)

Coefficient of Determination (R²)

\[ \large R^2 = \frac{Treatment \space SS}{Total \space SS} = 1 - \frac{Residual \space SS}{Total \space SS}\]

R-squared quantifies the proportion of variance that is explained by the explanatory variable(s) in a linear regression model. It is a measure of predictive power of the model.

We can also compute the estiamte of R² from the ANOVA table.

If R-squared is high (close to one) then this indicates that the predictor variable explains (describes) a lot of the variation in the data i.e. that there is a high signal-to-noise ratio.

Adjusted Coefficient of Determination (Adjusted R²)

\[ \large Adj.R^2 = 1 - \frac{Residual \space SS \space / df_{resdual}}{Total \space SS / \space df_{total}}\]

When multiple predictors are included in the model, R^2 increases monotonically. Adjusted R^2 accounts for both the extra parameters in the model and additional variability explained by an extended model.

Hence the adjustment has the effect of offsetting the tendency for R^2 to increase with additional explanatory variables in multiple regression (i.e. more than one X variable), even when they have no explanatory power.