Section 34 Summary of Simple Linear Regression
34.1 Estimates: Effects
\[ \large fm \leftarrow lm(SBP \sim BMI, \space data=BP) \]
\[ \large anova(fm) \]
\[ \large summary(fm) \]
34.2 Estimates
Statistical Model
\[ \large y_{i} = \beta_0 + \beta_1 x_{i} + \epsilon_{i} \]
\(\beta_1\) Estimates & SE
\[ \large \hat \beta_1 = \frac{\sum\limits_{i=1}^{n}(x_i-\bar x)(y_i-\bar y)}{\sum\limits_{i=1}^{n}(x_i-\bar x)^2} = \frac{S_{xy}}{S_{xx}} \]
\[ \large Var(\hat \beta_1) = \frac{\hat\sigma^2}{\sum\limits_{i=1}^{n}(x_i-\bar x)^2} = \frac{\hat\sigma^2}{S_{xx}} \]
\[ \large SE(\hat \beta_1) = \sqrt {Var(\hat \beta_1)} \]
95% Confidence Interval
\[ \large CI_{0.95}(\hat \beta_1) = \left[ \hat\beta_1 \pm t_{0.025, df_{residual}} * SE(\hat\beta_1) \right]\]
\(\beta_0\) Estimates & SE
\[ \large \hat \beta_0 = \bar y - \hat \beta_1 \bar x \]
\[ \large Var(\hat \beta_0) = \hat\sigma^2 \left[ \frac{1}{n} + \frac{\bar x^2}{\sum\limits_{i=1}^{n}(x_i-\bar x)^2} \right] = \hat\sigma^2 \left[ \frac{1}{n} + \frac{\bar x^2}{S_{xx}} \right] \]
\[ \large SE(\hat \beta_0) = \sqrt {Var(\hat \beta_0)} \]
95% Confidence Interval
\[ \large CI_{0.95}(\hat \beta_0) = \left[ \hat\beta_0 \pm t_{0.025, df_{residual}} * SE(\hat\beta_0) \right]\]
Here,
\[ \large \bar{x} = \frac{1}{n}\sum\limits_{i=1}^{n} x_{i} \]
\[ \large \bar{y} = \frac{1}{n}\sum\limits_{i=1}^{n} y_{i} \]
\[ \large \hat\sigma^2 = Var(\hat\epsilon) \]
34.3 ANOVA
Degrees of freedom (df)
\(\large n\) = Total number of observations
Regression df = BMI df = \(\large 1\)
Residual df = \(\large n - 1 - 1\)
Total df = Regression df + Residual df = \(\large n - 1\)
Total Sum of Squares (TSS)
\[ \large TSS = \sum\limits_{i=1}^{n} (y_i-\bar y)^2 = S_{yy}\]
Sum of Squares due to Regression (SSb)
\[ \large SSb = \hat\beta_1\sum\limits_{i=1}^{n} (x_i-\bar x)(y_i-\bar y) = \hat\beta_1S_{xy}\]
Residual Sum of Squares (RSS)
\[ \large RSS = TSS - SSb = S_{yy} - \hat\beta_1S_{xy} \]
Mean Squares
Mean square = Sum of squares / degrees of freedom
\(\large MS = SS / df\)
F-value (Variance Ratio)
F value = Regression MS / Residual MS
Pr(>F)
P-value: the probability of obtaining a variance ratio this large under the null hypothesis that the coefficient equals to zero.
Under the null hypothesis the variance ratio has an F distribution.
Error Variance = Residual Mean Square
\(\large \hat\sigma^2 = Residual \space MS\)
Coefficient of Determination (R2)
\[ \large R^2 = \frac{Treatment \space SS}{Total \space SS} = 1 - \frac{Residual \space SS}{Total \space SS}\]
Adjusted Coefficient of Determination (Adjusted R2)
\[ \large Adj.R^2 = 1 - \frac{Residual \space SS \space / df_{resdual}}{Total \space SS / \space df_{total}}\]
34.4 Prediction
Prediction
\[ \large E(y|X=x^*) = \hat y^* = \hat \beta_0 + \hat \beta_1 x^* \]
Confidence Intervals for the Population Regression Line
\[ \large Var(\hat y^*) = \hat\sigma^2 \left[ \frac{1}{n} + \frac{(x^* -\bar x)^2}{S_{xx}} \right] \]
\[ \large SE(\hat y^*) = \sqrt {Var(\hat y^*)} \]
\[ \large CI_{0.95}(\hat y^*) = \left[ \hat y^* \pm t_{0.025, df_{residual}} * SE(\hat y^*) \right]\]
Prediction Intervals for the y.* i.e. Actual Value of y
The variability in the error for predicting a single value of y (y.) will exceed the variability for estimating the expected value of y because of the random error.
\[ \large Var(\hat y.^*) = \hat\sigma^2 \left[ 1 + \frac{1}{n} + \frac{(x^* -\bar x)^2}{S_{xx}} \right] \]
\[ \large SE(\hat y.^*) = \sqrt {Var(\hat y^*)} \]
\[ \large CI_{0.95}(\hat y.^*) = \left[ \hat y.^* \pm t_{0.025, df_{residual}} * SE(\hat y.^*) \right]\]