Section 35 Simple Linear Regression: Factor


35.1 Blood pressure data


35.1.1 Null Model

\[ \large SBP \sim mean(SBP) \]


35.1.2 Regression Model

\[ \large SBP \sim mean(SBP) + DM \]

Note: The above model formula indicates mean(SBP) for the first level of DM.


35.2 Statistical Model

35.2.1 Null Model

SBP = overall mean + sampling variability

\[ \large y_{i} = \beta_0 + \epsilon_{i} \]


35.2.2 Regression Model

SBP = overall mean + coefficient*DM + sampling variability

\[ \large y_{i} = \beta_0 + \beta_1 x_{i} + \epsilon_{i} \]

Concept of dummy variables

\[ \large x_{i} = 0 \space for \space DM=1 \]

\[ \large x_{i} = 1 \space for \space DM=2 \]


35.3 Syntax

35.3.1 Null Model

\[ \large fm \leftarrow lm(SBP \sim 1, \space data=BP) \]


35.3.2 Regression Model

\[ \large fm \leftarrow lm(SBP \sim 1 + DM, \space data=BP) \]

\[ \large fm \leftarrow lm(SBP \sim DM, \space data=BP) \]


35.4 Assumptions

  • \(y\) is related to \(x\) by the simple linear regression model:

\[ \large y_{i} = \beta_0 + \beta_1 x_{i} + \epsilon_{i}, \space i=1,...,n\] \[ \large E(y | X=x_i) = \hat\beta_0 + \hat\beta_1 x_{i} \]

  • The errors \(\epsilon_1, \epsilon_2, ..., \epsilon_n\) are independent of each other.

  • The errors \(\epsilon_1, \epsilon_2, ..., \epsilon_n\) have a common variance \(\sigma^2\).

  • The errors are normally distributed with a mean of 0 and variance \(\sigma^2\), that is:

\[ \large \epsilon \sim N(0,\sigma^2) \]


35.5 Hypothesis

Intercept

\[ \large H_O: \beta_0 = 0 \] \[ \large H_A: \beta_0 \ne 0 \]


Regression coefficient

\[ \large H_O: \beta_1 = 0 \]

\[ \large H_A: \beta_1 \ne 0 \]


35.6 Investigating fitted lm object

\[ \large anova(fm) \]

\[ \large summary(fm) \]


35.7 Box Plot