Section 28 MLR: Matrix Implementation
Multiple Linear Regression: Matrix Implementation
Note: You may skip this section if you are not familiar with matrix operations.
28.1 Data Structure
\[ \large \left[ {\begin{array}{ccccc} x_{11} & x_{21} & ... & x_{p1} & y_1 \\ x_{12} & x_{22} & ... & x_{p2} & y_2 \\ ... & ... & ... & ... & ... \\ x_{1n} & x_{2n} & ... & x_{pn} & y_n \\ \end{array} } \right] \]
28.2 Statistical Model
\[ \large \left[ {\begin{array}{ccccc} y_1 \\ y_2 \\ ... \\ y_n \end{array} } \right] = \left[ {\begin{array}{ccccc} 1 & x_{11} & x_{21} & ... & x_{p1} \\ 1 & x_{12} & x_{22} & ... & x_{p2} \\ ... & ... & ... & ... & ... \\ 1 & x_{1n} & x_{2n} & ... & x_{pn} \\ \end{array} } \right] \left[ {\begin{array}{ccccc} \beta_0 \\ \beta_1 \\ ... \\ \beta_p \end{array} } \right] + \left[ {\begin{array}{ccccc} \epsilon_1 \\ \epsilon_2 \\ ... \\ \epsilon_n \end{array} } \right] \]
Model: Matrix formulation
\[ \large \textbf y_{n \times 1} = \bf X_{n \times (p+1)} \pmb\beta_{(p+1) \times 1} + \pmb\epsilon_{n \times 1} \]
Residual
\[ \large \pmb\epsilon = \bf y - X \pmb\beta \]
Residual Sum of Squares
\[ \large \pmb\epsilon'\pmb\epsilon = (\bf y - X \pmb\beta)'(\bf y - X \pmb\beta) \]
Least Squares
The least squares principle involves minimising \(\pmb\epsilon'\pmb\epsilon\) with respect to \(\pmb\beta\) to estimate the parameters. This is done by partial derivative of \(\pmb\epsilon'\pmb\epsilon\) with respect to \(\pmb\beta\). Assigning the first-order derivative to zero and solving the normal equations, we obtain the estimates of \(\pmb \beta\) i.e. \(\pmb{\hat\beta}\).
Regression Coefficients
\[ \large \pmb{\hat\beta} = \bf (X'X)^{-1}(X'y) \]
Prediction
\[ \large \bf \hat y = X \pmb{\hat\beta} = \pmb X(X'X)^{-1}(X'y) = \pmb Hy\]
Residual
\[ \large \bf \hat \epsilon = \pmb y - \hat y = \pmb y - X \pmb{\hat\beta} = \pmb y - Hy \]
Residual Sum of Squares
\[ \large RSS = \bf (y - Hy)'(y-Hy) = y'(I-H)y\]
Residual Variance
\[ \large \hat\sigma^2 = Var(\hat\epsilon) = RSS/(n-p-1) = MSE \]
Variance of Coefficients
\[ \large Var(\pmb{\hat\beta}) = \bf (X'X)^{-1} {\hat\sigma^2} \]
28.3 Hypothesis testing
\[ \large H_O: \pmb\beta = 0 \] \[ \large H_A: \pmb\beta \ne 0 \]
Test Statistic under the Null Hypothesis
Regression coefficients for the k
-th predictor
\[ \large \hat\beta_k / SE(\hat\beta_k) \sim t_{n-p-1} \]
95% Confidence Interval of the k
-th coefficients
\[ \large CI_{0.95}(\hat \beta_k) = \left[ \hat\beta_k \pm t_{0.025, (n-p-1)} SE(\hat\beta_k) \right]\]
Prediction
\[ \large \bf \hat y = X \pmb{\hat\beta} = \pmb X(X'X)^{-1}(X'y) = \pmb Hy\]
Prediction conditional on a new value \(x_0\)
\[ \large \bf \hat y_0 = x_0 \pmb{\hat\beta} \]
Confidence Interval for the mean response for the given \(x_0\)
\[ \large CI_{0.95}(\hat y_0) = \left[ \hat y_0 \pm t_{0.025, (n-p-1)} \hat \sigma \sqrt{x_0 \pmb (X'X)^{-1}x_0} \right]\]
Prediction Interval of a single future response for the given \(x_0\)
\[ \large CI_{0.95}(\hat y_0) = \left[ \hat y_0 \pm t_{0.025, (n-p-1)} \hat \sigma \sqrt{1 + x_0 \pmb (X'X)^{-1}x_0} \right]\]