26 Linear Models in R

  • R base package can fit many standard statistical models like linear and generalised linear models

  • For advanced model, you need to install specialised packages

  • Some popular R packages are: car, nlme, lme4, mgcv, survival


26.1 R: Linear Model

  • The function lm in base R fits linear model

  • The function glm in base R fits generalised linear model using several families of distribution (binomial, gaussian, inverse gaussian, poisson, Gamma) with appropriate link functions

  • No additional installation is required

  • You can, however, use several additional packages avaiable on CRAN to explore the model outputs


26.1.1 Steps of model fitting

  • Read the data.frame

  • Use the function lm

  • Include the model formula: y ~ x1 + x2 + x3

  • Assign the fitted model to an R object

  • Explore the fitted model


26.1.2 Read Data

Set the working directory to the data folder and read the iris dataset as an R object DF.

DF = read.csv('iris.csv')


26.1.3 Fit the model

fm = lm(SepalLength ~ PetalLength + as.factor(Species), data = DF)

26.1.4 Class lm components

names(fm)
 [1] "coefficients"  "residuals"     "effects"       "rank"         
 [5] "fitted.values" "assign"        "qr"            "df.residual"  
 [9] "contrasts"     "xlevels"       "call"          "terms"        
[13] "model"        

26.1.5 Explore the model

anova(fm)
Analysis of Variance Table

Response: SepalLength
                    Df Sum Sq Mean Sq F value    Pr(>F)    
PetalLength          1 77.643  77.643 679.544 < 2.2e-16 ***
as.factor(Species)   2  7.843   3.922  34.323 6.053e-13 ***
Residuals          146 16.682   0.114                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Call:
lm(formula = SepalLength ~ PetalLength + as.factor(Species), 
    data = DF)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.75310 -0.23142 -0.00081  0.23085  1.03100 

Coefficients:
                             Estimate Std. Error t value Pr(>|t|)    
(Intercept)                   3.68353    0.10610  34.719  < 2e-16 ***
PetalLength                   0.90456    0.06479  13.962  < 2e-16 ***
as.factor(Species)versicolor -1.60097    0.19347  -8.275 7.37e-14 ***
as.factor(Species)virginica  -2.11767    0.27346  -7.744 1.48e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.338 on 146 degrees of freedom
Multiple R-squared:  0.8367,    Adjusted R-squared:  0.8334 
F-statistic: 249.4 on 3 and 146 DF,  p-value: < 2.2e-16

26.1.6 Check model assumptions

par(mfrow=c(2,2))

plot(fm)
par(mfrow=c(1,1))

26.2 R: Other Important Packages

  • For advanced model in R, you need to install specialised packages

  • Here we provide some popular packages on different topics, but it is not comprehensive

  • Check CRAN Task Views for the guidance of packages and topics

  • Linear Mixed Models: nlme, lme4

  • Generalised Linear Mixed Model: lme4, glmmTMB

  • Survival Models: survival, rms, cmprisk

  • Generalised Additive (Mixed) Model: mgcv, gamm4

  • Generalised Estimating Equations: gee

  • Time Series & Forecasting Model: forecast, zoo, fable

  • Quantile Regression Model: quntile

  • Missing data, Imputation: Hmisc, mice

  • Multivariate Models: FactoMiner

  • Bayesian statistics:arm, bayesforecast, bayesm, boa, coda, mcmc, MCMCpack, rstan, brms

  • Evaluate Model Outputs: tidymodels


# - Multivariate Models: `FactoMiner`