Section 29 ANOVA Table: Simple R implementation

29.1 Analysis of Variance Table


\[ \large fm \leftarrow lm(SBP \sim Group, \space data=BP) \]

\[ \large anova(fm) \]


Df Sum Sq Mean Sq F value Pr(>F)
Group 3 1521.638 507.2128 7.4983 5e-04
Residuals 36 2435.184 67.6440 NA NA
Total 39 3956.822 NA NA NA

29.2 Explanation


Degrees of freedom (df)

\(\large n\) = observations per group

\(\large g\) = number of groups

Group df = \(\large (g - 1)\)

Residual df = \(\large g * (n - 1)\)

Total df = Treatment df + Residual df = \(\large (g * n) - 1\)

g <- nlevels(BP$Group)

n <- nrow(BP)/g

df.g <- g - 1

df.error <- g * (n - 1)

df.total <- g * n - 1


Sum of Squares due to Treatment (Group)

  1. For each group calculate: n * (Group mean - overall mean)2

  2. Add the values for the different groups together

\[ \large SST = n\sum\limits_{i=1}^{g} (\bar{y_i}-\bar{y})^2 \]

y.bar <- mean(BP$SBP, na.rm = TRUE)

yi.bar <- tapply(BP$SBP, INDEX = BP$Group, FUN = mean, na.rm = TRUE)

SST <- n * sum((yi.bar - y.bar)^2)



Sum of Squares due to Error

Residual Sum of Squares

For each observation calculate:

  1. (Observed value - group mean)2

  2. Add the values for the different observations together

\[ \large SSE = \sum\limits_{i=1}^{g} \sum\limits_{j=1}^{n} (y_{ij}-\bar{y_i})^2 \]


Total Sum of Squares = SS due to Treatment + SS due to Error

yij <- BP$SBP

TSS <- sum((yij - y.bar)^2)

SSE <- TSS - SST


Mean Squares

Mean square = Sum of squares / degrees of freedom

\(\large MS = SS / df\)

MST <- SST/df.g

MSE <- SSE/df.error

MS <- TSS/df.total


F-value (Variance Ratio)

F value = Treatment MS / Residual MS


Pr(>F)

P-value: the probability of obtaining a variance ratio this large under the null hypothesis that the treatment means are all equal.

Under the null hypothesis the variance ratio has an F distribution.

F.stat <- MST/MSE

pF <- pf(q = F.stat, df1 = df.g, df2 = df.error, lower.tail = FALSE)


DF <- data.frame(df = c(df.g, df.error, df.total), SS = c(SST, SSE, TSS), MS = c(MST, MSE, MS), Fstat = c(F.stat,
    NA, NA), Prob = c(pF, NA, NA))

row.names(DF) <- c("Group", "Error", "Total")

DF

anova(fm)