Section 16 Single Variable: Summary Statistics

16.1 Descriptive statistics

Function Explanation
length Number of elements in a vector
sum Sum of the values in a vector
min Minimum of a vector
max Maximum of a vector
range Range (min, max) of a vector
mean Mean of the values in a vector
median Median of the values in a vector
var Variance
sd Standard deviation
cov Covariance of two vectors
cor Pearson correlation between two vectors

16.2 Measures of average

  • Mean
  • Median
  • Mode
  • Quartiles
  • Median Absolute Deviation (MAD)
x <- c(0, 2, 4, 6, NA, 8, 4, 5, 15, 11, 4, 7)

mean(x, na.rm=TRUE)
median(x, na.rm=TRUE)

Mode <- function(x) {
  ux <- unique(x)
  # tab <- ux[which.max(tabulate(match(x, ux)))]
  tab <- tabulate(match(x, ux))
  modex <- ux[tab == max(tab)]
  return(modex)
}

Mode(x)

quantile(x=x, probs=c(0.25,0.50,0.75), na.rm=TRUE)

mad(x, na.rm=TRUE)

fivenum(x, na.rm = TRUE)

summary(x)

16.3 Measures of dispersion

  • Range
  • Inter-quartile range
  • Standard Deviation
  • Coefficient of variation
x <- c(0, 2, 4, 6, NA, 8, 4, 5, 15, 11, 4, 7)

range(x, na.rm=TRUE)

diff(range(x, na.rm=TRUE))

var(x, na.rm=TRUE)
sd(x, na.rm=TRUE)

IQR(x=x, na.rm=TRUE)

qx <- quantile(x=x, probs=c(0.25,0.50,0.75), na.rm=TRUE)
str(qx)
unname(qx[3] - qx[1])


# Coefficient of variation
cvx <- sd(x, na.rm=TRUE) / mean(x, na.rm=TRUE)

16.4 Measures of shape

  • Skewness
  • Kurtosis
x <- c(0, 2, 4, 6, NA, 8, 4, 5, 15, 11, 4, 7)
x <- x[!is.na(x)]
n <- length(x)

# Skewness
skx <- (sum((x - mean(x))^3)/n)/((sum((x - mean(x))^2)/n)^(3/2))

# Kurtosis
kurtx <- (sum((x - mean(x))^4)/n)/((sum((x - mean(x))^2)/n)^2)


16.5 Exercise

  • Calculate the following summary statistics of temperature and radiation of the weather data

    • Mean, Median, Quartiles
    • Range, IQR, Variance, SD, CV
    • Skewness, Kurtosis
  • Write a function which will return all the above summary statistics of a vector

  • Calculate the standard deviation using the formula and compare with the R function output