Section 18 Single Continuous Variable: Histogram

18.1 Histogram

  • Karl Pearson introduced this visual representation
  • Continuous variable
  • Probability distribution
  • Class width / interval / boundary: equal or unequal
  • Bins / Intervals / Breaks
  • Frequency, Frequency Density
  • Comparison with Normal density function
  • Ideal Normally distributed data:
    • Symmetry
    • Unimodal
    • Identical Mean, Median, Mode
    • No Skewness
    • No kurtosis

18.2 package base

data(iris)

hist(x=iris$Sepal.Length, breaks=15, 
     xlim=c(4,8), freq=FALSE,
     main='Histogram of Sepal Length',
     xlab='Sepal Length (cm)',
     ylab='Density',
     axes=TRUE,
     col='orange',
     lty=1, border='purple')
abline(v=mean(iris$Sepal.Length), lty=2, lwd=3, col='red')
abline(v=median(iris$Sepal.Length), lty=2, lwd=3, col='blue')

18.3 package ggplot2

library(ggplot2)

g <- ggplot(data=iris, mapping=aes(Sepal.Length))
g <- g + geom_histogram(binwidth=0.10, fill='orange', colour='purple')
g <- g + geom_vline(aes(xintercept=mean(Sepal.Length, na.rm=TRUE)),   
                    colour='red', linetype='dashed', size=1.5)
g <- g + geom_vline(aes(xintercept=median(Sepal.Length, na.rm=TRUE)),   
                    colour='blue', linetype='dashed', size=1.5)
g <- g + labs(title='Histogram of Sepal Length',
              subtitle='Based on Iris data',
              x='Sepal Length (cm)',
              y='Density')

g + theme_bw()