Section 18 Single Continuous Variable: Histogram
18.1 Histogram
- Karl Pearson introduced this visual representation
- Continuous variable
- Probability distribution
- Class width / interval / boundary: equal or unequal
- Bins / Intervals / Breaks
- Frequency, Frequency Density
- Comparison with Normal density function
- Ideal Normally distributed data:
- Symmetry
- Unimodal
- Identical Mean, Median, Mode
- No Skewness
- No kurtosis
18.2 package base
data(iris)
hist(x=iris$Sepal.Length, breaks=15,
xlim=c(4,8), freq=FALSE,
main='Histogram of Sepal Length',
xlab='Sepal Length (cm)',
ylab='Density',
axes=TRUE,
col='orange',
lty=1, border='purple')
abline(v=mean(iris$Sepal.Length), lty=2, lwd=3, col='red')
abline(v=median(iris$Sepal.Length), lty=2, lwd=3, col='blue')
18.3 package ggplot2
library(ggplot2)
g <- ggplot(data=iris, mapping=aes(Sepal.Length))
g <- g + geom_histogram(binwidth=0.10, fill='orange', colour='purple')
g <- g + geom_vline(aes(xintercept=mean(Sepal.Length, na.rm=TRUE)),
colour='red', linetype='dashed', size=1.5)
g <- g + geom_vline(aes(xintercept=median(Sepal.Length, na.rm=TRUE)),
colour='blue', linetype='dashed', size=1.5)
g <- g + labs(title='Histogram of Sepal Length',
subtitle='Based on Iris data',
x='Sepal Length (cm)',
y='Density')
g + theme_bw()