Section 15 Measures of Shape
Shape of the data
15.1 Skewness
Measure of asymmetry: A measure of the asymmetry of the probability distribution of a real-valued random variable about its mean.
Skewness values could be:
- Zero (symmetric)
- Positive (right tail)
- Negative (left tail)
For a sample of \(n\) values, a natural method of moments estimator of the population skewness is:
\[ \Huge Skewness = \frac{m^3}{s^3} = \frac {\frac{1}{n}\sum\limits_{i=1}^{n} (x_i-\bar{x})^3}{[\frac{1}{n}\sum\limits_{i=1}^{n} (x_i-\bar{x})^2]^{3/2}} \]
where \(\large \bar{x}\) is the sample mean, \(\large s\) is the sample standard deviation, and the numerator \(\large m^3\) is the sample third central moment.
15.2 Kurtosis
Measure of tailedness: A measure of the tailedness of the probability distribution of a real-valued random variable.
It indicates if a given distribution produces fewer and less (or more) extreme outliers than does the Normal distribution.
- Equals 3 (mesokurtic): zero excess kurtosis
- Greater than 3 : Positive (leptokurtic): more extreme outliers
- Less than 3 : Negative (platykurtic): less extreme outliers
For a sample of \(n\) values, a natural method of moments estimator of the population kurtosis is:
\[ \Huge Kurtosis = \frac{m^4}{s^4} = \frac {\frac{1}{n}\sum\limits_{i=1}^{n} (x_i-\bar{x})^4}{[\frac{1}{n}\sum\limits_{i=1}^{n} (x_i-\bar{x})^2]^{2}} \] where \(\large \bar{x}\) is the sample mean, \(\large s\) is the sample standard deviation, and the numerator \(\large m^4\) is the sample fourth central moment.
15.3 Transforming data
Many of the analyses we perform in statistics have an inherent assumption that the data are Normally distributed.
Normal distributions represents a bell-shaped histogram with reasonable symmetry and absence of long tails.
Many random variables are Normally distributed, particularly in the biological sciences.
If a variable is not Normally distributed, it is often appropriate to transform it to fit into the Normal distribution.