Section 15 Measures of Shape

Shape of the data

15.1 Skewness

  • Measure of asymmetry: A measure of the asymmetry of the probability distribution of a real-valued random variable about its mean.

  • Skewness values could be:

    • Zero (symmetric)
    • Positive (right tail)
    • Negative (left tail)
  • For a sample of \(n\) values, a natural method of moments estimator of the population skewness is:

\[ \Huge Skewness = \frac{m^3}{s^3} = \frac {\frac{1}{n}\sum\limits_{i=1}^{n} (x_i-\bar{x})^3}{[\frac{1}{n}\sum\limits_{i=1}^{n} (x_i-\bar{x})^2]^{3/2}} \]

where \(\large \bar{x}\) is the sample mean, \(\large s\) is the sample standard deviation, and the numerator \(\large m^3\) is the sample third central moment.

15.2 Kurtosis

  • Measure of tailedness: A measure of the tailedness of the probability distribution of a real-valued random variable.

  • It indicates if a given distribution produces fewer and less (or more) extreme outliers than does the Normal distribution.

    • Equals 3 (mesokurtic): zero excess kurtosis
    • Greater than 3 : Positive (leptokurtic): more extreme outliers
    • Less than 3 : Negative (platykurtic): less extreme outliers
  • For a sample of \(n\) values, a natural method of moments estimator of the population kurtosis is:

\[ \Huge Kurtosis = \frac{m^4}{s^4} = \frac {\frac{1}{n}\sum\limits_{i=1}^{n} (x_i-\bar{x})^4}{[\frac{1}{n}\sum\limits_{i=1}^{n} (x_i-\bar{x})^2]^{2}} \] where \(\large \bar{x}\) is the sample mean, \(\large s\) is the sample standard deviation, and the numerator \(\large m^4\) is the sample fourth central moment.

15.3 Transforming data

  • Many of the analyses we perform in statistics have an inherent assumption that the data are Normally distributed.

  • Normal distributions represents a bell-shaped histogram with reasonable symmetry and absence of long tails.

  • Many random variables are Normally distributed, particularly in the biological sciences.

  • If a variable is not Normally distributed, it is often appropriate to transform it to fit into the Normal distribution.