Section 26 Correlation

Correlation

  • It is a measure of association between continuous variables

  • Karl Pearson developed the correlation coefficient from a similar but slightly different idea by Francis Galton

  • It only applies to the linear relationship

  • The estimate of correlation lies between -1 and +1 - Values less than zero imply negative correlation - Values greater than zero imply positive correlation - Values close to 1 or -1 imply strong correlation - Values close to 0 imply little or no correlation

  • Correlation does NOT imply causation

  • Rank correlation coefficients, such as Spearman’s rank correlation coefficient and Kendall’s rank correlation coefficient, measure the association of two variables without a linear relationship condition.

  • Note that correlations are symmetric in the sense that r(x,y) = r(y,x).

  • Formula of correlation: Uses covariance and variance of two variables x & y

  • Centering and Scaling of the data x & y



Formula of Correlation Coefficient

rxy=Cov(x,y)(Var(x)Var(y)=Cov(x,y)sxsy

Cov(x,y)=sxy=1n1ni=1(xiˉx)(yiˉy)

Var(x)=s2x=1n1ni=1(xiˉx)2

Var(y)=s2y=1n1ni=1(yiˉy)2