Section 26 Correlation
Correlation
It is a measure of association between continuous variables
Karl Pearson developed the correlation coefficient from a similar but slightly different idea by Francis Galton
It only applies to the linear relationship
The estimate of correlation lies between -1 and +1 - Values less than zero imply negative correlation - Values greater than zero imply positive correlation - Values close to 1 or -1 imply strong correlation - Values close to 0 imply little or no correlation
Correlation does NOT imply causation
Rank correlation coefficients, such as Spearman’s rank correlation coefficient and Kendall’s rank correlation coefficient, measure the association of two variables without a linear relationship condition.
Note that correlations are symmetric in the sense that r(x,y) = r(y,x).
Formula of correlation: Uses covariance and variance of two variables x & y
Centering and Scaling of the data x & y
Formula of Correlation Coefficient
rxy=Cov(x,y)√(Var(x)Var(y)=Cov(x,y)sxsy
Cov(x,y)=sxy=1n−1n∑i=1(xi−ˉx)(yi−ˉy)
Var(x)=s2x=1n−1n∑i=1(xi−ˉx)2
Var(y)=s2y=1n−1n∑i=1(yi−ˉy)2