What is Correlation Coefficient?

April 1, 2018 Author: virendra
Print Friendly, PDF & Email

Many statistical analyses can be undertaken to examine the relationship between two continuous variables within a group of subjects. Two of the main purposes of such analyses are: Correlation coefficients are used in statistics to measure how strong a relationship is between two variables.

  • To assess whether the two variables are associated. There is no distinction between the two variables and no causation is implied, simply association.
  • To enable the value of one variable to be predicted from any known value of the other variable. One variable is regarded as a response to the other predictor (explanatory) variable and the value of the predictor variable is used to predict what the response would be.





Correlation coefficient is a measure of association between two variables, and it ranges between –1 and 1. If the two variables are in perfect linear relationship, the correlation coefficient will be either 1 or –1. The sign depends on whether the variables are positively or negatively related. The correlation coefficient is 0 if there is no linear relationship between the variables. Two different types of correlation coefficients are in use. One is called the Pearson product moment correlation coefficient, and the other is called the Spearman rank correlation coefficient, which is based on the rank relationship between variables. The Pearson product-moment correlation coefficient is more widely used in measuring the association between two variables. Given paired measurements ​\( (X_1 Y_1), (X_2 Y_2). . . . (X_n Y_n) \)​the Pearson product moment correlation coefficient is a measure of association given by

\[ r_s= (∑_{i=1}^n (X_i-\overline{X})(Y_i- \overline{Y}))/√(∑_{i=1}^n(X_i-\overline{X})∑_{i=1}^n(Y_i- \overline{Y})^2 ) \]

Where,

\( \overline{X} \)​and ​\( \overline{Y} \)​ are the sample mean of ​\( X_1,X_2,…..X_n \)​ and ​\( Y_1,Y_2,…..Y_n \)​respectively.

The correlation coefficient is a measure of the degree of linear association between two continuous variables, i.e. when plotted together, how close to a straight line is the scatter of points. No assumptions are made about whether the relationship between the two variables is causal, i.e. whether one

EXAMPLE

Consider the heights and weights of 10 elderly men:

(173, 65), (165, 57), (173, 77), (183, 89), (178, 93), (188, 73), (180, 83), (183, 86), (163, 70), (178, 83)

relationship between height and weight

Plotting these data indicates that, unsurprisingly, there is a positive linear relationship between height and weight (figure 1). The shorter a person is the lower their weight and, conversely, the taller a person is the greater their weight. In order to examine whether there is an association between these two variables, the correlation coefficient can be calculated (table 1). In calculating the correlation coefficient, no assumptions are made about whether the relationship is causal, i.e. whether one variable is influencing the value of the other variable.




Thus correlation coefficient for these data is 0.63, indicating a positive association between height and weight. When calculating, it is assumed that at least one of the variables is normally distributed.

The square of the correlation coefficient gives the proportion of the variance of one variable explained by the other. For the example above, the square of the correlation coefficient is 0.398, indicating that about 39.8 per cent of the variance of one variable is explained by the other.

For example, with 10 observations a correlation of 0.63 is significant at the 5 per cent level, whereas with 150 observations a correlation of 0.16 is significant at the 5 per cent level.

The standard error for this data is

\[ r=√((1- r^2)/(n-2)) \]

For the correlation coefficient above the standard error is 0.27, the t statistic is 2.30 and the P-value is 0.05.

Table 1: Calculation of correlation coefficient  (r)

Correlation Coefficient table

\[ x ̅= 1765/10=176.5 cm \]

\[ y ̅= 775/10=77.5 kg \]

\[ r_s= (∑_{i=1}^n(X_i-\overline{X})(Y_i- \overline{Y}))/√(∑_{i=1}^n(X_i-\overline{X})∑_(i=1)^n(Y_i- \overline{Y})^2 ) \]

\[ r_s= 505.50/√((558.50)*(1148.50))=0.63 \]




References

[1] “Correlation Coefficient”, pp. 73-74, available online at: http://www.stat.wisc.edu/~mchung/papers/chung.2007.sage.pdf

[2] “Chapter 2: The Correlation Coefficient”, available online at: http://www.biddle.com/documents/bcg_comp_chapter2.pdf

[3] Freeman, J., and T. Young. “Correlation coefficient: Association between two continuous variables.” Scope Tutorials (2009).

No Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Insert math as
Block
Inline
Additional settings
Formula color
Text color
#333333
Type math using LaTeX
Preview
\({}\)
Nothing to preview
Insert