+34 616 71 29 85 carsten@dataz4s.com

Correlation coefficient

The correlation coefficient describes how well the regression line fits the given datapoints between X and Y. The correlation coefficient is denoted by r. The closer r is to 1 or to -1, the better the fit of the line

 

 

r expresses the strength of the regression line

The regression line is the best possible fit to the datapoints. But that doesn’t mean that it is a good fit. It’s only the “best possible”. As described in Scatter plots, the fit can be weak or strong, or anywhere in between. The correlation coefficient (r) describes this degree of strength in the line.

 

Here some eyeballed examples explaining the correlation coefficient (r):

Correlation coefficient

 

Calculating the correlation coefficient

The formula for calculating the correlation coefficient:

Correlation coefficient formula

Subtracting the mean from each datapoint and dividing by the degrees of freedom gives us the Z-score. So, what this formula says is: The Z-score for X times the Z-score for Y seen in relation to the degrees of freedom and thereby to the sample size: 

Correlation coefficient formula explained

This can also be written as Z-scorex times Z-scorey / df:Correlation coefficient formula For the sake of learning by doing, I will take 4 datapoint example:

Correlation coefficient mini example

 

Visualizing the datapoints and the regression line in a scatterplot:

Correlation coefficient mini example with regression line

Now, let’s plug our values into the formula:

Correlation coefficient calculationWhich also can be written:

Correlation coefficient calculation

 

In a spreadsheet it could be set up as following:

Correlation coefficient in spreadsheet

 

 Concluding on the correlation coefficient, r

So, we get a correlation coefficient of 0.94 which is “very” close to +1, so we would conclude that the line has a “very strong” fit. As we recall from Scatter plots the line can also be determined as positive and linear.

 

 

Correlation coefficient (r) vs coefficient of determination (r²)

The correlation coefficient (r) and the coefficient of determination (r²) are similar, just like the very denotation states as r² is, indeed, r squared. Whereas r expresses the degree of strength in the linear association between X and Y, r² expresses the percentage, or proportion, of the variation in Y that can be explained by the variation in X.

In our 4 datapoint mini example above we had the following results for r and r²:

  • r = 0.94
  • r² =0.94² = 0.89

As mentioned in the conclusion, the 0.94 expresses that we have a line with a very strong fit. And that r² = 0.89 means that 89% of the variation in Y can be explained by the variation of X.

In other words, the regression line in our example has a very strong fit in which 89% of the variation in Y can be explained by the variation of X.

More on r²  in Coefficient of determination, r²

 

 

Correlation coefficient in MS Excel

To calculate the correlation coefficient in Excel you can take the square root (=SQRT) of the value calculated with the formula =RSQ. Another option is to run the regression analysis via Data >> Data Analysis >> Regression

 

Learning resources on correlation coefficient

I have found these resources helpful for learning on the correlation coefficient

 

Carsten Grube

Carsten Grube

Freelance Data Analyst

0 Comments

Submit a Comment

+34 616 71 29 85

Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

Drop me a line

What are you working on just now? Can I help you, and can you help me? 

About me

Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children. 

What they say

20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.