The correlation coefficient describes how well the regression line fits the given datapoints between X and Y. The correlation coefficient is denoted by r. The closer r is to 1 or to -1, the better the fit of the line
r expresses the strength of the regression line
The regression line is the best possible fit to the datapoints. But that doesn’t mean that it is a good fit. It’s only the “best possible”. As described in Scatter plots, the fit can be weak or strong, or anywhere in between. The correlation coefficient (r) describes this degree of strength in the line.
Here some eyeballed examples explaining the correlation coefficient (r):
Calculating the correlation coefficient
The formula for calculating the correlation coefficient:
Subtracting the mean from each datapoint and dividing by the degrees of freedom gives us the Z-score. So, what this formula says is: The Z-score for X times the Z-score for Y seen in relation to the degrees of freedom and thereby to the sample size:
Visualizing the datapoints and the regression line in a scatterplot:
Now, let’s plug our values into the formula:
In a spreadsheet it could be set up as following:
Concluding on the correlation coefficient, r
So, we get a correlation coefficient of 0.94 which is “very” close to +1, so we would conclude that the line has a “very strong” fit. As we recall from Scatter plots the line can also be determined as positive and linear.
Correlation coefficient (r) vs coefficient of determination (r²)
The correlation coefficient (r) and the coefficient of determination (r²) are similar, just like the very denotation states as r² is, indeed, r squared. Whereas r expresses the degree of strength in the linear association between X and Y, r² expresses the percentage, or proportion, of the variation in Y that can be explained by the variation in X.
In our 4 datapoint mini example above we had the following results for r and r²:
- r = 0.94
- r² =0.94² = 0.89
As mentioned in the conclusion, the 0.94 expresses that we have a line with a very strong fit. And that r² = 0.89 means that 89% of the variation in Y can be explained by the variation of X.
In other words, the regression line in our example has a very strong fit in which 89% of the variation in Y can be explained by the variation of X.
More on r² in Coefficient of determination, r²
Correlation coefficient in MS Excel
To calculate the correlation coefficient in Excel you can take the square root (=SQRT) of the value calculated with the formula =RSQ. Another option is to run the regression analysis via Data >> Data Analysis >> Regression
Learning resources on correlation coefficient
I have found these resources helpful for learning on the correlation coefficient
- Khan Academy (video 7:20): Correlation coefficient intuition
- Khan Academy (video 12:21): Calculating correlation coefficient
- ThoughtCo (text page): Calculating the Correlation Coefficien
Freelance Data Analyst
+34 616 71 29 85
Spain: Ctra. 404, km 2, 29100 Coín, Malaga
Denmark: c/o Musvitvej 4, 3660 Stenløse
Drop me a line
What are you working on just now? Can I help you, and can you help me?
Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.
Connect with me
What they say
20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.