+34 616 71 29 85 carsten@dataz4s.com

Precautions in simple linear regression

Precautions in simple linear regression are important to have when e.g. focusing on the regression equation, the plot of the regression line and when making inference on correlation between of X and Y.


Anscombe’s Quartet: Always start by plotting

We need to go further than just to look at our regression model as the same regression model can have completely different relationships. The Anscombe’s Quartet shows the following example of regression models that have identic regression lines, but that differ in all other values:


Precautions in simple linear regression_Anscombe's Quartet


Avoid extrapolating

This regression line can be tempting to continue beyond its observed X values. Take this example:

Precautions in simple linear regression_Part of a model



It could be tempting to continue the model beyond the observed X values as the line seems to be a very good fit. But to go beyond the observed X values is called extrapolation and should be avoid. The reason is that we do not know the model beyond the observed X values. Take the plot above: The rest of the model is completely different:


Precautions in simple linear regression_Extrapolation


As illustrated, we were only shown a part of the total model and had we extrapolated from the part we were given, we would have gone completely wrong.



Correlation does not imply causation

This is a made-up example of the relationship between households with at least two car (X) and life expectancy for women (Y):

Correlation does not imply causation

It seems logic that we cannot help populations increasing their life expectancy by shipping them loads of cars to their homes. There is an underlying effect which is that, at the time this data collection was made, there was a correlation with number of cars per household and wealth. And wealth was the underlying effect. There was a higher life expectancy in wealthier countries.

Correlation needs well designed experiments, and it does not imply causation.


Precautions in simple linear regression – summarizing

Cautions in simple linear regression can be taken by:

  • plotting the whole model including all observation (X)
  • not extrapolating
  • having in mind underlying effects and that correlation does not imply causation


Learning statistics



Carsten Grube

Carsten Grube

Freelance Data Analyst


Submit a Comment

+34 616 71 29 85

Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga


Denmark: c/o Musvitvej 4, 3660 Stenløse

Drop me a line

What are you working on just now? Can I help you, and can you help me? 

About me

Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children. 

What they say

20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.