+34 616 71 29 85 carsten@dataz4s.com
Select Page

# Precautions in simple linear regression

Precautions in simple linear regression are important to have when e.g. focusing on the regression equation, the plot of the regression line and when making inference on correlation between of X and Y.

## Anscombe’s Quartet: Always start by plotting

We need to go further than just to look at our regression model as the same regression model can have completely different relationships. The Anscombe’s Quartet shows the following example of regression models that have identic regression lines, but that differ in all other values:

## Avoid extrapolating

This regression line can be tempting to continue beyond its observed X values. Take this example:

It could be tempting to continue the model beyond the observed X values as the line seems to be a very good fit. But to go beyond the observed X values is called extrapolation and should be avoid. The reason is that we do not know the model beyond the observed X values. Take the plot above: The rest of the model is completely different:

As illustrated, we were only shown a part of the total model and had we extrapolated from the part we were given, we would have gone completely wrong.

## Correlation does not imply causation

This is a made-up example of the relationship between households with at least two car (X) and life expectancy for women (Y):

It seems logic that we cannot help populations increasing their life expectancy by shipping them loads of cars to their homes. There is an underlying effect which is that, at the time this data collection was made, there was a correlation with number of cars per household and wealth. And wealth was the underlying effect. There was a higher life expectancy in wealthier countries.

Correlation needs well designed experiments, and it does not imply causation.

## Precautions in simple linear regression – summarizing

Cautions in simple linear regression can be taken by:

• plotting the whole model including all observation (X)
• not extrapolating
• having in mind underlying effects and that correlation does not imply causation

## Learning statistics

#### Carsten Grube

Freelance Data Analyst

p
p
p
##### ANOVA & the F-distribution

+34 616 71 29 85

Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

Drop me a line

What are you working on just now? Can I help you, and can you help me?