# Precautions in simple linear regression

Precautions in simple linear regression are important to have when e.g. focusing on the regression **equation**, the plot of the **regression line** and when making inference on **correlation** between of X and Y.

## Anscombe’s Quartet: Always start by plotting

We need to go further than just to look at our regression model as the same regression model can have completely different relationships. The Anscombe’s Quartet shows the following example of regression models that have identic regression lines, but that differ in all other values:

## Avoid extrapolating

This regression line can be tempting to continue beyond its observed X values. Take this example:

It could be **tempting to continue the model beyond the observed X values** as the line seems to be a very good fit. But **to go beyond the observed X values is called extrapolation** and should be avoid. The reason is that we do not know the model beyond the observed X values. Take the plot above: The rest of the model is completely different:

As illustrated, **we were only shown a part of the total model** and had we extrapolated from the part we were given, **we would have gone completely wrong**.

## Correlation does not imply causation

This is a made-up example of the relationship between households with at least two car (X) and life expectancy for women (Y):

It seems logic that we cannot help populations increasing their life expectancy by shipping them loads of cars to their homes. There is an **underlying effect** which is that, at the time this data collection was made, there was a **correlation with number of cars per household and wealth. And wealth was the underlying effect**. There was a higher life expectancy in wealthier countries.

Correlation needs well designed experiments, and it does not imply causation.

## Precautions in simple linear regression – summarizing

Cautions in simple linear regression can be taken by:

**plotting the whole model**including all observation (X)**not extrapolating**- having in mind underlying effects and that
**correlation does not imply causation**

## Learning statistics

- Jbstatistcs (video 5:24): Simple linear regression: Always plot your data
- Rafael Irizarry, Professor of Biostatistcs at Dana-Farber Cancer Institute (video 7:17): Data Science Linear Regression in R | Anscombe’s Quartet Stratification
- Udacity (video 1:07): Anscombe’s Quartet

#### Carsten Grube

Freelance Data Analyst

##### Normal distribution

##### Confidence intervals

##### Simple linear regression, fundamentals

##### Two-sample inference

##### ANOVA & the F-distribution

+34 616 71 29 85

Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

**Drop me a line**

*What are you working on just now? Can I help you, and can you help me? *

**About me**

Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.

**Connect with me**

**What they say**

20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.

## 0 Comments