# Residual plots

The residual plots can reveal conditions that are hard to see from the regression line. In a glimpse the residual plot can cast the overall picture of the errors in the model and thus if the conditions for inference seem to be met. The residual plots basically graph the conditions listed with the LINER model.

**On this page**hide

## Key points on residual plots

- Residual plots can, by a glimpse,
**reveal what is not so obvious**viewing the regression line - Residual plots can give a
**visual cast of the overall error situation**in a regression model - Residual plots
**enable visual assessment**of the error scenario in a regression model - Residual plots
**graph the conditions**described in the LINER model

** **

## Reflection on the residuals

As described in Regression line the true regression model can be denoted:

where **Y** has the linear relationship with **β****0 + ****β****1X** and **epsilon (****ε****)** is the random error component which indicates that the observed datapoints have some variability in Y around the regression line. They are randomly distributed around the line.

The estimate of this line can be denoted:

No epsilon (ε) is needed in the estimated regression model, because it is an estimated line and therefore, per definition, includes errors: Estimation is not the “exact thing”, it includes errors. So, the estimated regression line is composed by the estimated point, or, we can say that **the estimated points are the line**.

The errors in the estimated line are the **distances** from each observed datapoint to the regression line as I describe in Squared errors of line and can be denoted:* e _{i} = Y_{i} – *

*Ŷ*

_{i. }

** **

** **

## Residual plots scenarios

In residual plots the errors are displayed around their mean of 0. The residuals sum up to zero: ** ∑e_{i} = 0**. The following examples display the two scenarios: 1) that

**inference is possible**and 2) that

**inference is not possible**:

### Residual plots => inference possible

**Residual plot 1**: The **variability** for each observed X value is more or less equal and that there is **linearity** and thus no curvature. There is **no indication of non-normality**. So, this example indicates that the can be valid:

**Residual plot 2: **This residual plot also seems to be a ‘fairly reasonable’ as the variability in Y values seems to be approximately equal for all observed X values:

### Residual plots => inference not possible

**Residual plot 3**: This residual plot would lead to the conclusion that inference is not possible as variability in the Y values differs for the observed X values. The greater X, the greater the variability in Y:

**Residual plot 4:** This residual plot shows a clear curvature:

** **

** **

** **

** **

**Residual plot 5:** This residual plot shows a pattern of non-linearity and non-normality:

** **

** **

## Example: A model that complies with the conditions

Here is an example of a scatter plot with its estimated regression line that **seemingly** is ‘ok’ and **seemingly** allows for statistical inference:

Our residual shows no ‘greater’ systematic variability nor curvature:

The **quantile-quantile** plot can be applied to check for **normality** which also seems to be occurring as the datapoints indicate a “reasonable” fit to the line:

So, for this model, we would accept the conditions and proceed with the statistical analysis for inference.

## Example: “Revealed” by the residual plot

This example returns a regression line with a **strong fit** that also **seemingly** is ok, but where the residual plot reveals a different picture:

### About the dataset

The following example is based on the dataset for Jankar hardness vs. density for 36 **Austrialian trees**. I’ve been inspired by the **JBstatistics** video for Checking Assumptions with Residual Plots and caught the dataset via the **PASWR2 package for R** statistical programming.

It shows 36 Australian trees for which density is relatively **easy to measure** and hardness is **difficult to measure**.

### Regression line and r2

If we assess that the conditions for inference are met, we can predict the hardness by simply measuring the density. **We can predict the values that are difficult to measure by observing the values that are easy to measure. **

Based on the 36 datapoints the regression model gives a very strong fit with a coefficient of determination (r^{2}) of some 95%:

** **

Visualizing the regression line, **it seems reasonable to think that this model complies** with the condition for statistical inference. **However**, the residual plot reveals a **different picture**:

** **

It shows that there is both **difference in the variability** for each X value and that there is **curvature**:

And if we take an extra look at the regression line, we can maybe discern these conditions:

** **

So, what we initially, might not perceive from the scatter plot with the regression line was **revealed** by the residual plot.

In cases like this, where the model is not appropriate, we can improve the model by adding an x^{2 }term that could help fit a curve through the datapoints, or in other ways transform data to achieve a linearity.

## Residual plots in Excel

By ticking off the options under Residuals in Data >> Data Analysis >> Regression, the residual plots:

## Learning resources

I find these learning resources helpful for learning on residual plots:

- JBstatistics (video 8:03): Checking Assumptions with Residual Plots
- Stat Trek (text page with embedded video): Residual Analysis in Regression
- Khan Academy (video 6:11): Residual plots

#### Carsten Grube

Freelance Data Analyst

##### Normal distribution

##### Confidence intervals

##### Simple linear regression, fundamentals

##### Two-sample inference

##### ANOVA & the F-distribution

+34 616 71 29 85

Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

**Drop me a line**

*What are you working on just now? Can I help you, and can you help me? *

**About me**

Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.

**Connect with me**

**What they say**

20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.

## 0 Comments