+34 616 71 29 85 carsten@dataz4s.com
Select Page

# Residual plots

The residual plots can reveal conditions that are hard to see from the regression line. In a glimpse the residual plot can cast the overall picture of the errors in the model and thus if the conditions for inference seem to be met. The residual plots basically graph the conditions listed with the LINER model.

## Key points on residual plots

• Residual plots can, by a glimpse, reveal what is not so obvious viewing the regression line
• Residual plots can give a visual cast of the overall error situation in a regression model
• Residual plots enable visual assessment of the error scenario in a regression model
• Residual plots graph the conditions described in the LINER model

## Reflection on the residuals

As described in Regression line the true regression model can be denoted: where Y has the linear relationship with β0 + β1X and epsilon (ε) is the random error component which indicates that the observed datapoints have some variability in Y around the regression line. They are randomly distributed around the line.

The estimate of this line can be denoted: No epsilon (ε) is needed in the estimated regression model, because it is an estimated line and therefore, per definition, includes errors: Estimation is not the “exact thing”, it includes errors. So, the estimated regression line is composed by the estimated point, or, we can say that the estimated points are the line.

The errors in the estimated line are the distances from each observed datapoint to the regression line as I describe in Squared errors of line and can be denoted: ei = YiŶi. ## Residual plots scenarios

In residual plots the errors are displayed around their mean of 0. The residuals sum up to zero: ∑ei = 0. The following examples display the two scenarios: 1) that inference is possible and 2) that inference is not possible:

### Residual plots => inference possible

Residual plot 1: The variability for each observed X value is more or less equal and that there is linearity and thus no curvature. There is no indication of non-normality. So, this example indicates that the can be valid: Residual plot 2: This residual plot also seems to be a ‘fairly reasonable’ as the variability in Y values seems to be approximately equal for all observed X values: ### Residual plots => inference not possible

Residual plot 3: This residual plot would lead to the conclusion that inference is not possible as variability in the Y values differs for the observed X values. The greater X, the greater the variability in Y: Residual plot 4: This residual plot shows a clear curvature: Residual plot 5: This residual plot shows a pattern of non-linearity and non-normality: ## Example: A model that complies with the conditions

Here is an example of a scatter plot with its estimated regression line that seemingly is ‘ok’ and seemingly allows for statistical inference: Our residual shows no ‘greater’ systematic variability nor curvature: The quantile-quantile plot can be applied to check for normality which also seems to be occurring as the datapoints indicate a “reasonable” fit to the line: So, for this model, we would accept the conditions and proceed with the statistical analysis for inference.

## Example: “Revealed” by the residual plot

This example returns a regression line with a strong fit that also seemingly is ok, but where the residual plot reveals a different picture:

The following example is based on the dataset for Jankar hardness vs. density for 36 Austrialian trees. I’ve been inspired by the JBstatistics video for Checking Assumptions with Residual Plots and caught the dataset via the PASWR2 package for R statistical programming.

It shows 36 Australian trees for which density is relatively easy to measure and hardness is difficult to measure.

### Regression line and r2

If we assess that the conditions for inference are met, we can predict the hardness by simply measuring the density. We can predict the values that are difficult to measure by observing the values that are easy to measure.

Based on the 36 datapoints the regression model gives a very strong fit with a coefficient of determination (r2) of some 95%: Visualizing the regression line, it seems reasonable to think that this model complies with the condition for statistical inference. However, the residual plot reveals a different picture: It shows that there is both difference in the variability for each X value and that there is curvature: And if we take an extra look at the regression line, we can maybe discern these conditions: So, what we initially, might not perceive from the scatter plot with the regression line was revealed by the residual plot.

In cases like this, where the model is not appropriate, we can improve the model by adding an x2 term that could help fit a curve through the datapoints, or in other ways transform data to achieve a linearity.

## Residual plots in Excel

By ticking off the options under Residuals in Data >> Data Analysis >> Regression, the residual plots: ## Learning resources

I find these learning resources helpful for learning on residual plots: #### Carsten Grube

Freelance Data Analyst

p
p
p
##### ANOVA & the F-distribution +34 616 71 29 85

Call me Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

Drop me a line

What are you working on just now? Can I help you, and can you help me?