The residual plots can reveal conditions that are hard to see from the regression line. In a glimpse the residual plot can cast the overall picture of the errors in the model and thus if the conditions for inference seem to be met. The residual plots basically graph the conditions listed with the LINER model.
Key points on residual plots
- Residual plots can, by a glimpse, reveal what is not so obvious viewing the regression line
- Residual plots can give a visual cast of the overall error situation in a regression model
- Residual plots enable visual assessment of the error scenario in a regression model
- Residual plots graph the conditions described in the LINER model
Reflection on the residuals
As described in Regression line the true regression model can be denoted:
where Y has the linear relationship with β0 + β1X and epsilon (ε) is the random error component which indicates that the observed datapoints have some variability in Y around the regression line. They are randomly distributed around the line.
The estimate of this line can be denoted:
No epsilon (ε) is needed in the estimated regression model, because it is an estimated line and therefore, per definition, includes errors: Estimation is not the “exact thing”, it includes errors. So, the estimated regression line is composed by the estimated point, or, we can say that the estimated points are the line.
Residual plots scenarios
In residual plots the errors are displayed around their mean of 0. The residuals sum up to zero: ∑ei = 0. The following examples display the two scenarios: 1) that inference is possible and 2) that inference is not possible:
Residual plots => inference possible
Residual plot 1: The variability for each observed X value is more or less equal and that there is linearity and thus no curvature. There is no indication of non-normality. So, this example indicates that the can be valid:
Residual plot 2: This residual plot also seems to be a ‘fairly reasonable’ as the variability in Y values seems to be approximately equal for all observed X values:
Residual plots => inference not possible
Residual plot 3: This residual plot would lead to the conclusion that inference is not possible as variability in the Y values differs for the observed X values. The greater X, the greater the variability in Y:
Residual plot 4: This residual plot shows a clear curvature:
Residual plot 5: This residual plot shows a pattern of non-linearity and non-normality:
Example: A model that complies with the conditions
Our residual shows no ‘greater’ systematic variability nor curvature:
The quantile-quantile plot can be applied to check for normality which also seems to be occurring as the datapoints indicate a “reasonable” fit to the line:
So, for this model, we would accept the conditions and proceed with the statistical analysis for inference.
Example: “Revealed” by the residual plot
This example returns a regression line with a strong fit that also seemingly is ok, but where the residual plot reveals a different picture:
About the dataset
The following example is based on the dataset for Jankar hardness vs. density for 36 Austrialian trees. I’ve been inspired by the JBstatistics video for Checking Assumptions with Residual Plots and caught the dataset via the PASWR2 package for R statistical programming.
It shows 36 Australian trees for which density is relatively easy to measure and hardness is difficult to measure.
Regression line and r2
If we assess that the conditions for inference are met, we can predict the hardness by simply measuring the density. We can predict the values that are difficult to measure by observing the values that are easy to measure.
Based on the 36 datapoints the regression model gives a very strong fit with a coefficient of determination (r2) of some 95%:
Visualizing the regression line, it seems reasonable to think that this model complies with the condition for statistical inference. However, the residual plot reveals a different picture:
It shows that there is both difference in the variability for each X value and that there is curvature:
And if we take an extra look at the regression line, we can maybe discern these conditions:
In cases like this, where the model is not appropriate, we can improve the model by adding an x2 term that could help fit a curve through the datapoints, or in other ways transform data to achieve a linearity.
Residual plots in Excel
By ticking off the options under Residuals in Data >> Data Analysis >> Regression, the residual plots:
I find these learning resources helpful for learning on residual plots:
Freelance Data Analyst
+34 616 71 29 85
Spain: Ctra. 404, km 2, 29100 Coín, Malaga
Denmark: c/o Musvitvej 4, 3660 Stenløse
Drop me a line
What are you working on just now? Can I help you, and can you help me?
Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.
Connect with me
What they say
20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.