Squared errors of line
The squared errors of line are the vertical distances (squared) from the estimated regression line to each datapoint in the line fit plot. They are therefore errors and crucial in the process of making inference in regression analysis. For this inference process the errors are squared and known as residuals or as the squared errors of line.
The errors = distance from sampled points to the line
The regression line is only a “best possible fit” to the actual observed datapoints. It will not go through every datapoint and it might not go through any of the datapoints.
The regression line goes through (X,Y)-coordinates that are not equal to the (X,Y)-coordinates of the observations sample. And the distance from the two sets of coordinates is called the errors, or the residuals.
So, the errors are what the regression line calculated “erroneously”. So, it is the distance from reality to the predicted line.
Calculating the squared errors of line
Let’s apply the 4-datapoint mini example that from the chapter Regression line for cacluating the squared errors of line.
To calculate the total of all errors, we sum the distances for each datapoint to the line. And the distance from the datapoint to the line is the y-value of the datapoint minus the y-value of the line for the given X-value. So, the error is the vertical distance from the point to the line:
As mentioned, each error is the vertical distance from the estimated line to the datapoint: Observed value (the datapoint) minus the line. This can also be expressed as Yi – mxi+b, where Y is the observed datapoint and mx+b the estimated line.
Visualized in graph:
Finally, we sum up the all the squared errors of line and this sum of squared errors is often known as the sum of the squared errors of line and can be seen shortened as SELine.
Sum of the squared errors of line (SELine)
The sum of the squared error of line (SELine) plays an important role in statistical inference in regression analysis and is, amongst others, applied to calculate the coefficient of determination (r²). SELine is the sum of each of the errors which in our example becomes: (0.6)² + (-0.8)² + (0.4)² + (-0.2)² = 1.2:
Sum of the squared errors of the mean (SEӯ)
Another component that is central in making inferences in regression is the error from the mean of y (SEӯ). SEӯ is the distance from ӯ to each datapoint (yi):
So, the sum of the squared errors of mean y (SEӯ) is added up:
Squared errors of line in MS Excel
Regression analysis in Excel can be run following the path Data >> Data Analysis >> Regression:
From this table, you will expand with the squared values, etc. Another option is to build spreadsheet tables, like in the examples above on this page.
Some of my preferred material on squared errors of line:
- Khan Academy (video 6:47): Squared error of regression line
- JBstatistics: (video 7:23): The least squares regression line
- Freecodecamp: (text page): Machine learning: An introduction to mean squared error and regression lines
Freelance Data Analyst
+34 616 71 29 85
Spain: Ctra. 404, km 2, 29100 Coín, Malaga
Denmark: c/o Musvitvej 4, 3660 Stenløse
Drop me a line
What are you working on just now? Can I help you, and can you help me?
Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.
Connect with me
What they say
20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.