+34 616 71 29 85 carsten@dataz4s.com

Squared errors of line

The squared errors of line are the vertical distances (squared) from the estimated regression line to each datapoint in the line fit plot. They are therefore errors and crucial in the process of making inference in regression analysis. For this inference process the errors are squared and known as residuals or as the squared errors of line.

 

The errors = distance from sampled points to the line

The regression line is only a “best possible fit” to the actual observed datapoints. It will not go through every datapoint and it might not go through any of the datapoints.

The regression line goes through (X,Y)-coordinates that are not equal to the (X,Y)-coordinates of the observations sample. And the distance from the two sets of coordinates is called the errors, or the residuals.

So, the errors are what the regression line calculated “erroneously”. So, it is the distance from reality to the predicted line.

 

Calculating the squared errors of line

Let’s apply the 4-datapoint mini example that from the chapter Regression line for cacluating the squared errors of line.

To calculate the total of all errors, we sum the distances for each datapoint to the line. And the distance from the datapoint to the line is the y-value of the datapoint minus the y-value of the line for the given X-value. So, the error is the vertical distance from the point to the line:

Squared errors of line visually

 

As mentioned, each error is the vertical distance from the estimated line to the datapoint: Observed value (the datapoint) minus the line. This can also be expressed as Yi – mxi+b, where Y is the observed datapoint and mx+b the estimated line.

Calculating errors

 

 

Visualized in graph:

Squared errors of line calculation

 

Finally, we sum up the all the squared errors of line and this sum of squared errors is often known as the sum of the squared errors of line and can be seen shortened as SELine.

 

Sum of the squared errors of line (SELine)

The sum of the squared error of line (SELine) plays an important role in statistical inference in regression analysis and is, amongst others, applied to calculate the coefficient of determination (r²). SELine is the sum of each of the errors which in our example becomes: (0.6)²  + (-0.8)² + (0.4)² + (-0.2)² = 1.2:

Sum of Squared errors of line

 

 

Sum of the squared errors of the mean (SEӯ)

Another component that is central in making inferences in regression is the error from the mean of y (SEӯ). SEӯ is the distance from ӯ to each datapoint (yi):

Squared errors of the mean of y

 

So, the sum of the squared errors of mean y (SEӯ) is added up:

Sum of squared error of mean y

  

 

Squared errors of line in MS Excel

Regression analysis in Excel can be run following the path Data >> Data Analysis >> Regression:

Squared errors of line in Excel

From this table, you will expand with the squared values, etc. Another option is to build spreadsheet tables, like in the examples above on this page.

 

 

Learning statistics

Some of my preferred material on squared errors of line:

Carsten Grube

Carsten Grube

Freelance Data Analyst

0 Comments

Submit a Comment

+34 616 71 29 85

Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

Drop me a line

What are you working on just now? Can I help you, and can you help me? 

About me

Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children. 

What they say

20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.