# Squared errors of line

The squared errors of line are the **vertical distances (squared) from the estimated regression line to each datapoint** in the line fit plot. They are therefore errors and crucial in the process of making inference in regression analysis. For this inference process the errors are squared and known as **residuals** or as the **squared errors of line**.

## The errors = distance from sampled points to the line

The regression line is only a “best possible fit” to the actual observed datapoints. It will not go through every datapoint and it might not go through any of the datapoints.

The regression line goes through (X,Y)-coordinates that are not equal to the (X,Y)-coordinates of the observations sample. And the distance from the two sets of coordinates is called the errors, or the **residuals**.

So, the errors are what the regression line calculated “erroneously”. So, it is the distance from reality to the predicted line.

## Calculating the squared errors of line

Let’s apply the **4-datapoint mini example** that from the chapter Regression line for cacluating the squared errors of line.

To calculate the **total of all errors**, we sum the distances for each datapoint to the line. And the distance from the datapoint to the line is the y-value of the datapoint minus the y-value of the line for the given X-value. So, the error is the vertical distance from the point to the line:

As mentioned, **each error is the vertical distance** from the estimated line to the datapoint: Observed value (the datapoint) minus the line. This can also be expressed as **Y _{i} – mx_{i}+b**, where Y is the observed datapoint and mx+b the estimated line.

Visualized in graph:

Finally, we sum up the all the squared errors of line and this sum of squared errors is often known as the **sum of the squared errors of line** and can be seen shortened as **SE _{Line}.**

## Sum of the squared errors of line (SE_{Line})

The sum of the squared error of line (SE_{Line}) plays an important role in statistical inference in regression analysis and is, amongst others, applied to calculate the coefficient of determination (r²). SE_{Line} is the sum of each of the errors which in our example becomes: (0.6)² + (-0.8)² + (0.4)² + (-0.2)² = 1.2:

^{ }

## Sum of the squared errors of the mean (SE_{ӯ})

Another component that is central in making inferences in regression is the **error from the mean of y (SE**_{ӯ}**)**. **SE**** _{ӯ}** is the

**distance from**

**ӯ**

**to each datapoint (y**:

*)*_{i}

So, the sum of the squared errors of mean y (SE_{ӯ}) is **added up**:

** **** **

## Squared errors of line in MS Excel

Regression analysis in Excel can be run following the path Data >> Data Analysis >> Regression:

From this table, you will expand with the squared values, etc. Another option is to build spreadsheet tables, like in the examples above on this page.

** **

** **

## Learning statistics

Some of my preferred material on squared errors of line:

- Khan Academy (video 6:47): Squared error of regression line
- JBstatistics: (video 7:23): The least squares regression line
- Freecodecamp: (text page): Machine learning: An introduction to mean squared error and regression lines

#### Carsten Grube

Freelance Data Analyst

##### Normal distribution

##### Confidence intervals

##### Simple linear regression, fundamentals

##### Two-sample inference

##### ANOVA & the F-distribution

+34 616 71 29 85

Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

**Drop me a line**

*What are you working on just now? Can I help you, and can you help me? *

**About me**

Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.

**Connect with me**

**What they say**

20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.

## 0 Comments