# Inference about regression

Inference about regression helps **understanding the relationship within data**. *How and how much does Y depend on X? Is our model precise enough to be used for forecasting?* This can be explored through inference about regression conducting e.g. confidence intervals and hypothesis testing for the estimated regression line.

## What is inference about regression?

In the section *‘Simple linear regression, fundamentals’*, I work through concepts and calculations of

- Regression model
- Correlation coefficient (r)
- Squared error of line (residuals)
- Coefficient of determination (r2)

With these instruments we can work out our model and interpret on it to some extent. In inference about regression **we now check and test the model** through confidence intervals and hypothesis testing.

## Inferential vs predictive

If it is inferential analysis or predictive analysis **depends on** **which side of the regression equation**, we turn our focus. Say we have the equation ŷ=m+bX. If we focus on the right side (m+bX) it is **inferential** analysis and if on ŷ, it is **predictive** analysis.

So, **inferential analysis is** when we focus on the relationship between m+bX and thereby between X and Y. **Predictive analysis is** when we focus on providing the best possible prediction of the outcome of the model (ŷ).

Prior to making inference we will check that the **prerequisites** are in place. This includes checking the **conditions** and **testing** the estimators and the relationship between these.

## Why test?

As described in Regression line, the estimated regression equation is expressed usually with one of these notations:

Whereas the true line includes an **epsilon (****ε****)** expressing the errors between the line and the datapoints.

We are not able to go out and measure the real population, as we cannot measure every living human being. So, therefore we estimate through sample statistics, and as such, our estimated regression line is only the “best possible” fit based on samples. But **different samples will always return different results and will never coincide exactly with true world**.

## Samples don’t tell the truth

Samples will always come out with different result and will never be a 100% match of the true population. The following example is based on a **self-made dataset** which I find satisfying for an example, although it might not reflect real world situation:

Say we make two different samples consisting in measuring **15 persons’ height (Y) and glove size** **(X)** as we wish to know if the heights of persons are corelated to their size of gloves. The two samples will give different results:

As we keep doing different samples, each sample will give different results. And none of them never tell the exact truth for however many samples we run. The true line would always differ from the samples:

*So, what is the true world situation? If none of our samples tell us the true story, how can we get an exact answer?* **We cannot**. But we can come up with a “qualified suggestion”. We can calculate an estimate and test our regression line.

## Summarizing

We are working with immeasurable populations which we intent to estimate through sampling statistics. And **as estimates are not the “sure thing”, it makes sense to test them prior to making inferences. **

## Learnings on inference about regression

- Jbstatistics: Youtube playlist / ‘Simple Linear Regression’. Here, Jeremy Balka offers some 12 short and well-structured videos explaining theory through examples.
- Khan Academy (video 7:12): Introduction to inference about slope in linear regression

#### Carsten Grube

Freelance Data Analyst

##### Normal distribution

##### Confidence intervals

##### Simple linear regression, fundamentals

##### Two-sample inference

##### ANOVA & the F-distribution

+34 616 71 29 85

Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

**Drop me a line**

*What are you working on just now? Can I help you, and can you help me? *

**About me**

Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.

**Connect with me**

**What they say**

20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.

## 0 Comments