+34 616 71 29 85 carsten@dataz4s.com
Select Page

Inference about regression helps understanding the relationship within data. How and how much does Y depend on X? Is our model precise enough to be used for forecasting? This can be explored through inference about regression conducting e.g. confidence intervals and hypothesis testing for the estimated regression line.

## What is inference about regression?

In the section ‘Simple linear regression, fundamentals’, I work through concepts and calculations of

With these instruments we can work out our model and interpret on it to some extent. In inference about regression we now check and test the model through confidence intervals and hypothesis testing.

## Inferential vs predictive

If it is inferential analysis or predictive analysis depends on which side of the regression equation, we turn our focus. Say we have the equation ŷ=m+bX. If we focus on the right side (m+bX) it is inferential analysis and if on ŷ, it is predictive analysis.

So, inferential analysis is when we focus on the relationship between m+bX and thereby between X and Y. Predictive analysis is when we focus on providing the best possible prediction of the outcome of the model (ŷ).

Prior to making inference we will check that the prerequisites are in place. This includes checking the conditions and testing the estimators and the relationship between these.

## Why test?

As described in Regression line, the estimated regression equation is expressed usually with one of these notations: Whereas the true line includes an epsilon (ε) expressing the errors between the line and the datapoints. We are not able to go out and measure the real population, as we cannot measure every living human being. So, therefore we estimate through sample statistics, and as such, our estimated regression line is only the “best possible” fit based on samples. But different samples will always return different results and will never coincide exactly with true world.

## Samples don’t tell the truth

Samples will always come out with different result and will never be a 100% match of the true population. The following example is based on a self-made dataset which I find satisfying for an example, although it might not reflect real world situation:

Say we make two different samples consisting in measuring 15 persons’ height (Y) and glove size (X) as we wish to know if the heights of persons are corelated to their size of gloves. The two samples will give different results: As we keep doing different samples, each sample will give different results. And none of them never tell the exact truth for however many samples we run. The true line would always differ from the samples: So, what is the true world situation? If none of our samples tell us the true story, how can we get an exact answer? We cannot. But we can come up with a “qualified suggestion”. We can calculate an estimate and test our regression line.

## Summarizing

We are working with immeasurable populations which we intent to estimate through sampling statistics. And as estimates are not the “sure thing”, it makes sense to test them prior to making inferences.

## Learnings on inference about regression #### Carsten Grube

Freelance Data Analyst

p
p
p
##### ANOVA & the F-distribution +34 616 71 29 85

Call me Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

Drop me a line

What are you working on just now? Can I help you, and can you help me?