+34 616 71 29 85 carsten@dataz4s.com

Inference about regression

Inference about regression helps understanding the relationship within data. How and how much does Y depend on X? Is our model precise enough to be used for forecasting? This can be explored through inference about regression conducting e.g. confidence intervals and hypothesis testing for the estimated regression line.

 

What is inference about regression?

In the section ‘Simple linear regression, fundamentals’, I work through concepts and calculations of

With these instruments we can work out our model and interpret on it to some extent. In inference about regression we now check and test the model through confidence intervals and hypothesis testing.

 

Inferential vs predictive

If it is inferential analysis or predictive analysis depends on which side of the regression equation, we turn our focus. Say we have the equation ŷ=m+bX. If we focus on the right side (m+bX) it is inferential analysis and if on ŷ, it is predictive analysis.

So, inferential analysis is when we focus on the relationship between m+bX and thereby between X and Y. Predictive analysis is when we focus on providing the best possible prediction of the outcome of the model (ŷ).

Prior to making inference we will check that the prerequisites are in place. This includes checking the conditions and testing the estimators and the relationship between these.

 

Why test?

As described in Regression line, the estimated regression equation is expressed usually with one of these notations:

Inference about regression_estimated line formula

 

Whereas the true line includes an epsilon (ε) expressing the errors between the line and the datapoints.

Inference about regression_true line formula

 

We are not able to go out and measure the real population, as we cannot measure every living human being. So, therefore we estimate through sample statistics, and as such, our estimated regression line is only the “best possible” fit based on samples. But different samples will always return different results and will never coincide exactly with true world.

 

Samples don’t tell the truth

Samples will always come out with different result and will never be a 100% match of the true population. The following example is based on a self-made dataset which I find satisfying for an example, although it might not reflect real world situation:

Say we make two different samples consisting in measuring 15 persons’ height (Y) and glove size (X) as we wish to know if the heights of persons are corelated to their size of gloves. The two samples will give different results:

Inference about regression_different samples, different results

 

As we keep doing different samples, each sample will give different results. And none of them never tell the exact truth for however many samples we run. The true line would always differ from the samples:

 

Inference about regression_sample results vs truth

 

So, what is the true world situation? If none of our samples tell us the true story, how can we get an exact answer? We cannot. But we can come up with a “qualified suggestion”. We can calculate an estimate and test our regression line.

 

Summarizing

We are working with immeasurable populations which we intent to estimate through sampling statistics. And as estimates are not the “sure thing”, it makes sense to test them prior to making inferences. 

 

Learnings on inference about regression

 

Carsten Grube

Carsten Grube

Freelance Data Analyst

0 Comments

Submit a Comment

+34 616 71 29 85

Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

Drop me a line

What are you working on just now? Can I help you, and can you help me? 

About me

Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children. 

What they say

20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.