Inference about regression
Inference about regression helps understanding the relationship within data. How and how much does Y depend on X? Is our model precise enough to be used for forecasting? This can be explored through inference about regression conducting e.g. confidence intervals and hypothesis testing for the estimated regression line.
What is inference about regression?
In the section ‘Simple linear regression, fundamentals’, I work through concepts and calculations of
- Regression model
- Correlation coefficient (r)
- Squared error of line (residuals)
- Coefficient of determination (r2)
With these instruments we can work out our model and interpret on it to some extent. In inference about regression we now check and test the model through confidence intervals and hypothesis testing.
Inferential vs predictive
If it is inferential analysis or predictive analysis depends on which side of the regression equation, we turn our focus. Say we have the equation ŷ=m+bX. If we focus on the right side (m+bX) it is inferential analysis and if on ŷ, it is predictive analysis.
So, inferential analysis is when we focus on the relationship between m+bX and thereby between X and Y. Predictive analysis is when we focus on providing the best possible prediction of the outcome of the model (ŷ).
Prior to making inference we will check that the prerequisites are in place. This includes checking the conditions and testing the estimators and the relationship between these.
As described in Regression line, the estimated regression equation is expressed usually with one of these notations:
Whereas the true line includes an epsilon (ε) expressing the errors between the line and the datapoints.
We are not able to go out and measure the real population, as we cannot measure every living human being. So, therefore we estimate through sample statistics, and as such, our estimated regression line is only the “best possible” fit based on samples. But different samples will always return different results and will never coincide exactly with true world.
Samples don’t tell the truth
Samples will always come out with different result and will never be a 100% match of the true population. The following example is based on a self-made dataset which I find satisfying for an example, although it might not reflect real world situation:
Say we make two different samples consisting in measuring 15 persons’ height (Y) and glove size (X) as we wish to know if the heights of persons are corelated to their size of gloves. The two samples will give different results:
As we keep doing different samples, each sample will give different results. And none of them never tell the exact truth for however many samples we run. The true line would always differ from the samples:
So, what is the true world situation? If none of our samples tell us the true story, how can we get an exact answer? We cannot. But we can come up with a “qualified suggestion”. We can calculate an estimate and test our regression line.
We are working with immeasurable populations which we intent to estimate through sampling statistics. And as estimates are not the “sure thing”, it makes sense to test them prior to making inferences.
Learnings on inference about regression
- Jbstatistics: Youtube playlist / ‘Simple Linear Regression’. Here, Jeremy Balka offers some 12 short and well-structured videos explaining theory through examples.
- Khan Academy (video 7:12): Introduction to inference about slope in linear regression
Freelance Data Analyst
+34 616 71 29 85
Spain: Ctra. 404, km 2, 29100 Coín, Malaga
Denmark: c/o Musvitvej 4, 3660 Stenløse
Drop me a line
What are you working on just now? Can I help you, and can you help me?
Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.
Connect with me
What they say
20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.