Mean and single response intervals
Key points for mean and single response intervals
- Mean response interval can be referred to as confidence intervals for the mean of all Y values at a given X value. It can be denoted µ̂(Y|X)
- Single response interval can be referred to as the prediction interval for one single Y value at a given X value: It can be denoted Y(pred)
- Y(pred) has a wider interval than µ̂(Y|X).
Overall on µ̂(Y|X) and Y(pred)
Jeremy Balka, in his video, Intervals for the mean response…, states the following example which I find helps understanding the two concepts and thereby to distinguish between them:
Say a power plant wish to estimate the mean daily power consumption (Y) for a given temperature (X). Or, they wish to predict the power consumption (Y) tomorrow (X) as weather forecasts says, “hot day tomorrow”. Say the temperature for tomorrow is expected to be X degrees. These would be the questions that we would work to answer for:
- What consumption can the power plant expect to have for tomorrow (single Y value)?
- What is the mean consumption on a day with X degrees (mean of all Y’s)?
Calculating the point estimates
Let’s take another example. Here, I will work the values by hand with a few holds for reflection and analyzing of the general concepts and formulas. The values are from the video ‘Prediction Interval…’ of Rick Vaughn.
We have the following relation observational data:
The regression model for this relationship is:
We assume that there is sufficient evidence to support the claim that the LINER model’s conditions are met and that we therefore can proceed with the inference statistics. First, let’s see what the predicted temperature is at 6327 ft:
The predicted temperature at 6327 feet Y(Pred) = 48 degrees Fahrenheit.
This result is the same whether we calculate for mean of Y at this X value as for this single predicted Y value:
But these are only point estimates. Let’s see what happens when calculating the corresponding intervals:
Estimating and predicting the intervals
We should be suspicious about point estimates and ask what degree of uncertainty is associated with them. How sure can we be that 48 is a good estimate and what range can we expect our value to lie between?
In statistics we wish to associate point estimates to some level of uncertainty. For example, can we be only 30% confident that the 48 is the is the true mean, or can we be 90% confident? What is this the uncertainty associated with the point estimate? Below, we will calculate for this, but let’s first understand why the name differences: confidence vs. prediction:
Why the names ‘confidence’ and ‘prediction’ intervals?
Confidence intervals are created for parameters and the estimation of a single Y is not a parameter as it comes from our sample. Both formulas are similar to the structure of formulas for confidence intervals that we know from other statistical disciplines:
Standard error of the estimated mean of Y:
As in confidence interval calculations the standard error (SE) is one of the components of these two formulas. However, standard error is the term used for estimators of parameters. And, as mentioned above, the estimation of a single Y (the Y(pred)) is an estimation of a sample mean and not a parameter.
But the formulas are typically defined as standard errors (SE) and it can help to see the similarity that these formulas have compared with the “usual” confidence interval formulas.
Let’s see the two SE formulas:
Where X* is the given X for which we are to estimate the Y. So, the component of (X* – X̄)2 indicates that the further away the given X value is from the mean X (X̄), the further away the line for this formula, because the larger the calculated value of this formula.
Standard error for the single predicted value of Y:
The SE calculation for a single predicted value of Y is similar to the one above for the SE calculation of estimated mean Y:
Like the SE calculation of the mean Y, SE(Ypred) formula, for calculating a single predicted Y value, results in a greater value the further away the given X (X*) is from the mean of X (X̄).
The formula differs with the +σ2 compared to the other formula of intervals for mean of Y. The + σ2 results in the “1” in the SE formula. This gives a greater ME when we predict for a single Y value as opposed to when we estimate for the mean of Y for that given X.
Comparing the two interval formulas
When comparing the interval formulas for µ̂(Y|X) vs predicting interval for Ŷ(Pred) we see that the only difference between them, as mentioned above, is the extra “1” in the formula for Ŷ(Pred). This “1” leads to a wider interval for the single Y value compared to the one of the mean of all Y values at that given X:
Let’s see this difference in the example listed above, where our point estimate was 48 for both the µ̂(Y|X) and for Ŷ(Pred). We plug the values into the formulas and see get a greater interval for the (Ypred) than for the µ̂(Y|X):
The final interval calculations
Let’s see what happens when we apply the example from above with the temperatures at the different heights during flights:
As we recall from above, the point estimates are the same:
As in confidence intervals of other statistical disciplines, we add and subtract ME (ME=SE =t-crit × SE) to the point estimate. Here we will run 95% intervals at df=5:
So, we see that (Ypred) has a wider interval than µ̂(Y|X). The confidence interval for the mean of Y values at X=6.327 is [43 to 54] whereas the prediction interval for a single Y value at X=6.327 is [38 to 59].
Conditions for inference
Visualizing µ̂(Y|X) and (Ypred)
Due to the “1” in the formula the (Ypred) has a wider interval than µ̂(Y|X). It might also seem logical to think that there is more uncertainty in a prediction made for one single Y value compared to for the estimating of the mean of all the Y values at the given X level.
Mean and single response intervals – shorter example
One more example, just in short:
Say that a Supplier A markets Service 1 suspects that Supplier B is following their prices for their competing service, Service 2. Supplier A wishes to see the relationship between the past 12 prices changes they have done. Has the competitor changed Service 2 corresponding to the new price on Service 1? Supplier checks the movements of the prices in Service 2 up to 24 hours after the price change in Service 1. Below, the 12 observations.
First, Supplier A now calculates the regression line and the coefficient of determination (r2) – displayed below. She checks the LINER model and the residual plots and find that the conditions are met for inference.
She could then test the slope through confidence interval and hypothesis testing. In this case she finds an extremely low p-value (0.000008) for the slope meaning that there is a very high probability that there is a relationship between X and Y.
What is the Service 2 predicted price (Y) if the Service 1 price (X) is set to 13.0 at a 95% confidence level? This is the prediction interval for the single Y value: Y(pred)
What is the confidence interval for the Service 2 mean price (Y), if the Service 1 price (X) is set to 13.0 at a 95% confidence level? This is the mean of all Y’s: µ̂(Y|X)
Answer to Question 1
The predicted price interval for Service 2 (Y), at a 95% confidence level, if the Service 1 price (X) is set to 13.0 is 11.98 to 12.99. In other words, the prediction interval for the single Y value at a 95% confidence level = [11.98; 12.99].
Answer to Question 2
The confidence interval for the Service 2 mean price (Y), at a 95% confidence level, if the Service 1 price (X) is set to 13.0 is [12.33; 12.63].
Mean and single response intervals in Excel
There is no Excel function for calculation of the SE values nor for the intervals in themselves. That means that the SE-formulas must be written in by hand which is source for making errors. As such, I believe that these calculations should be done with a statistical software, or by obtaining statistical add-ins for Excel.
However, I have run the examples above in Excel. First I have calculated sum, mean, stdev, etc. in the data columns, as you can see. Second, I have run the Data >> Data Analysis >> Regression mainly to use its ‘Standard Error’ as the ‘s’ in the SE formulas, but also nice to have the regression equation, the coefficient of determination (r2) and the p-value at hand.
In the example above, we calculated confidence interval and prediction interval for X=13. In Excel we might as well calculate for the whole range of exes with the intervals that you find suitable. From that get the regression line together with the four interval lines.
I might do a short video to show how to, but you do a Line chart, right click the chart and click ‘Select data’. Here, you add to Legend Entries (Series) and make sure that the horizontal axis are your x values. Maybe, this screenshot can be of use for you:
Some of my preferred materials for learning on mean and single response intervals, finding JBstatistics great!
- JBstatistics (video 12:26): Great way of explaining theory through examples: Intervals (for the Mean Response and a Single Response) in Simple Linear Regression
- Rick Vaughn (video 8:40): Prediction Interval in Excel
- Real Statistics Using Excel, by Charles Zaiontz (text page): Confidence and prediction intervals for forecasted values
Freelance Data Analyst
+34 616 71 29 85
Spain: Ctra. 404, km 2, 29100 Coín, Malaga
Denmark: c/o Musvitvej 4, 3660 Stenløse
Drop me a line
What are you working on just now? Can I help you, and can you help me?
Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.
Connect with me
What they say
20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.