+34 616 71 29 85 carsten@dataz4s.com

Confidence intervals for means

Confidence intervals for means calculate an interval in which there is a certain degree of confidence (often 90%; 95% or 99%) that the true population mean lies within.

 

 

Conditions for valid t intervals

  1. Simple random sample
  2. Normal: The sample distribution of the sample mean is roughly normally distributed through on of the following conditions:
    1. n ≥ 30
    2. The population is normally distributed
    3. Approximately symmetric around the mean
  3. Independent sample: Sample either with replacement, or n < 10N

 

Point estimate vs confidence interval

Why not just rely on a point estimate, like the sample mean? The sample mean is a value, or a point estimate. It does not give a feel of how certain, or uncertain, we can feel about it. We would still be asking: How “certain” can we be that the true population mean is anywhere close to this sample mean?

For inferential statistics we add an interval to this point estimate. We can calculate a confidence interval for means and this returns an interval for which we can conclude that we can feel, for example, 95% confident that the true mean lies within.

Example: Say that we, in some population, calculate an average height or length for 4-months old babies. Compare this to an interval between 74 and 82 cm saying that we can feel 95% confident that the true mean lies within this interval.

As described in Confidence intervals the greater the confidence level the wider the interval opens:

Confidence levels interval width

 

σ unknown => t-statics

When calculating confidence intervals for means, it is very unusual that the population standard deviation is known. If it would be known, the population most probably would also have to be known, and we wouldn’t need to calculate these estimates in the first place, as we could just read directly from the population data.

In Confidence intervals for proportions, we can calculate the standard error of the statistics and we apply the standard normal table as also described in Z-table for proportions.

When calculating the confidence interval for means, we apply the t-table, because our standard error is based on our sample standard deviation and not on the true population standard deviation.

The t-table returns higher values than the z-table, which gives good sense as t-statistics is used for smaller samples meaning that the margin of error is higher. The spread of data is larger:

Normal curve and t-curve

 

How to use the t-table

The t-table is embedded in all statistical software, and statisticians rarely do lookups in the tables, but for the sake of the exercise:

We divide the Alpha level by 2 whereas we have two critical values: the upper and the lower. Say we are supposed to find the t-score for a sample size of 21 at a 95% confidence interval:

Looking up in the t-table

The left column represents degrees of freedom (df) which is n-1 = 20.  As we are looking for a 95% confidence level and an Alpha of 0.05, we look down the 0.025 column and find the value 2.086 which is our t-score, or our critical value:t-curve w. critical values

Margin of error (ME)

The margin of error (ME) is composed by the t-score and the standard error. The standard error is the sample standard deviation (s) seen in relation to the sample size squared:Standard error calculation

 

The formula

The formula for calculation of a confidence interval for a population mean with unknown σ is therefore:Confidence intervals for means_formula

The components of the formula are: 

Confidence intervals for means_formula explained

 

Worked example of confidence intervals for means

Parents of 21 girls born in Sweden report the exact height/length of their 2 years old girls measured in the very birthday of each child. This sample comes out with the results that the mean height is 79.00 and the standard deviation is 5.00. We wish to do a 95% confidence interval, so the calculation becomes:

Confidence intervals for means_calculation

 

Through our sample statistics we estimate that we can be 95% confident that the true mean height for two-years old girls born in Sweden is between 76.4 and 81.6 cm.

 

Confidence intervals for means in Excel

The =CONFIDENCE.T function in Excel applies Student’s t-Distribution to calculate a confidence value for a population mean.

Syntax: CONFIDENCE.T(alpha,standard_dev,size)

For our example above:

  • alpha=0.05
  • standard deviation (s) = 5.72
  • sample size (n) = 21

=CONFIDENCE.T(0.05,5.72,21) = 2.6

 

In the following screenshot we can appreciate that the t-table returns a greater interval than the z-table:Confidence intervals for means in Excel

 

Learning statistics

Some of my preferred tutorials for learning on confidence intervals for means:

Carsten Grube

Carsten Grube

Freelance Data Analyst

0 Comments

Submit a Comment

+34 616 71 29 85

Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

Drop me a line

What are you working on just now? Can I help you, and can you help me? 

About me

Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children. 

What they say

20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.