+34 616 71 29 85 carsten@dataz4s.com

Pooled variance t-procedure

The pooled variance t-procedure uses a pooled variance for comparing two means. It assumes that the two population variances are equal which is rarely known, and therefore, the pooled variance t-procedure is not commonly applied in statistics.

 

When to use pooled variance t-procedure?

The advantage of the pooled variance t-procedure is that it follows an exact t-distribution. But its disadvantage is that the population variances are assumed to be equal. As we do not know the population variances, neither do we know if they really are equal.

If we expect the population variances to be equal the pooled variance t-procedure might be adequate. We can calculate a ratio of the sample standard deviations: s1/s2

The closer the ratio is to 1 the greater the probability that the pooled variance t-procedure is adequate because the more “alike” these sample standard deviations are. A general guideline is that the pooled variance procedure can be applied when the ratio (s1/s2) is within 0.5 and 2, which means that neither deviation is more than twice the other.

One of the alternatives to the pooled variance t-procedure is the Welch unpooled t-procedure. The advantage of this procedure is that it does not assume equal population variances. The disadvantage is that it does not follow an exact t-distribution.

 

Assumption for the pooled variance t-procedure

 

Main purposes

The main purposes of estimating and testing with the pooled variance t-procedure are:

  • Estimating a confidence interval for the difference of mean-1 and mean-2 returns a range of value in which we can be (for example) 95% confident that our true difference of means lies within. If this doesn’t include zero, we can be 95% confident that the true population mean difference is different from zero and thereby that there is a difference in the two means. This is mathematically expressed with a hypothesis test:
  • Typically, we test if there is evidence that mean-1 is different from mean-2. Is there a significant difference between the populations? The p-value is the likelihood of getting a result at least as extreme as the one we get from our sample assuming that there is no difference between the two means.

 

Pooled sample variance

As we assume the two population variances to be equal, we can pool the two sample variances taking a weighted average of these two. It will end up somewhere between the two sample variances tending to be closer to the one with the largest sample size, as it can be deducted from the formula:

Pooled variance t-procedure_formula for pooled variance

The pooled sample variance is the estimator of the common population variance (σ2).

 

 

The standard error (SE) of the difference

The standard error (SE) of the difference in sample means is applied when comparing the two means through confidence intervals and hypothesis testing. The SE formula:

Formula standard error of difference

SE is the estimator of the standard deviation of the sampling distribution of the difference.

 

Confidence interval for the difference

Having the pooled sample variance and the SE, we can complete the formula for the confidence interval of the difference:

 

Pooled variance t-procedure_formula for confidence interval

 

  

Hypothesis test for the difference

A hypothesis test for the difference tests if there is evidence to support a rejection of H0 as we know it from hypothesis testing in other statistical procedures. Usually, the null hypothesis if the difference is equal to or different from zero, like we have seen it in Comparing two means and Comparing two proportions.

The p-value expresses how likely we are to get that an extreme of a result as the one we got from our sample assuming that there is no difference in the means.

Typically, the hypotheses are expressed as follows:

Hypothesis test for the slope

 

As we are usually interested in knowing if there is a difference or not, we are looking both on the lower side of the mean and as on the upper side, which is a two-tailed test.

The test statistic is calculated similarly to what we know from other statistical procedures. We compare the difference to the SE of the difference: 

Pooled variance t-procedure_formula test statistics

 

 Concluding on the hypothesis test: As in other kind of hypothesis test a ‘very low’ p-value expresses that there is ‘very little’ chance of getting as extreme a result as the one we got from our sample assuming that the means should be equal.

Therefore, a ‘very low’ p-value, gives ‘very strong’ evidence against the null hypothesis and thus against the claim that the two means should be equal. We reject the null hypothesis when the p-value is lower than the pre-set significance level (α).

 

 

Worked example

In the following, we will run through an example making inference through confidence interval and hypothesis test.

 

The story

Say, that a curious teacher who works in a large international education organization wish to test the difference in test scores between students who attend courses run during morning hours with the ones run in the afternoon.

 

The sample

She takes two randomly selected samples, one amongst the population of “morning-students” and one amongst the “evening-students”. Each sample size is of 10 students, and she assumes that the population variances are equal and therefore runs a pooled variance t-procedure.

The sample outcomes for morning-student is a mean-score of 72.8 with a sample variance of 15.43. The sample outcomes for evening-student is a mean-score of 64.7 with a sample variance of 12.29.

 

Confidence interval

To make inference on this difference of (72.8 – 64.7=) 8.1, she runs a 95% confidence interval. First, she calculates a pooled variance which is then applied to calculate the standard error of the difference (SE), and the SE finally serves to calculate the confidence interval. The critical value for a 95% confidence interval can be looked up in a t-table or in statistical software. It is 2.10.

Pooled variance t-procedure_confidence interval calculation example

The 95% confidence interval spans from -5 to 21. This means that, based on her samples of 10 each, she can feel confident that there is a 95% probability that the true population mean difference oscillates between -5 and 21.

This confidence interval includes 0, which means that a null hypothesis stating that there is no difference in means, could not be rejected.

  

Hypothesis test

The teacher wishes to state a mathematical expression for what her confidence interval, above, already tells her: That there is no evidence that there should be a difference. She conducts a confidence interval:

Pooled variance t-procedure_test statistics calculation example

 

The test statistic is not in the rejection area as it lies between the critical values of -2.10 and 2.10. The teacher therefore fails to reject the null hypothesis. There is no evidence to conclude that there is a difference between the test scores of the two groups.

The p-value is 0.211, which means that there is a 21.1% chance that she would get this extreme a result assuming that there is no difference in the means. That is a ‘fairly large’ chance and she can therefore not reject the null hypothesis.

 

New and larger samples

The teacher, who really feel that she must be right in her intuition, suspects that the sample size simples has been “too small”. She now takes two samples, each of size 28. For the sake of this exercise, say she gets almost the sample result (sample mean difference = 9.0 and a pooled sample variance of 215.5). She, therefore, reject the null hypothesis with test statistic of 2.29 compared to a critical value at α = 0.05) = 2.00.

The p-value is 0.026, which means that there is a 2.6% probability that she would get this extreme a result assuming that there is no difference in the means. At a significance level of alpha = 0.05, she will reject the null hypothesis and consider that the 2.6% probability is too extreme assuming that the two means should be equal.

We recall that this t-procedure assumes that the population variances are equal, which we didn’t know in this example. Therefore this pooled variance t-procedure might not be the most adequate for the situation.

 

 

Pooled variance t-procedure in MS Excel

In Excel we run a ‘t-Test: Two-Sample Assuming Equal Variances’ from the Data >> Data Analysis menu. This does not calculate the confidence interval which I’m calculating by

Pooled variance t-procedure

 

 

 

Pooled variance t-procedure in R statistical programming

Coming

 

 

 

Learnings on pooled variance t-procedure

 

 

Carsten Grube

Carsten Grube

Freelance Data Analyst

0 Comments

Submit a Comment

+34 616 71 29 85

Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

Drop me a line

What are you working on just now? Can I help you, and can you help me? 

About me

Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children. 

What they say

20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.