# Pooled variance t-procedure

The pooled variance t-procedure uses a pooled variance for comparing two means. It assumes that the two population variances are equal which is rarely known, and therefore, the pooled variance t-procedure is **not commonly applied in statistics**.

**On this page**hide

## When to use pooled variance t-procedure?

The **advantage** of the pooled variance t-procedure is that it follows an exact t-distribution. But its **disadvantage** is that the population variances are assumed to be equal. As **we do not know the population variances**, neither do we know if they really are equal.

If we expect the population variances to be equal the pooled variance t-procedure might be adequate. We can calculate a ratio of the sample standard deviations: s_{1}/s_{2}

**The closer the ratio is to 1** the greater the probability that the pooled variance t-procedure is adequate because the more “alike” these sample standard deviations are. **A** **general guideline** is that the pooled variance procedure **can be applied when the ratio (s _{1}/s_{2}) is within 0.5 and 2**, which means that neither deviation is more than twice the other.

One of the alternatives to the pooled variance t-procedure is the **Welch unpooled t-procedure**. The advantage of this procedure is that it does not assume equal population variances. The disadvantage is that it does not follow an exact t-distribution.

## Assumption for the pooled variance t-procedure

- Independent simple random samples, or randomized experiments
- The two populations follow a normal distribution
- Population variances are equal

## Main purposes

The **main purposes** of estimating and testing with the pooled variance t-procedure are:

- Estimating a
**confidence interval**for the difference of mean-1 and mean-2 returns a range of value in which we can be (for example) 95% confident that our true difference of means lies within. If this doesn’t include zero, we can be 95% confident that the true population mean difference is different from zero and thereby that there is a difference in the two means. This is mathematically expressed with a hypothesis test: - Typically, we test if there is
**evidence**that mean-1 is different from mean-2.*Is there a*The**significant difference**between the populations?**p-value**is the likelihood of getting a result at least as extreme as the one we get from our sample assuming that there is no difference between the two means.

## Pooled sample variance

As we assume the two population variances to be equal, we can pool the two sample variances taking a **weighted average** of these two. It will end up somewhere between the two sample variances tending to be closer to the one with the largest sample size, as it can be deducted from the formula:

The pooled sample variance is the **estimator** of the common population variance (σ^{2}).

## The standard error (SE) of the difference

The standard error (SE) of the difference in sample means is applied when comparing the two means through **confidence intervals** and **hypothesis testing**. The SE formula:

SE is the **estimator of the standard deviation** of the sampling distribution of the difference.

## Confidence interval for the difference

Having the pooled sample variance and the SE, we can complete the formula for the confidence interval of the difference:

** **** **

## Hypothesis test for the difference

A hypothesis test for the difference tests if there is **evidence **to support a rejection of H0 as we know it from hypothesis testing in other statistical procedures. Usually, the null hypothesis if the difference is equal to or different from zero, like we have seen it in Comparing two means and Comparing two proportions.

The **p-value** expresses **how likely** we are to get that an extreme of a result as the one we got from our sample assuming that there is no difference in the means.

Typically, **the hypotheses are expressed** as follows:

As we are usually interested in knowing if there is a difference or not, we are looking both on the lower side of the mean and as on the upper side, which is a two-tailed test.

The test statistic is calculated similarly to what we know from other statistical procedures. We compare the difference to the SE of the difference:

**Concluding on the hypothesis test: **As in other kind of hypothesis test a ‘very **low’ p-value** expresses that there is ‘very little’ chance of getting as extreme a result as the one we got from our sample assuming that the means should be equal.

Therefore, a ‘**very low’ p-value**, **gives ‘very strong’ evidence against the null hypothesis** and thus against the claim that the two means should be equal. We **reject** the null hypothesis when the p-value is lower than the pre-set significance level (α).

## Worked example

In the following, we will run through an example making inference through confidence interval and hypothesis test.

** **

### The story

**Say, that a curious teacher** who works in a large international education organization wish to test the difference in test scores between students who attend courses run during morning hours with the ones run in the afternoon.

### The sample

She takes two randomly selected samples, one amongst the population of “**morning-students**” and one amongst the “**evening-students**”. Each sample size is of 10 students, and she assumes that the population variances are equal and therefore runs a pooled variance t-procedure.

The sample outcomes for morning-student is a **mean-score of 72.8** with a sample variance of 15.43. The sample outcomes for evening-student is a **mean-score of 64.7** with a sample variance of 12.29.

### Confidence interval

To make inference on this difference of (72.8 – 64.7=) 8.1, she runs a **95% confidence interval**. First, she calculates a **pooled variance** which is then applied to calculate the **standard error** of the difference (SE), and the SE finally serves to calculate the confidence interval. The **critical value** for a 95% confidence interval can be looked up in a t-table or in statistical software. It is 2.10.

The 95% confidence interval spans from **-5 to 21**. This means that, based on her samples of 10 each, she can feel confident that there is a 95% probability that the true population mean difference oscillates between -5 and 21.

This confidence interval includes 0, which means that a null hypothesis stating that there is no difference in means, **could not be rejected**.

### Hypothesis test

The teacher wishes to state a mathematical expression for what her confidence interval, above, already tells her: That there is no evidence that there should be a difference. She conducts a confidence interval:

The **test statistic is not in the rejection area** as it lies between the critical values of -2.10 and 2.10. The teacher therefore **fails to reject the null hypothesis**. There is no evidence to conclude that there is a difference between the test scores of the two groups.

The **p-value is 0.211**, which means that there is a 21.1% chance that she would get this extreme a result assuming that there is no difference in the means. That is a ‘fairly large’ chance and **she can** **therefore** **not reject the null hypothesis**.

### New and larger samples

The teacher, who really feel that she must be right in her intuition, **suspects that the sample size simples has been “too small”**. She now takes two samples, each of size 28. For the sake of this exercise, say she gets almost the sample result (sample mean **difference = 9.0** and a pooled sample variance of 215.5). **She, therefore, reject the null hypothesis** with test statistic of 2.29 compared to a critical value at α = 0.05) = 2.00.

The **p-value is 0.026**, which means that there is a 2.6% probability that she would get this extreme a result assuming that there is no difference in the means. At a significance level of alpha = 0.05, she will **reject** the null hypothesis and consider that the 2.6% probability is **too extreme** assuming that the two means should be equal.

We recall that **this t-procedure assumes that the population variances are equal**, which we didn’t know in this example. Therefore **this pooled variance t-procedure might not be the most adequate for the situation**.

## Pooled variance t-procedure in MS Excel

In Excel we run a ‘**t-Test: Two-Sample Assuming Equal Variances**’ from the Data >> Data Analysis menu. This does not calculate the confidence interval which I’m calculating by

** **

## Pooled variance t-procedure in R statistical programming

Coming

## Learnings on pooled variance t-procedure

- Penn State Eberly College of Science:
- Text page. Short step-by-step examples: Comparing Two Independent Means – Unpooled and Pooled
- Text page: Pooled Variances

- JBstatistics: Video (11:03): Pooled variance T tests and confidence intervals: Introduction

#### Carsten Grube

Freelance Data Analyst

##### Normal distribution

##### Confidence intervals

##### Simple linear regression, fundamentals

##### Two-sample inference

##### ANOVA & the F-distribution

+34 616 71 29 85

Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

**Drop me a line**

*What are you working on just now? Can I help you, and can you help me? *

**About me**

Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.

**Connect with me**

**What they say**

20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.

## 0 Comments