+34 616 71 29 85 carsten@dataz4s.com

Comparing two proportions

Comparing two proportions is often seen during election periods where it can applied when e.g. comparing how two groups of people vote for a party. For example, are women more likely to vote a certain party than men?

 

Voting example

Except for a few twists, I will use the example that Salman Khan give us in this video: Comparing population proportions 1:

Say we are in the election period and we wish to know if men are more likely to vote for a certain party than women.

Out of the 900 men that we sample, 584 (=65%) express that they will vote for the party.

Out of the 1100 women that we sample, 651 (=59%) express that they will vote for the party.

 

The distributions of sampled men and women

This is a Bernouli distribution where the mean equals the sample proportion. The variance equals the success rate (=the sample proportion) times the failure rate (the ones not voting for the party). This is the p(1-p), which can loosely be expressed as the ‘yes-proportion’ times the ‘no-proportion’:

 

Comparing two proportions_example Bernouli distributions

 

The sampling distribution of proportion means

As we have two relatively large sample sizes and proportions that are relatively far from 0 and from 1, the sampling distributions become approximately normally distributed:

The mean of the sampling distribution of the sample proportion (µp̄) = the population proportion (p̄).

 

Comparing two proportions_sampling distributions

 

 

The sampling distribution of the difference

The distribution of the differences of the sampling means will have a mean = the differences of our sampling means, and they = to the difference of our sampling proportions.

The variance of the sampling distribution of the differences of means = variance for p2 + variance for p2.

 

03. Comparing two proportions_sampling distributions of the difference

 

Using a confidence interval to compare

Now, that we have identified and described the sampling distributions and the means, variances and standard deviations, we can start working on our confidence interval for the difference in the sampling means.

A confidence interval associates a degree of uncertainty to our point estimate which, in our case, is 0.057. Often, one of the main interests that we have when doing a confidence interval for the difference is to see if zero is included in the interval.

If zero is included in our 95% confidence interval we will not have evidence, in our hypothesis test, to reject H0, and therefore, there is no evidence to support that there really might be a difference. The difference might be zero.

The confidence interval formula is similar to the “usual” confidence interval structure where we add and subtract the margin of error (ME) with estimated mean difference:

04. Confidence interval for comparing two proportions

In the figure above we saw the formula for the variance and the standard deviation of the difference, and we can therefore calculate the standard deviation for the difference like this:

Standard deviation of the difference

 

Sorry for using different notations! Here I denote the standard deviation ‘σd’. The ‘d’ is the difference, which above, I denoted p1-p2.

The critical value for a 95% confidence interval is 1.96 and having calculated the estimated sample variance of the difference, we can now plug this 0.022 into the confidence interval formula:

06. Confidence interval for difference_calculations

We get a confidence interval that spans from 0.014 to 0.100. This means that we can be 95% confident that the true difference between the population parameters P1 and P2 is minimum 0.014. Hence, zero is not included in the interval and there is therefore most likely sufficient evidence to support our alternative hypothesis saying that there is a difference between the proportion means. P1 minus P2 does not seem to be zero.

 

Hypothesis test for the sample mean difference

The test will answer to the following question: How likely are we to get that an extreme a result, as the one we got in our sample, assuming that there really is no difference between the proportions? Can we reject the null hypothesis stating that there is no difference between the two sample proportions?

 

Hypothesis test for the slope

We have a large sample size and sample means relatively far from zero, so we will assume normality and apply z-statistics:

z-test statistic formula for two proportions

Where p1 and p2 are assumed to be equal, so that parenthesis is zero and q = p(1-p), which, in our example are the proportion of the persons who say ‘No’. It is the proportion that does not say ‘Yes’.

The estimated p-hat and q-hat are:

q calculations

 

We can now calculate our z-statistics

z statistic calculation for two proportions

 

 

The critical z-value at a significance level (α) of 0.05 is 1.96, so with our test statistic of 2.613 we reject the null hypothesis. There is not evidence to support that the two proportions should be equal.  

 

Comparing two proportions with MS Excel

Below, a screenshot of how comparing of two proportions can be done in Excel. The z-test function in Data >> Data Analysis is another option, although the ranges of the observations are needed.

Comparing two proportions with Excel

 

 

Learnings on comparing two proportions

My preferred material for learning theory on comparing two proportions:

 

Carsten Grube

Carsten Grube

Freelance Data Analyst

0 Comments

Submit a Comment

+34 616 71 29 85

Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

Drop me a line

What are you working on just now? Can I help you, and can you help me? 

About me

Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children. 

What they say

20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.