+34 616 71 29 85 carsten@dataz4s.com
Select Page

Hypergeometric distribution

What is the probability of getting 2 aces when dealt 4 cards without replacement from a standard deck of 52 cards? This can be answered through the hypergeometric distribution

Replacement

An example of an experiment with replacement is that we of the 4 cards being dealt and replaced. The deck will still have 52 cards as each of the cards are being replaced or put back to the deck. If we do not replace the cards, the remaining deck will consist of 48 cards.

Probabilities consequently vary as to whether the experiment is run with or without replacement. More on replacement in Dependent event.

Not independent, not binomial

The hypergeometric distribution is closely related to the binomial distribution. Only, the binomial distribution works for experiments with replacement and the hypergeometric works for experiments without replacement.

Back to the example that we are given 4 cards with no replacement from a standard deck of 52 cards:

The probability of getting an ace changes from one card dealt to the other. For the first card, we have 4/52 = 1/13 chance of getting an ace. Say, we get an ace. Now, for the second card, we have 4/51 chance of getting an ace. But if we had been dealt an ace in the first card, the probability would have been 3/51 in the second draw, and so on.

So, when no replacement, the probability for each event depends on 1) the sample space left after previous trials, and 2) on the outcome of the previous trials.

Thus, the probabilities of each trial (each card being dealt) are not independent, and therefore do not follow a binomial distribution.

Combination formula

The hypergeometric distribution is a discrete probability distribution with similarities to the binomial distribution and as such, it also applies the combination formula:

Approximation: Hypergeometric to binomial

In statistics the hypergeometric distribution is applied for testing proportions of successes in a sample.

The hypergeometric experiments consist of dependent events as they are carried out with replacement as opposed to the case of the binomial experiments which works without replacement.

However, for larger populations, the hypergeometric distribution often approximates to the binomial distribution, although the experiment is run without replacement. Because, when taking one unit from a large population of, say 10,000, this one unit drawn from 10,000 units practically does not change the probability of the next trial. It goes from 1/10,000 to 1/9,999.

Theoretically, the hypergeometric distribution work with dependent events as there is no replacement, but these are practically converted to independent events.

As a rule of thumb, the hypergeometric distribution is applied only when the trial (n) is larger than 5% of the population size (N):  Approximation from the hypergeometric distribution to the binomial distribution when N < 5% of n.

As sample sizes rarely exceed 5% of the population sizes, the hypergeometric distribution is not very commonly applied in statistics as it approximates to the binomial distribution.

Properties of the hypergeometric distribution

The hypergeometric distribution is a discrete probability distribution applied in statistics to calculate proportion of success in a finite population and:

• Finite population (N) < 5% of trial (n)
• Fixed number of trials
• 2 possible outcomes: Success or failure
• Dependent probabilities (without replacement)

Formulas and notations

The random variable of X has the hypergeometric distribution formula:

Where:

• N = Size of the total population
• K = Number of successes in the population
• N-K = Number of failures in the population
• n = number of trials
• k = number of successes observed

Examples with the hypergeometric distribution

2 aces when dealt 4 cards (small N: No approximation)

Let’s apply the formula with the example above where we are to calculate the probability of getting 2 aces when dealt 4 cards from a standard deck of 52:

There is a 0.025 probability, or a 2.5% chance, of getting two aces when dealt 4 cards from a standard deck of 52.

x=3; n=10; k=450; N=1,000 (Large N: Approximation to binomial)

What’s the probability of randomly picking 3 blue marbles when we randomly pick 10 marbles without replacement from a bag that contains 450 blue and 550 green marbles.

With the hypergeometric distribution we would say:

Let’s compare try and apply the binomial point estimate formula for this calculation:

The result when applying the binomial distribution (0.166478) is extremely close to the one we get by applying the hypergeometric formula (0.166500). The reason is that the total population (N) in this example is relatively large, because even though we do not replace the marbles, the probability of the next event is nearly unaffected.

The hypergeometric distribution with MS Excel

The Excel function =HYPERGEOM.DIST returns the probability providing:

• number of sample successes (x)
• sample size (n)
• population successes (k)
• population size (N)

The ‘2 aces example’ from above:

The ‘3 blue marbles example’ from above where we approximate to the binomial distribution.

Learning statistics

Carsten Grube

Freelance Data Analyst

p
p
p
ANOVA & the F-distribution

+34 616 71 29 85

Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

Drop me a line

What are you working on just now? Can I help you, and can you help me?