What is the probability of getting 2 aces when dealt 4 cards without replacement from a standard deck of 52 cards? This can be answered through the hypergeometric distribution
An example of an experiment with replacement is that we of the 4 cards being dealt and replaced. The deck will still have 52 cards as each of the cards are being replaced or put back to the deck. If we do not replace the cards, the remaining deck will consist of 48 cards.
Probabilities consequently vary as to whether the experiment is run with or without replacement. More on replacement in Dependent event.
Not independent, not binomial
The hypergeometric distribution is closely related to the binomial distribution. Only, the binomial distribution works for experiments with replacement and the hypergeometric works for experiments without replacement.
Back to the example that we are given 4 cards with no replacement from a standard deck of 52 cards:
The probability of getting an ace changes from one card dealt to the other. For the first card, we have 4/52 = 1/13 chance of getting an ace. Say, we get an ace. Now, for the second card, we have 4/51 chance of getting an ace. But if we had been dealt an ace in the first card, the probability would have been 3/51 in the second draw, and so on.
So, when no replacement, the probability for each event depends on 1) the sample space left after previous trials, and 2) on the outcome of the previous trials.
Thus, the probabilities of each trial (each card being dealt) are not independent, and therefore do not follow a binomial distribution.
Approximation: Hypergeometric to binomial
In statistics the hypergeometric distribution is applied for testing proportions of successes in a sample.
The hypergeometric experiments consist of dependent events as they are carried out with replacement as opposed to the case of the binomial experiments which works without replacement.
However, for larger populations, the hypergeometric distribution often approximates to the binomial distribution, although the experiment is run without replacement. Because, when taking one unit from a large population of, say 10,000, this one unit drawn from 10,000 units practically does not change the probability of the next trial. It goes from 1/10,000 to 1/9,999.
As a rule of thumb, the hypergeometric distribution is applied only when the trial (n) is larger than 5% of the population size (N): Approximation from the hypergeometric distribution to the binomial distribution when N < 5% of n.
As sample sizes rarely exceed 5% of the population sizes, the hypergeometric distribution is not very commonly applied in statistics as it approximates to the binomial distribution.
Properties of the hypergeometric distribution
The hypergeometric distribution is a discrete probability distribution applied in statistics to calculate proportion of success in a finite population and:
- Finite population (N) < 5% of trial (n)
- Fixed number of trials
- 2 possible outcomes: Success or failure
- Dependent probabilities (without replacement)
Formulas and notations
The random variable of X has the hypergeometric distribution formula:
- N = Size of the total population
- K = Number of successes in the population
- N-K = Number of failures in the population
- n = number of trials
- k = number of successes observed
Examples with the hypergeometric distribution
2 aces when dealt 4 cards (small N: No approximation)
Let’s apply the formula with the example above where we are to calculate the probability of getting 2 aces when dealt 4 cards from a standard deck of 52:
There is a 0.025 probability, or a 2.5% chance, of getting two aces when dealt 4 cards from a standard deck of 52.
x=3; n=10; k=450; N=1,000 (Large N: Approximation to binomial)
What’s the probability of randomly picking 3 blue marbles when we randomly pick 10 marbles without replacement from a bag that contains 450 blue and 550 green marbles.
With the hypergeometric distribution we would say:
Let’s compare try and apply the binomial point estimate formula for this calculation:
The result when applying the binomial distribution (0.166478) is extremely close to the one we get by applying the hypergeometric formula (0.166500). The reason is that the total population (N) in this example is relatively large, because even though we do not replace the marbles, the probability of the next event is nearly unaffected.
The hypergeometric distribution with MS Excel
The Excel function =HYPERGEOM.DIST returns the probability providing:
- number of sample successes (x)
- sample size (n)
- population successes (k)
- population size (N)
The ‘2 aces example’ from above:
The ‘3 blue marbles example’ from above where we approximate to the binomial distribution.
Freelance Data Analyst
+34 616 71 29 85
Spain: Ctra. 404, km 2, 29100 Coín, Malaga
Denmark: c/o Musvitvej 4, 3660 Stenløse
Drop me a line
What are you working on just now? Can I help you, and can you help me?
Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.
Connect with me
What they say
20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.