What is the number of cars passing by every hour? Or the number of claims received every month in a company? Or the number of calls received by a telemarketing center during a day? Or the number of asteroids of a certain volume that collide every million year with the earth? These questions can be answered using the Poisson distribution.
Properties of the Poisson distribution
The properties of the Poisson distribution have relation to those of the binomial distribution:
- The count of events that will occur during the interval k being usually interval of time, a distance, volume or area.
- The average rate at which events occur is constant
- The occurrence of one event does not affect the other events. They are independent events, and they must occur at random.
- Different events cannot occur at the exact same time. Each event has its own sub-interval (for example, minutes, seconds, decimal of seconds, etc.).
- 2 possible outcomes: the event occurs, or it does not occur.
Example: How many visits per hour?
Say we count the numbers of person passing the doorstep to our shop per hour. We must assume that the number of visits is the same on a Saturday afternoon as a Monday morning, which might not be the case. Since the average rate at which the events occur in a Poisson distribution is constant, this assumption is needed.
Why not the binomial distribution?
Let’s see what happens if we use the binomial distribution for this example: We would say that the number of trials (n) is each minute in an hour, so n= 60. Say that we previously have carried out this kind of counts of the number of visitors to our shop, and that we find that the average number of visitors per hour is 11. With the binomial distribution we would express:
The probability of one person passing every one second is 0.183 which equals 11 persons per hour, and we could calculate the probability of k success with the binomial point probability formula:
Our trial (n) is an observation of one minute, and when a person steps over the doorstep during this minute, this event is interpreted as a success. But what if 3 persons step over the doorstep during this minute? This would still be registered as only 1 success.
To solve this, we could divide a minute into seconds, and we would then “catch” all 3 persons having now 60 minutes x 60 seconds per hour = 3600 seconds. We could then plug 3600 into our formula for n:
But now, what if a couple, or another two-person combination, cross the doorstep during 1 second? What if each person crosses the doorstep each in their half-second? Then we would split seconds into half-seconds doubling our n from 3600 to 7200:
We can keep dividing the trials into half-seconds, quarter of a second, etc. We can keep adding more persons/events to each interval/trial granulating the trials into constantly smaller intervals, and we will end up getting the Poisson distribution.
Very briefly it can be said that we can approximate from the binomial distribution to the Poisson distribution as n approximates infinity and p approximates 0. Or in other words, with relatively large n and relatively small p, we can approximate to Poisson from the binomial distribution.
Poisson distribution formulas
The Poisson distribution is given by these formulas:
Where x can take on any value of positive integers: 1,2,3… and where Lambda is the mean and where mean and variance are equal.
To express that X follows the Poisson distribution we can write one of these notations:
Statistical problems with the Poisson distribution
Let’s see how to apply the Poisson distribution to solve statistical problems staying in our shop and with our example above:
Say we are now interested in number of sales per opening hour. We have observed an average of 2.1 sales per opening hour. We wish to find the probabilities of doing 0;1;2;3 and 4 sales for a given hour during the day. Say we wish to calculate for the next opening hour:
The probabilities of 0;1;2;3 and 4 sales for the next opening hour:
As shown in the cumulated column, and as it can be calculated by adding up five probabilities: p(0)+p(1)+p(2)+p(3)+p(4), there is a 93.8% of doing at least 0 and at most 4 sales during the next opening hour.
What if we are not sure about our sample observation of 2.1 sales per hour? Maybe we are selling a little more now. What if our true Lambda is 2.5 or 3 or 3.5? This would take the same calculations as we did above. Only, we would substitute Lambda with the respective values and it would show as following:
That, for example, if our true Lambda is 3.0, then we would have a 16.8% probability of doing 4 sales during the next opening hour.
The Poisson distribution with MS Excel
The Excel function =POISSON.DIST can be used to calculate point probabilities for different Lambda values leaving the ‘cumulative’ argument to ‘FALSE’. Leaving this to ‘TRUE’ will return the cumulative values for the given range.
The Poisson distribution with R statistical programming
Let’s look at the R functions:
- dpois finds values for the probability density function of X, f(x)
- ppois returns probabilities associated with the probability distribution function, F(x)
- rpois can be used to take a random sample
- qpois finds quantiles for the Poisson distribution
The dpois function finds values for the probability density function of X, f(x).
Let’s take the example of X following a Poisson distribution with a known rate of lambda = 7:
dpois(x=4, lambda = 7)
##  0.09122619
# P(X=0) & P(X=1) &…& P(X=4)
##  0.000911882 0.006383174 0.022341108 0.052129252 0.091226192
# P(X <= 4)
# this can be done with dpois:
sum( dpois(x=0:4, lambda=7) )
##  0.1729916
and it can be done with the ppois function:
The ppois function returns probabilities associated with the probability distribution function, F(x)
# P(X <= 4)
ppois(q=4, lambda = 7, lower.tail = T)
##  0.1729916
# P(X >= 12)
ppois(q=12, lambda = 7, lower.tail = F)
##  0.02699977
The rpois function takes a random sample from a Poisson distribution and be applied for as for modelling the number of expected events occurring within a determined time interval:
# Say that we wish to take a random sample of 10 from a Poisson distribution with a known rate of lambda = 7. It could be that we have observed a number of 7 customers entering a shop per minute and that we wish to generate a sumulation of the number of customers per minute for the next 10 minutes:
##  4 7 7 4 7 9 9 9 8 9
Our sample shows 10 customers the first minute, 5 customers the second, 3 the thir, 5 the fourth and so on. The variaion in the expected numbers are modeled by the Poisson distribution.
The qpois function finds quantiles for the Poisson distribution. As such, it is the inverse of the operation performed by ppois. We percentile and it generates the number of events associated with that cumulative probability:
##  5
My shop example: We have observed an average sale of 2.1 per opening hour, X ~ Poisson(λ=2.1)
Question 1: What are the different probabilities of selling one, two, three or four during the next opening hour?
# Answer to Q1: Selling between 0, 1, 2, 3 or 4:
##  0.12245643 0.25715850 0.27001642 0.18901150 0.09923104
Hence, the probability of sales for the next opening hour: of 0 sales =12%; of 1 sale 26%. etc. assuming that lambda is known as 2.1.
Question 2: What is the probability of doing between 0 and 4 sales during the next opening hour?
# Answer to Q2: Selling between 0 and 4:
ppois(q=4, lambda = 2.1, lower.tail = T)
##  0.9378739
sum( dpois(x=0:4, lambda=2.1) )
##  0.937873
So, there is a 93.8% probability of doing 4 sales or less during the next opening hour.
I created my web page with the WordPress platform using the theme, Divi from Elegant Themes. To publish the mathematical and statistical expressions and for the run-through of the R function, I use RMardown which I then publish as a Rpubs. This is the link to the Rpubs for this page on the Poisson distribution: https://rpubs.com/CarstenGrube/575354
I find the two Khan Academy videos (below) very helpful: Salman Khan explains this “road” from the binomial distribution to the Poisson distribution and my example above is inspired from these videos although with a different story and different values.
Freelance Data Analyst
+34 616 71 29 85
Spain: Ctra. 404, km 2, 29100 Coín, Malaga
Denmark: c/o Musvitvej 4, 3660 Stenløse
Drop me a line
What are you working on just now? Can I help you, and can you help me?
Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.
Connect with me
What they say
20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.