+34 616 71 29 85 carsten@dataz4s.com

The binomial distribution in R

The binomial distribution in R can be applied with the functions: dbinom, pbinom, rbinom and qbinom.

 

Intro to the functions   

Let’s go through the four functions, dbinom, pbinom, rbinom and qbinom via the example of X being binomially distributed with n=15 trials and p=0.8 prob of success:

 

dbinom

dbinom can be used to find values for the probability density function of X, f(x)

# P(X=9)
dbinom(x=9, size = 15, prob = 0.8)

## [1] 0.04299262

Multiple probailities:

# P(X=6) & P(X=7) & P(X=8) & P(X=9)
dbinom(x=6:9, size = 15, prob=0.8)

## [1] 0.0006717597 0.0034547643 0.0138190573 0.0429926226

# The probability of equal to or less than 9: P( <= 9)
sum( dbinom(x=0:9, size = 15, prob=0.8) )

## [1] 0.06105143

The above sum calculation can also be done with the pbinom function:

 

pbinom

pbinom function returns values for the probability distribution function of X, F(x)

# P( <= 9)
pbinom(q=9, size = 15, prob = 0.8, lower.tail = T)

## [1] 0.06105143

 

rbinom

The ‘rbinom’ function can be used to take a random sample from a binomial distribution. For example, if we have a production line and we produce 200 of ProductX per day. Defective products are returned for re-production. There is a 7% error rate. The number of ProductX of defectives per 5-days working week can be estimated like this:

rbinom(5,200,0.07)

## [1] 14 10 19 19 12

?rbinom

## starting httpd help server … done

The rbinom can model Bernoulli trials by setting the ‘size’ (number of trials) equal to one. For example, for the outcome of 10 coin flips:

# 10 coin flips
rbinom(10, 1,.5)

##  [1] 0 0 0 1 1 0 0 0 1 1

#H or for 20 flips of 150 coins:
# 20 flips of 150 coins
rbinom(20,150,.5)

##  [1] 78 70 68 81 62 75 77 76 71 71 72 77 74 71 79 71 76 71 67 81

 

qbinom

The qbinom function calculates the inverse binomial distribution inverting the operation performed by pbinom. We provide the function specifying the percentile we want to be at or below and it will generate the number of successes associated with just that cumulative probability, for example:

qbinom(0.8,10,.5)

## [1] 6

 

 

Simulations with rbinom

The rbinom() function allows to run simulations. Simulations can, for example, be used to check our estimators. Say, we calculate a probability of 5. We can then run a number of simulations of e.g. 10.000 trials each, These simulations should approximate our calculated probability if this is correctly calculated.

 

Simulating coin flips

Let’s do a simulation of 10 coin flips with one loaded coin that has 30% chance of “head”.

# 10 flips with loaded coin
# 30% of head
# Let’s see what our simulation returns:
rbinom(10,1,.3)

##  [1] 1 0 1 0 0 1 0 0 0 0

 

Flipping 10 coins 100 times

Flipping 10 coins 100 times with probability of “heads” = 0.3

# 100 flips with 10 coins
rbinom(100,10,0.3)

##   [1] 3 5 3 1 1 2 4 4 5 3 3 2 3 4 3 2 4 3 1 5 3 1 4 1 4 3 5 7 6 6 5 4 2 1 4 2 3
##  [38] 3 3 4 3 5 3 2 2 3 5 3 4 2 2 0 2 5 2 3 2 3 3 2 3 1 1 2 3 3 3 3 4 2 3 3 3 4
##  [75] 3 4 2 2 3 2 4 4 4 3 4 4 3 3 3 3 2 4 3 5 6 3 5 3 1 3

# Checking with the mean of the simulation
# It approximates 3
mean(rbinom(100,10,.3))

## [1] 2.81

 

Calculating point estimate and cumulative density

Calculating the density point estimate is when calculate for one exact value. For example, what is the probability of getting exactly 4. The 4 is one exact value as opposed to calculating for 4 or less. The 4 or less is the cumulative density as it cumulates the probability for 0; 1; 2; 3 and 4. Likewise, it is the cumulative density when calculating for the probability of 4 or more.

 

Multiple choice test example

A multiple choice test has 10 questions with each 5 posible answers. Say we answer the 10 questions purely by guessing.

Question 1: What is the probability of getting exactly 4 correct answers?

We are being asked about the probability of getting exactly 4 correct answers, so this is the point probability which can be calculated with the point probability formula:

# Calculating point probability with dbinom
dbinom(x = 4, size = 10, prob = 0.2)

## [1] 0.08808038

# Checking with a simulation of 10,000 trials with rbinom
mean(rbinom(10000,10,.2)==4)

## [1] 0.0897

Answer 1: The probability of getting exactly 4 correct answers by answering purely by guessing is approximatly 0.0881 = 8.81%

 

Question 2: What is the probability of getting 4 or less correct answers in the above described multiple choise test?

This is equal to getting 0; 1; 2; 3 and 4. It is cumulating these four probabilities and is therefore a the cumulative density.

X follows a binomial distribution with sample size = 10 and a 1/5 probability:

# The probability of getting 4 or less, sample size 10 with a prob = 1/5 = 0.2
pbinom(4,10,.2)

## [1] 0.9672065

# or by using the dbinom
sum(dbinom(x=0:4,size = 10,prob = .2))

## [1] 0.9672065

Answer 2: The probability of getting maximum 4 correct answers is 0.9672 = 96.72%.

 

 

Sales probabilities examples

A salesperson has a success rate of 15% meaning closing a sale to 1 out of 5 potential clients with whom she contacts.

Question 1: What are the probabilities that, in 10 randomly selected client interactions, she will do:
a: Exactly 3 sales
b: Maximum 1 sale
c: At least 2 sales
d: More than 3 sales

# a: Exactly 3 sales
dbinom(x = 3, size = 10, prob = 0.15)

## [1] 0.1298337

# b: Max 1 sale
pbinom(q = 1, size = 10, prob = 0.15)

## [1] 0.5442998

# c: At least 2 sales
1-pbinom(1,10,0.15)

## [1] 0.4557002

# d: More than 3 sales
1-pbinom(3,10,0.15)

## [1] 0.0499698

 

 

Point probabilities for variables in a range

Suppose X is a binomial random variable with n=4 and p=0.2:

Question: What is the p(x) for each of the following values of X: 0,1,2,3,4?

These are the point probabilities for each of the 5 values: 0-4

# The point probabilities of 0; 1; 2; 3 and 4
dbinom(0:4, 4, 0.2)

## [1] 0.4096 0.4096 0.1536 0.0256 0.0016

 

 

Probabilities for developing antibody – example

A researcher has found that one out of every ten patients receiving a certain medication develops antibodies against this very medication.

Question 1: What is the probability that in the next 10 patients the researcher treats, none will develop antibodies against medication?

This is a point probability as it refers to one point or to one value, in this case to the point probability of 0.

# The probability of 0
# P(X=0)
dbinom(0,10,0.1)

## [1] 0.3486784

Question 2: What is the probability that at least 2 will develop antibodies?

# The probability of at least 2
# P(X => 2)
1-pbinom(1,10,0.1)

## [1] 0.2639011

 

 

Income probabilities example

Say that Statistics of Denmark announces that a median annual income of a Danish family is DKR 610K. Consider a random sample of 20 Danish households.

Question 1: What is the expected number of households with annual incomes less than DKR 610K?
Question 2: What is the probability of getting at least 16 out of the 20 households with annual incomes less than the median?
Question 3: What is the probability of getting no more than 9 with greater income than the median?
Question 4: What is the probability of getting no more than 7 with greater income than the median?

# Q1: The expected value, E(x) is np
# The median means that the probability is set to 0.5
n <- 20
p <- 0.5
n*p

## [1] 10

# Q2: P(X => 16)
1-pbinom(15,n,p)

## [1] 0.005908966

# Q3: P(X =< 9)
pbinom(9,n,p)

## [1] 0.4119015

# Q4: P(X < 7)
pbinom(6,n,p)

## [1] 0.05765915

 

 

Loaded coin flip

Let’s get back to the loaded coin that has 30% of coming up heads.

Question 1 What is the probability of flipping exactly 2 heads on 10 flips?

# P(X=2)
dbinom(2,10,0.3)

## [1] 0.2334744

# Checking with simulation of 10,000 flips with the 10 loaded coins
# Wrapping in mean() function
mean(rbinom(10000, 10, 0.3)==2)

## [1] 0.2222

Answer 1: The probability of exactly 2 heads in 10 flips is 0.2335 which is confirmed by the simulation of 10,000 trials as it approximates the 0.23

Question 2: What is the probability of minimum 5 heads?

# P(X =< 5)
1-pbinom(4, 10, 0.3)

## [1] 0.1502683

# Checking with simulation of 10,000 flips with the 10 loaded coins
# Wrapping in mean() function
mean(rbinom(10000,10,0.3)>=5)

## [1] 0.1472

 

 

Calculating expected value and variance

In the binomial distribution the expected value, E(x), is the sample size times the probability (np) and the variance is npq, where q is the probability of failure which is 1-p. 

 

 

Point probabilities, E(x) and variance

Suppose X is a binomial random variable with n=5 and p=0.5.

Question 1: Compute first p(x) for all values 0:5

# p(x)
dbinom(x=0:5, size=5, prob=0.5)

## [1] 0.03125 0.15625 0.31250 0.31250 0.15625 0.03125

Question 2: What are the expected value and thevariance?

# Expected value E(x)
n <- 5
p <- 0.5
q <- 1-p

n

## [1] 5

p

## [1] 0.5

q

## [1] 0.5

mu <- n*p
mu

## [1] 2.5

# Variance
Var <- mu*q
Var

## [1] 1.25

 

 

25 flips with loaded coin

The loaded coin with 30% chance of heads:

Question What is the expected value of 25 flips?

# Calculating the expected value E(x)
n <- 25
p <- 0.3
n*p

## [1] 7.5

# Checking with a simulation
mean(rbinom(10000, 25, .3))

## [1] 7.5269

 

Question 2 What is the variance?

# Calculating the expected value E(x)
n <- 25
p <- 0.3
q <- 1-p

n*p*q

## [1] 5.25

# Checking with a simulation
var(rbinom(10000, 25, .3))

## [1] 5.255521

 

 

Test for two proportions

We can also use pbinom and dbinom to test the probability that two proportions should be equal, or if one proportion can be expected to be greater or less than the other.

Let’s run these examples:

 

One-tailed test

22 persons participate in a simple randomly selected sample. Blindfolded they take one zip of one Glass 1 and one zip of Glass 2. Glass 1 contains the Soft drink A and glass 2 contains the Soft drink B. 17 of the participants vote that they like Soft dring A the better.

Question: Does this test provide evidence that Soft drink A is preferred over Soft drink B, or has our test result occurred by chance?

As we are testing if the proportion of preference A over B (17/20) can be expected to be significantly greater, we are running a one-talied, or right-tailed test, as we are looking to the right of the proportion mean of 0.5. In case there is no difference we would have expected Soft drink A to be binomially distributed with probability of 0.50.

To answer the question, we will test the probability of obtaining the result that we have obtained in our test:

X follows a binomal distribution with sample size 22 and a probability = 17/22:

# The probability, or the p-value, of 17/22 assuming that the mean is 0.5
# P(X => 17)
1-pbinom(q=16, size = 22, prob = .5)

## [1] 0.00845027

# The same result can be obtained with the dbinom function
sum( dbinom(x=17:22, size = 22, prob=0.5) )

## [1] 0.00845027

Answer: The probability of obtaining our observed value of 17 out of 22 or a more extreme result is 0.00845 = 0.845%. We would therefore reject the null hypothesis even at a significance level of 0.01. Our test provides evidence that A is not 0.5 and thus that there is a preference for A over B.

 

 Two-tailed test

Let’s take this example and convert it into a two-sided test:

Say we have no prior idea about if there is a preference of either A or B, or if there is no preference.

Question: Is there a preference? Or, in other words: Is the probability of A different from the probability of B?*

We will thus look both to the right and to the left of the mean proportion of 0.50 and it is therefore a two-sided test. We solve this by simply adding the two probabilities: ‘16 or less’ + ‘5 or more’:

# P-value for two-sided test
# The probability of (17-1) 16 + the prob of (22-17) 5:
1-pbinom(16,22,.5) + pbinom(5,22,.5)

## [1] 0.01690054

Answer: There is approximately 1.69% probability that we would obtain a result as extrme or more than the one we observed in our test. At a significance level greater than 1.69 we would therefore reject the H0 that there should be no difference in preferences.

The result of the two-sided test is exactly the double of the probability of the one-sided that we calculated above. We could therefore have used the dbinom multiplying with 2:

# Using dbinom for two-sided would return the same result
sum( dbinom(x=17:22, size = 22, prob=0.5) )*2

## [1] 0.01690054

 

Learning R programming

The following are the main sources for these learnings:

Carsten Grube

Carsten Grube

Freelance Data Analyst

0 Comments

+34 616 71 29 85

Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

Drop me a line

What are you working on just now? Can I help you, and can you help me? 

About me

Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children. 

What they say

20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.