+34 616 71 29 85 carsten@dataz4s.com

Population variance and standard deviation

Population variance and standard deviation serve to describe dispersion in data. They are calculated for both populations and samples. For populations they are denoted as σ² and σ. For samples they are typically denoted s² and s or n-1 and sn-1

This chapter is based on a normally distributed population

This chapter is about population variance and population standard deviation for a normally distributed population. Variance and standard deviations are also calculated and used for inference in samples: Sample variance and standard deviation.

For not-normally distributed populations, variances and standard deviations have different formulas, but the essence is the same. Variance and standard deviations are about variety in data.

 

Mean & median vs Population variance and standard deviation

Central tendencies in datasets can be identified through mean, median and mode. However, these do not reveal the actual dispersion in data. Let’s take these two datasets as example:

Population variance and standard deviationDespite the relatively large difference between these two datasets, they share the same mean and median (6). The difference in spread is, however, relatively large. Therefore, mean and median do not reveal a realistic, or a complete, picture of the differences between the datasets.

 

Range

The range in Dataset A is 4 (8-4) and the range in Dataset B is 45 (30 – (-15)). But still, datasets that are very different can have similar ranges. Because, the range only consider the lowest and the highest values and not the in-between values.

 

Variance

The variance can take care of the problem described in Range (above). The variance expresses the spread in data as it shows the average distance from the mean.

The variance is calculated by taking the average of the square difference between each datapoint and the mean of the dataset. In other words, the mean is subtracted from each datapoint, and these differences are then added up and divided by the population size:

Population variance and standard deviation

Where:

xi = each datapoint

µ = the mean of the dataset

N = the population size

The reason for squaring the differences between each datapoint and the mean is essentially, to keep the values positive.

Variance of dataset A:

Population variance and standard deviation

The variance of dataset A is 2. Doing the same calculation for dataset B, we find that the variation is 240. This variation is not revealed through the mean, median or mode.

 

Standard deviation of a population

We might wish to get the value of the actual unit that we are using, but the variance returns a squared value. For example, if we are analyzing some length in meters, the variance will be expressed in square meters. To handle this, we take the square root of the variance and we will get the standard deviation, which is expressed in the actual units, in this case in meters.

 

 

Van example

I would love to have a camper van, and in the search for the right van, one of the parameters that I wish to look at is the length of the different van types. For example, I’ll assume that there are only four different types of vans on the market, so these four van types compose the population of van types available on the market:

 

Population variance and standard deviation

 

Listing and calculating the measures in a spreadsheet:

Population variance and standard deviation

I wish to get an overall picture of the variation in lengths between these vans, so I calculate the variance with the formula:

 

Population variance and standard deviation

Where:

xi = meters of each van

µ = the mean length of all the vans

N = the population size: the 4 vans

 

I plug data into the variance formula:

Population variance and standard deviation

 

The variance is 0.288 and the standard deviation is 0.547.

 

The σ formula

To get the population standard deviation we take the square root of the variance:

 

Population variance and standard deviation

 

 The formula says: “Sum up the distances that each datapoint has from the mean and then take the average of these distances”.

 

Population variance and standard deviation conclusion

The population variance and standard deviation provide an indication for the spread in data which is not revealed through other central tendency indicators like mean, median and mode. The Population variance and standard deviation are denoted as σ² and σ respectively.

 

Learning resources

Khan Academy video: Range, variance & standard deviation

Carsten Grube

Carsten Grube

Freelance Data Analyst

0 Comments

Submit a Comment

+34 616 71 29 85

Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

Drop me a line

What are you working on just now? Can I help you, and can you help me? 

About me

Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children. 

What they say

20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.