Population variance and standard deviation
Population variance and standard deviation serve to describe dispersion in data. They are calculated for both populations and samples. For populations they are denoted as σ² and σ. For samples they are typically denoted s² and s or s²n-1 and sn-1
This chapter is based on a normally distributed population
This chapter is about population variance and population standard deviation for a normally distributed population. Variance and standard deviations are also calculated and used for inference in samples: Sample variance and standard deviation.
For not-normally distributed populations, variances and standard deviations have different formulas, but the essence is the same. Variance and standard deviations are about variety in data.
Mean & median vs Population variance and standard deviation
Central tendencies in datasets can be identified through mean, median and mode. However, these do not reveal the actual dispersion in data. Let’s take these two datasets as example:
Despite the relatively large difference between these two datasets, they share the same mean and median (6). The difference in spread is, however, relatively large. Therefore, mean and median do not reveal a realistic, or a complete, picture of the differences between the datasets.
The range in Dataset A is 4 (8-4) and the range in Dataset B is 45 (30 – (-15)). But still, datasets that are very different can have similar ranges. Because, the range only consider the lowest and the highest values and not the in-between values.
The variance can take care of the problem described in Range (above). The variance expresses the spread in data as it shows the average distance from the mean.
The variance is calculated by taking the average of the square difference between each datapoint and the mean of the dataset. In other words, the mean is subtracted from each datapoint, and these differences are then added up and divided by the population size:
xi = each datapoint
µ = the mean of the dataset
N = the population size
The reason for squaring the differences between each datapoint and the mean is essentially, to keep the values positive.
Variance of dataset A:
The variance of dataset A is 2. Doing the same calculation for dataset B, we find that the variation is 240. This variation is not revealed through the mean, median or mode.
Standard deviation of a population
We might wish to get the value of the actual unit that we are using, but the variance returns a squared value. For example, if we are analyzing some length in meters, the variance will be expressed in square meters. To handle this, we take the square root of the variance and we will get the standard deviation, which is expressed in the actual units, in this case in meters.
I would love to have a camper van, and in the search for the right van, one of the parameters that I wish to look at is the length of the different van types. For example, I’ll assume that there are only four different types of vans on the market, so these four van types compose the population of van types available on the market:
Listing and calculating the measures in a spreadsheet:
I wish to get an overall picture of the variation in lengths between these vans, so I calculate the variance with the formula:
xi = meters of each van
µ = the mean length of all the vans
N = the population size: the 4 vans
I plug data into the variance formula:
The variance is 0.288 and the standard deviation is 0.547.
The σ formula
To get the population standard deviation we take the square root of the variance:
The formula says: “Sum up the distances that each datapoint has from the mean and then take the average of these distances”.
Population variance and standard deviation conclusion
The population variance and standard deviation provide an indication for the spread in data which is not revealed through other central tendency indicators like mean, median and mode. The Population variance and standard deviation are denoted as σ² and σ respectively.
Khan Academy video: Range, variance & standard deviation
Freelance Data Analyst
+34 616 71 29 85
Spain: Ctra. 404, km 2, 29100 Coín, Malaga
Denmark: c/o Musvitvej 4, 3660 Stenløse
Drop me a line
What are you working on just now? Can I help you, and can you help me?
Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.
Connect with me
What they say
20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.