Sample variance and standard deviation
How much time does European adults spent reading and watching news? We don’t know, nor will we know. But can make a “qualified guess”. We can run a sample and through the sample variance and standard deviation we can estimate the true population variance and standard deviation.
We don’t know, nor will we know. The population is simply too large in order to get an answer from every citizen. But make a “qualified guess”. We run a sample. Through sample statistics we can estimate the true population parameters.
To estimate the population variance and the population standard deviation we calculate the sample variance and the sample standard deviation.
This chapter is based on a normally distributed population
This chapter is on sample variance and sample standard deviation for a normally distributed population. Variance and standard deviations are also calculated for populations in the rare cases that the true population parameters are available: Population variance and standard deviation.
For not-normally distributed populations, variances and standard deviations are calculated in different ways, but the core stays the same: It’s about variety in data.
When the population is too large and/or in other ways is not accessible we will not know the exact population parameters. But we can run a sample in order to come up with a qualified answer to our question. A sample is a survey if a smaller part of the population:
Sample variance vs Population variance
We sample when we cannot measure. In other words, when the population is too large or in other ways inaccessible, we sample in the attempt to make a “qualified guess” for the population. The statistical framework considers that the sample is not the “sure thing”. It is only an estimation of the “sure thing”, or and approximation of the true parameters.
Thus, there is a higher degree of uncertainty associated with the sample results. This uncertainty is reflected when we use statistical software to calculate or simply in Excel. Using our three small datasets we see that the sample variance is greater than the population variance. See the three small datasets below in the Excel section.
Purpose of sample variance and standard deviation
As we saw in Population variance and standard deviation, the variance and the standard deviation illustrate the spread in data.
If we look only at mean and median in the intent to identify a central tendency, we might miss out on the difference that there can be in datasets. Like in the example, we see in Population variance and standard deviation where mean and median are the same for two very different datasets:
Mean and median are the same for the two datasets, but the spreads are highly different.
Formula of the sample variance
To find the variance in our sample we might intuitively do the same as when we calculate the population variance where we take the average distance from each datapoint to population mean:
When calculating the sample variance, we only need to consider that we don’t know the population mean and we therefore apply the mean that we have from our sample: Sample mean (x̄).
Also, we don’t divide by the sample size (n) but with the sample size minus one (n-1). Dividing by n-1 gives an unbiased estimate. I find Salman Khan’s explanation very good on this subject (see Learning resources below).
That leaves us with the following formula of the sample variance:
Error in the formula: the ‘x-bar-i’ to be replaced by ‘xi’
Formula of the sample standard deviation
As we know from the calculation of the population standard deviation, the standard deviation is deducted by taking the square root of the variance. We therefore get the following formula for the sample standard deviation:
In Excel we can use the following formulas for the calculation of sample variance and sample standard deviation of a normally distributed population:
- Sample variance =VAR.S
- Sample standard deviation =STDEV.S
Khan Academy videos related to sample variance and standard deviation:
Freelance Data Analyst
+34 616 71 29 85
Spain: Ctra. 404, km 2, 29100 Coín, Malaga
Denmark: c/o Musvitvej 4, 3660 Stenløse
Drop me a line
What are you working on just now? Can I help you, and can you help me?
Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.
Connect with me
What they say
20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.