+34 616 71 29 85 carsten@dataz4s.com

Sample variance and standard deviation

How much time does European adults spent reading and watching news? We don’t know, nor will we know. But can make a “qualified guess”. We can run a sample and through the sample variance and standard deviation we can estimate the true population variance and standard deviation.

 

 

We don’t know, nor will we know. The population is simply too large in order to get an answer from every citizen. But make a “qualified guess”. We run a sample. Through sample statistics we can estimate the true population parameters.

To estimate the population variance and the population standard deviation we calculate the sample variance and the sample standard deviation.

 

This chapter is based on a normally distributed population

This chapter is on sample variance and sample standard deviation for a normally distributed population. Variance and standard deviations are also calculated for populations in the rare cases that the true population parameters are available: Population variance and standard deviation.

For not-normally distributed populations, variances and standard deviations are calculated in different ways, but the core stays the same: It’s about variety in data.

 

Sample

When the population is too large and/or in other ways is not accessible we will not know the exact population parameters. But we can run a sample in order to come up with a qualified answer to our question. A sample is a survey if a smaller part of the population:

 

 

Sample variance and standard deviation

 

 

Sample variance vs Population variance

We sample when we cannot measure. In other words, when the population is too large or in other ways inaccessible, we sample in the attempt to make a “qualified guess” for the population. The statistical framework considers that the sample is not the “sure thing”. It is only an estimation of the “sure thing”, or and approximation of the true parameters.

Thus, there is a higher degree of uncertainty associated with the sample results. This uncertainty is reflected when we use statistical software to calculate or simply in Excel. Using our three small datasets we see that the sample variance is greater than the population variance. See the three small datasets below in the Excel section.

 

Purpose of sample variance and standard deviation

As we saw in Population variance and standard deviation, the variance and the standard deviation illustrate the spread in data.

If we look only at mean and median in the intent to identify a central tendency, we might miss out on the difference that there can be in datasets. Like in the example, we see in Population variance and standard deviation where mean and median are the same for two very different datasets: 

Compare to mean and median

Mean and median are the same for the two datasets, but the spreads are highly different.

 

Formula of the sample variance

To find the variance in our sample we might intuitively do the same as when we calculate the population variance where we take the average distance from each datapoint to population mean: 

Population variance formula

 When calculating the sample variance, we only need to consider that we don’t know the population mean and we therefore apply the mean that we have from our sample: Sample mean (x̄).

Also, we don’t divide by the sample size (n) but with the sample size minus one (n-1). Dividing by n-1 gives an unbiased estimate. I find Salman Khan’s explanation very good on this subject (see Learning resources below).

That leaves us with the following formula of the sample variance: 

Sample variance formula

 Error in the formula: the ‘x-bar-i’ to be replaced by ‘xi’

Formula of the sample standard deviation

As we know from the calculation of the population standard deviation, the standard deviation is deducted by taking the square root of the variance. We therefore get the following formula for the sample standard deviation: 

Sample standard deviation

 

MS Excel

In Excel we can use the following formulas for the calculation of sample variance and sample standard deviation of a normally distributed population: 

  • Sample variance =VAR.S
  • Sample standard deviation =STDEV.S

Sample variance and standard deviation in Excel

 

Learning resources

Khan Academy videos related to sample variance and standard deviation:

Carsten Grube

Carsten Grube

Freelance Data Analyst

0 Comments

Submit a Comment

+34 616 71 29 85

Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

Drop me a line

What are you working on just now? Can I help you, and can you help me? 

About me

Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children. 

What they say

20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.