In statistics a density curves are used to visualize how data is distributed. The density curves illustrate the probability distribution of data.
Area under the density curve = 1.0
The area below the density curve is the whole of all probabilities in the given probability distribution. This area is therefore equal to 1.0. The density curve can represent the relative frequency of a dataset and can visualize datasets of any size. The area below the density curve will always be 1.0.
The density curve visualized in relation to mean, standard deviation and proportion of the area:
From histogram to density curve
A density curve can be born out of a histogram:
20 datapoints: Say we have a dataset consisting of 20 observations or datapoints with a mean of 70 and with a standard deviation of 10:
500 datapoints: If the above dataset instead of 20 datapoint has 500 datapoints, it might not make sense to know the intervals in absolute numbers, but rather in percentage. Therefore, we would do the histogram with relative frequencies.
The relevant frequency calculates for a percentage of the sum of all frequencies (frequency divided by sum of all frequencies):
2 million datapoints: Say that the dataset above with mean 70 and standard deviation 10 has 1 million datapoints and that each datapoint can have an infinity of decimals.
For example, two different datapoints can be 52.157952649001 and 75.0265977465998. With 2 million different datapoints squeezed into our interval between the minimum and the maximum limits the datapoints start becoming a mass rather than individual points.
So, we start seeing it as a mass or as an area rather than boxed up and limited bins. Therefore, an upper line can be a more adequate way of limiting this area, instead of beans with vertical limits.
We limit upwards, and the areas to the left, right and downwards are already limited by min-value, max-value and x-axis. Thus, we only need the upper line to illustrate the area. This upper line is therefore our density curve:
You might have noticed that the curve above should have been a little more open to the left showing that there must be a little gab from the x-axis to the curve whereas there is a “fair proportion” data that fall to the left of 55 where it reaches the x-axis.
Another way of understanding the density curve: As the graph is limited to the right (max-value), to the left (min-value) and downwards (x-axis), we only need an upper limit in order to close in the area. This upper line is the density curve.
Reading density curves
What percentage of the data falls between 70 and 100? Or what percentage of the datapoints are more than 90? Or less than 60? And so on. This can quickly be estimated by a quick look at the density curve. Statistical software and even Excel can give the exact values:
Density curves in Excel
Just like other statistical graphics and procedures you might find that Excel is not the right place to develop, and you would usually use R programming and perhaps the ggplot2 package. But Travis’ Blog, thydzik.com, has this step-by-step tutorial for how to do histogram with normal distribution overlay in Excel.
Some of my preferred learning material on density curves:
- Khan Academy video: Density curves
Freelance Data Analyst
+34 616 71 29 85
Spain: Ctra. 404, km 2, 29100 Coín, Malaga
Denmark: c/o Musvitvej 4, 3660 Stenløse
Drop me a line
What are you working on just now? Can I help you, and can you help me?
Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.
Connect with me
What they say
20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.