The Student’s t-distribution, also shortened as the ‘t-distribution’, is strongly related to the standard normal distribution.
Often, we find that the population that we wish to analyze is too large. Like during election period when we wish to analyze the votes. The population of all voters is simply too large. So, we estimate, and estimating is, of course, more uncertain than if we could measure the actual population. The t-distribution resolve for this higher degree of uncertainty.
Standard normal distribution vs t-distribution
The Student’s t-distribution is widely applied for inferential statistics as it estimates for unknown parameters. The t-distribution is a continuous probability distribution related to unknown population parameters.
Say we are about to conduct a sample from a normally distributed population. From this, we draw n independent observations and apply the test statistic formula also known as the z-score formula:
But σ is unknown, so…
One problems will often occur with the z-score formula, because sigma is the standard deviation of the actual population, that we often don’t know.
Like in the example of the election in a country. We will never get to know the exact votes of all voters in the given country. There are too many and we can’t go and ask that many people and handle that amount of data in that short a period. The population is immeasurable. So, we do the next best which is to take a sample.
As s has a sampling distribution, it will vary from sample to sample. So, the formula does not have the standard normal distribution. It now has a Student’s t-distribution.
By applying the sample standard deviation (s) calculated from the sample statistics, we replace s with the σ, and thereby we get the t-statistics formula:
- x̄ = sample mean
- s = sample standard deviation:
The t-statistics formula follows the t-distribution with n-1 degrees of freedom (df).
From normal distribution to t-distribution
The t-distribution is very similar to the standard normal distribution, but as it works with estimators and not with constant population parameters, it has a greater variance. The values of the t-tables are consequently greater than the respective values of the z-table.
The t-distribution works with n-1 degrees of freedom and can compare with the normal standard deviation visually like this:
The t-distribution has heavier tails and a lower peak compared to the standard normal distribution, because the sample variance is greater than the population variance. It has more area in the tails as a result. The higher the sample size (n) and thereby the higher the degrees of freedom (df), the more the t-distribution approximates to the standard normal distribution:
Example of difference between t and normal distribution
Let’s see some specific results that illustrate the different outcomes between the normal distribution and the t-distribution:
The values in the t-table increase as n increases and reaches the same value as the z-value at n = ∞ (infinity). This is an example for a two tailed test with a significance level (α) = 0.05:
As shown, the Z0.025 value is 1.96 and thus we can deduct that the relative t0.025 value must be greater than 1.96. Visualized with curves, it could look like this:
Normal distribution when n > 30?
The table shows that the greater the sample size, the more the t-distribution approximates the standard normal distribution. Or in other word, the greater the sample size, the lower the margin of error.
It is typically seen in statistical textbooks that for n > 30, the normal distribution is applied instead of the t-distribution. But still, if we look at the n = 31, the t-value is 2.042 compared to the correspondent z-value of 1.96. This difference will generate some difference the results of the t- and the normal normal distribution.
Therefore, statisticians are often heard to say that the t-distribution should be applied when sample standard deviations implied. They recommend that we always apply the t-distribution regardless of the sample size whenever the population standard deviation (σ) is unknown.
t-distribution with MS Excel
The T.INV function can be used to find critical t-values in tests and the T.DIST to calculate the p-value.
t-distribution with R programming
In R, we can use the pt and the qt functions to find probabilities and percentiles for the Student’s t-distribution:
- pt = the distribution function
- qt = the quantile function
Let’s run a few examples that follow a t-distribution with µ=0, s=1, degrees of freedom (df)=20 and a t-statistic of 2.5.
We wish to determine whether this is a statistically significant finding. For this we will find the p-value and also illustrate how to find the critical value with R. To solve for this, we might usually prefer using the t.test function, but for the sake of the exercise…
pt-function for p-value in one-tailed test
# t-stat=2.5, df=20
# one-sided p-value
# P(t > 2.5)
pt(q=2.5, df=20, lower.tail = F)
##  0. 01061677
The p-value is 0.0106, so at a 5% significance level we will reject H0.
pt-function for p-value in two-tailed test
Say we wish to explore if our sample statistics of 2.5 is significantly different from H0 claiming that µ=0. As it says ‘different from’ we will be looking below and above the mean – in both tails and it is therefore a two-tailed test:
# p-value for two-tailed test
pt(q=2.5, df=20, lower.tail = F) + pt(q=-2.5, df=20, lower.tail = T)
##  0. 02123355
pt(q=2.5, df=20, lower.tail = F)*2
##  0. 02123355
Critical value with the qt function
# Finding the critical t-value at a significance level (α) 0.05
# α=0.05 => 0.025 in each tail
qt(p=0.025, df = 20, lower.tail = T)
##  -2. 085963
- Penn State Eberly College of Science (video): Student’s t Distribution
- Jbstatistics: Introduction to the t Distribution (non-technical)
- Jbstatistics: An Introduction to the t Distribution (Includes some mathematical details)
Freelance Data Analyst
+34 616 71 29 85
Spain: Ctra. 404, km 2, 29100 Coín, Malaga
Denmark: c/o Musvitvej 4, 3660 Stenløse
Drop me a line
What are you working on just now? Can I help you, and can you help me?
Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.
Connect with me
What they say
20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.