+34 616 71 29 85 carsten@dataz4s.com

Student’s t-distribution

The Student’s t-distribution, also shortened as the ‘t-distribution’, is strongly related to the standard normal distribution.

Often, we find that the population that we wish to analyze is too large. Like during election period when we wish to analyze the votes. The population of all voters is simply too large. So, we estimate, and estimating is, of course, more uncertain than if we could measure the actual population. The t-distribution resolve for this higher degree of uncertainty.

 

 

Standard normal distribution vs t-distribution

The Student’s t-distribution is widely applied for inferential statistics as it estimates for unknown parameters. The t-distribution is a continuous probability distribution related to unknown population parameters.

Say we are about to conduct a sample from a normally distributed population. From this, we draw n independent observations and apply the test statistic formula also known as the z-score formula:

Student's t-distribution. z-score formula normal distribution

 

But σ is unknown, so…

One problems will often occur with the z-score formula, because sigma is the standard deviation of the actual population, that we often don’t know.

Like in the example of the election in a country. We will never get to know the exact votes of all voters in the given country. There are too many and we can’t go and ask that many people and handle that amount of data in that short a period. The population is immeasurable. So, we do the next best which is to take a sample.

The estimator for the population standard deviation (σ) is the sample standard deviation (s).

As s has a sampling distribution, it will vary from sample to sample. So, the formula does not have the standard normal distribution. It now has a Student’s t-distribution.

By applying the sample standard deviation (s) calculated from the sample statistics, we replace s with the σ, and thereby we get the t-statistics formula:

Student's t-distribution. t-distribution or t distribution

 

Where:

  • x̄ = sample mean
  • s = sample standard deviation:

The t-statistics formula follows the t-distribution with n-1 degrees of freedom (df).

 

From normal distribution to t-distribution

The t-distribution is very similar to the standard normal distribution, but as it works with estimators and not with constant population parameters, it has a greater variance. The values of the t-tables are consequently greater than the respective values of the z-table.

The t-distribution works with n-1 degrees of freedom and can compare with the normal standard deviation visually like this:

Student's t-distribution

 

The t-distribution has heavier tails and a lower peak compared to the standard normal distribution, because the sample variance is greater than the population variance. It has more area in the tails as a result. The higher the sample size (n) and thereby the higher the degrees of freedom (df), the more the t-distribution approximates to the standard normal distribution:

Student's t-distribution

 

Example of difference between t and normal distribution

Let’s see some specific results that illustrate the different outcomes between the normal distribution and the t-distribution:

The values in the t-table increase as n increases and reaches the same value as the z-value at n = ∞ (infinity). This is an example for a two tailed test with a significance level (α) = 0.05:

Student's t-distribution

 

As shown, the Z0.025 value is 1.96 and thus we can deduct that the relative t0.025 value must be greater than 1.96. Visualized with curves, it could look like this:

Student's t-distribution

 

Normal distribution when n > 30?

The table shows that the greater the sample size, the more the t-distribution approximates the standard normal distribution. Or in other word, the greater the sample size, the lower the margin of error.

It is typically seen in statistical textbooks that for n > 30, the normal distribution is applied instead of the t-distribution. But still, if we look at the n = 31, the t-value is 2.042 compared to the correspondent z-value of 1.96. This difference will generate some difference the results of the t- and the normal normal distribution.

Therefore, statisticians are often heard to say that the t-distribution should be applied when sample standard deviations implied. They recommend that we always apply the t-distribution regardless of the sample size whenever the population standard deviation (σ) is unknown.

 

t-distribution with MS Excel

The T.INV function can be used to find critical t-values in tests and the T.DIST to calculate the p-value.

Student's t-distribution in Excel

 

t-distribution with R programming

In R, we can use the pt and the qt functions to find probabilities and percentiles for the Student’s t-distribution:

  • pt = the distribution function
  • qt = the quantile function

The pt and the qt functions can be used to find p-value and critical values for statistics that follow a Student’s t-distribution.

Let’s run a few examples that follow a t-distribution with µ=0, s=1, degrees of freedom (df)=20 and a t-statistic of 2.5.

We wish to determine whether this is a statistically significant finding. For this we will find the p-value and also illustrate how to find the critical value with R. To solve for this, we might usually prefer using the t.test function, but for the sake of the exercise…

 

pt-function for p-value in one-tailed test

 
# t-stat=2.5, df=20
# one-sided p-value
# P(t > 2.5)
pt(q=2.5, df=20, lower.tail = F)
## [1] 0. 01061677
 
The p-value is 0.0106, so at a 5% significance level we will reject H0.

 

pt-function for p-value in two-tailed test 

Say we wish to explore if our sample statistics of 2.5 is significantly different from H0 claiming that µ=0. As it says ‘different from’ we will be looking below and above the mean – in both tails and it is therefore a two-tailed test:

# p-value for two-tailed test

pt(q=2.5, df=20, lower.tail = F) + pt(q=-2.5, df=20, lower.tail = T)
## [1] 0. 02123355
 
 pt(q=2.5, df=20, lower.tail = F)*2
## [1] 0. 02123355

 

Also, the two-tailed test leads to rejection of H0 at a 5% significance level, as our p-value (0.02) < α (0.05).

 

Critical value with the qt function

 
# Finding the critical t-value at a significance level (α) 0.05
# α=0.05 => 0.025 in each tail
qt(p=0.025, df = 20, lower.tail = T)
## [1] -2. 085963
 

Our critical values are -2.086 and 2.086, which again explain that our finding at 2.5 is significant at a 0.05 significance level leading to a rejection of H0.

  

Learning statistics

Carsten Grube

Carsten Grube

Freelance Data Analyst

0 Comments

Submit a Comment

+34 616 71 29 85

Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

Drop me a line

What are you working on just now? Can I help you, and can you help me? 

About me

Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children. 

What they say

20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.