# Student’s t-distribution

The Student’s t-distribution, also shortened as **the ‘t-distribution’, is strongly related to the standard normal distribution**.

Often, we find that the population that we wish to analyze is too large. Like during election period when we wish to analyze the votes. The population of all voters is simply too large. So, we estimate, and estimating is, of course, **more uncertain than if we could measure the actual population. The t-distribution resolve for this higher degree of uncertainty.**

**On this page**hide

## Standard normal distribution vs t-distribution

The Student’s t-distribution is **widely applied for inferential statistics** as it estimates for unknown parameters. The t-distribution is a continuous probability distribution related to unknown population parameters.

**Say** we are about to conduct a sample from a normally distributed population. From this, we draw *n* independent observations and apply the test statistic formula also known as the **z-score formula:**

** **

## But σ is unknown, so…

**One problems will often occur with the z-score formula**, because sigma is the standard deviation of the actual population, that we often don’t know.

Like in the **example of the election in a country**. We will never get to know the exact votes of all voters in the given country. There are too many and we can’t go and ask that many people and handle that amount of data in that short a period. **The population is immeasurable**. **So, we do the next best which is to take a sample**.

The estimator for the population standard deviation (σ) is the sample standard deviation (s).

As *s* has a sampling distribution, it will vary from sample to sample. So, the formula does not have the standard normal distribution. It now has a Student’s t-distribution.

By applying the sample standard deviation (s) calculated from the sample statistics, we replace *s *with the *σ*, and thereby we get the t-statistics formula:

Where:

- x̄ = sample mean
- s = sample standard deviation:

The t-statistics formula follows the t-distribution with n-1** degrees of freedom (df).**

## From normal distribution to t-distribution

The t-distribution is very similar to the standard normal distribution, but as it works with estimators and not with constant population parameters, it has a **greater variance**. The values of the t-tables are consequently greater than the respective values of the z-table.

The t-distribution works with n-1 degrees of freedom and can compare with the normal standard deviation visually like this:

The t-distribution has heavier tails and a lower peak compared to the standard normal distribution, because the sample variance is greater than the population variance. It has more area in the tails as a result. The higher the sample size (n) and thereby the higher the degrees of freedom (df), the more the t-distribution approximates to the standard normal distribution:

## Example of difference between t and normal distribution

Let’s see some specific results that illustrate the different outcomes between the normal distribution and the t-distribution:

The values in the t-table increase as *n* increases and reaches the same value as the z-value at *n* = ∞ (infinity). This is an example for a two tailed test with a significance level (α) = 0.05:

As shown, the *Z*_{0.025} value is 1.96 and thus we can deduct that the relative *t*_{0.025} value must be greater than 1.96. **Visualized** with curves, it could look like this:

## Normal distribution when n > 30?

The table shows that the greater the sample size, the more the t-distribution approximates the standard normal distribution. Or in other word, the greater the sample size, the lower the margin of error.

It is typically seen in statistical textbooks that for n > 30, the normal distribution is applied instead of the t-distribution. But still, if we look at the n = 31, the t-value is 2.042 compared to the correspondent z-value of 1.96. This difference will generate some difference the results of the t- and the normal normal distribution.

Therefore, statisticians are often heard to say that the t-distribution should be applied when sample standard deviations implied. They recommend that we always apply the t-distribution regardless of the sample size whenever the population standard deviation (σ) is unknown.

## t-distribution with MS Excel

The **T.INV** function can be used to find critical t-values in tests and the **T.DIST** to calculate the p-value.

## t-distribution with R programming

In R, we can use the *pt *and the *qt* functions to find **probabilities** and **percentiles** for the Student’s t-distribution:

- pt = the distribution function
- qt = the quantile function

The *pt *and the *qt *functions can be used to find **p-value** and **critical values** for statistics that follow a Student’s t-distribution.

Let’s run a few **examples** that follow a t-distribution **with µ=0, s=1, degrees of freedom (df)=20 and a t-statistic of 2.5. **

We wish to determine whether this is a statistically significant finding. For this we will find the p-value and also illustrate how to find the critical value with R. To solve for this, we might usually prefer using the * t.test function*, but for the sake of the exercise…

### pt-function for p-value in one-tailed test

##### # t-stat=2.5, df=20

##### # one-sided p-value

##### # P(t > 2.5)

**pt(q=2.5, df=20, lower.tail = F)**

##### ## [1] 0. 01061677

##### The p-value is 0.0106, so at a 5% significance level we will reject H0.

### pt-function for p-value in two-tailed test

Say we wish to explore if our sample statistics of 2.5 is significantly different from H0 claiming that µ=0. As it says ‘different from’ we will be looking below and above the mean – in both tails and it is therefore a two-tailed test:

# p-value for two-tailed test

**pt(q=2.5, df=20, lower.tail = F) + pt(q=-2.5, df=20, lower.tail = T)**

##### ## [1] 0. 02123355

** pt(q=2.5, df=20, lower.tail = F)*2**

##### ## [1] 0. 02123355

Also, the two-tailed test leads to rejection of H0 at a 5% significance level, as our p-value (0.02) < α (0.05).

### Critical value with the qt function

##### # Finding the critical t-value at a significance level (α) 0.05

##### # α=0.05 => 0.025 in each tail

**qt(p=0.025, df = 20, lower.tail = T)**

##### ## [1] -2. 085963

Our critical values are -2.086 and 2.086, which again explain that our finding at 2.5 is significant at a 0.05 significance level leading to a rejection of H0.

** **** **

## Learning statistics

- Penn State Eberly College of Science (video): Student’s t Distribution
- Jbstatistics: Introduction to the t Distribution (non-technical)
- Jbstatistics: An Introduction to the t Distribution (Includes some mathematical details)

#### Carsten Grube

Freelance Data Analyst

##### Normal distribution

##### Confidence intervals

##### Simple linear regression, fundamentals

##### Two-sample inference

##### ANOVA & the F-distribution

+34 616 71 29 85

Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

**Drop me a line**

*What are you working on just now? Can I help you, and can you help me? *

**About me**

Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.

**Connect with me**

**What they say**

20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.

## 0 Comments