Hypothesis testing, also known as significance testing, is often applied to hold a new finding against an existing assumption. Typically, if a new sample mean can be considered as significant or not against the assumed mean.
The null hypothesis is the conservative statement: “There are no changes!” and the alternative hypothesis statest the opposite: “The new finding does give evidence that there is a change in the assumed mean”. Hypothesis testing is about rejecting or failing to reject this null hypothesis. Are the new findings significant or not?
Key points on hypothesis testing
- Hypothesis testing can be applied to test some value, often a new finding, against the existing assumed value.
- The test leads to rejecting or to failing to reject the null hypothesis
- Hypothesis testing answers to the questions: Is the new finding significant? Does it lead to believe that the mean estimate is different from the originally assumed mean?
- There is an ongoing debate about whether p-value should be applied rather than the if the finding falls within or outside the significance level (α).
Say that a statistician conducts a sample of the weight of certain African parrot that lives freely in nature. The assumed mean weight is 420 grams. But she now runs a new sample of 15 parrots that comes out with a sample mean (x̄) of 412.05 grams.
She wonders if this sample mean of 412.05 grams gives reason to state that the assumed mean is lower than the 420. Hence, she conducts a hypothesis test to find an answer to the question that can be asked with different formulations:
- Is the 412.05 a “significant” finding?
- Does it provide evidence enough to support the alternative hypothesis that the mean is lower than 420?
- Does the new finding fall outside our significance level (α) and can we thereby reject the null hypothesis?
Visualized in a normal bell curve:
Stating the hypotheses
This is how the null hypothesis and the alternative hypothesis are stated and expressed:
The null hypothesis
The null hypothesis represents the “conservative” posture: “Things are as they have always been”. In our parrot example: The mean is still 420, despite the new findings.
The null hypothesis must have an equal to (=):
- equal to
- less than or equal to
- greater than or equal to
The alternative hypothesis (HA)
The hypothesis test is expressed and denoted with the null hypothesis and the alternative hypothesis at a given significance level (α):
The two hypotheses are mutually exclusive, and one of them will always be true.
The alternative hypothesis will always state the alternative to the null hypothesis and thus always state the opposite.
Considerations when testing
When conducting a hypothesis test, we will consider:
- What probability distribution does data follow?
- Are we conducting a hypothesis test for a mean or a proportion?
- Should the test be one-tailed or two-tailed
In our parrot example the population variance (α) is unknown, as we cannot go out and measure each and every of the parrots that exist. We don’t know the true population parameters.
Procedure for hypothesis testing
The statistician will typically follow these steps when running the hypothesis test
- State null hypothesis (H0)
- Specify alternative hypothesis (HA)
- Set significance level (α)
- Calculate test statistics
- Draw conclusion (H0 rejected or not)
Step 1+2: State H0 & HA
The null hypothesis represents the “conservative” posture claiming that our existing value is still the correct and that nothing has changed. The alternative hypothesis represents the change claiming that our new finding indicates that our estimated population mean (x̄) has changed.
Our parrot example would be denoted with a null hypothesis saying that our mean weight stays unchanged, as it is 420 or greater. The alternative hypothesis would say, that “No, we have a finding with significance enough to proof that there is a change in the mean weight, it is less than 420”:
Step 3: Significance level (σ)
The significance level (σ) is usually is defined. It is usually set to 0.01; 0.05 or 0.10 and states the critical value. If test statistics fall beyond this limit, the null hypothesis is rejected.
Say our parrot analyst sets the significance level (σ) to 0.05. This will give a 5% risk that our she commits the error of rejecting a true hypothesis (a so-called ‘Type I Error’). More on Type I and II Errors in Power of test, Type I & II Errors.
Step 4: Calculate test statistics
Continuing our parrot example, let’s add a few assumptions:
- The true population is too large to measure and can therefore only be estimated through sample statistics
- Data is normally distributed
- Sample size (n) < 30
The sample mean is 412.05 with a sample standard deviation (s) of 12.21. To calculate the test score, the statistician will hold the new findings up against the “old” assumption.
The 412 against the 420. The difference between these two values is seen in relation to the sample standard deviation seen in relation to the sample size: 12.21 grams:
Our t-score is -2.523 and looking up the df=14 at α=0.05 in a t-table, we find a t-score of -1.761. Visualizing our scenario:
For calculation of critical value, please see ‘Finding the critical value’ (below).
Step 5: Conclusion (H0 rejected or not)
Our -2.523 is beyond threshold of -1.761, so we reject the null hypothesis as we have sufficient prove to support our alternative hypothesis stating that the mean is less than 420 based on our new findings. Therefore, the new findings are determined as ‘significant’.
The p-value is 0.012 which means that there is only a 1.2% chance that we would have found a result as extreme as the one we found (412.05) assuming that the null hypothesis is true. And we had initially set the significant level to 0.05 meaning that we would reject H0 for any p-values lower than the 0.05.
Had the findings returned a p-value greater than 0.05, our analyst would have failed to reject the null hypothesis and we could no have concluded that there was a change in the assumed mean.
Finding the critical value
To get more nuances of our overall picture we can also calculate form off what sample mean we would have failed to reject the null hypothesis. In other words, what would be the lowest sample mean that would lead to not rejecting the null hypothesis?
This is called the critical value. As we have looked up the critical t-score value to be -1.1761, we can, with some simple algebra, find the critical value:
Our sample mean was 412.05 and we would reject the null hypothesis for any values below 414.45.
Hypothesis testing with Excel
The =T-TEST and =T.DIST functions both calculate the p-value which you can compare to the alpha level for taking the decision of rejecting or failing to reject the null hypothesis.
The =T-TEST is for when you have an array of data. Array 1 is the array of your sample observations. Array 2 is null hypothesis. You will have to write the value for your null hypothesis twice in order for it to be an array.
The =T.DIST function is for when you apply the single values for your t-score:
Hypothesis testing in R
Here I run through hypothesis testing in R:
- One-sample t-test in R
- Two-sample t-test in R
Some of my preferred pages and videos for learnings on hypothesis testing:
- Stat Trek (text): How to test hypotheses
- Statistics How To (text): Hypothesis testing
- Statistics How To (video 4:53 min): Hypothesis testing Example # 1 z test
- Dr Nic’s Maths and Stats (video 7:37 min): Understanding Hypothesis testing, p-value, t-test – Statistics Help
Freelance Data Analyst
+34 616 71 29 85
Spain: Ctra. 404, km 2, 29100 Coín, Malaga
Denmark: c/o Musvitvej 4, 3660 Stenløse
Drop me a line
What are you working on just now? Can I help you, and can you help me?
Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.
Connect with me
What they say
20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.