When is a new finding significant? The hypothesis can help answer this question.

 

The hypothesis testing is composted by a null hypothesis and the alternative hypothesis and helps us to determine if a new finding is significant and thus to see if a parameter is different from what we originally assumed.

 

Example: Say we assume that the mean height of 180 cm for Scandinavian men aged 65-75, but now, a new sample results in a mean height of 183 cm. Is there a reason to think that the mean height now is more than 180? We can test this new finding against the original assumed mean value expressed with the \(H_0\) hypothesis against the alternative hypothesis, \( H_1\). We test \( H_0  \ vs. H_1\).

 

Start defining our alternative hypothesis (\( H_1\)): Suggesting a change

You might find it easiest to start off with the defining of the alternative hypothesis which expresses the “Hey, we just found a new result, which indicates that the mean value actually is different from what we originally have assumed as our mean, or “We have found that the mean might have changed”. In our example, the alternative hypothesis would thus say: We have found that the mean height could be more than 180, which can be expressed: \(H_1: \mu > 180\)

 

The null hypothesis (\( H_0\)): The “conservative”

The H0 expresses the “conservative” part of the hypothesis, saying that: There are no changes, and if there should be, it would be in the opposite direction of the new findings. Things are as they have always been, and if any changes, they are opposite to the new findings. Also, the H0 hypothesis must have an “equal to” sign (=). Our \( H_0 \) hypothesis must say “equal to” or equal to and +/-. So, back to the Scandinavian men, we would say:  In this case, our H0 would be: The mean height is 180 or less than the new and higher mean that we have found in this new sample. So \( H_0: \mu\leq 180\), and therefore our hypothesis can be expressed like this:

 

\(\displaystyle H_0: \mu\leq 180\qquad   vs. \qquad  H_1: \mu > 180 \)

 

 

The standard deviation related to the sample size

Now, we will test which of these two hypotheses to accept and which to reject. Is the new finding significant, and does it thus suggest a change?

 

The new findings in relation to the sample that it came out of. How large was the sample size, in what way was the sample carried out? How was the spread of the data: Was there a relatively large difference between the difference of heights? The spread is usually being called the standard deviation and the formula is:

 

\(\displaystyle s = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i – \overline{x})^2}\)

 

 

Where:

  • \(n \)= sample size
  • \(x_i \)= each individual data
  • \(\bar{x}\) = sample mean

 

Let’s say, that our sample gives a standard deviation of 8 cm and that our sample has been of 50 randomly selected amongst Scandinavian men aged 65-75.

 

The difference between our “original” mean and the new finding is seen in relation to the standard deviation, and the this, the standard deviation is seen in relation to the sample size, which is expressed:

 

\(\displaystyle \frac{s}{\sqrt {n}}\)

 

 

Looking for a Khan Academy: “Proof…….”

 

z-score = how many standard deviations is our finding from our assumed mean

We can now see the difference between the original mean and the new finding comparing it to the standard deviation related to the sample size. This value is called the z-score and expresses the number of standard deviations our new finding is from our mean. And with this, we can determine whether it is far enough from the mean in order to conclude, that there is a significant difference from the original to the new, so that we can say that there is a change from the original, or not. Should we reject the \(H_0\)hypothesis and conclude that there is a change or visa versa. That is what our z-score indicates. Since we have a sample size (n) larger than 30, we are running a z-statistics that follows the distribution.

 

The z-score calculation:

 

\(\displaystyle z = \frac{\bar x – \mu}{s/\sqrt{n}} \sim\mathrm{Normal}(\mu,\sigma^2)\)

 

The \(\sigma\) is unknown, but we assume that it is 7, so we can now write:

 

\(\displaystyle z = \frac{183 – 180}{8/\sqrt{50}} \sim \mathrm{Normal}(180,8^2)\)

 

 

Is the z-score significant? Is our finding significant?

The z-score value can be looked up in the z-score table, and this is then compared to the level that we have set as a level of acceptance, also called the significance level, which is defined by alpha, \(alpha \). The significance level, the alpha, \(alpha \) is being set at the same time as we define our hypothesis, as we are testing our finding up against our significance level which we could set as 5%. In our case, this would written:

 

\(\displaystyle H_{0}\qquad vs \qquad H_{1}\qquad \alpha = 5\%\)

 

Shown on the density curve, in our case the bell curve for the normal distribution:

 

In the pharma industry, a usual significance level is 1% and, in other environment, the 5% significance level is habitual.

 

One-sided test

In our case, with the heights of Scandinavian men, we are working with a one-sided one-sided test, as we are testing whether we are testing “equal to or higher than”. We are testing whether the new findings are sufficiently significant in order to state that our true mean is higher than 180. We are not saying “different from”, which would lead to a two-sided test:

 

Two-sided test

If it is “different from”, meaning that we are testing if the new findings show that there is a difference from our original mean. Is it higher or lower? This is a two-sided test and is the significance levels on the two-sided bell curve can be shown like this:

 

………….BELL CURVE – 2-sided……………..

 

 

Carsten Grube

Carsten Grube

HPd (Highly Persistant & devotional) & MMSD (Mad Math Stat Dad) approaching Master's level in mathematical statistics through self-study alongside with my promotion as full to halftime dad and freelance whatever analysis and writings

Contract me. Love to help you. Love to learn