The Chi-square test is also called “Goodness of fit”, as it compares fit of the observed sample data with the expected data. The chi-square test analyzes the dependence between different categorical datasets. Example: Do more women than men vote for some political party?
Key points for chi-square test
- The Chi-square distribution is applied when testing for dependence between categorical datasets
- It works with discrete and mutually exclusive data
Chi-square test worked example
Say that a HR department, as part of ongoing training program, is running a periodical test among the employees on their product knowhow. A multiple-choice test is used for the purpose. The test has four possible answers: A, B, C and D. The test producers claim that there is an equal probability that the correct answer is either four of these.
Some of the HR staffs get curious and wish to test the truth of this claim: Is the probability of a correct answer really equal between A, B, C and D?
We would define the following chi-square hypotheses:
H0: The correct choices are equally distributed (A: 25%, B: 25%, C: 25%, D: 25%)
H1: The correct choices are not equally distributed
Let’s set a significance level (α) of 0.05.
Purpose of the chi-square hypothesis test
The Chi-square test is a hypothesis tests and follow the same procedure and concepts, so we will reject the null hypothesis in case our p-value is lower than our significance level (α). Rejecting the null hypothesis, in our case, will mean that we reject that the probabilities can be equally distributed between the four options A, B, C and D.
Expected vs observed data
HR take a sample of 100 tests randomly selected from the past years of testing with this method. As the null hypothesis says that the correct answers are equally distributed, we expect 25 correct answers for each of the four questions. These expected values are compared to the once observed in the sample and we can make the following contingency table:
Now, how can we calculate if the result that we got from our sample is more extreme that what our significant level allows for? How can we know if our sample result is “statistically significant”? We apply the Chi-square distribution:
With the Chi-square test we can calculate for dependence or independence between different categories, in our case between A, B, C and D. It works with mutually exclusive data, meaning that if for a question the correct choice is A, then it cannot be D at the same time.
Data in the Chi-square distribution is countable and therefore discrete. It is countable data. We can count each question and choice as a whole integer.
The Chi-square distribution is denoted with the Greek letter Chi squared: ꭓ2. To calculate the Chi-squared statistics, we calculate the sum of the squared differences between the observed and the expected values. This value is the related to the expected value. Thus, we get the formula for Chi-square statistics:
The Chi-square formula explained
To find the distance between the observed and the expected, we subtract the expected value from the observed. This is also called “residual”.
The differences are squared in order to obtain only positive values and are divided by the expected value in order to normalize independently of the number of counts. Otherwise, the Chi-square statistic would increase with the number of counts, so for large datasets we would get large statistics. This is the idea about normalizing or standardizing (ref. the Z-score chapter).
The operation is carried out for each count, or in our situation, for each row and then added up. The adding up of each count is expressed by the large sigma in front of the formula:
Our calculated value, or our test statistic is 6.0. This is now tested against the corresponding value from the chi-square table which is found by looking up under the degrees of freedom:
Degrees of freedom
The degrees of freedom (df) is the number of category values, or cells in our table, that are independent. If we have the totals, and four values that add up to the total, filling in 3 cells will let us know what the fourth value is.
For example, in our table, for the observed data, we would know that the value for D must be equal to 20, knowing that A+B+C+D = 100. Knowing the values for A, B and C and that they, together with D will add up to 100, tells that D must be 20. So, D, in this case, is not free to vary. A degree of freedom is lost.
So, the values for A, B and C can be any values, but D must be the missing puzzle that makes all four add up to 100. In mathematical terms this expresses that A, B and C are free to vary. They are independent and free to vary and therefore express the degrees of freedom.
The degrees of freedom for Chi-square tables, like the table in our example, is (Row-1) × (Column-1). We express this as r-1, c-1. In our table, we have four rows and two columns, so our degree of freedom is (4-1) × (2-1) = 3.
Looking up in the Chi-square table
To look up the p-value in the Chi-square distribution table, we look at the row of degrees of freedom (df) = 3 and follow the line to the column that corresponds with our significance level (α) of 5%.
At df=3 and α=0.05, we find a critical value of 7.81. Visualizing this with the Chi-square probability density curve for df=3 compared to our test statistic of 6.0:
Chi-square test conclusion
We find a critical value of 7.81 which is greater than our 6.00. So, we fail to reject the null hypothesis concluding that, based on our sample results, we cannot reject that the choices are equally distributed.
We recall that we do not conclude that the H0 is the actual result. Failing to reject the H0 only means that we cannot reject that it could be true. We do not conclude that it is, in fact, true. In fact, our 6.00 is “pretty” close to the critical value of 7.28. And from the table, we can read that 6.00 is little more than 10%, because the 0.1 column at df=3 returns 6.25.
So, we get a p-value of a little greater than 10%. This means that there is “a little” more than 10% probability that we will get as extreme a result as the one we got at 6.0. Or expressed as: “There is + 10% chance of getting 6.00 or more”.
Visualizing multiple chi-square distributions
The following graph shows multiple chi-square distributions with each of their different degree of freedom:
Chi-square test with MS Excel
The CHISQ.TEST and CHISQ.INV and CHISQ.DIST functions in Excel return values in the Chi-square distribution and available from off the Excel 2010 version and later.
The Excel function CHISQ.TEST conducts a Chi-square test on the array of observed values and on the array of expected frequencies. It returns the p-value and thereby the probability that our result is due to chance or sampling error.
The CHISQ.INV returns the critical value or the inverse of the left-tailed probability:
The Excel function CHISQ.DIST with the arguments (x,df,cumulative=TRUE) returns the cumulative distribution function.
When cumulative set to ‘FALSE’ (x,df,cumulative=FALSE) it returns the probability density function. ‘x’ is the calculated test statistic which for Chi-square statistics is ∑(O-E)2/E
- Khan Academy (video 8:25): Chi-square statistic for hypothesis testing
- Khan Academy (video 11:45): Pearson’s chi square test (goodness of fit)
- Jbstatistcs (video 9:53): Chi-square Tests of Independence (Chi-square Tests for Two-Way Tables)
- Penn State Eberly College of Science (text page): Goodness-of-Fit test
Freelance Data Analyst
+34 616 71 29 85
Spain: Ctra. 404, km 2, 29100 Coín, Malaga
Denmark: c/o Musvitvej 4, 3660 Stenløse
Drop me a line
What are you working on just now? Can I help you, and can you help me?
Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.
Connect with me
What they say
20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.