+34 616 71 29 85 carsten@dataz4s.com
Select Page

# Chi-square test

The Chi-square test is also called “Goodness of fit”, as it compares fit of the observed sample data with the expected data. The chi-square test analyzes the dependence between different categorical datasets. Example: Do more women than men vote for some political party?

## Key points for chi-square test

• The Chi-square distribution is applied when testing for dependence between categorical datasets
• It works with discrete and mutually exclusive data

## Chi-square test worked example

Say that a HR department, as part of ongoing training program, is running a periodical test among the employees on their product knowhow. A multiple-choice test is used for the purpose. The test has four possible answers: A, B, C and D. The test producers claim that there is an equal probability that the correct answer is either four of these.

Some of the HR staffs get curious and wish to test the truth of this claim: Is the probability of a correct answer really equal between A, B, C and D?

We would define the following chi-square hypotheses:

H0: The correct choices are equally distributed (A: 25%, B: 25%, C: 25%, D: 25%)

H1: The correct choices are not equally distributed

Let’s set a significance level (α) of 0.05.

## Purpose of the chi-square hypothesis test

The Chi-square test is a hypothesis tests and follow the same procedure and concepts, so we will reject the null hypothesis in case our p-value is lower than our significance level (α). Rejecting the null hypothesis, in our case, will mean that we reject that the probabilities can be equally distributed between the four options A, B, C and D.

## Expected vs observed data

HR take a sample of 100 tests randomly selected from the past years of testing with this method. As the null hypothesis says that the correct answers are equally distributed, we expect 25 correct answers for each of the four questions. These expected values are compared to the once observed in the sample and we can make the following contingency table: Now, how can we calculate if the result that we got from our sample is more extreme that what our significant level allows for?  How can we know if our sample result is “statistically significant”? We apply the Chi-square distribution:

## Chi-square distribution

With the Chi-square test we can calculate for dependence or independence between different categories, in our case between A, B, C and D. It works with mutually exclusive data, meaning that if for a question the correct choice is A, then it cannot be D at the same time.

Data in the Chi-square distribution is countable and therefore discrete. It is countable data. We can count each question and choice as a whole integer

The Chi-square distribution is denoted with the Greek letter Chi squared: ꭓ2. To calculate the Chi-squared statistics, we calculate the sum of the squared differences between the observed and the expected values. This value is the related to the expected value. Thus, we get the formula for Chi-square statistics: ## The Chi-square formula explained

To find the distance between the observed and the expected, we subtract the expected value from the observed. This is also called “residual”.

The differences are squared in order to obtain only positive values and are divided by the expected value in order to normalize independently of the number of counts. Otherwise, the Chi-square statistic would increase with the number of counts, so for large datasets we would get large statistics. This is the idea about normalizing or standardizing (ref. the Z-score chapter).

The operation is carried out for each count, or in our situation, for each row and then added up. The adding up of each count is expressed by the large sigma in front of the formula: Our calculated value, or our test statistic is 6.0. This is now tested against the corresponding value from the chi-square table which is found by looking up under the degrees of freedom:

Degrees of freedom

The degrees of freedom (df) is the number of category values, or cells in our table, that are independent. If we have the totals, and four values that add up to the total, filling in 3 cells will let us know what the fourth value is.

For example, in our table, for the observed data, we would know that the value for D must be equal to 20, knowing that A+B+C+D = 100. Knowing the values for A, B and C and that they, together with D will add up to 100, tells that D must be 20. So, D, in this case, is not free to vary. A degree of freedom is lost. So, the values for A, B and C can be any values, but D must be the missing puzzle that makes all four add up to 100. In mathematical terms this expresses that A, B and C are free to vary. They are independent and free to vary and therefore express the degrees of freedom.

The degrees of freedom for Chi-square tables, like the table in our example, is (Row-1) × (Column-1). We express this as r-1, c-1. In our table, we have four rows and two columns, so our degree of freedom is (4-1) × (2-1) = 3.

With the degrees of freedom and the significance level, you can look up the probability, or the p-value, for independence.

## Looking up in the Chi-square table

To look up the p-value in the Chi-square distribution table, we look at the row of degrees of freedom (df) = 3 and follow the line to the column that corresponds with our significance level (α) of 5%. At df=3 and α=0.05, we find a critical value of 7.81. Visualizing this with the Chi-square probability density curve for df=3 compared to our test statistic of 6.0: ## Chi-square test conclusion

We find a critical value of 7.81 which is greater than our 6.00. So, we fail to reject the null hypothesis concluding that, based on our sample results, we cannot reject that the choices are equally distributed.

We recall that we do not conclude that the H0 is the actual result. Failing to reject the H0 only means that we cannot reject that it could be true. We do not conclude that it is, in fact, true. In fact, our 6.00 is “pretty” close to the critical value of 7.28. And from the table, we can read that 6.00 is little more than 10%, because the 0.1 column at df=3 returns 6.25.

So, we get a p-value of a little greater than 10%. This means that there is “a little” more than 10% probability that we will get as extreme a result as the one we got at 6.0. Or expressed as: “There is + 10% chance of getting 6.00 or more”.

## Visualizing multiple chi-square distributions

The following graph shows multiple chi-square distributions with each of their different degree of freedom: ## Chi-square test with MS Excel

The CHISQ.TEST and CHISQ.INV and CHISQ.DIST functions in Excel return values in the Chi-square distribution and available from off the Excel 2010 version and later.

### CHISQ.TEST

The Excel function CHISQ.TEST conducts a Chi-square test on the array of observed values and on the array of expected frequencies. It returns the p-value and thereby the probability that our result is due to chance or sampling error. ### CHISQ.INV

The CHISQ.INV returns the critical value or the inverse of the left-tailed probability: ### CHISQ.DIST(x,df,TRUE)

The Excel function CHISQ.DIST with the arguments (x,df,cumulative=TRUE) returns the cumulative distribution function. ### CHISQ.DIST(x,df,FALSE)

When cumulative set to ‘FALSE’ (x,df,cumulative=FALSE) it returns the probability density function. ‘x’ is the calculated test statistic which for Chi-square statistics is ∑(O-E)2/E ## Learning statistics #### Carsten Grube

Freelance Data Analyst

p
p
p
##### ANOVA & the F-distribution

1. Do you mind if I quote a few of your articles as long as I provide credit
and sources back to your site? My blog site is in the exact same niche as yours and my users would
genuinely benefit from a lot of the information you provide here.
Please let me know if this alright with you. Thank
you!

2. Hi there, yeah this paragraph is genuinely good and I have learned lot of things from it on the topic of blogging.
thanks.

3. With havin so much written content do you ever run into
any problems of plagorism or copyright violation? My site has a lot of unique content I’ve
either authored myself or outsourced but it appears a lot of it is
popping it up all over the internet without my agreement.
Do you know any methods to help reduce content from being
ripped off? I’d certainly appreciate it.

4. Hey would you mind sharing which blog platform you’re using?
I’m looking to start my own blog soon but I’m having a hard
time selecting between BlogEngine/Wordpress/B2evolution and Drupal.
seems different then most blogs and I’m looking for something unique.

5. while looking for a related matter, your site came up, it looks good.

I’ve bookmarked it in my google bookmarks.
Hello there, just became aware of your blog via Google, and located that
it’s really informative. I am gonna be careful for brussels.

I will appreciate if you happen to proceed this in future.

A lot of other folks will likely be benefited from your writing.
Cheers!

6. Hello There. I found your blog using msn. This is
an extremely well written article. I’ll make sure to bookmark it
and come back to read more of your useful info. Thanks for the post.
I’ll definitely return.

7. Having read this I thought it was really enlightening.

I appreciate you taking the time and energy to put this content together.

I once again find myself personally spending way too much time both reading and leaving comments.
But so what, it was still worth it!

8. Thanks very nice blog!

9. Link exchange is nothing else however it is just placing the other
person’s web site link on your page at suitable place and other person will also do same in support of you.

10. I loved as much as you will receive carried out right here.
The sketch is tasteful, your authored subject matter stylish.
nonetheless, you command get got an impatience over that you wish be delivering
the following. unwell unquestionably come further formerly again since
exactly the same nearly very often inside case you shield this hike.

11. Wonderful post! We are linking to this particularly great post on our site.

Keep up the good writing.

12. Hello colleagues, its wonderful piece of writing about cultureand entirely explained,
keep it up all the time.

13. Usually I don’t learn post on blogs, however I wish to say that this write-up very forced me to take
a look at and do it! Your writing style has been surprised
me. Thank you, quite nice post.

14. I needed to thank you for this very good read!! I absolutely loved every little bit of it.
I’ve got you book marked to look at new things you post…

15. It was practical. Keep on posting!

16. Hello there! I could have sworn I’ve been to
this website before but after browsing through some of the post I realized it’s
new to me. Nonetheless, I’m definitely happy I found it
and I’ll be bookmarking and checking back often! +34 616 71 29 85

Call me Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

Drop me a line

What are you working on just now? Can I help you, and can you help me?