# Chi-square test

The Chi-square test is also called **“Goodness of fit”**, as it compares fit of the observed sample data with the expected data. The chi-square test analyzes the **dependence between different categorical datasets**. Example: *Do more women than men vote for some political party? *

**On this page**hide

## Key points for chi-square test

- The Chi-square distribution is applied when testing for
**dependence between categorical datasets** - It works with discrete and mutually exclusive data

## Chi-square test worked example

Say that a **HR department**, as part of ongoing training program, is running a periodical test among the employees on their product knowhow. A multiple-choice test is used for the purpose. The test has four possible answers: A, B, C and D. The test producers claim that there is an equal probability that the correct answer is either four of these.

Some of the HR staffs get curious and** wish to test the truth** **of this claim**: *Is the probability of a correct answer really equal between A, B, C and D?*

We would define the following chi-square hypotheses:

H0: The correct choices are equally distributed (A: 25%, B: 25%, C: 25%, D: 25%)

H1: The correct choices are not equally distributed

Let’s set a significance level (α) of 0.05.

## Purpose of the chi-square hypothesis test

The Chi-square test is a hypothesis tests and follow the same procedure and concepts, so we will reject the null hypothesis in case our p-value is lower than our significance level (α). Rejecting the null hypothesis, in our case, will mean that we reject that the probabilities can be equally distributed between the four options A, B, C and D.

## Expected vs observed data

HR take a **sample of 100** tests randomly selected from the past years of testing with this method. As the null hypothesis says that the correct answers are equally distributed, we expect 25 correct answers for each of the four questions. These expected values are compared to the once observed in the sample and we can make the following **contingency table**:

*Now, how can we calculate if the result that we got from our sample is more extreme that what our significant level allows for? How can we know if our sample result is “statistically significant”?* We apply the Chi-square distribution:

## Chi-square distribution

With the Chi-square test we can calculate for dependence or independence between different categories, in our case between A, B, C and D. It works with **mutually exclusive **data, meaning that** if for a question the correct choice is A, then it cannot be D at the same time**.

Data in the Chi-square distribution is **countable** and therefore **discrete**. It is countable data. We can count each question and choice as a **whole integer**.

The Chi-square distribution is **denoted** with the Greek letter Chi squared: ꭓ^{2}. To calculate the Chi-squared statistics, we calculate the sum of the squared differences between the observed and the expected values. This value is the related to the expected value. Thus, we get the **formula for Chi-square statistics**:

** **

## The Chi-square formula explained

**To find the distance** between the observed and the expected, we subtract the expected value from the observed. This is also called “residual”.

The differences are **squared in order to** **obtain only positive values** and are divided by the expected value in order to normalize independently of the number of counts. **Otherwise**, the Chi-square statistic would increase with the number of counts, so for large datasets we would get large statistics. This is the idea about normalizing or **standardizing** (ref. the Z-score chapter).

The operation is carried out for each count, or in our situation, for each row and then added up. The adding up of each count is expressed by the large sigma in front of the formula:

Our calculated value, or **our test statistic is 6.0**. This is now tested against the corresponding value from the chi-square table which is found by looking up under the degrees of freedom:

Degrees of freedom

The degrees of freedom (df) is the **number of category values, or cells in our table, that are independent**. If we have the totals, and four values that add up to the total, filling in 3 cells will let us know what the fourth value is.

For **example**, in our table, for the observed data, we would know that the value for D must be equal to 20, knowing that A+B+C+D = 100. Knowing the values for A, B and C and that they, together with D will add up to 100, tells **that D must be 20. So, D, in this case, is not free to vary. A degree of freedom is lost**.

So, the values for A, B and C can be any values, but D must be the missing puzzle that makes all four add up to 100. In mathematical terms this expresses that **A, B and C are free to vary**. They are independent and free to vary and therefore express the degrees of freedom.

The degrees of freedom for Chi-square tables, like the table in our example, is (Row-1) × (Column-1). We express this as *r-1, c-1*. In our table, we have four rows and two columns, so our degree of freedom is (4-1) × (2-1) = 3.

With the degrees of freedom and the significance level, you can look up the probability, or the p-value, for independence.

** **

## Looking up in the Chi-square table

To look up the p-value in the Chi-square distribution table, we look at the row of degrees of freedom (df) = 3 and follow the line to the column that corresponds with our significance level (α) of 5%.

At df=3 and α=0.05, we find a critical value of **7.81**. Visualizing this with the Chi-square probability density curve for df=3 compared to our test statistic of 6.0:

** **

## Chi-square test conclusion

We find a critical value of 7.81 which is greater than our 6.00. So, **we fail to reject** the null hypothesis concluding that, based on our sample results, we cannot reject that the choices are equally distributed.

We recall that **we do not conclude** **that the H0 is the actual result**. Failing to reject the H_{0} only means that we cannot reject that it could be true. **We do not conclude that it is, in fact, true**. In fact, our 6.00 is “pretty” close to the critical value of 7.28. And from the table, we can read that 6.00 is little more than 10%, because the 0.1 column at df=3 returns 6.25.

So, we get a p-value of a little greater than 10%. This means that there is “a little” more than 10% probability that we will get as extreme a result as the one we got at 6.0. Or expressed as: “There is + 10% chance of getting 6.00 or more”.

## Visualizing multiple chi-square distributions

The following graph shows multiple chi-square distributions with each of their different degree of freedom:

** **

## Chi-square test with MS Excel

The CHISQ.TEST and CHISQ.INV and CHISQ.DIST functions in Excel return values in the Chi-square distribution and available from off the Excel 2010 version and later.

### CHISQ.TEST

The Excel function CHISQ.TEST conducts a Chi-square test on the array of observed values and on the array of expected frequencies. It returns the **p-value** and thereby the probability that our result is due to chance or sampling error.

### CHISQ.INV

The CHISQ.INV returns the **critical value** or the inverse of the left-tailed probability:

### CHISQ.DIST(x,df,TRUE)

The Excel function CHISQ.DIST with the arguments (x,df,cumulative=TRUE) returns the **cumulative distribution function**.

### CHISQ.DIST(x,df,FALSE)

When cumulative set to ‘FALSE’ (x,df,cumulative=FALSE) it returns the **probability density function**. ‘x’ is the calculated test statistic which for Chi-square statistics is ∑(O-E)^{2}/E

## Learning statistics

- Khan Academy (video 8:25): Chi-square statistic for hypothesis testing
- Khan Academy (video 11:45): Pearson’s chi square test (goodness of fit)
- Jbstatistcs (video 9:53): Chi-square Tests of Independence (Chi-square Tests for Two-Way Tables)
- Penn State Eberly College of Science (text page): Goodness-of-Fit test

#### Carsten Grube

Freelance Data Analyst

##### Normal distribution

##### Confidence intervals

##### Simple linear regression, fundamentals

##### Two-sample inference

##### ANOVA & the F-distribution

# 17 Comments

### Submit a Comment

You must be logged in to post a comment.

+34 616 71 29 85

Call me

Spain: Ctra. 404, km 2, 29100 Coín, Malaga

...........

Denmark: c/o Musvitvej 4, 3660 Stenløse

**Drop me a line**

*What are you working on just now? Can I help you, and can you help me? *

**About me**

Learning statistics. Doing statistics. Freelance since 2005. Dane. Living in Spain. With my Spanish wife and two children.

**Connect with me**

**What they say**

20 years in sales, analysis, journalism and startups. See what my customers and partners say about me.

Do you mind if I quote a few of your articles as long as I provide credit

and sources back to your site? My blog site is in the exact same niche as yours and my users would

genuinely benefit from a lot of the information you provide here.

Please let me know if this alright with you. Thank

you!

Hi there, yeah this paragraph is genuinely good and I have learned lot of things from it on the topic of blogging.

thanks.

With havin so much written content do you ever run into

any problems of plagorism or copyright violation? My site has a lot of unique content I’ve

either authored myself or outsourced but it appears a lot of it is

popping it up all over the internet without my agreement.

Do you know any methods to help reduce content from being

ripped off? I’d certainly appreciate it.

Hey would you mind sharing which blog platform you’re using?

I’m looking to start my own blog soon but I’m having a hard

time selecting between BlogEngine/Wordpress/B2evolution and Drupal.

The reason I ask is because your design and style

seems different then most blogs and I’m looking for something unique.

P.S Apologies for being off-topic but I had to ask!

Mas o que nem todo planeta sabe é que o gengibre também deve aumentar o metabolismo em mais ou menos

20%, o que ajuda a emagrecer. https://sites.google.com/view/orlistat-emagrece-bula-preco/in%C3%ADcio

Hi there, I found your blog by means of Google

while looking for a related matter, your site came up, it looks good.

I’ve bookmarked it in my google bookmarks.

Hello there, just became aware of your blog via Google, and located that

it’s really informative. I am gonna be careful for brussels.

I will appreciate if you happen to proceed this in future.

A lot of other folks will likely be benefited from your writing.

Cheers!

Hello There. I found your blog using msn. This is

an extremely well written article. I’ll make sure to bookmark it

and come back to read more of your useful info. Thanks for the post.

I’ll definitely return.

Having read this I thought it was really enlightening.

I appreciate you taking the time and energy to put this content together.

I once again find myself personally spending way too much time both reading and leaving comments.

But so what, it was still worth it!

Thanks very nice blog!

Link exchange is nothing else however it is just placing the other

person’s web site link on your page at suitable place and other person will also do same in support of you.

I loved as much as you will receive carried out right here.

The sketch is tasteful, your authored subject matter stylish.

nonetheless, you command get got an impatience over that you wish be delivering

the following. unwell unquestionably come further formerly again since

exactly the same nearly very often inside case you shield this hike.

Wonderful post! We are linking to this particularly great post on our site.

Keep up the good writing.

Hello colleagues, its wonderful piece of writing about cultureand entirely explained,

keep it up all the time.

Usually I don’t learn post on blogs, however I wish to say that this write-up very forced me to take

a look at and do it! Your writing style has been surprised

me. Thank you, quite nice post.

I needed to thank you for this very good read!! I absolutely loved every little bit of it.

I’ve got you book marked to look at new things you post…

What’s up, just wanted to mention, I loved this article.

It was practical. Keep on posting!

Hello there! I could have sworn I’ve been to

this website before but after browsing through some of the post I realized it’s

new to me. Nonetheless, I’m definitely happy I found it

and I’ll be bookmarking and checking back often!