# Chi Square Test of Independence – Guide & Examples

02.04.23 Time to read: 9min
0 Reviews

The chi square test of independence is a statistical test used to determine whether two categorical variables are associated or independent. It is useful in various fields such as marketing, social science, and medical research to establish the correlation between different variables.

## Chi Square Test of Independence – In a Nutshell

• The chi square test of independence is a statistical tool used to determine if there is a relationship between two categorical variables.
• A contingency table is created to perform the chi square test of independence, and the expected and observed frequencies are compared to determine if there is a significant association between the two variables.
• The null hypothesis in the chi square test of independence assumes no relationship between the two variables, and the alternative hypothesis assumes a relationship.
• The chi square test of independence is conducted by calculating the chi-square statistic, which measures the difference between the observed and expected frequencies. The p-value is then calculated, and the null hypothesis is either rejected or accepted based on the p-value.
• The chi square test of independence is a useful tool for analyzing data in fields such as social sciences, biology, and marketing, where researchers may be interested in understanding the relationship between two categorical variables.

## Definition: Chi square test of independence

The chi square test of independence is a statistical test used to determine the association between two categorical variables.

Chi square test of independence, also known as Pearson’s chi-square tests, are widely used nonparametric tests because they do not rely on the assumptions of parametric tests, particularly the assumption of a normal distribution.

The chi square test of independence is calculated by comparing the observed frequencies of categories in a contingency table with the frequencies that would be expected if the variables were independent. The components needed for the test are the observed frequencies, expected frequencies, and degrees of freedom.

How to avoid point deductions

Point deductions can also be caused when citing passages that are not written in your own words. Don’t take a risk and run your paper through our online plagiarism checker. You will receive the results in only 10 minutes and submit your paper with confidence.

### Contingency tables

Contingency tables summarize and display the relationship between two categorical variables in the chi square test of independence. They are cross-tabulation tables, two-way frequency tables, or crosstabs.

They are useful for analyzing the relationship between two categorical variables, and they can be used as the basis for statistical tests such as the chi square test of independence.

Example:

A contingency table could show the number of males and females who study psychology and those who take history.

The rows would represent gender (male and female), and the columns would represent study status (psychology and history).1

 Gender Psychology History Male 67 130 Female 124 50

## The chi square test of independence hypotheses

The chi square test of independence is used to test whether the observed frequencies of the categories in a contingency table differ significantly from those expected if the variables were independent.

Example:

You can collect data on blood types from a sample of 500 individuals and create a contingency table with the observed frequencies. We then use the chi-square goodness of fit test to compare the observed frequencies with the expected frequencies based on the ABO blood type distribution in the general population.

The hypotheses for the chi-square goodness of fit test could be:

Expectation of equal proportions:
Null hypothesis (H0): The distribution of blood types in the population is consistent with the expected distribution.
Alternative hypothesis (Ha): The distribution of blood types in the population significantly differs from the expected distribution.

Expectation of different proportions:
Null hypothesis (H0): The distribution of blood types in the population is consistent with the average distribution.
Alternative hypothesis (Ha): The population’s blood types distribution significantly differs from the average distribution.

### Expected values

Expected values in the context of the chi square test of independence refer to the frequencies that would be expected if the two categorical variables were independent.

The formula for calculating the expected frequency for each cell of a contingency table is:

Example:

Consider a study on the relationship between education level and voting behavior. A researcher collects data from a sample of 500 individuals and records their education level (high school, college, graduate school) and voting behavior (voted, did not vote).2

## When is the chi square test of independence used?

The chi square test of independence can be used when certain criteria and circumstances are met:

• The variables under investigation are categorical or nominal
• The variables are independent of each other
• The expected frequency count for each cell in a contingency table is at least 5

If these criteria are met, the chi square test of independence can be used to test whether there is a significant association between the two categorical variables.

Example:

You can use chi square test of independence to investigate the relationship between gender and religion.3

## Calculating the test statistic of the chi-square of independence

The formula for calculating the test statistic of the chi square test of independence is:

 O Observed frequency in each cell E Expected frequency in each cell

The chi-square test statistic measures the difference between the observed and expected frequencies in a contingency table.

To calculate the test statistic for the chi square test of independence, follow these five steps:

1. Create a contingency table with the observed frequencies for the two categorical variables.
2. Calculate the expected frequencies for each cell in the contingency table.
3. Calculate the difference between each cell’s observed and expected frequencies, and square the difference.
4. Divide the squared difference by the expected frequency for each cell.
5. Sum the values obtained in step 4 to get the chi-square test statistic.

### 1. Table of frequencies

To conduct the chi square test of independence, the first step is to establish a contingency table containing the counts or frequencies of each category of one variable for each category of the other variable.

Example:

We want to investigate the relationship between a new medical intervention and patient outcome. We collect data from 200 patients and record whether they received the intervention (yes or no) and had a positive outcome (yes or no). We create a contingency table for the chi square test of independence with the observed frequencies:

 Intervention Outcome Observed Frequencies Yes No 60 No No 40 No Yes 30 Yes Yes 104

### 2. Calculating O – E

This step of chi square test of independence helps to quantify the extent to which the observed frequencies differ from what would be expected under the assumption of independence between the two variables.

To calculate O – E, an additional column is added to the contingency table to represent the difference between the observed and expected frequencies for each cell.

Using the previous example of the medical intervention and patient outcome, the contingency table with added columns would be:

 Intervention Outcome Observed Frequencies Expected Frequencies O - E Yes No 60 50 10 No No 40 50 -10 No Yes 30 70 -40 Yes Yes 105 30 -20

### 3. Calculating (O – E)²

To calculate (O – E)², an additional column is added to the contingency table. This third step of calculating the chi square test of independence assesses the squared difference between each cell’s observed and expected frequencies.

Using the same example of the medical intervention and patient outcome, the contingency table with the additional columns would be:

 Intervention Outcome Observed Frequencies Expected Frequencies O - E (O - E)2 Yes No 60 50 10 100 No No 40 50 -10 100 No Yes 30 70 -40 1600 Yes Yes 105 30 -20 400

### 4. Calculating (O – E)²/ E

To calculate , an additional column is added to the contingency table to represent the result of dividing the squared difference between the observed frequency and the expected frequency by the expected frequency for each cell.

 Intervention Outcome Observed Frequencies Expected Frequencies O - E (O - E)2 Yes No 60 50 10 100 2 No No 40 50 -10 100 2 No Yes 30 70 -40 1600 22.86 Yes Yes 105 30 -20 400 13.33

This step scales the contribution of each cell to the overall chi-square test statistic.

### 5. Calculating X²

The last step in the chi square test of independence is to sum the values in the  column to obtain the overall chi-square test statistic. This test statistic measures the degree of association between the two categorical variables.
Continuing with the same example of the medical intervention and patient outcome in our chi square test of independence, we can sum the values in the column as follows:

χ² = 2+2+22.86+13.33
χ² =40.19

## Performing the chi square test of independence

When performing the chi square test of independence, a large value of the chi-square test statistic indicates that the observed frequencies in the contingency table are significantly different from the expected frequencies under the assumption of independence between the two categorical variables.

The six steps to perform the chi square test of independence are:
1. State the null and alternative hypotheses
2. Create a contingency table
3. Calculate the expected frequencies
4. Calculate the chi-square statistic using the formula:
5. Determine the degrees of freedom and p-value
6. Interpret the results association.

### 1. Calculating the expected frequencies

The first step in using the chi square test of independence is to calculate the expected frequencies for each cell in the contingency table. The formula for calculating the expected frequency for a cell is:

### 2. Calculating the chi-square

The second step of the chi square test of independence is to calculate the test statistic (χ²) using the formula: , where O is the observed frequency and E is the expected frequency.

### 3. The critical chi-square value

The critical chi-square value can be found in a chi-square distribution table or software, based on the chosen level of significance and the degrees of freedom (df). The degrees of freedom for the chi square test of independence is (r-1) x (c-1), where r is the number of rows and c is the number of columns in the contingency table.

The significance level is typically set at 0.05 or 0.01.

For example a 2×2 contingency table, the critical chi-square value with df=1 and α=0.05 is 3.84.

### 4. Comparing the chi-square value to the critical value

The next step in the chi square test of independence is to compare the calculated chi-square test statistic to the critical value obtained from the chi-square distribution table or software. If the calculated chi-square test statistic is greater than the critical value, the null hypothesis is rejected and it is concluded that there is a significant association between the two categorical variables.

### 5. Should the null hypothesis be rejected?

If the calculated chi-square test statistic is greater than the critical value, the null hypothesis is rejected, indicating a significant association between the two categorical variables. If the calculated chi-square test statistic is less than or equal to the critical value, the null hypothesis is not rejected, indicating no significant association between the two categorical variables.

For example, if the calculated chi-square test statistic is 10.26 and the critical chi-square value is 3.84, we would reject the null hypothesis and conclude that there is a significant association between the two variables.

## Chi Square Test of Independence vs. Other Tests

Apart from chi square test of independence, some other tests in other scenarios include:

 Test When to use it Chi-square goodness of fit When there is only one categorical variable and we want to test whether the observed frequencies fit a known or expected distribution. Fisher’s exact test When the sample size is small (typically less than 20) and the expected frequency for one or more cells is less than 5. McNemar’s test When the data are paired or matched, such as in a before-and-after study or a case-control study. G test When the sample size is small or the expected frequency for one or more cells is less than 5, and when the Chi-square test is not appropriate due to its assumptions.
Tip for submitting your thesis

Depending on the type of binding and customer frequency at a print shop, the printing process and delivery may take a longer period of time. Don’t lose valuable time and use the printing service with free express delivery at BachelorPrint! This enables you to finalize your thesis up to one day before hand in.

## FAQs

#### How do you perform a chi square test of independence in R?

To perform a chi square test of independence in R, you can use the chisq.test() function, specifying the two categorical variables you want to test for independence. The function returns the test statistic, degrees of freedom, and p-value for the test.

#### What is the chi square test of independence?

A chi square test of independence is a statistical method used to determine if there is a significant association between two categorical variables.

#### How is the chi square test of independence performed?

To perform a chi square test of independence, the researcher creates a contingency table and calculates the chi-square statistic by comparing observed and expected frequencies.

The p-value is then calculated to determine if the null hypothesis is rejected or accepted in the chi square test of independence.

#### What is the interpretation of the results of the chi square test of independence?

If the p-value is less than 0.05, the two variables have a significant association. If the p-value is greater than 0.05, there is no significant association. The effect size can also be calculated to determine the strength of the association.

## Sources

1 Stover, Christopher. “Contingency Table.” MathWorld—A Wolfram Web Resrouce. Accessed February 28, 2023. https://mathworld.wolfram.com/ContingencyTable.html.

2 Mindrila, Diana and Phoebe Balentyne. “Chi Square Test Lecture Notes.” Carrollton, GA: University of West Georgia, n.d. Accessed February 28, 2023. https://www.westga.edu/academics/research/vrc/assets/docs/ChiSquareTest_LectureNotes.pdf.

3 Statistics Solutions. “Chi-Square.” Accessed February 28, 2023. https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/chi-square/.

4 Glen, Stephanie. “Chi-Square Test / Goodness of Fit.” StatisticsHowTo.com. Accessed February 28, 2023. https://www.statisticshowto.com/probability-and-statistics/chi-square/.

From

0 Reviews