The *chi-square* (chi, the Greek letter
pronounced "kye”) statistic is a
nonparametric statistical technique used to determine if a
distribution of observed frequencies differs from the theoretical
expected frequencies. Chi-square statistics use
nominal (categorical) or ordinal level data, thus instead of
using means and variances, this test uses frequencies.

The value of the chi-square statistic is given by

X2 = Sigma [ (O-E)2 / E ] (1)

where X2 is the chi-square statistic, O is the observed frequency and E is the expected frequency

Generally the
*chi-squared statistic* summarizes the
discrepancies between the expected number of times each outcome
occurs (assuming that the model is true) and the observed number of
times each outcome occurs, by summing the squares of the
discrepancies, normalized by the expected numbers, over all the
categories (Dorak, 2006).

Data used in a chi-square analysis has to satisfy the following conditions

- Randomly drawn from the population,
- reported in raw counts of frequency,
- measured variables must be independent,
- observed frequencies cannot be too small, and
- values of independent and dependent variables must be mutually exclusive.

There are two types of chi-square test.

*The Chi-square test for goodness of fit*which compares the expected and observed values to determine how well an experimenter's predictions fit the data.*The Chi-square test for independence*which compares two sets of categories to determine whether the two groups are distributed differently among the categories. (McGibbon, 2006)

**1. Chi-square test for Goodness of Fit**

*Goodness of fit* means how well a statistical
model fits a set of observations. A measure of *goodness of fit*
typically summarize the discrepancy between observed values and the
values expected under the model in question. Such measures can be
used in statistical hypothesis testing, e.g., to test for normality
of residuals, to test whether two samples are drawn from identical
distributions.

Suppose a coin is tossed 100 times, the outcomes would be expected to be 50 heads and 50 tails. If 47 heads and 53 tails are observed instead, does this deviation occur because the coin is biased, or is it by chance?

**1.1 Establish Hypotheses**

The Null hypothesis for the above experiment
is that the observed values are close to the predicted values. The
alternative hypothesis is that they are not close to the predicted
values. These hypotheses hold for all Chi-square *goodness of fit*
tests. Thus in this case the null and alternative hypotheses
corresponds to:Null hypothesis: The coin is fair

Alternative hypothesis: The coin is biased

Heads | Tails | |

Observed | 47 | 53 |

Expected | 50 | 50 |

**1.2 Calculate the chi-square statistic**

We calculate chi-square by substituting values
for *O* and *E*

For Heads: X2 = (47-50)2/50 = 0.18

For Tails X2 = (53-50)2/50 = 0.18

The sum of these categories is 0.18 + 0.18 = 0.36

**1.3 Assessing significance levels**

Significance of the *chi-square test for
goodness of fit* value is established by calculating the *degree of
freedom* *v* (the Greek letter nu) and by using the
chi-square distribution table (Bissonnette, 2006). The *v* in a
*chi-square goodness of fit test* is equal to the number of
categories, c, minus one (*v*= c-1). This is done in order to check
if the null hypothesis is valid or not, by looking at the critical
chi-square value from the table that corresponds to the calculated
*v*. If the calculated Chi-square is greater than the value in the
table, then the null hypothesis is rejected and it is concluded
that the predictions made were incorrect.In the above experiment,
*v*= (2-1) = 1. The critical value for a chi-square for this example
at *a* = 0.05 and *v* =1 is 3.84 which is greater than *X2*=0.36.
Therefore the null hypothesis is not rejected, hence the coin toss
was fair.

**2. Chi-square test for Independence**

The chi-square test for independence is used to determine the relationship between two variables of a sample. In this context independence means that the two factors are not related. Typically in social science research, we're interested in finding factors which are related, e.g. education and income, occupation and prestige, age and voting behaviour.

Example: We want to know whether boys or girls get into trouble more often in school. Below is the table documenting the frequency of boys and girls who got into trouble in school

Got into trouble (Observed) | Not in trouble (Observed) | Total | Got into trouble (Expected) | Not in trouble (Expected) | |

Boys | 46 | 71 | 117 | (40.97) |
(76.02) |

Girls | 37 | 83 | 120 | (42.03) |
(77.97) |

Total | 83 | 154 | 237 |

To examine statistically whether boys got in trouble more often in school, we need to establish hypotheses for the question.

**2.1 Establish Hypotheses**

The *null hypothesis* is that the two variables
are independent or in this particular case is that the likelihood
of getting in trouble is the same for boys and girls. The
alternative hypothesis to be tested is that the likelihood of
getting in trouble is not the same for boys and girls.

Cautionary Note

It is important to keep in mind that the chi-square test for independence only tests whether two variables are independent or not, it cannot address questions of which is greater or less. Using the chi-square test for independence, it cannot be evaluate directly from the hypothesis who get more in trouble between boys and girls.

**2.2 Calculate the expected value for each cell of the
table**

As with the *goodness of fit* example described
earlier, the key idea of the chi-square test for independence is a
comparison of observed and expected values. In the case of tabular
data, however, we usually do not know what the distribution should
look like (as we did with tossing the coin). Rather expected values
are calculated based on the row and column totals from the
table.

The expected value for each cell of the table can be calculated using the following equation:

expected value = Row total * Column total / Total for table (2)

The expected values (in parentheses, italics and bold) for each cell are also presented in Table 2.

**2.3 Calculate Chi-square statistic**

With the values in Table 2, the chi-square statistic can be calculated using Equation 1 as follows:

*X2* = (46-40.97)2 / 40.97 + (37-42.03)2 /
42.03 + (71-76.03)2 / 76.03 +(83-77.97)2 / 77.97=1.87

**2.4 Assessing significance levels**

In the *chi-square test for independence* the
degree of freedom is equal to the number of columns in the table
minus one multiplied by the number of rows in the table minus
one.

i.e. dof = (r-1)(c-1) = 1.

Thus the value calculated from the formula above is compared with values in the chi-square distribution table (Bissonnette, 2006). The value returned from the table is p< 20%. Therefore the null hypothesis is not rejected, hence boys are not significantly more likely to get in trouble in school than girls.

Exercise: In a certain city, there are about one million eligible voters. A simple random sample of 10000 eligible voters was chosen to study the relationship between sex and participation in the previous elections

Men | Women | |

Voted | 2792 | 3591 |

Did not Vote | 1486 | 2131 |

Establish whether being a man or a woman is independent of having voted in the previous elections. In other words are "sex and voting independent"? Answer

References:

Bissonnette V L, *Statistical Tables*,
http://fsweb.berry.edu/academic/education/vbissonnette/tables/chisqr.pdf,
Department of Psychology, Berry College, last accessed 19 February
2006.

Dorak MT, *Common Concepts in Statistics*,
http://dorakmt.tripod.com/mtd/glosstat.html, last accessed
23 February 2006.

QMSS (Quantitative Methods in social sciences
), *The Chi-Square Test*,
http://ccnmtl.columbia.edu/projects/qmss/chi_test.html, last
accessed 21 February 2006.

Rodríguez C, *Chi-Square Test for Independence*,
http://omega.albany.edu:8008/mat108dir/chi2independence/chi2in-m2h.html,
last accessed 21 February 2006.

McGibbon CA, *Statistical Resources*,
http://www.stats-consult.com/tutorial-10/tutorial-10.htm,
Statistical Consulting Services, last accessed 17 February
2006.

Wikipedia, http://en.wikipedia.org/wiki/Goodness_of_fit, last accessed 22 February 2006.

Coauthor: Jones Kalunga