Skip to content Skip to navigation

OpenStax-CNX

You are here: Home » Content » What is the chi-square statistic?

Navigation

Recently Viewed

This feature requires Javascript to be enabled.
 

What is the chi-square statistic?

Module by: Mphekwane Mamahlodi. E-mail the author

What is the chi-square statistic?

The chi-square (chi, the Greek letter pronounced "kye”) statistic is a nonparametric statistical technique used to determine if a distribution of observed frequencies differs from the theoretical expected frequencies. Chi-square statistics use nominal (categorical) or ordinal level data, thus instead of using means and variances, this test uses frequencies.

The value of the chi-square statistic is given by

X2 = Sigma [ (O-E)2 / E ] (1)

where X2 is the chi-square statistic, O is the observed frequency and E is the expected frequency

Generally the chi-squared statistic summarizes the discrepancies between the expected number of times each outcome occurs (assuming that the model is true) and the observed number of times each outcome occurs, by summing the squares of the discrepancies, normalized by the expected numbers, over all the categories (Dorak, 2006).

Data used in a chi-square analysis has to satisfy the following conditions

  1. Randomly drawn from the population,
  2. reported in raw counts of frequency,
  3. measured variables must be independent,
  4. observed frequencies cannot be too small, and
  5. values of independent and dependent variables must be mutually exclusive.

There are two types of chi-square test.

  • The Chi-square test for goodness of fit which compares the expected and observed values to determine how well an experimenter's predictions fit the data.
  • The Chi-square test for independence which compares two sets of categories to determine whether the two groups are distributed differently among the categories. (McGibbon, 2006)

1. Chi-square test for Goodness of Fit

Goodness of fit means how well a statistical model fits a set of observations. A measure of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e.g., to test for normality of residuals, to test whether two samples are drawn from identical distributions.

Suppose a coin is tossed 100 times, the outcomes would be expected to be 50 heads and 50 tails. If 47 heads and 53 tails are observed instead, does this deviation occur because the coin is biased, or is it by chance?

1.1 Establish Hypotheses

The Null hypothesis for the above experiment is that the observed values are close to the predicted values. The alternative hypothesis is that they are not close to the predicted values. These hypotheses hold for all Chi-square goodness of fit tests. Thus in this case the null and alternative hypotheses corresponds to:Null hypothesis: The coin is fair

Alternative hypothesis: The coin is biased

Table 1: Table 1 Tabulated results of Observed and Expected frequencies
Heads Tails
Observed 47 53
Expected 50 50

1.2 Calculate the chi-square statistic

We calculate chi-square by substituting values for O and E

For Heads: X2 = (47-50)2/50 = 0.18

For Tails X2 = (53-50)2/50 = 0.18

The sum of these categories is 0.18 + 0.18 = 0.36

1.3 Assessing significance levels

Significance of the chi-square test for goodness of fit value is established by calculating the degree of freedom v (the Greek letter nu) and by using the chi-square distribution table (Bissonnette, 2006). The v in a chi-square goodness of fit test is equal to the number of categories, c, minus one (v= c-1). This is done in order to check if the null hypothesis is valid or not, by looking at the critical chi-square value from the table that corresponds to the calculated v. If the calculated Chi-square is greater than the value in the table, then the null hypothesis is rejected and it is concluded that the predictions made were incorrect.In the above experiment, v= (2-1) = 1. The critical value for a chi-square for this example at a = 0.05 and v =1 is 3.84 which is greater than X2=0.36. Therefore the null hypothesis is not rejected, hence the coin toss was fair.

2. Chi-square test for Independence

The chi-square test for independence is used to determine the relationship between two variables of a sample. In this context independence means that the two factors are not related. Typically in social science research, we're interested in finding factors which are related, e.g. education and income, occupation and prestige, age and voting behaviour.

Example: We want to know whether boys or girls get into trouble more often in school. Below is the table documenting the frequency of boys and girls who got into trouble in school

Table 2: Table 2: Tabulated results of the Observed and Expected frequency [QMSS, 2006]
  Got into trouble (Observed) Not in trouble (Observed) Total Got into trouble (Expected) Not in trouble (Expected)
Boys 46 71 117 (40.97) (76.02)
Girls 37 83 120 (42.03) (77.97)
Total 83 154 237    

To examine statistically whether boys got in trouble more often in school, we need to establish hypotheses for the question.

2.1 Establish Hypotheses

The null hypothesis is that the two variables are independent or in this particular case is that the likelihood of getting in trouble is the same for boys and girls. The alternative hypothesis to be tested is that the likelihood of getting in trouble is not the same for boys and girls.

Cautionary Note

It is important to keep in mind that the chi-square test for independence only tests whether two variables are independent or not, it cannot address questions of which is greater or less. Using the chi-square test for independence, it cannot be evaluate directly from the hypothesis who get more in trouble between boys and girls.

2.2 Calculate the expected value for each cell of the table

As with the goodness of fit example described earlier, the key idea of the chi-square test for independence is a comparison of observed and expected values. In the case of tabular data, however, we usually do not know what the distribution should look like (as we did with tossing the coin). Rather expected values are calculated based on the row and column totals from the table.

The expected value for each cell of the table can be calculated using the following equation:

expected value = Row total * Column total / Total for table (2)

The expected values (in parentheses, italics and bold) for each cell are also presented in Table 2.

2.3 Calculate Chi-square statistic

With the values in Table 2, the chi-square statistic can be calculated using Equation 1 as follows:

X2 = (46-40.97)2 / 40.97 + (37-42.03)2 / 42.03 + (71-76.03)2 / 76.03 +(83-77.97)2 / 77.97=1.87

2.4 Assessing significance levels

In the chi-square test for independence the degree of freedom is equal to the number of columns in the table minus one multiplied by the number of rows in the table minus one.

i.e. dof = (r-1)(c-1) = 1.

Thus the value calculated from the formula above is compared with values in the chi-square distribution table (Bissonnette, 2006). The value returned from the table is p< 20%. Therefore the null hypothesis is not rejected, hence boys are not significantly more likely to get in trouble in school than girls.

Exercise: In a certain city, there are about one million eligible voters. A simple random sample of 10000 eligible voters was chosen to study the relationship between sex and participation in the previous elections

Table 3: Table 3: Tabulated results of Observed Frequency [Rodríguez, 2006]
  Men Women
Voted 2792 3591
Did not Vote 1486 2131

Establish whether being a man or a woman is independent of having voted in the previous elections. In other words are "sex and voting independent"? Answer

References:

Bissonnette V L, Statistical Tables, http://fsweb.berry.edu/academic/education/vbissonnette/tables/chisqr.pdf, Department of Psychology, Berry College, last accessed 19 February 2006.

Dorak MT, Common Concepts in Statistics, http://dorakmt.tripod.com/mtd/glosstat.html, last accessed 23 February 2006.

QMSS (Quantitative Methods in social sciences ), The Chi-Square Test, http://ccnmtl.columbia.edu/projects/qmss/chi_test.html, last accessed 21 February 2006.

Rodríguez C, Chi-Square Test for Independence, http://omega.albany.edu:8008/mat108dir/chi2independence/chi2in-m2h.html, last accessed 21 February 2006.

McGibbon CA, Statistical Resources, http://www.stats-consult.com/tutorial-10/tutorial-10.htm, Statistical Consulting Services, last accessed 17 February 2006.

Wikipedia, http://en.wikipedia.org/wiki/Goodness_of_fit, last accessed 22 February 2006.

Coauthor: Jones Kalunga

Content actions

Download module as:

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks