Skip to content Skip to navigation

Connexions

You are here: Home » Content » The Chi-Square Distribution: Test of Independence

Navigation

Content Actions

  • Download module PDF
  • Add to ...
    Add the module to:
    • My Favorites
    • A lens
    • An external social bookmarking service
    • My Favorites (What is 'My Favorites'?)
      'My Favorites' is a special kind of lens which you can use to bookmark modules and collections directly in Connexions. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need a Connexions account to use 'My Favorites'.
    • A lens (What is a lens?)

      Definition of a lens

      Lenses

      A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

      What is in a lens?

      Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

      Who can create a lens?

      Any individual Connexions member, a community, or a respected organization.

    • External bookmarks
  • E-mail the authors

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual Connexions member, a community, or a respected organization.

This content is ...

In these lenses

  • Printable Books

    This module is included inLens: Connexions Books Available for Print on Demand
    By: ConnexionsAs a part of collection:"Collaborative Statistics"

    Comments:

    "This book was purchased from the authors by the Maxfield Foundation and provided to the community as an open textbook available freely online and in PDF format. Bound copies of the book can also […]"

    Click the "Printable Books" link to see all content selected in this lens.

Recently Viewed

Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.

The Chi-Square Distribution: Test of Independence

Module by: Dr. Barbara Illowsky, Susan Dean

Summary: This module describes how the chi-square distribution can be used to test for independence.

Tests of independence involve using a contingency table of observed (data) values. You first saw a contingency table when you studied probability in the Probability Topics chapter.

The test statistic for a test of independence is similar to that of a goodness-of-fit test:

Σ ( i j ) ( O - E ) 2 E Σ ( i j ) ( O - E ) 2 E (1)

where:

  • OO = observed values
  • EE = expected values
  • ii = the number of rows in the table
  • jj = the number of columns in the table

There are i j ij terms of the form ( O - E ) 2 E ( O - E ) 2 E .

A test of independence determines whether two factors are independent or not. You first encountered the term independence in Chapter 3. As a review, consider the following example.

Example 1

Suppose A A = a speeding violation in the last year and B B = a car phone user. If A A and B B are independent then P ( A AND B ) = P ( A ) P ( B ) P(A AND B)=P(A)P(B). A AND B A AND B is the event that a driver received a speeding violation last year and is also a car phone user. Suppose, in a study of drivers who received speeding violations in the last year and who use car phones, that 755 people were surveyed. Out of the 755, 70 had a speeding violation and 685 did not; 305 were car phone users and 450 were not.

Let yy = expected number of car phone users who received speeding violations.

If AA and BB are independent, then P ( A AND B ) = P ( A ) P ( B ) P(A AND B)=P(A)P(B). By substitution,

y 755 = 70 755 305 755 y 755 = 70 755 305 755

Solve for y : y = 70 305 755 = 28.3 y:y= 70 305 755 =28.3

About 28 people from the sample are expected to be car phone users and to receive speeding violations.

In a test of independence, we state the null and alternate hypotheses in words. Since the contingency table consists of two factors, the null hypothesis states that the factors are independent and the alternate hypothesis states that they are not independent (dependent). If we do a test of independence using the example above, then the null hypothesis is:

H o H o : Being a car phone user and receiving a speeding violation are independent events.

If the null hypothesis were true, we would expect about 28 people to be car phone users and to receive a speeding violation.

The test of independence is always right-tailed because of the calculation of the test statistic. If the expected and observed values are not close together, then the test statistic is very large and way out in the right tail of the chi-square curve, like goodness-of-fit.

The degrees of freedom for the test of independence are:

df = (number of columns - 1)(number of rows - 1) df = (number of columns - 1)(number of rows - 1)

The following formula calculates the expected number (EE):

E = (row total)(column total) total number surveyed E= (row total)(column total) total number surveyed

Example 2

In a volunteer group, adults 21 and older volunteer from one to nine hours each week to spend time with a disabled senior citizen. The program recruits among community college students, four-year college students, and nonstudents. The following table is a sample of the adult volunteers and the number of hours they volunteer per week.

Number of Hours Worked Per Week by Volunteer Type (Observed)
Type of Volunteer 1-3 Hours 4-6 Hours 7-9 Hours
Community College Students 111 96 48
Four-Year College Students 96 133 61
Nonstudents 91 150 53

The table contains

Problem 1

Are the number of hours volunteered independent of the type of volunteer?

Solution 1

The observed table and the question at the end of the problem, "Are the number of hours volunteered independent of the type of volunteer?" tell you this is a test of independence. The two factors are number of hours volunteered and type of volunteer. This test is always right-tailed.

H o H o : The number of hours volunteered is independent of the type of volunteer.

H a H a : The number of hours volunteered is dependent on the type of volunteer.

The expected table is:

Number of Hours Worked Per Week by Volunteer Type (Expected)
Type of Volunteer 1-3 Hours 4-6 Hours 7-9 Hours
Community College Students 90.57 115.19 49.24
Four-Year College Students 103.00 131.00 56.00
Nonstudents 104.42 132.81 56.77

The table contains expected (EE) values (data).

For example, the calculation for the expected frequency for the top left cell is

E = (row total)(column total) total number surveyed = 255 298 839 = 90.57 E= (row total)(column total) total number surveyed = 255 298 839 =90.57

Calculate the test statistic: χ 2 = 12.99 χ 2 =12.99 (calculator or computer)

Distribution for the test: χ 4 2 χ 4 2

df=(3 columns- 1)(3 rows-1)= (2)(2)=4df=(3 columns-1)(3 rows-1)=(2)(2)=4

Graph:

Nonsymmetrical chi-square curve with values of 0 and 12.99 on the x-axis representing the test statistic of number of hours worked by volunteers of different types. A vertical upward line extends from 12.99 to the curve and the area to the right of this is equal to the p-value.

Probability statement: p-value=P(χ2 >12.99)=0.0113p-value=P(χ2 >12.99)=0.0113

Compare αα and the p-valuep-value: Since no αα is given, assume α=0.05α=0.05. p-value=0.0113p-value=0.0113. α>p-valueα>p-value.

Make a decision: Since α>p-valueα>p-value, reject HoHo. This means that the factors are not independent.

Conclusion: At a 5% level of significance, from the data, there is sufficient evidence to conclude that the number of hours volunteered and the type of volunteer are dependent on one another.

For the above example, if there had been another type of volunteer, teenagers, what would the degrees of freedom be?

Note:

Calculator instructions follow.

TI-83+ and TI-84 calculator: Press the MATRX key and arrow over to EDIT. Press 1:[A]. Press 3 ENTER 3 ENTER. Enter the table values by row from Example 11-6. Press ENTER after each. Press 2nd QUIT. Press STAT and arrow over to TESTS. Arrow down to C:χ2-TEST. Press ENTER. You should see Observed:[A] and Expected:[B]. Arrow down to Calculate. Press ENTER. The test statistic is 12.9909 and the p-value=0.0113p-value=0.0113. Do the procedure a second time but arrow down to Draw instead of calculate.

Example 3

De Anza College is interested in the relationship between anxiety level and the need to succeed in school. A random sample of 400 students took a test that measured anxiety level and need to succeed in school. The table shows the results. De Anza College wants to know if anxiety level and need to succeed in school are independent events.

Need to Succeed in School vs. Anxiety Level
Need to Succeed in School High Anxiety Med-high Anxiety Medium Anxiety Med-low Anxiety Low Anxiety Row Total
High Need 35 42 53 15 10 155
Medium Need 18 48 63 33 31 193
Low Need 4 5 11 15 17 52
Column Total 57 95 127 63 58 400

Problem 1

How many high anxiety level students are expected to have a high need to succeed in school?

Solution 1

The column total for a high anxiety level is 57. The row total for high need to succeed in school is 155. The sample size or total surveyed is 400.

E = (row total)(column total) total surveyed = 155 57 400 = 22.09 E= (row total)(column total) total surveyed = 155 57 400 =22.09

The expected number of students who have a high anxiety level and a high need to succeed in school is about 22.

Problem 2

How many students do you expect to have a low need to succeed in school and a med-low level of anxiety?

Solution 2

The column total for a med-low anxiety level is 63. The row total for a low need to succeed in school is 52. The sample size or total surveyed is 400.

Exercise 1

  • a. E = (row total)(column total) total surveyed E= (row total)(column total) total surveyed =
  • b. The expected number of students who have a med-low anxiety level and a low need to succeed in school is about:

Solution 1

  • a. E = (row total)(column total) total surveyed = 8.19 E= (row total)(column total) total surveyed = 8.19
  • b. 8

Glossary

Contingency Table:
The method of displaying a frequency distribution in case of dependable (contingent) variables; the table provides the easy way to calculate conditional probabilities.

Comments, questions, feedback, criticisms?

Send feedback