Skip to content Skip to navigation

Connexions

You are here: Home » Content » The Chi-Square Distribution: Teacher's Guide

Navigation

Content Actions

  • Download module PDF
  • Add to ...
    Add the module to:
    • My Favorites
    • A lens
    • An external social bookmarking service
    • My Favorites (What is 'My Favorites'?)
      'My Favorites' is a special kind of lens which you can use to bookmark modules and collections directly in Connexions. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need a Connexions account to use 'My Favorites'.
    • A lens (What is a lens?)

      Definition of a lens

      Lenses

      A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

      What is in a lens?

      Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

      Who can create a lens?

      Any individual Connexions member, a community, or a respected organization.

    • External bookmarks
  • E-mail the authors

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual Connexions member, a community, or a respected organization.

This content is ...

In these lenses

  • Printable Books

    This module is included inLens: Connexions Books Available for Print on Demand
    By: ConnexionsAs a part of collection:"Collaborative Statistics Teacher's Guide"

    Comments:

    "The teacher's guide is a companion to Collaborative Statistics -- http://cnx.org/content/col10522."

    Click the "Printable Books" link to see all content selected in this lens.

Recently Viewed

Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.

The Chi-Square Distribution: Teacher's Guide

Module by: Dr. Barbara Illowsky, Susan Dean

Summary: This module is the complementary teacher's guide for the "The Chi-Square Distribution" chapter of the Collaborative Statistics collection (col10522) by Barbara Illowsky and Susan Dean.

This chapter is concerned with three chi-square applications: goodness-of-fit; independence; and single variance. We rely on technology to do the calculations, especially for goodness-of-fit and for independence. However, the first example in the chapter (the number of absences in the days of the week) has the student calculate the chi-square statistic in steps. The same could be done for the chi-square statistic in a test of independence.

The chi-square distribution generally is skewed to the right. There is a different chi-square curve for each df. When the df's are 90 or more, the chi-square distribution is a very good approximation to the normal. For the chi-square distribution, μμ = the number of df's and σσ = the square root of twice the number of df's.

Goodness-of-Fit Test

A goodness-of-fit hypothesis test is used to determine whether or not data "fit" a particular distribution.

Example 1

In a past issue of the magazine GEICO Direct, there was an article concerning the percentage of teenage motor vehicle deaths and time of day. The following percentages were given from a sample:

Time of Day Percentage of Motor Vehicle Deaths
Time of Day Death Rate
12 a.m. to 3 a.m. 17%
3 a.m. to 6 a.m. 8%
6 a.m. to 9 a.m. 8%
9 a.m. to 12 noon 6%
12 noon to 3 p.m. 10%
3 p.m. to 6 p.m. 16%
6 p.m. to 9 p.m. 15%
9 p.m. to 12 a.m. 19%

Suppose we hypothesize that the data fits a uniform distribution. The level of significance is 1% ( α = 0 . 01 α = 0 . 01 size 12{a=0 "." "01"} {} ).

  • HoHo size 12{H rSub { size 8{o} } } {}: The percent of teenage motor vehicle deaths fits a uniform distribution.
  • H a H a size 12{H rSub { size 8{a} } } {} : The percent of teenage motor vehicle deaths does not fit a uniform distribution.

The distribution for the hypothesis test is X72X72 size 12{X rSub { size 8{7} } rSup { size 8{2} } } {}

The table contains the observed (O) percentages. The expected (E) percentages are each 12.5 for a uniform distribution. The chi-square test statistic is calculated using

8 ( 0 E ) 2 E 8 ( 0 E ) 2 E size 12{ Sum rSub { size 8{8} } { { { \( 0 - E \) rSup { size 8{2} } } over {E} } } } {}

= ( 17 12 . 5 ) 2 12 . 5 + ( 8 12 . 5 ) 2 12 . 5 + ( 8 12 . 5 ) 2 12 . 5 + ( 6 12 . 5 ) 2 12 . 5 + ( 10 12 . 5 ) 2 12 . 5 + ( 16 12 . 5 ) 2 12 . 5 + ( 15 12 . 5 ) 2 12 . 5 + ( 19 12 . 5 ) 2 12 . 5 = ( 17 12 . 5 ) 2 12 . 5 + ( 8 12 . 5 ) 2 12 . 5 + ( 8 12 . 5 ) 2 12 . 5 + ( 6 12 . 5 ) 2 12 . 5 + ( 10 12 . 5 ) 2 12 . 5 + ( 16 12 . 5 ) 2 12 . 5 + ( 15 12 . 5 ) 2 12 . 5 + ( 19 12 . 5 ) 2 12 . 5 size 12{ {}= { { \( "17" - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } + { { \( 8 - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } + { { \( 8 - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } + { { \( 6 - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } + { { \( "10" - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } + { { \( "16" - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } + { { \( "15" - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } + { { \( "19" - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } } {}

= 13 . 6 = 13 . 6 size 12{ {}="13" "." 6} {}

If you are using the TI-84 series graphing calculators, ON SOME OF THEM there is a function in STAT TESTS called x2x2 size 12{x rSup { size 8{2} } } {} GOF-Test that does the goodness-of-fit test. You first have to enter the observed percentages from the article in one list (enter as whole numbers) and the expected percentages (uniform implies they are each 12.5%) in a second list (enter 12.5 for each entry: 100 divided by 8 = 12.5). Then do the test by going to x2x2 size 12{x rSup { size 8{2} } } {} GOF-Test.

If you are using the TI-83 series, enter the observed percentages in list1 and the expected percentages in list2 and in list3 (go to the list name), enter (list1-list2)^2/list2. Press enter. Add the values in list3 (this is the test statistic). Then go to 2nd DISTR x2x2 size 12{x rSup { size 8{2} } } {}cdf. Then enter the test statistic (13.6) and the upper value of the area (10^99) and the degrees of freedom (7).

Probability Statement: P ( x 2 > 13 . 6 ) = 0 . 0588 P ( x 2 > 13 . 6 ) = 0 . 0588 size 12{P \( x rSup { size 8{2} } >"13" "." 6 \) =0 "." "0588"} {}

(Always a right-tailed test)

Figure 1: pp size 12{p} {}-value =0.0588=0.0588 size 12{ {}=0 "." "0588"} {}
A skewed distribution graph with bottom right corner shaded

Since α<pα<p size 12{a<p} {}-value (0.01<0.0588)(0.01<0.0588) size 12{ \( 0 "." "01"<0 "." "0588" \) } {}, we do not reject HoHo size 12{H rSub { size 8{o} } } {}.

We conclude that there is not sufficient evidence to reject the null hypothesis. It appears that the percent of teenage motor vehicle deaths fits a uniform distribution. It does not matter what time of the day or night it is. Teenagers die from motor vehicle accidents equally at any time of the day or night. However, if the level of significance were 10%, we would reject the null hypothesis and conclude that the distribution of deaths does not fit a uniform distribution.

A test of independence compares two factors to determine if they are independent (i.e. one factor does not affect the happening of a second factor).

Example 2

The following table shows a random sample of 100 hikers and the area of hiking preferred.

Hiking Preference Area
Gender The Coastline Near Lakes and Streams On Mountain Peaks
Female 18 16 11
Male 16 25 14

The two factors are gender and preferred hiking area.

  • HoHo size 12{H rSub { size 8{o} } } {}: Gender and preferred hiking area are independent.
  • H a H a size 12{H rSub { size 8{a} } } {} : Gender and preferred hiking area are not independent

The distribution for the hypothesis test is x22x22 size 12{x rSub { size 8{2} } rSup { size 8{2} } } {}.

The df's are equal to: ( rows - 1 ) ( columns - 1 ) = ( 2 - 1 ) ( 3 - 1 ) = 2 (rows - 1)(columns - 1) = (2 - 1)(3 - 1) = 2

The chi-square statistic is calculated using ( 2 3 ) ( 0 E ) 2 E ( 2 3 ) ( 0 E ) 2 E size 12{ Sum rSub { size 8{ \( 2 - 3 \) } } { { { \( 0 - E \) rSup { size 8{2} } } over {E} } } } {}

Each expected (E) value is calculated using (rowtotal)(columntotal)totalsurveyed(rowtotal)(columntotal)totalsurveyed size 12{ { { \( ital "rowtotal" \) \( ital "columntotal" \) } over { ital "totalsurveyed"} } } {}

The first expected value (female, the coastline) is 4534100=15.34534100=15.3 size 12{ { {"45" cdot "34"} over {"100"} } ="15" "." 3} {}

The expected values are: 15.3, 18.45, 11.25, 18.7, 22.55, 13.75

The chi-square statistic is:

( 2 3 ) ( 0 E ) 2 E = ( 2 3 ) ( 0 E ) 2 E = size 12{ Sum rSub { size 8{ \( 2 - 3 \) } } { { { \( 0 - E \) rSup { size 8{2} } } over {E} } } ={}} {}

( 18 15 . 3 ) 2 15 . 3 + ( 16 18 . 45 ) 2 18 . 45 + ( 11 11 . 15 ) 2 11 . 25 + ( 16 18 . 7 ) 2 18 . 7 + ( 25 22 . 55 ) 2 22 . 55 + ( 14 13 . 75 ) 2 13 . 75 ( 18 15 . 3 ) 2 15 . 3 + ( 16 18 . 45 ) 2 18 . 45 + ( 11 11 . 15 ) 2 11 . 25 + ( 16 18 . 7 ) 2 18 . 7 + ( 25 22 . 55 ) 2 22 . 55 + ( 14 13 . 75 ) 2 13 . 75 size 12{ { { \( "18" - "15" "." 3 \) rSup { size 8{2} } } over {"15" "." 3} } + { { \( "16" - "18" "." "45" \) rSup { size 8{2} } } over {"18" "." "45"} } + { { \( "11" - "11" "." "15" \) rSup { size 8{2} } } over {"11" "." "25"} } + { { \( "16" - "18" "." 7 \) rSup { size 8{2} } } over {"18" "." 7} } + { { \( "25" - "22" "." "55" \) rSup { size 8{2} } } over {"22" "." "55"} } + { { \( "14" - "13" "." "75" \) rSup { size 8{2} } } over {"13" "." "75"} } } {}

= 1 . 47 = 1 . 47 size 12{ {}=1 "." "47"} {}

Calculator Instructions

The TI-83/84 series have the function x2x2 size 12{x rSup { size 8{2} } } {}-Test in STAT TESTS to preform this test. First, you have to enter the observed values in the table into a matrix by using 2nd MATRIX and EDIT [A]. Enter the values and go to x2x2 size 12{x rSup { size 8{2} } } {}-Test. Matrix [B] is calculated automatically when you run the test.

Probability Statement: pp size 12{p} {}-value =0.4800=0.4800 size 12{ {}=0 "." "4800"} {}(A right-tailed test)

Figure 2: pp size 12{p} {}-value =0.4800=0.4800 size 12{ {}=0 "." "4800"} {}
A slightly right skewed distribution graph showing 1.47 as the mean and all values to the right shaded

Since αα size 12{a} {} is less than 0.05, we do not reject the null.

There is not sufficient evidence to conclude that gender and hiking preference are not independent.

Sometimes you might be interested in how something varies. A test of a single variance is the type of hypothesis test you could run in order to determine variability.

Example 3

Problem 1

A vending machine company which produces coffee vending machines claims that its machine pours an 8 ounce cup of coffee, on the average, with a standard deviation of 0.3 ounces. A college that uses the vending machines claims that the standard deviation is more than 0.3 ounces causing the coffee to spill out of a cup. The college sampled 30 cups of coffee and found that the standard deviation was 1 ounce. At the 1% level of significance, test the claim made by the vending machine company.

Solution 1

H o : σ 2 = ( 0 . 3 ) 2 H o : σ 2 = ( 0 . 3 ) 2 size 12{H rSub { size 8{o} } :σ rSup { size 8{2} } = \( 0 "." 3 \) rSup { size 8{2} } } {} H a : σ 2 > ( 0 . 3 ) 2 H a : σ 2 > ( 0 . 3 ) 2 size 12{H rSub { size 8{a} } :σ rSup { size 8{2} } > \( 0 "." 3 \) rSup { size 8{2} } } {}

The distribution for the hypothesis test is x292x292 size 12{x rSub { size 8{"29"} } rSup { size 8{2} } } {} where df=301=29df=301=29 size 12{ ital "df"="30" - 1="29"} {}.

The test statistic x2=(n1)s2σ2=(301)120.32=322.22x2=(n1)s2σ2=(301)120.32=322.22 size 12{x rSup { size 8{2} } = { { \( n - 1 \) cdot s rSup { size 8{2} } } over {σ rSup { size 8{2} } } } = { { \( "30" - 1 \) cdot 1 rSup { size 8{2} } } over {0 "." 3 rSup { size 8{2} } } } ="322" "." "22"} {}

Probability Statement: P ( x 2 > 322 . 22 ) = 0 P ( x 2 > 322 . 22 ) = 0 size 12{P \( x rSup { size 8{2} } >"322" "." "22" \) =0} {}

Figure 3: pp size 12{p} {}-value =0=0 size 12{ {}=0} {}
A distribution graph with what appears to be a asymptote near the horizontal axis

Since a>pa>p size 12{a>p} {}-value (0.01>0)(0.01>0) size 12{ \( 0 "." "01">0 \) } {}, reject HoHo size 12{H rSub { size 8{o} } } {}.

There is sufficient evidence to conclude that the standard deviation is more than 0.3 ounces of coffee. The vending machine company needs to adjust their machines to prevent spillage.

Assign Practice

Have the students do the Practice 1, Practice 2, and Practice 3 in class collaboratively.

Assign Homework

Assign Homework . Suggested homework: 3, 5, 7 (GOF), 9, 13, 15 (Test of Indep.), 17, 19, 23 (Variance), 24 - 37 (General)

Comments, questions, feedback, criticisms?

Send feedback