Skip to content Skip to navigation

OpenStax-CNX

You are here: Home » Content » The Chi-Square Distribution: Teacher's Guide

Navigation

Recently Viewed

This feature requires Javascript to be enabled.
 

The Chi-Square Distribution: Teacher's Guide

Module by: Susan Dean, Barbara Illowsky, Ph.D.. E-mail the authors

Summary: This module is the complementary teacher's guide for the "The Chi-Square Distribution" chapter of the Collaborative Statistics collection (col10522) by Barbara Illowsky and Susan Dean.

This chapter is concerned with three chi-square applications: goodness-of-fit; independence; and single variance. We rely on technology to do the calculations, especially for goodness-of-fit and for independence. However, the first example in the chapter (the number of absences in the days of the week) has the student calculate the chi-square statistic in steps. The same could be done for the chi-square statistic in a test of independence.

The chi-square distribution generally is skewed to the right. There is a different chi-square curve for each df. When the df's are 90 or more, the chi-square distribution is a very good approximation to the normal. For the chi-square distribution, μμ = the number of df's and σσ = the square root of twice the number of df's.

Goodness-of-Fit Test

A goodness-of-fit hypothesis test is used to determine whether or not data "fit" a particular distribution.

Example 1

In a past issue of the magazine GEICO Direct, there was an article concerning the percentage of teenage motor vehicle deaths and time of day. The following percentages were given from a sample.

Table 1: Time of Day Percentage of Motor Vehicle Deaths
Time of Day Death Rate
12 a.m. to 3 a.m. 17%
3 a.m. to 6 a.m. 8%
6 a.m. to 9 a.m. 8%
9 a.m. to 12 noon 6%
12 noon to 3 p.m. 10%
3 p.m. to 6 p.m. 16%
6 p.m. to 9 p.m. 15%
9 p.m. to 12 a.m. 19%

For the purpose of this example, suppose another sample of 100 produced the same percentages. We hypothesize that the data from this new sample fits a uniform distribution. The level of significance is 1% ( α = 0 . 01 α = 0 . 01 size 12{a=0 "." "01"} {} ).

  • HoHo size 12{H rSub { size 8{o} } } {}: The number of teenage motor vehicle deaths fits a uniform distribution.
  • H a H a size 12{H rSub { size 8{a} } } {} : The number of teenage motor vehicle deaths does not fit a uniform distribution.

The distribution for the hypothesis test is X72X72 size 12{X rSub { size 8{7} } rSup { size 8{2} } } {}

The table contains the observed percentages. For the sample of 100, the observed (O) numbers are 17, 8, 8, 6, 10, 16, 15 and 19. The expected (E) numbers are each 12.5 for a uniform distribution (100 divided by 8 cells). The chi-square test statistic is calculated using

8 ( 0 E ) 2 E 8 ( 0 E ) 2 E size 12{ Sum rSub { size 8{8} } { { { \( 0 - E \) rSup { size 8{2} } } over {E} } } } {}

= ( 17 12 . 5 ) 2 12 . 5 + ( 8 12 . 5 ) 2 12 . 5 + ( 8 12 . 5 ) 2 12 . 5 + ( 6 12 . 5 ) 2 12 . 5 + ( 10 12 . 5 ) 2 12 . 5 + ( 16 12 . 5 ) 2 12 . 5 + ( 15 12 . 5 ) 2 12 . 5 + ( 19 12 . 5 ) 2 12 . 5 = ( 17 12 . 5 ) 2 12 . 5 + ( 8 12 . 5 ) 2 12 . 5 + ( 8 12 . 5 ) 2 12 . 5 + ( 6 12 . 5 ) 2 12 . 5 + ( 10 12 . 5 ) 2 12 . 5 + ( 16 12 . 5 ) 2 12 . 5 + ( 15 12 . 5 ) 2 12 . 5 + ( 19 12 . 5 ) 2 12 . 5 size 12{ {}= { { \( "17" - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } + { { \( 8 - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } + { { \( 8 - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } + { { \( 6 - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } + { { \( "10" - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } + { { \( "16" - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } + { { \( "15" - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } + { { \( "19" - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } } {}

= 13 . 6 = 13 . 6 size 12{ {}="13" "." 6} {}

If you are using the TI-84 series graphing calculators, ON SOME OF THEM there is a function in STAT TESTS called x2x2 size 12{x rSup { size 8{2} } } {} GOF-Test that does the goodness-of-fit test. You first have to enter the observed numbers in one list (enter as whole numbers) and the expected numbers (uniform implies they are each 12.5) in a second list (enter 12.5 for each entry: 100 divided by 8 = 12.5). Then do the test by going to x2x2 size 12{x rSup { size 8{2} } } {} GOF-Test.

If you are using the TI-83 series, enter the observed numbers in list1 and the expected numbers in list2 and in list3 (go to the list name), enter (list1-list2)^2/list2. Press enter. Add the values in list3 (this is the test statistic). Then go to 2nd DISTR x2x2 size 12{x rSup { size 8{2} } } {}cdf. Enter the test statistic (13.6) and the upper value of the area (10^99) and the degrees of freedom (7).

Probability Statement: P ( x 2 > 13 . 6 ) = 0 . 0588 P ( x 2 > 13 . 6 ) = 0 . 0588 size 12{P \( x rSup { size 8{2} } >"13" "." 6 \) =0 "." "0588"} {}

(Always a right-tailed test)

Figure 1: pp size 12{p} {}-value =0.0588=0.0588 size 12{ {}=0 "." "0588"} {}
A skewed distribution graph with bottom right corner shaded

Since α<pα<p size 12{a<p} {}-value (0.01<0.0588)(0.01<0.0588) size 12{ \( 0 "." "01"<0 "." "0588" \) } {}, we do not reject HoHo size 12{H rSub { size 8{o} } } {}.

We conclude that there is not sufficient evidence to reject the null hypothesis. It appears that the number of teenage motor vehicle deaths fits a uniform distribution. It does not matter what time of the day or night it is. Teenagers die from motor vehicle accidents equally at any time of the day or night. However, if the level of significance were 10%, we would reject the null hypothesis and conclude that the distribution of deaths does not fit a uniform distribution.

A test of independence compares two factors to determine if they are independent (i.e. one factor does not affect the happening of a second factor).

Example 2

The following table shows a random sample of 100 hikers and the area of hiking preferred.

Table 2: Hiking Preference Area
The two factors are gender and preferred hiking area.
Gender The Coastline Near Lakes and Streams On Mountain Peaks
Female 18 16 11
Male 16 25 14
  • HoHo size 12{H rSub { size 8{o} } } {}: Gender and preferred hiking area are independent.
  • H a H a size 12{H rSub { size 8{a} } } {} : Gender and preferred hiking area are not independent

The distribution for the hypothesis test is x22x22 size 12{x rSub { size 8{2} } rSup { size 8{2} } } {}.

The df's are equal to: ( rows - 1 ) ( columns - 1 ) = ( 2 - 1 ) ( 3 - 1 ) = 2 (rows - 1)(columns - 1) = (2 - 1)(3 - 1) = 2

The chi-square statistic is calculated using ( 2 3 ) ( 0 E ) 2 E ( 2 3 ) ( 0 E ) 2 E size 12{ Sum rSub { size 8{ \( 2 - 3 \) } } { { { \( 0 - E \) rSup { size 8{2} } } over {E} } } } {}

Each expected (E) value is calculated using (rowtotal)(columntotal)totalsurveyed(rowtotal)(columntotal)totalsurveyed size 12{ { { \( ital "rowtotal" \) \( ital "columntotal" \) } over { ital "totalsurveyed"} } } {}

The first expected value (female, the coastline) is 4534100=15.34534100=15.3 size 12{ { {"45" cdot "34"} over {"100"} } ="15" "." 3} {}

The expected values are: 15.3, 18.45, 11.25, 18.7, 22.55, 13.75

The chi-square statistic is:

( 2 3 ) ( 0 E ) 2 E = ( 2 3 ) ( 0 E ) 2 E = size 12{ Sum rSub { size 8{ \( 2 - 3 \) } } { { { \( 0 - E \) rSup { size 8{2} } } over {E} } } ={}} {}

( 18 15 . 3 ) 2 15 . 3 + ( 16 18 . 45 ) 2 18 . 45 + ( 11 11 . 15 ) 2 11 . 25 + ( 16 18 . 7 ) 2 18 . 7 + ( 25 22 . 55 ) 2 22 . 55 + ( 14 13 . 75 ) 2 13 . 75 ( 18 15 . 3 ) 2 15 . 3 + ( 16 18 . 45 ) 2 18 . 45 + ( 11 11 . 15 ) 2 11 . 25 + ( 16 18 . 7 ) 2 18 . 7 + ( 25 22 . 55 ) 2 22 . 55 + ( 14 13 . 75 ) 2 13 . 75 size 12{ { { \( "18" - "15" "." 3 \) rSup { size 8{2} } } over {"15" "." 3} } + { { \( "16" - "18" "." "45" \) rSup { size 8{2} } } over {"18" "." "45"} } + { { \( "11" - "11" "." "15" \) rSup { size 8{2} } } over {"11" "." "25"} } + { { \( "16" - "18" "." 7 \) rSup { size 8{2} } } over {"18" "." 7} } + { { \( "25" - "22" "." "55" \) rSup { size 8{2} } } over {"22" "." "55"} } + { { \( "14" - "13" "." "75" \) rSup { size 8{2} } } over {"13" "." "75"} } } {}

= 1 . 47 = 1 . 47 size 12{ {}=1 "." "47"} {}

Calculator Instructions

The TI-83/84 series have the function x2x2 size 12{x rSup { size 8{2} } } {}-Test in STAT TESTS to preform this test. First, you have to enter the observed values in the table into a matrix by using 2nd MATRIX and EDIT [A]. Enter the values and go to x2x2 size 12{x rSup { size 8{2} } } {}-Test. Matrix [B] is calculated automatically when you run the test.

Probability Statement: pp size 12{p} {}-value =0.4800=0.4800 size 12{ {}=0 "." "4800"} {}(A right-tailed test)

Figure 2: pp size 12{p} {}-value =0.4800=0.4800 size 12{ {}=0 "." "4800"} {}
A slightly right skewed distribution graph showing 1.47 as the mean and all values to the right shaded

Since αα size 12{a} {} is less than 0.05, we do not reject the null.

There is not sufficient evidence to conclude that gender and hiking preference are not independent.

Sometimes you might be interested in how something varies. A test of a single variance is the type of hypothesis test you could run in order to determine variability.

Example 3

Problem 1

A vending machine company which produces coffee vending machines claims that its machine pours an 8 ounce cup of coffee, on the average, with a standard deviation of 0.3 ounces. A college that uses the vending machines claims that the standard deviation is more than 0.3 ounces causing the coffee to spill out of a cup. The college sampled 30 cups of coffee and found that the standard deviation was 1 ounce. At the 1% level of significance, test the claim made by the vending machine company.

Solution

H o : σ 2 = ( 0 . 3 ) 2 H o : σ 2 = ( 0 . 3 ) 2 size 12{H rSub { size 8{o} } :σ rSup { size 8{2} } = \( 0 "." 3 \) rSup { size 8{2} } } {} H a : σ 2 > ( 0 . 3 ) 2 H a : σ 2 > ( 0 . 3 ) 2 size 12{H rSub { size 8{a} } :σ rSup { size 8{2} } > \( 0 "." 3 \) rSup { size 8{2} } } {}

The distribution for the hypothesis test is x292x292 size 12{x rSub { size 8{"29"} } rSup { size 8{2} } } {} where df=301=29df=301=29 size 12{ ital "df"="30" - 1="29"} {}.

The test statistic x2=(n1)s2σ2=(301)120.32=322.22x2=(n1)s2σ2=(301)120.32=322.22 size 12{x rSup { size 8{2} } = { { \( n - 1 \) cdot s rSup { size 8{2} } } over {σ rSup { size 8{2} } } } = { { \( "30" - 1 \) cdot 1 rSup { size 8{2} } } over {0 "." 3 rSup { size 8{2} } } } ="322" "." "22"} {}

Probability Statement: P ( x 2 > 322 . 22 ) = 0 P ( x 2 > 322 . 22 ) = 0 size 12{P \( x rSup { size 8{2} } >"322" "." "22" \) =0} {}

Figure 3: pp size 12{p} {}-value =0=0 size 12{ {}=0} {}
A distribution graph with what appears to be a asymptote near the horizontal axis

Since a>pa>p size 12{a>p} {}-value (0.01>0)(0.01>0) size 12{ \( 0 "." "01">0 \) } {}, reject HoHo size 12{H rSub { size 8{o} } } {}.

There is sufficient evidence to conclude that the standard deviation is more than 0.3 ounces of coffee. The vending machine company needs to adjust their machines to prevent spillage.

Assign Practice

Have the students do the Practice 1, Practice 2, and Practice 3 in class collaboratively.

Assign Homework

Assign Homework . Suggested homework: 3, 5, 7 (GOF), 9, 13, 15 (Test of Indep.), 17, 19, 23 (Variance), 24 - 37 (General)

Content actions

Download module as:

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks