This chapter is concerned with three chi-square applications: goodness-of-fit; independence; and single variance. We rely on technology to do the calculations, especially for goodness-of-fit and for independence. However, the first example in the chapter (the number of absences in the days of the week) has the student calculate the chi-square statistic in steps. The same could be done for the chi-square statistic in a test of independence.
The chi-square distribution generally is skewed to the right. There is a different chi-square curve for each df. When the df's are 90 or more, the chi-square distribution is a very good approximation to the normal. For the chi-square distribution, μμ = the number of df's and σσ = the square root of twice the number of df's.
A goodness-of-fit hypothesis test is used to determine whether or not data "fit" a particular distribution.
In a past issue of the magazine GEICO Direct, there was an article concerning the percentage of teenage motor vehicle deaths and time of day. The following percentages were given from a sample:
Time of Day Percentage of Motor Vehicle Deaths
| Time of Day |
Death Rate |
| 12 a.m. to 3 a.m. |
17% |
| 3 a.m. to 6 a.m. |
8% |
| 6 a.m. to 9 a.m. |
8% |
| 9 a.m. to 12 noon |
6% |
| 12 noon to 3 p.m. |
10% |
| 3 p.m. to 6 p.m. |
16% |
| 6 p.m. to 9 p.m. |
15% |
| 9 p.m. to 12 a.m. |
19% |
Suppose we hypothesize that the data fits a uniform distribution. The level of significance is 1% (
α
=
0
.
01
α
=
0
.
01
size 12{a=0 "." "01"} {}
).
- HoHo size 12{H rSub { size 8{o} } } {}: The percent of teenage motor vehicle deaths fits a uniform distribution.
-
H
a
H
a
size 12{H rSub { size 8{a} } } {}
: The percent of teenage motor vehicle deaths does not fit a uniform distribution.
The distribution for the hypothesis test is
X72X72 size 12{X rSub { size 8{7} } rSup { size 8{2} } } {}
The table contains the observed (O) percentages. The expected (E) percentages are each 12.5 for a uniform distribution. The chi-square test statistic is calculated using
∑
8
(
0
−
E
)
2
E
∑
8
(
0
−
E
)
2
E
size 12{ Sum rSub { size 8{8} } { { { \( 0 - E \) rSup { size 8{2} } } over {E} } } } {}
=
(
17
−
12
.
5
)
2
12
.
5
+
(
8
−
12
.
5
)
2
12
.
5
+
(
8
−
12
.
5
)
2
12
.
5
+
(
6
−
12
.
5
)
2
12
.
5
+
(
10
−
12
.
5
)
2
12
.
5
+
(
16
−
12
.
5
)
2
12
.
5
+
(
15
−
12
.
5
)
2
12
.
5
+
(
19
−
12
.
5
)
2
12
.
5
=
(
17
−
12
.
5
)
2
12
.
5
+
(
8
−
12
.
5
)
2
12
.
5
+
(
8
−
12
.
5
)
2
12
.
5
+
(
6
−
12
.
5
)
2
12
.
5
+
(
10
−
12
.
5
)
2
12
.
5
+
(
16
−
12
.
5
)
2
12
.
5
+
(
15
−
12
.
5
)
2
12
.
5
+
(
19
−
12
.
5
)
2
12
.
5
size 12{ {}= { { \( "17" - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } + { { \( 8 - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } + { { \( 8 - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } + { { \( 6 - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } + { { \( "10" - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } + { { \( "16" - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } + { { \( "15" - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } + { { \( "19" - "12" "." 5 \) rSup { size 8{2} } } over {"12" "." 5} } } {}
=
13
.
6
=
13
.
6
size 12{ {}="13" "." 6} {}
If you are using the TI-84 series graphing calculators, ON SOME OF THEM there is a function in STAT TESTS called
x2x2 size 12{x rSup { size 8{2} } } {} GOF-Test that does the goodness-of-fit test. You first have to enter the observed percentages from the article in one list (enter as whole numbers) and the expected percentages (uniform implies they are each 12.5%) in a second list (enter 12.5 for each entry: 100 divided by 8 = 12.5). Then do the test by going to
x2x2 size 12{x rSup { size 8{2} } } {} GOF-Test.
If you are using the TI-83 series, enter the observed percentages in list1 and the expected percentages in list2 and in list3 (go to the list name), enter (list1-list2)^2/list2. Press enter. Add the values in list3 (this is the test statistic). Then go to 2nd DISTR
x2x2 size 12{x rSup { size 8{2} } } {}cdf. Then enter the test statistic (13.6) and the upper value of the area (10^99) and the degrees of freedom (7).
Probability Statement:
P
(
x
2
>
13
.
6
)
=
0
.
0588
P
(
x
2
>
13
.
6
)
=
0
.
0588
size 12{P \( x rSup { size 8{2} } >"13" "." 6 \) =0 "." "0588"} {}
(Always a right-tailed test)
Since
α<pα<p size 12{a<p} {}-value
(0.01<0.0588)(0.01<0.0588) size 12{ \( 0 "." "01"<0 "." "0588" \) } {}, we do not reject
HoHo size 12{H rSub { size 8{o} } } {}.
We conclude that there is not sufficient evidence to reject the null hypothesis. It appears that the percent of teenage motor vehicle deaths fits a uniform distribution. It does not matter what time of the day or night it is. Teenagers die from motor vehicle accidents equally at any time of the day or night. However, if the level of significance were 10%, we would reject the null hypothesis and conclude that the distribution of deaths does not fit a uniform distribution.
A test of independence compares two factors to determine if they are independent (i.e. one factor does not affect the happening of a second factor).
The following table shows a random sample of 100 hikers and the area of hiking preferred.
Hiking Preference Area
| Gender |
The Coastline |
Near Lakes and Streams |
On Mountain Peaks |
| Female |
18 |
16 |
11 |
| Male |
16 |
25 |
14 |
The two factors are gender and preferred hiking area.
- HoHo size 12{H rSub { size 8{o} } } {}: Gender and preferred hiking area are independent.
-
H
a
H
a
size 12{H rSub { size 8{a} } } {}
: Gender and preferred hiking area are not independent
The distribution for the hypothesis test is
x22x22 size 12{x rSub { size 8{2} } rSup { size 8{2} } } {}.
The df's are equal to:
(
rows
-
1
)
(
columns
-
1
)
=
(
2
-
1
)
(
3
-
1
)
=
2
(rows - 1)(columns - 1) = (2 - 1)(3 - 1) = 2
The chi-square statistic is calculated using
∑
(
2
−
3
)
(
0
−
E
)
2
E
∑
(
2
−
3
)
(
0
−
E
)
2
E
size 12{ Sum rSub { size 8{ \( 2 - 3 \) } } { { { \( 0 - E \) rSup { size 8{2} } } over {E} } } } {}
Each expected (E) value is calculated using
(rowtotal)(columntotal)totalsurveyed(rowtotal)(columntotal)totalsurveyed size 12{ { { \( ital "rowtotal" \) \( ital "columntotal" \) } over { ital "totalsurveyed"} } } {}
The first expected value (female, the coastline) is
45⋅34100=15.345⋅34100=15.3 size 12{ { {"45" cdot "34"} over {"100"} } ="15" "." 3} {}
The expected values are: 15.3, 18.45, 11.25, 18.7, 22.55, 13.75
The chi-square statistic is:
∑
(
2
−
3
)
(
0
−
E
)
2
E
=
∑
(
2
−
3
)
(
0
−
E
)
2
E
=
size 12{ Sum rSub { size 8{ \( 2 - 3 \) } } { { { \( 0 - E \) rSup { size 8{2} } } over {E} } } ={}} {}
(
18
−
15
.
3
)
2
15
.
3
+
(
16
−
18
.
45
)
2
18
.
45
+
(
11
−
11
.
15
)
2
11
.
25
+
(
16
−
18
.
7
)
2
18
.
7
+
(
25
−
22
.
55
)
2
22
.
55
+
(
14
−
13
.
75
)
2
13
.
75
(
18
−
15
.
3
)
2
15
.
3
+
(
16
−
18
.
45
)
2
18
.
45
+
(
11
−
11
.
15
)
2
11
.
25
+
(
16
−
18
.
7
)
2
18
.
7
+
(
25
−
22
.
55
)
2
22
.
55
+
(
14
−
13
.
75
)
2
13
.
75
size 12{ { { \( "18" - "15" "." 3 \) rSup { size 8{2} } } over {"15" "." 3} } + { { \( "16" - "18" "." "45" \) rSup { size 8{2} } } over {"18" "." "45"} } + { { \( "11" - "11" "." "15" \) rSup { size 8{2} } } over {"11" "." "25"} } + { { \( "16" - "18" "." 7 \) rSup { size 8{2} } } over {"18" "." 7} } + { { \( "25" - "22" "." "55" \) rSup { size 8{2} } } over {"22" "." "55"} } + { { \( "14" - "13" "." "75" \) rSup { size 8{2} } } over {"13" "." "75"} } } {}
=
1
.
47
=
1
.
47
size 12{ {}=1 "." "47"} {}
The TI-83/84 series have the function
x2x2 size 12{x rSup { size 8{2} } } {}-Test in STAT TESTS to preform this test. First, you have to enter the observed values in the table into a matrix by using 2nd MATRIX and EDIT [A]. Enter the values and go to
x2x2 size 12{x rSup { size 8{2} } } {}-Test. Matrix [B] is calculated automatically when you run the test.
Probability Statement: pp size 12{p} {}-value
=0.4800=0.4800 size 12{ {}=0 "." "4800"} {}(A right-tailed test)
Since
αα size 12{a} {} is less than 0.05, we do not reject the null.
There is not sufficient evidence to conclude that gender and hiking preference are not independent.
Sometimes you might be interested in how something varies. A test of a single variance is the type of hypothesis test you could run in order to determine variability.
A vending machine company which produces coffee vending machines claims that its machine pours an 8 ounce cup of coffee, on the average, with a standard deviation of 0.3 ounces. A college that uses the vending machines claims that the standard deviation is more than 0.3 ounces causing the coffee to spill out of a cup. The college sampled 30 cups of coffee and found that the standard deviation was 1 ounce. At the 1% level of significance, test the claim made by the vending machine company.
H
o
:
σ
2
=
(
0
.
3
)
2
H
o
:
σ
2
=
(
0
.
3
)
2
size 12{H rSub { size 8{o} } :σ rSup { size 8{2} } = \( 0 "." 3 \) rSup { size 8{2} } } {}
H
a
:
σ
2
>
(
0
.
3
)
2
H
a
:
σ
2
>
(
0
.
3
)
2
size 12{H rSub { size 8{a} } :σ rSup { size 8{2} } > \( 0 "." 3 \) rSup { size 8{2} } } {}
The distribution for the hypothesis test is
x292x292 size 12{x rSub { size 8{"29"} } rSup { size 8{2} } } {} where
df=30−1=29df=30−1=29 size 12{ ital "df"="30" - 1="29"} {}.
The test statistic
x2=(n−1)⋅s2σ2=(30−1)⋅120.32=322.22x2=(n−1)⋅s2σ2=(30−1)⋅120.32=322.22 size 12{x rSup { size 8{2} } = { { \( n - 1 \) cdot s rSup { size 8{2} } } over {σ rSup { size 8{2} } } } = { { \( "30" - 1 \) cdot 1 rSup { size 8{2} } } over {0 "." 3 rSup { size 8{2} } } } ="322" "." "22"} {}
Probability Statement:
P
(
x
2
>
322
.
22
)
=
0
P
(
x
2
>
322
.
22
)
=
0
size 12{P \( x rSup { size 8{2} } >"322" "." "22" \) =0} {}
Since
a>pa>p size 12{a>p} {}-value
(0.01>0)(0.01>0) size 12{ \( 0 "." "01">0 \) } {}, reject
HoHo size 12{H rSub { size 8{o} } } {}.
There is sufficient evidence to conclude that the standard deviation is more than 0.3 ounces of coffee. The vending machine company needs to adjust their machines to prevent spillage.
Have the students do the Practice 1, Practice 2, and Practice 3 in class collaboratively.
Assign Homework . Suggested homework: 3, 5, 7 (GOF), 9, 13, 15 (Test of Indep.), 17, 19, 23 (Variance), 24 - 37 (General)
"The teacher's guide is a companion to Collaborative Statistics -- http://cnx.org/content/col10522."