<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE document PUBLIC "-//CNX//DTD CNXML 0.5 plus MathML//EN" "http://cnx.rice.edu/technology/cnxml/schema/dtd/0.5/cnxml_mathml.dtd">
<document xmlns="http://cnx.rice.edu/cnxml" xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:bib="http://bibtexml.sf.net/" xmlns:m="http://www.w3.org/1998/Math/MathML" id="new">
  <name>The Chi-Square Distribution: Goodness-of-Fit Test</name>
  <metadata>
  <md:version>1.6</md:version>
  <md:created>2008/07/05 14:16:12 GMT-5</md:created>
  <md:revised>2008/10/27 20:41:11.182 GMT-5</md:revised>
  <md:authorlist>
      <md:author id="sdean">
      <md:firstname>Susan</md:firstname>
      
      <md:surname>Dean</md:surname>
      <md:email>deansusan@deanza.edu</md:email>
    </md:author>
      <md:author id="billowsky">
      <md:firstname>Barbara</md:firstname>
      
      <md:surname>Illowsky</md:surname>
      <md:email>illowskybarbara@deanza.edu</md:email>
    </md:author>
  </md:authorlist>

  <md:maintainerlist>
    <md:maintainer id="sdean">
      <md:firstname>Susan</md:firstname>
      
      <md:surname>Dean</md:surname>
      <md:email>deansusan@deanza.edu</md:email>
    </md:maintainer>
    <md:maintainer id="billowsky">
      <md:firstname>Barbara</md:firstname>
      
      <md:surname>Illowsky</md:surname>
      <md:email>illowskybarbara@deanza.edu</md:email>
    </md:maintainer>
    <md:maintainer id="cnxorg">
      <md:firstname/>
      
      <md:surname>Connexions</md:surname>
      <md:email>cnx@cnx.org</md:email>
    </md:maintainer>
  </md:maintainerlist>
  
  <md:keywordlist>
    <md:keyword>chi</md:keyword>
    <md:keyword>elementary</md:keyword>
    <md:keyword>fit</md:keyword>
    <md:keyword>good</md:keyword>
    <md:keyword>square</md:keyword>
    <md:keyword>statistics</md:keyword>
    <md:keyword>test</md:keyword>
  </md:keywordlist>

  <md:abstract>This module describes how the chi-square distribution is used to conduct goodness-of-fit test.</md:abstract>
</metadata>
  <content>





<para id="element-112">In this type of hypothesis test, you determine whether the data <emphasis>"fit"</emphasis> a particular
distribution or not. For example, you may suspect your unknown data fit a binomial
distribution. You use a chi-square test (meaning the distribution for the hypothesis test is
chi-square) to determine if there is a fit or not. <emphasis>The null and the alternate hypotheses
for this test may be written in sentences or may be stated as equations or
inequalities.</emphasis>
    </para><para id="element-908">The test statistic for a goodness-of-fit test is:

</para><equation id="element-252"><m:math>
<m:munder>
<m:mo>Σ</m:mo>
<m:mi>n</m:mi>
</m:munder>

<m:mfrac>

<m:mrow>
<m:mo>(</m:mo>
<m:mi>O</m:mi>
<m:mo>−</m:mo>
<m:mi>E</m:mi>
<m:msup>
<m:mo>)</m:mo>
<m:mn>2</m:mn>
</m:msup>
</m:mrow>


<m:mrow>
<m:mi>E</m:mi>
</m:mrow>


</m:mfrac>
</m:math>

</equation><para id="element-248">where:</para>
<list id="element-645" type="bulleted"><item><m:math><m:mi>O</m:mi></m:math> = observed values (data)
</item><item id="element-481"><m:math><m:mi>E</m:mi></m:math> = expected values (from theory)
</item><item id="element-402"><m:math><m:mi>n</m:mi></m:math> = the number of different data cells
or categories</item>
</list>
<para id="element-512"><emphasis>The observed values are the data values and the expected values are the
values you would expect to get if the null hypothesis were true.</emphasis> There are <m:math><m:mi>n</m:mi></m:math> terms of the form


<m:math>
<m:mfrac>

<m:mrow>
<m:mo>(</m:mo>
<m:mi>O</m:mi>
<m:mo>−</m:mo>
<m:mi>E</m:mi>
<m:msup>
<m:mo>)</m:mo>
<m:mn>2</m:mn>
</m:msup>
</m:mrow>


<m:mrow>
<m:mi>E</m:mi>
</m:mrow>


</m:mfrac>
</m:math>.</para><para id="element-304">The degrees of freedom are <m:math><m:mtext>df = (number of columns - 1)(number of rows - 1)</m:mtext></m:math>.</para><para id="element-838"><emphasis>The goodness-of-fit test is almost always right tailed.</emphasis> If the observed values and
the corresponding expected values are not close to each other, then the test statistic
can get very large and will be way out in the right tail of the chi-square curve.</para><example id="element-719"><para id="element-615">
 Absenteeism of college students from math classes is a major concern to
math instructors because missing class appears to increase the drop rate. Three
statistics instructors wondered whether the absentee rate was the <emphasis>same</emphasis> for every
day of the school week. They took a sample of absent students from three of their
statistics classes during one week of the term. The results of the survey appear in the
table.
</para>
<table id="element-235325">
<?table-summary This table presents the number of students absent by days of the week. Monday is in the second column, Tuesday is in the third column, Wednesday is in the fourth column, Thursday is in the fifth column, and Friday is in the sixth column. There is only one row of data for the number of students absent.?>
<tgroup cols="6"><colspec colnum="1" colname="header_c1"/>
<colspec colnum="2" colname="c2"/>
<colspec colnum="3" colname="c3"/>
<colspec colnum="4" colname="c4"/>
<colspec colnum="5" colname="c5"/>
<colspec colnum="6" colname="c6"/>
<thead>
<row>
<entry/>
<entry>Monday</entry>
<entry>Tuesday</entry>
<entry>Wednesday</entry>
<entry>Thursday</entry>
<entry>Friday</entry>
</row>
</thead>
<tbody>
<row>
<entry># of students absent</entry>
<entry>28</entry>
<entry>22</entry>
<entry>18</entry>
<entry>20</entry>
<entry>32</entry>
</row>
</tbody>



</tgroup>
</table>

<para id="element-32252626">
Determine the null and alternate hypotheses needed to run a goodness-of-fit test.
</para><para id="element-859">Since the instructors wonder whether the absentee rate is the same for every school
day, we could say in the null hypothesis that the data <emphasis>"fit"</emphasis> a uniform distribution.</para><para id="element-846"><emphasis> <m:math>
<m:msub>
<m:mi>H</m:mi>
<m:mi>o</m:mi>
</m:msub>
</m:math>:</emphasis> The rate at which college students are absent from their statistics class fits a
uniform distribution.</para><para id="element-82">The alternate hypothesis is the opposite of the null hypothesis.</para><para id="element-421"><emphasis> <m:math>
<m:msub>
<m:mi>H</m:mi>
<m:mi>a</m:mi>
</m:msub>
</m:math>:</emphasis> The rate at which college students are absent from their statistics class does
not fit a uniform distribution.</para><exercise id="element-217"><problem>
  <para id="element-451">
    How many students do you <emphasis>expect</emphasis> to be absent on any given school day?
  </para>
</problem>

<solution>
  <para id="element-600">
 The total number of students in the sample is 120. <emphasis>If the null
hypothesis were true,</emphasis> you would divide 120 by 5 to get 24 absences expected
per day. <emphasis>The expected number is based on a true null hypothesis.</emphasis>
  </para>
</solution>
</exercise><exercise id="element-612"><problem>
  <para id="element-496">What are the degrees of freedom (<m:math><m:mi>df</m:mi></m:math>)?</para>
</problem>

<solution>
  <para id="element-601">
There are 5 days of the week or 5 "cells" or categories.</para>
<para id="element-6012"><m:math><m:mi>df = no. cells - 1 = 5 - 1 = 4</m:mi></m:math>
  </para>
</solution>
</exercise></example><example id="element-962"><para id="element-733">Employers particularly want to know which days
of the week employees are absent in a five day work week. Most employers would
like to believe that employees are absent equally during the week. That is, the average
number of times an employee is absent is the same on Monday, Tuesday, Wednesday,
Thursday, or Friday. Suppose a sample of 20 absent days was taken and the days
absent were distributed as follows:
<table id="table-235674346">
<?table-summary This table presents the number of students absent by days of the week. Monday is in the second column, Tuesday is in the third column, Wednesday is in the fourth column, Thursday is in the fifth column, and Friday is in the sixth column. There is only one row of data for the number of students absent.?>
<name>Day of the Week Absent</name>
<tgroup cols="6"><colspec colnum="1" colname="header_c1"/>
	<colspec colnum="2" colname="c2"/>
	<colspec colnum="3" colname="c3"/>
	<colspec colnum="4" colname="c4"/>
	<colspec colnum="5" colname="c5"/>
	<colspec colnum="6" colname="c6"/>
	<thead valign="top">
		<row>
			<entry/>
			<entry>Monday</entry>
			<entry>Tuesday</entry>
			<entry>Wednesday</entry>
			<entry>Thursday</entry>
			<entry>Friday</entry>
		</row>
	</thead>
	<tbody valign="top">
		<row>
			<entry>Number of Absences</entry>
			<entry>5</entry>
			<entry>4</entry>
			<entry>2</entry>
			<entry>3</entry>
			<entry>6</entry>
		</row>
	</tbody>


</tgroup>
</table>
</para><exercise id="element-811"><problem>
  <para id="element-215">
   For the population of employees, do the absent days occur with equal frequencies
during a five day work week? Test at a 5% significance level.
  </para>
</problem>

<solution>
  <para id="element-13">The null and alternate hypotheses are:
  </para><list id="element-741" type="bulleted">
<item><m:math>
<m:msub>
<m:mi>H</m:mi>
<m:mi>o</m:mi>
</m:msub>
</m:math>: The absent days occur with equal frequencies, that is, they fit a uniform distribution.</item>
<item>
 <m:math>
<m:msub>
<m:mi>H</m:mi>
<m:mi>a</m:mi>
</m:msub>
</m:math>: The absent days occur with unequal frequencies, that is, they do not fit a uniform
distribution.</item>
</list><para id="element-71">If the absent days occur with equal frequencies, then, out of 20 absent days, there
would be 4 absences on Monday, 4 on Tuesday, 4 on Wednesday, 4 on Thursday,
and 4 on Friday. These numbers are the <emphasis>expected</emphasis> (<m:math><m:mi>E</m:mi></m:math>) values. The values in the
table are the <emphasis>observed</emphasis> (<m:math><m:mi>O</m:mi></m:math>) values or data.</para><para id="element-489">This time, calculate the <m:math><m:msup>
<m:mi>χ</m:mi>
<m:mn>2</m:mn>
</m:msup></m:math> test statistic by hand. Make a chart with the following headings:</para><list id="element-120" type="bulleted"><item>Expected (<m:math><m:mi>E</m:mi></m:math>) values</item>
<item>Observed (<m:math><m:mi>O</m:mi></m:math>) values</item>
<item><m:math><m:mo>(</m:mo> <m:mi>O</m:mi> <m:mo> - </m:mo>  
               <m:mi>E</m:mi><m:mo>)</m:mo></m:math>
</item>
<item><m:math><m:msup><m:mrow><m:mo>(</m:mo> <m:mi>O</m:mi> <m:mo> - </m:mo>  
               <m:mi>E</m:mi><m:mo>)</m:mo></m:mrow><m:mn>2</m:mn></m:msup></m:math>
</item>

<item><m:math><m:mfrac><m:mrow><m:msup><m:mrow><m:mo>(</m:mo> <m:mi>O</m:mi> <m:mo> - </m:mo>  
               <m:mi>E</m:mi><m:mo>)</m:mo></m:mrow><m:mn>2</m:mn></m:msup></m:mrow>
<m:mi>E</m:mi>
</m:mfrac>
</m:math>
</item></list><para id="element-231">Now add (sum) the last column. Verify that the sum is 2.5. This is the 
<m:math>
<m:msup>
<m:mi>χ</m:mi>
<m:mi>2</m:mi>
</m:msup>
</m:math> test statistic.</para><para id="element-290">To find the p-value, calculate
<m:math>
<m:mi>P</m:mi>
<m:mo>(</m:mo>
<m:msup>
<m:mi>χ</m:mi>
<m:mn>2</m:mn>
</m:msup>
<m:mo>&gt;</m:mo>
<m:mn>2.5</m:mn>
<m:mo>)</m:mo>
</m:math>. This test is right-tailed.</para><para id="element-894">The <m:math><m:mi>dfs</m:mi></m:math> are the <m:math><m:mtext>number of cells</m:mtext><m:mo>-</m:mo>
<m:mn>1</m:mn>
<m:mo>=</m:mo>
<m:mn>4</m:mn>
</m:math>.</para><para id="element-724">Next, complete a graph like the one below with the proper labeling and shading. (You
should shade the right tail. It will be a "large" right tail for this example because the
p-value is "large.")</para>
 <media type="image/png" src="chisq_uses1.png">
 <param name="alt" value="Blank nonsymmetrical chi-square curve for the test statistic of the days of the week absent."/>
 
 <param name="print-width" value="3in"/>
 </media>
<para id="element-967">Use a computer or calculator to find the p-value. You should get <m:math><m:mtext>p-value</m:mtext><m:mo>=</m:mo><m:mn>0.6446</m:mn></m:math>.</para><para id="element-545">The decision is to not reject the null hypothesis.</para><para id="element-1000"><emphasis>Conclusion:</emphasis> At a 5% level of significance, from the sample data, there is not sufficient
evidence to conclude that the absent days do not occur with equal frequencies.</para><para id="element-822"><emphasis>TI-83+ and TI-84:</emphasis> Press <code>2nd DISTR</code>. Arrow down to <code><m:math><m:msup><m:mi>χ</m:mi><m:mn>2</m:mn></m:msup></m:math>cdf</code>. Press <code>ENTER</code>.
Enter <code>(2.5,1E99,4)</code>. Rounded to 4 places, you should see 0.6446 which is the
p-value.</para><note>TI-83+ and some TI-84 calculators do not have a special program for
the test statistic for the goodness-of-fit test. The next example (Example 11-3) has
the calculator instructions.
The newer TI-84 calculators have in <code>STAT TESTS</code> the test <code>Chi2 GOF</code>. To run the
test, put the observed values (the data) into a first list and the expected values (the
values you expect if the null hypothesis is true) into a second list. Press <code>STAT</code>
<code>TESTS</code> and <code>Chi2 GOF</code>. Enter the list names for the Observed list and the
Expected list. Enter whatever else is asked and press <code>calculate</code> or <code>draw</code>. Make
sure you clear any lists before you start. See below.</note><note><emphasis>To Clear Lists in the calculators:</emphasis> Go into <code>STAT EDIT</code> and arrow up to the list
name area of the particular list. Press <code>CLEAR</code> and then arrow down. The list will
be cleared. Or, you can press <code>STAT</code> and press 4 (for <code>ClrList</code>). Enter the list name
and press <code>ENTER</code>.</note>
</solution>
</exercise>
</example><example id="element-258"><para id="element-59">
  One study indicates that the number of televisions
that American families have is distributed (this is the <emphasis>given </emphasis>distribution for the American
population) as follows:
</para>
<table id="table-234567">
<?table-summary This table presents the number of televisions that American families have is in the first column, and the expected percentage is in the second column.?>
<tgroup cols="2"><colspec colnum="1" colname="c1"/>
<colspec colnum="2" colname="c2"/>
<thead valign="top">
<row>
<entry>Number of Televisions</entry>
<entry>Percent</entry>
</row>
</thead>
<tbody>
<row>
<entry>0</entry>
<entry>10</entry>
</row>
<row>
<entry>1</entry>
<entry>16</entry>
</row>
<row>
<entry>2</entry>
<entry>55</entry>
</row>
<row>
<entry>3</entry>
<entry>11</entry>
</row>
<row>
<entry>over 3</entry>
<entry>8</entry>
</row>
</tbody>


</tgroup>
</table>
<para id="element-23563">The table contains
expected (<m:math><m:mi>E</m:mi></m:math>)
percents.
</para>
<para id="element-236">A random sample of 600 families in the far western United States resulted in the following
data:
</para>
<table id="table-234599867">
<?table-summary This table presents the number of televisions that American families have is in the first column, and the observed frequency is in the second column.?>
<tgroup cols="2"><colspec colnum="1" colname="c1"/>
<colspec colnum="2" colname="c2"/>
<thead valign="top">
<row>
<entry>Number of Televisions</entry>
<entry>Frequency</entry>
</row>
</thead>
<tfoot>
<row>
<entry/>
<entry>Total = 600</entry>
</row>
</tfoot>
<tbody>
<row>
<entry>0</entry>
<entry>66</entry>
</row>
<row>
<entry>1</entry>
<entry>119</entry>
</row>
<row>
<entry>2</entry>
<entry>340</entry>
</row>
<row>
<entry>3</entry>
<entry>60</entry>
</row>
<row>
<entry>over 3</entry>
<entry>15</entry>
</row>


</tbody>


</tgroup>
</table><para id="element-560">The table contains observed (<m:math><m:mi>O</m:mi></m:math>) frequency values.</para><exercise id="element-911"><problem>
  <para id="element-649">
   At the 1% significance level, does it appear that the distribution "number of televisions" of
far western United States families is different from the distribution for the American
population as a whole?
  </para>
</problem>

<solution>
  <para id="element-538">
   This problem asks you to test whether the far western United States families distribution fits
the distribution of the American families. This test is always right-tailed.
  </para><para id="element-788">The first table contains expected percentages. To get expected (<m:math><m:mi>E</m:mi></m:math>) frequencies,
multiply the percentage by 600. The expected frequencies are:</para>

<table id="table-2345671">
<?table-summary This table is the same as above except for an additional third column listing th expected frequencies.?>
<tgroup cols="3"><colspec colnum="1" colname="c1"/>
<colspec colnum="2" colname="c2"/>
<colspec colnum="3" colname="c3"/>
<thead valign="top">
<row>
<entry>Number of Televisions</entry>
<entry>Percent</entry>
<entry>Expected Frequency</entry>
</row>
</thead>
<tbody>
<row>
<entry>0</entry>
<entry>10</entry>
<entry>
<m:math>
<m:mo>(</m:mo>
<m:mn>0.10</m:mn>
<m:mo>)</m:mo>
<m:mo>⋅</m:mo>
<m:mo>(</m:mo>
<m:mn>600</m:mn>
<m:mn>)</m:mn>
<m:mo>=</m:mo>
<m:mn>60</m:mn>
</m:math>
</entry>
</row>
<row>
<entry>1</entry>
<entry>16</entry>
<entry>
<m:math>
<m:mo>(</m:mo>
<m:mn>0.16</m:mn>
<m:mo>)</m:mo>
<m:mo>⋅</m:mo>
<m:mo>(</m:mo>
<m:mn>600</m:mn>
<m:mn>)</m:mn>
<m:mo>=</m:mo>
<m:mn>96</m:mn>
</m:math>
</entry>
</row>
<row>
<entry>2</entry>
<entry>55</entry>
<entry>
<m:math>
<m:mo>(</m:mo>
<m:mn>0.55</m:mn>
<m:mo>)</m:mo>
<m:mo>⋅</m:mo>
<m:mo>(</m:mo>
<m:mn>600</m:mn>
<m:mn>)</m:mn>
<m:mo>=</m:mo>
<m:mn>330</m:mn>
</m:math>
</entry>
</row>
<row>
<entry>3</entry>
<entry>11</entry>
<entry>
<m:math>
<m:mo>(</m:mo>
<m:mn>0.11</m:mn>
<m:mo>)</m:mo>
<m:mo>⋅</m:mo>
<m:mo>(</m:mo>
<m:mn>600</m:mn>
<m:mn>)</m:mn>
<m:mo>=</m:mo>
<m:mn>66</m:mn>
</m:math>
</entry>
</row>
<row>
<entry>over 3</entry>
<entry>8</entry>
<entry>
<m:math>
<m:mo>(</m:mo>
<m:mn>0.08</m:mn>
<m:mo>)</m:mo>
<m:mo>⋅</m:mo>
<m:mo>(</m:mo>
<m:mn>600</m:mn>
<m:mn>)</m:mn>
<m:mo>=</m:mo>
<m:mn>48</m:mn>
</m:math>
</entry>
</row>
</tbody>







</tgroup>
</table>


<para id="element-whatever">
Therefore, the expected frequencies are 60, 96, 330, 66, and 48. In the TI
calculators, you can let the calculator do the math. For example, instead of 60,
enter .10*600.</para><para id="element-239"><m:math>
<m:msub>
<m:mi>H</m:mi>
<m:mi>o</m:mi>
</m:msub>
</m:math>: The "number of televisions" distribution of far western United States families
is the same as the "number of televisions" distribution of the American population.</para><para id="element-2"><m:math>
<m:msub>
<m:mi>H</m:mi>
<m:mi>a</m:mi>
</m:msub>
</m:math>: The "number of televisions" distribution of far western United States families
is different from the "number of televisions" distribution of the American population.</para><para id="element-228">Distribution for the test: 
<m:math>
<m:msubsup>
<m:mi>χ</m:mi>
<m:mn>4</m:mn>
<m:mn>2</m:mn>
</m:msubsup>
</m:math> where  
<m:math><m:mi>df</m:mi>
<m:mo>=</m:mo> 
<m:mtext>(the number of cells)</m:mtext>
<m:mo>-</m:mo>
<m:mn>1</m:mn>
<m:mo>=</m:mo>
<m:mn>5</m:mn>
<m:mo>-</m:mo>
<m:mn>1</m:mn>
<m:mo>=</m:mo>
<m:mn>4</m:mn>
</m:math>.</para><note><m:math>
<m:mi>df ≠ 600 − 1</m:mi>
</m:math></note><para id="element-65"><emphasis>Calculate the test statistic:</emphasis> <m:math>
<m:msup>
<m:mi>χ</m:mi>
<m:mn>2</m:mn>
</m:msup>
<m:mo>=</m:mo>
<m:mn>29.65</m:mn>
</m:math></para><para id="element-280"><emphasis>Graph:</emphasis></para>
<media type="image/png" src="chisq_uses2.png">
<param name="alt" value="Non-symmetric chi-square curve with values of 0, 4, and 29.65 on the x-axis representing the test statistic of the comparison of the number of televisions in America. A vertical upward line extends from 29.65 to the curve, and the area to the right of this line is equal to the p-value."/>

<param name="print-width" value="3in"/>
</media>
<para id="element-635"><emphasis>Probability statement:</emphasis>
<m:math>
<m:mtext>p-value</m:mtext>
<m:mo>=</m:mo>
<m:mi>P</m:mi>
<m:mo>(</m:mo>
<m:msup>
<m:mi>χ</m:mi>
<m:mn>2</m:mn>
</m:msup>
<m:mo>&gt;</m:mo>
<m:mn>29.65</m:mn>
<m:mo>)</m:mo>
<m:mo>=</m:mo>
<m:mn>0.000006</m:mn>
</m:math>.</para><para id="element-42"><emphasis>Compare α and the p-value:</emphasis>
<list id="list-whatever3" type="bulleted">
<item><m:math>
<m:mi>α</m:mi>
<m:mo>=</m:mo>
<m:mn>0.01</m:mn></m:math>
</item><item><m:math>
<m:mtext>p-value</m:mtext>
<m:mo>=</m:mo>
<m:mn>0.000006</m:mn></m:math></item>
</list>
So, <m:math>
<m:mi>α</m:mi>
<m:mo> &gt; </m:mo>
<m:mtext>p-value</m:mtext>
</m:math>.</para><para id="element-251"><emphasis>Make a decision:</emphasis> Since 
<m:math>
<m:mi>α</m:mi>
<m:mo>&gt;</m:mo>
<m:mtext>p-value</m:mtext>
</m:math>, reject 
<m:math>
<m:msub>
<m:mi>H</m:mi>
<m:mi>o</m:mi>
</m:msub>
</m:math>.</para><para id="element-717">This means you reject the belief that the distribution for the far western states is the same
as that of the American population as a whole.</para><para id="element-387"><emphasis>Conclusion:</emphasis> At the 1% significance level, from the data, there is sufficient evidence to
conclude that the "number of televisions" distribution for the far western United States
is different from the "number of televisions" distribution for the American population as
a whole.</para><note>TI-83+ and some TI-84 calculators: Press <code>STAT</code> and <code>ENTER</code>. Make sure to
clear lists <code>L1</code>, <code>L2</code>, and <code>L3</code> if they have data in them (see the note at the end of
Example 11-2). Into <code>L1</code>, put the observed frequencies <code>66</code>, <code>119</code>, <code>349</code>, <code>60</code>, <code>15</code>. Into
<code>L2</code>, put the expected frequencies <code>.10*600, .16*600</code>, <code>.55*600</code>, <code>.11*600</code>, <code>.08*600</code>.
Arrow over to list <code>L3</code> and up to the name area <code>"L3"</code>. Enter <code>(L1-L2)^2/L2</code> and
<code>ENTER</code>. Press <code>2nd QUIT</code>. Press <code>2nd LIST</code> and arrow over to <code>MATH</code>. Press <code>5</code>.
You should see <code>"sum" (Enter L3)</code>. Rounded to 2 decimal places, you should
see <code>29.65</code>. Press <code>2nd DISTR</code>. Press <code>7</code> or Arrow down to <code>7:χ2cdf</code> and press
<code>ENTER</code>. Enter <code>(29.65,1E99,4)</code>. Rounded to 4 places, you should see <code>5.77E-6 = .000006</code> (rounded to 6 decimal places) which is the p-value.</note>
</solution>
</exercise></example><example id="element-866"><exercise id="element-221"><problem>
  <para id="element-95">
   Suppose you flip two coins 100 times. The
results are 20 HH, 27 HT, 30 TH, and 23 TT. Are the coins fair? Test at a 5%
significance level.
  </para>
</problem>

<solution>
  <para id="element-929">
   This problem can be set up as a goodness-of-fit problem. The sample space for flipping
two fair coins is {HH, HT, TH, TT}. Out of 100 flips, you would expect 25 HH, 25 HT,
25 TH, and 25 TT. This is the expected distribution. The question, "Are the coins fair?"
is the same as saying, "Does the distribution of the coins (20 HH, 27 HT, 30 TH, 23 TT)
fit the expected distribution?"
  </para><para id="element-570"><emphasis>Random Variable:</emphasis> Let <m:math><m:mi>X</m:mi></m:math> = the number of heads in one flip of the two coins. <m:math><m:mi>X</m:mi></m:math>
takes on the value 0, 1, 2. (There are 0, 1, or 2 heads in the flip of 2 coins.) Therefore,
the <emphasis>number of cells is 3</emphasis>. Since <m:math><m:mi>X</m:mi></m:math> = the number of heads, the observed frequencies are
20 (for 2 heads), 57 (for 1 head), and 23 (for 0 heads or both tails). The expected
frequencies are 25 (for 2 heads), 50 (for 1 head), and 25 (for 0 heads or both tails). This
test is right-tailed.</para><para id="element-774"><m:math>
<m:msub>
<m:mi>H</m:mi>
<m:mi>o</m:mi>
</m:msub>
</m:math>: The coins are fair. 
 </para><para id="element-734"><m:math>
<m:msub>
<m:mi>H</m:mi>
<m:mi>a</m:mi>
</m:msub>
</m:math>: The coins are not fair.</para><para id="element-136"><emphasis>Distribution for the test:</emphasis>
<m:math>
<m:msubsup>
<m:mi>χ</m:mi>
<m:mn>2</m:mn>
<m:mn>2</m:mn>
</m:msubsup>
</m:math>

where 
<m:math>
<m:mi>df</m:mi>
<m:mo>=</m:mo>
<m:mn>3</m:mn>
<m:mo>-</m:mo>
<m:mn>1</m:mn>
<m:mo>=</m:mo>
<m:mn>2</m:mn>
</m:math>.</para><para id="element-380"><emphasis>Calculate the test statistic:</emphasis> 
<m:math>
<m:msup>
<m:mi>χ</m:mi>
<m:mn>2</m:mn>
</m:msup>
<m:mo>=</m:mo> 
<m:mn>2.14</m:mn>
</m:math></para><para id="element-973"><emphasis>Graph:</emphasis></para>
  <media type="image/png" src="chisq_uses3.png">
  <param name="alt" value="Nonsymmetrical chi-square curve with values of 0 and 2.14 on the x-axis representing the test statistic of results from flipping a coin. A vertical upward line extends from 2.14 to the curve and the area to the right of this is equal to the p-value."/>

  <param name="print-width" value="3in"/>
  </media>
<para id="element-490"><emphasis>Probability statement:</emphasis>
<m:math>
<m:mtext>p-value</m:mtext>
<m:mo>=</m:mo>
<m:mi>P</m:mi>
<m:mo>(</m:mo>
<m:msup>
<m:mi>χ</m:mi>
<m:mn>2</m:mn>
</m:msup>
<m:mo>&gt;</m:mo>
<m:mn>2.14</m:mn>
<m:mo>)</m:mo>
<m:mo>=</m:mo> 
<m:mn>0.3430</m:mn>
</m:math>
</para><para id="element-817"><emphasis>Compare 
<m:math>
<m:mi>α</m:mi>
</m:math> 
and the p-value:
</emphasis> 

<list id="element-xyz">
<item>

<m:math>
<m:mi>α</m:mi>
<m:mo>=</m:mo>
<m:mn>0.05</m:mn></m:math></item>
<item><m:math>
<m:mtext>p-value</m:mtext>
<m:mo>=</m:mo>
<m:mn>0.3430</m:mn></m:math></item>
</list>

So, <m:math>
<m:mi>α</m:mi>
<m:mo>&lt;</m:mo>
<m:mtext>p-value</m:mtext>
</m:math>.

</para><para id="element-58"><emphasis>Make a decision:</emphasis> Since 
<m:math>
<m:mi>α</m:mi>
<m:mo>&lt;</m:mo>
<m:mtext>p-value</m:mtext>
</m:math>, do not reject 
<m:math>
<m:msub>
<m:mi>H</m:mi>
<m:mi>o</m:mi>
</m:msub>
</m:math>.
</para><para id="element-995"><emphasis>Conclusion:</emphasis> The coins are fair.</para><note>TI-83+ and some TI- 84 calculators: Press <code>STAT</code> and <code>ENTER</code>. Make sure you
clear lists <code>L1</code>, <code>L2</code>, and <code>L3</code> if they have data in them. Into <code>L1</code>, put the observed
frequencies <code>20</code>, <code>57</code>, <code>23</code>. Into <code>L2</code>, put the expected frequencies <code>25</code>, <code>50</code>, <code>25</code>. Arrow
over to list <code>L3</code> and up to the name area <code>"L3"</code>. Enter <code>(L1-L2)^2/L2</code> and
<code>ENTER</code>. Press <code>2nd QUIT</code>. Press <code>2nd LIST</code> and arrow over to <code>MATH</code>. Press
<code>5</code>. You should see <code>"sum"</code>.<code>Enter L3</code>. Rounded to 2 decimal places, you
should see <code>2.14</code>. Press <code>2nd DISTR</code>. Arrow down to <code>7:χ2cdf</code> (or press <code>7</code>). Press
<code>ENTER</code>. Enter <code>2.14,1E99,2)</code>. Rounded to 4 places, you should see <code>.3430</code> which
is the p-value.</note><note>For the newer TI-84 calculators, check <code>STAT TESTS</code> to see if you have <code>Chi2
GOF</code>. If you do, see the calculator instructions (a NOTE) before Example 11-3</note>
</solution>
</exercise>
</example>



  </content>
  
</document>
