<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE document PUBLIC "-//CNX//DTD CNXML 0.5 plus MathML//EN" "http://cnx.rice.edu/technology/cnxml/schema/dtd/0.5/cnxml_mathml.dtd">
<document xmlns="http://cnx.rice.edu/cnxml" xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:bib="http://bibtexml.sf.net/" xmlns:m="http://www.w3.org/1998/Math/MathML" id="new">
  <name>Linear Regression and Correlation: Outliers</name>
  <metadata>
  <md:version>1.6</md:version>
  <md:created>2008/06/23 16:55:41 GMT-5</md:created>
  <md:revised>2008/10/27 18:20:14.664 GMT-5</md:revised>
  <md:authorlist>
      <md:author id="sdean">
      <md:firstname>Susan</md:firstname>
      
      <md:surname>Dean</md:surname>
      <md:email>deansusan@deanza.edu</md:email>
    </md:author>
      <md:author id="billowsky">
      <md:firstname>Barbara</md:firstname>
      
      <md:surname>Illowsky</md:surname>
      <md:email>illowskybarbara@deanza.edu</md:email>
    </md:author>
  </md:authorlist>

  <md:maintainerlist>
    <md:maintainer id="sdean">
      <md:firstname>Susan</md:firstname>
      
      <md:surname>Dean</md:surname>
      <md:email>deansusan@deanza.edu</md:email>
    </md:maintainer>
    <md:maintainer id="billowsky">
      <md:firstname>Barbara</md:firstname>
      
      <md:surname>Illowsky</md:surname>
      <md:email>illowskybarbara@deanza.edu</md:email>
    </md:maintainer>
    <md:maintainer id="cnxorg">
      <md:firstname/>
      
      <md:surname>Connexions</md:surname>
      <md:email>cnx@cnx.org</md:email>
    </md:maintainer>
  </md:maintainerlist>
  
  <md:keywordlist>
    <md:keyword>elementary</md:keyword>
    <md:keyword>statistics</md:keyword>
  </md:keywordlist>

  <md:abstract>This module provides an overview of Linear Regression and Correlation: Outliers as a part of Collaborative Statistics collection (col10522) by Barbara Illowsky and Susan Dean.</md:abstract>
</metadata>
  <content>
    <para id="delete_me">In some data sets, there are values <emphasis>(points)</emphasis> called <term src="#outlier">outliers</term>. <emphasis>Outliers are points that are
far from the least squares line.</emphasis> They have large "errors." Outliers need to be examined
closely. Sometimes, for some reason or another, they should not be included in the analysis of
the data. It is possible that an outlier is a result of erroneous data. Other times, an outlier may
hold valuable information about the population under study. The key is to carefully examine
what causes a data point to be an outlier.</para><example id="element-971"><exercise id="element-12987"><problem>

<para id="element-631">In the <cnxn document="m17092" target="element-22">third exam/final exam example</cnxn>, you can determine if there is an outlier
or not. If there is one, as an exercise, delete it and fit the remaining data to a new line. For this
example, the new line ought to fit the remaining data better. This means the <emphasis>SSE</emphasis> should be
smaller and the correlation coefficient ought to be closer to 1 or -1.
</para>

</problem>
<solution>

<para id="element-176">Computers and many calculators can determine outliers from the data. However, as an
exercise, we will go through the steps that are needed to calculate an outlier. In the table below, the first two
columns are the third exam and the final exam data. The third
column shows the y-hat values calculated from the line of best fit.</para>

<table id="element-466">
<tgroup cols="3"><colspec colnum="1" colname="header_c1"/>
 <colspec colnum="2" colname="c2"/>
 <colspec colnum="3" colname="c3"/>
<thead>
  <row>
    <entry><m:math><m:mi>x</m:mi></m:math></entry>
    <entry><m:math><m:mi>y</m:mi></m:math></entry>
    <entry><m:math><m:mover><m:mi>y</m:mi><m:mo>^</m:mo></m:mover></m:math></entry>
  </row>
</thead>
<tbody>
  <row>
    <entry>65   </entry>
    <entry>175  </entry>
    <entry>140  </entry>
  </row>
  <row>
    <entry>67</entry>
    <entry>133</entry>
    <entry>150</entry>
  </row>
  <row>
    <entry>71</entry>
    <entry>185</entry>
    <entry>169</entry>
  </row>
  <row>
    <entry>71</entry>
    <entry>163</entry>
    <entry>169</entry>
  </row>
  <row>
    <entry>66</entry>
    <entry>126</entry>
    <entry>145</entry>
  </row>
  <row>
    <entry>75</entry>
    <entry>198</entry>
    <entry>189</entry>
  </row>
  <row>
    <entry>67</entry>
    <entry>153</entry>
    <entry>150</entry>
  </row>
  <row>
    <entry>70</entry>
    <entry>163</entry>
    <entry>164</entry>
  </row>
  <row>
    <entry>71</entry>
    <entry>159</entry>
    <entry>169</entry>
  </row>
  <row>
    <entry>69</entry>
    <entry>151</entry>
    <entry>160</entry>
  </row>
  <row>
    <entry>69</entry>
    <entry>159</entry>
    <entry>160</entry>
  </row>
</tbody>







</tgroup>
</table>

<para id="element-973">A <emphasis>Residual</emphasis> is the <m:math><m:mtext>Actual y value</m:mtext><m:mo>−</m:mo><m:mtext>predicted y value</m:mtext><m:mo>=</m:mo><m:mi>y</m:mi><m:mo>−</m:mo><m:mover><m:mi>y</m:mi><m:mo>^</m:mo></m:mover></m:math></para><para id="element-797"><emphasis> Calculate the absolute value of each residual.</emphasis></para>

<para id="element-42"><emphasis>Calculate each <m:math><m:mo>|</m:mo><m:mi>y</m:mi><m:mo>-</m:mo><m:mover><m:mi>y</m:mi><m:mo>^</m:mo></m:mover><m:mo>|</m:mo></m:math>:</emphasis>


<table id="element-4662">
<tgroup cols="4"><colspec colnum="1" colname="header_c1"/>
 <colspec colnum="2" colname="c2"/>
 <colspec colnum="3" colname="c3"/>
 <colspec colnum="4" colname="c4"/>
<thead>
  <row>
    <entry><m:math><m:mi>x</m:mi></m:math></entry>
    <entry><m:math><m:mi>y</m:mi></m:math></entry>
    <entry><m:math><m:mover><m:mi>y</m:mi><m:mo>^</m:mo></m:mover></m:math></entry>
    <entry><m:math><m:mo>|</m:mo><m:mi>y</m:mi><m:mo>-</m:mo><m:mover><m:mi>y</m:mi><m:mo>^</m:mo></m:mover><m:mo>|</m:mo></m:math></entry>
  </row>
</thead>
<tbody>
  <row>
    <entry>65   </entry>
    <entry>175  </entry>
    <entry>140  </entry>
<entry><m:math><m:mo>|</m:mo><m:mn>175</m:mn><m:mo>-</m:mo><m:mn>140</m:mn><m:mo>|</m:mo><m:mo>=</m:mo><m:mn>35</m:mn></m:math></entry>
  </row>
  <row>
    <entry>67</entry>
    <entry>133</entry>
    <entry>150</entry>
    <entry><m:math><m:mo>|</m:mo><m:mn>133</m:mn><m:mo>-</m:mo><m:mn>150</m:mn><m:mo>|</m:mo><m:mo>=</m:mo><m:mn>17</m:mn></m:math></entry>
  </row>
  <row>
    <entry>71</entry>
    <entry>185</entry>
    <entry>169</entry>
    <entry><m:math><m:mo>|</m:mo><m:mn>185</m:mn><m:mo>-</m:mo><m:mn>169</m:mn><m:mo>|</m:mo><m:mo>=</m:mo><m:mn>16</m:mn></m:math></entry>
  </row>
  <row>
    <entry>71</entry>
    <entry>163</entry>
    <entry>169</entry>

    <entry><m:math><m:mo>|</m:mo><m:mn>163</m:mn><m:mo>-</m:mo><m:mn>169</m:mn><m:mo>|</m:mo><m:mo>=</m:mo><m:mn>6</m:mn></m:math></entry>
  </row>
  <row>
    <entry>66</entry>
    <entry>126</entry>
    <entry>145</entry>
    <entry><m:math><m:mo>|</m:mo><m:mn>126</m:mn><m:mo>-</m:mo><m:mn>145</m:mn><m:mo>|</m:mo><m:mo>=</m:mo><m:mn>19</m:mn></m:math></entry>
  </row>
  <row>
    <entry>75</entry>
    <entry>198</entry>
    <entry>189</entry>


    <entry><m:math><m:mo>|</m:mo><m:mn>198</m:mn><m:mo>-</m:mo><m:mn>189</m:mn><m:mo>|</m:mo><m:mo>=</m:mo><m:mn>9</m:mn></m:math></entry>
  </row>
  <row>
    <entry>67</entry>
    <entry>153</entry>
    <entry>150</entry>

    <entry><m:math><m:mo>|</m:mo><m:mn>153</m:mn><m:mo>-</m:mo><m:mn>150</m:mn><m:mo>|</m:mo><m:mo>=</m:mo><m:mn>3</m:mn></m:math></entry>
  </row>
  <row>
    <entry>70</entry>
    <entry>163</entry>
    <entry>164</entry>
    <entry><m:math><m:mo>|</m:mo><m:mn>163</m:mn><m:mo>-</m:mo><m:mn>164</m:mn><m:mo>|</m:mo><m:mo>=</m:mo><m:mn>1</m:mn></m:math></entry>
  </row>
  <row>
    <entry>71</entry>
    <entry>159</entry>
    <entry>169</entry>
    <entry><m:math><m:mo>|</m:mo><m:mn>159</m:mn><m:mo>-</m:mo><m:mn>169</m:mn><m:mo>|</m:mo><m:mo>=</m:mo><m:mn>10</m:mn></m:math></entry>
  </row>
  <row>
    <entry>69</entry>
    <entry>151</entry>
    <entry>160</entry>
    <entry><m:math><m:mo>|</m:mo><m:mn>151</m:mn><m:mo>-</m:mo><m:mn>160</m:mn><m:mo>|</m:mo><m:mo>=</m:mo><m:mn>9</m:mn></m:math></entry>
  </row>
  <row>
    <entry>69</entry>
    <entry>159</entry>
    <entry>160</entry>
    <entry><m:math><m:mo>|</m:mo><m:mn>159</m:mn><m:mo>-</m:mo><m:mn>160</m:mn><m:mo>|</m:mo><m:mo>=</m:mo><m:mn>1</m:mn></m:math></entry>
  </row>
</tbody>
</tgroup>
</table>


</para><para id="element-508"><emphasis>Square each <m:math><m:mo>|</m:mo><m:mi>y</m:mi><m:mo>-</m:mo><m:mover><m:mi>y</m:mi><m:mo>^</m:mo></m:mover><m:mo>|</m:mo></m:math>:</emphasis> 
</para>

<para id="element-list1"><list id="list1" type="inline"><item><m:math><m:msup><m:mn>35</m:mn><m:mn>2</m:mn></m:msup></m:math></item>
<item><m:math><m:msup><m:mn>17</m:mn><m:mn>2</m:mn></m:msup></m:math></item>
<item><m:math><m:msup><m:mn>16</m:mn><m:mn>2</m:mn></m:msup></m:math></item>
<item><m:math><m:msup><m:mn>6</m:mn><m:mn>2</m:mn></m:msup></m:math></item>
<item><m:math><m:msup><m:mn>19</m:mn><m:mn>2</m:mn></m:msup></m:math></item>
<item><m:math><m:msup><m:mn>9</m:mn><m:mn>2</m:mn></m:msup></m:math></item>
<item><m:math><m:msup><m:mn>3</m:mn><m:mn>2</m:mn></m:msup></m:math></item>
<item><m:math><m:msup><m:mn>1</m:mn><m:mn>2</m:mn></m:msup></m:math></item>
<item><m:math><m:msup><m:mn>10</m:mn><m:mn>2</m:mn></m:msup></m:math></item>
<item><m:math><m:msup><m:mn>9</m:mn><m:mn>2</m:mn></m:msup></m:math></item>
<item><m:math><m:msup><m:mn>1</m:mn><m:mn>2</m:mn></m:msup></m:math></item>
</list></para>

<para id="element-85"><emphasis> Then, add (sum) all the <m:math><m:mo>|</m:mo><m:mi>y</m:mi><m:mo>-</m:mo><m:mover><m:mi>y</m:mi><m:mo>^</m:mo></m:mover><m:mo>|</m:mo></m:math> squared terms:</emphasis></para><para id="element-883"><m:math>
<m:mover>
<m:mrow>
<m:munder>
<m:mi>Σ</m:mi>
<m:mtext>i = 1</m:mtext>
</m:munder>
</m:mrow>
<m:mn>11</m:mn>
</m:mover>
<m:mo>(</m:mo><m:mo>|</m:mo><m:mi>y</m:mi><m:mo>-</m:mo><m:mover><m:mi>y</m:mi><m:mo>^</m:mo></m:mover><m:mo>|</m:mo><m:msup><m:mo>)</m:mo><m:mn>2</m:mn></m:msup>
<m:mo>=</m:mo>
<m:mover>
<m:mrow>
<m:munder>
<m:mi>Σ</m:mi>
<m:mtext>i = 1</m:mtext>
</m:munder>
</m:mrow>
<m:mn>11</m:mn>
</m:mover><m:msup><m:mi>ε</m:mi><m:mn>2</m:mn></m:msup>
<m:mspace width="20pt"/>
</m:math> (Recall that <m:math><m:mo>|</m:mo><m:msub><m:mi>y</m:mi><m:mi>i</m:mi></m:msub><m:mo>-</m:mo><m:msub><m:mrow><m:mover><m:mi>y</m:mi><m:mo>^</m:mo></m:mover></m:mrow><m:mrow><m:mi>i</m:mi></m:mrow></m:msub><m:mo>|</m:mo><m:mo>=</m:mo><m:msub><m:mi>ε</m:mi><m:mi>i</m:mi></m:msub></m:math>.)</para><para id="element-761"><m:math><m:mo>=</m:mo><m:msup><m:mn>35</m:mn><m:mn>2</m:mn></m:msup>
<m:mo>+</m:mo><m:msup><m:mn>17</m:mn><m:mn>2</m:mn></m:msup>
<m:mo>+</m:mo><m:msup><m:mn>16</m:mn><m:mn>2</m:mn></m:msup>
<m:mo>+</m:mo><m:msup><m:mn>6</m:mn><m:mn>2</m:mn></m:msup>
<m:mo>+</m:mo><m:msup><m:mn>19</m:mn><m:mn>2</m:mn></m:msup>
<m:mo>+</m:mo><m:msup><m:mn>9</m:mn><m:mn>2</m:mn></m:msup>
<m:mo>+</m:mo><m:msup><m:mn>3</m:mn><m:mn>2</m:mn></m:msup>
<m:mo>+</m:mo><m:msup><m:mn>1</m:mn><m:mn>2</m:mn></m:msup>
<m:mo>+</m:mo><m:msup><m:mn>10</m:mn><m:mn>2</m:mn></m:msup>
<m:mo>+</m:mo><m:msup><m:mn>9</m:mn><m:mn>2</m:mn></m:msup>
<m:mo>+</m:mo><m:msup><m:mn>1</m:mn><m:mn>2</m:mn></m:msup></m:math>
</para><para id="element-435"><m:math><m:mo>=</m:mo><m:mn>2440</m:mn><m:mo>=</m:mo></m:math> <emphasis>SSE</emphasis></para><para id="element-870"><emphasis>Next, calculate <m:math><m:mi>s</m:mi></m:math>, the standard deviation of all the <m:math><m:mo>|</m:mo><m:mi>y</m:mi><m:mo>-</m:mo><m:mover><m:mi>y</m:mi><m:mo>^</m:mo></m:mover><m:mo>|</m:mo><m:mo>=</m:mo><m:mi>ε</m:mi></m:math> values where <m:math><m:mi>n</m:mi></m:math> = the total number of data points.</emphasis> (Calculate the standard deviation of 
<list id="set-list2" type="inline">
<item>35</item>
<item>17</item>
<item>16</item>
<item>6</item>
<item>19</item>
<item>9</item>
<item>3</item>
<item>1</item>
<item>10</item>
<item>9</item>
<item>1</item>
</list>.)</para><para id="element-208"><m:math><m:mi>s</m:mi><m:mo>=</m:mo>
<m:msqrt>
<m:mfrac>
<m:mrow><m:mtext>SSE</m:mtext></m:mrow>
<m:mrow><m:mi>n</m:mi><m:mo>-</m:mo><m:mn>2</m:mn></m:mrow>
</m:mfrac>
</m:msqrt>

</m:math></para>

<para id="element-129853">For the third exam/final exam problem,
<m:math><m:mi>s</m:mi><m:mo>=</m:mo>
<m:msqrt>
<m:mfrac>
<m:mrow><m:mn>2440</m:mn></m:mrow>
<m:mrow><m:mn>11</m:mn><m:mo>-</m:mo><m:mn>2</m:mn></m:mrow>
</m:mfrac>
</m:msqrt>
<m:mo>=</m:mo><m:mn>16.47</m:mn>
</m:math></para><para id="element-664">Next, multiply <m:math><m:mi>s</m:mi></m:math> by <m:math><m:mn>1.9</m:mn></m:math> and get
<m:math><m:mo>(</m:mo><m:mn>1.9</m:mn><m:mo>)</m:mo><m:mo>⋅</m:mo><m:mo>(</m:mo><m:mn>16.47</m:mn><m:mo>)</m:mo><m:mo>=</m:mo><m:mn>31.29</m:mn></m:math>
(the value 31.29 is almost 2 standard deviations away from the mean of the <m:math><m:mo>|</m:mo><m:mi>y</m:mi><m:mo>-</m:mo><m:mover><m:mi>y</m:mi><m:mo>^</m:mo></m:mover><m:mo>|</m:mo></m:math> values.)</para><note>The number <m:math><m:mn>1.9</m:mn><m:mi>s</m:mi></m:math> is equal to <emphasis>1.9 standard deviations</emphasis>. It is a measure that is almost 2 standard deviations. If we were to measure the vertical distance from any data point to the corresponding point on the line of best fit and that distance was equal to <m:math><m:mn>1.9</m:mn><m:mi>s</m:mi></m:math> or greater, then we would consider the data point to be "too far" from the line of best fit. We would call that point a <emphasis>potential outlier</emphasis>.</note><para id="element-420">For the example, if any of the <m:math><m:mo>|</m:mo><m:mi>y</m:mi><m:mo>-</m:mo><m:mover><m:mi>y</m:mi><m:mo>^</m:mo></m:mover><m:mo>|</m:mo></m:math> values are <emphasis>at least</emphasis> 31.29, the corresponding
<m:math><m:mo>(</m:mo><m:mi>x</m:mi><m:mo>,</m:mo><m:mi>y</m:mi><m:mo>)</m:mo></m:math> point (data point) is a potential outlier.</para><para id="element-227">Mathematically, we say that if <m:math><m:mo>|</m:mo><m:mi>y</m:mi><m:mo>-</m:mo><m:mover><m:mi>y</m:mi><m:mo>^</m:mo></m:mover><m:mo>|</m:mo><m:mo>≥</m:mo><m:mo>(</m:mo><m:mn>1.9</m:mn><m:mo>)</m:mo><m:mo>⋅</m:mo><m:mo>(</m:mo><m:mi>s</m:mi><m:mo>)</m:mo></m:math>, then the corresponding point is an outlier.</para><para id="element-682">For the third exam/final exam problem, all the <m:math><m:mo>|</m:mo><m:mi>y</m:mi><m:mo>-</m:mo><m:mover><m:mi>y</m:mi><m:mo>^</m:mo></m:mover><m:mo>|</m:mo></m:math>'s are less than 31.29 except for the
first one which is 35.</para><para id="element-11"><m:math><m:mn>35</m:mn><m:mo>&gt;</m:mo><m:mn>31.29</m:mn><m:mspace width="20pt"/></m:math>
That is, <m:math><m:mo>|</m:mo><m:mi>y</m:mi><m:mo>-</m:mo><m:mover><m:mi>y</m:mi><m:mo>^</m:mo></m:mover><m:mo>|</m:mo><m:mo>≥</m:mo><m:mo>(</m:mo><m:mn>1.9</m:mn><m:mo>)</m:mo><m:mo>⋅</m:mo><m:mo>(</m:mo><m:mi>s</m:mi><m:mo>)</m:mo></m:math></para><para id="element-860">The point which corresponds to <m:math><m:mo>|</m:mo><m:mi>y</m:mi><m:mo>-</m:mo><m:mover><m:mi>y</m:mi><m:mo>^</m:mo></m:mover><m:mo>|</m:mo><m:mo>=</m:mo><m:mn>35 </m:mn></m:math> is <m:math><m:mo>(</m:mo><m:mn>65</m:mn><m:mo>,</m:mo><m:mn>175</m:mn><m:mo>)</m:mo></m:math>. <emphasis>Therefore, the point
<m:math><m:mo>(</m:mo><m:mn>65</m:mn><m:mo>,</m:mo><m:mn>175</m:mn><m:mo>)</m:mo></m:math> is an outlier.</emphasis> For this example, we will delete it. (Remember, we do not always
delete an outlier.) The next step is to compute a new best-fit line using the 10 remaining
points. The new line of best fit and the correlation coefficient are:</para><para id="element-669"><m:math>
<m:mover><m:mi>y</m:mi><m:mo>^</m:mo></m:mover>
<m:mo>=</m:mo><m:mn>-355.19</m:mn><m:mo>+</m:mo><m:mtext>7.39x</m:mtext>
</m:math> and
<m:math>
<m:mi>r</m:mi><m:mo>=</m:mo><m:mn>0.9121</m:mn>
</m:math></para><para id="element-101">If you compare <m:math><m:mi>r</m:mi><m:mo>=</m:mo><m:mn>0.9121</m:mn></m:math> to its critical value 0.632, <m:math><m:mn>0.9121</m:mn><m:mo>&gt;</m:mo><m:mn>0.632</m:mn></m:math>. Therefore, <m:math><m:mi>r</m:mi></m:math>
is significant. In fact, <m:math><m:mi>r</m:mi><m:mo>=</m:mo><m:mn>0.9121</m:mn></m:math> is a better <m:math><m:mi>r</m:mi></m:math> than the original (0.6631) because <m:math><m:mi>r</m:mi><m:mo>=</m:mo><m:mn>0.9121</m:mn></m:math> is closer to 1. This means that the 10 points fit the line better. The line can better
predict the final exam score given the third exam score.</para>
</solution>
</exercise></example><example id="element-824"><exercise id="element-12623">
<?solution_in_back?>
<problem>

<para id="element-802">
Using the new line of best fit (calculated with 10 points), what would a
student who receives a 73 on the third exam expect to receive on the final exam?
</para>
</problem>

<solution>
 <para id="element-12623s">
  184.28
 </para>
</solution>
</exercise>
</example><example id="element-986"><para id="element-719">(<cite>From The Consumer Price Indexes Web site</cite>) The Consumer Price Index
(CPI) measures the average change over time in the prices paid by urban consumers for
consumer goods and services. The CPI affects nearly all Americans because of the many ways
it is used. One of its biggest uses is as a measure of inflation. By providing information about
price changes in the Nation's economy to government, business, and labor, the CPI helps them
to make economic decisions. The President, Congress, and the Federal Reserve Board use the
CPI's trends to formulate monetary and fiscal policies.
</para>

<table id="element-127">
<name>Data:</name>
<tgroup cols="2"><thead>
  <row>
    <entry align="center"><m:math><m:mi>x</m:mi></m:math></entry>
    <entry align="center"><m:math><m:mi>y</m:mi></m:math></entry>
  </row>
</thead>
<tbody>
  <row>
    <entry>1915  </entry>
    <entry>10.1  </entry>
  </row>
  <row>
    <entry>1926</entry>
    <entry>17.7</entry>
  </row>
  <row>
    <entry>1935</entry>
    <entry>13.7</entry>
  </row>
  <row>
    <entry>1940</entry>
    <entry>14.7</entry>
  </row>
  <row>
    <entry>1947</entry>
    <entry>24.1</entry>
  </row>
  <row>
    <entry>1952</entry>
    <entry>26.5</entry>
  </row>
  <row>
    <entry>1964</entry>
    <entry>31.0</entry>
  </row>
  <row>
    <entry>1969</entry>
    <entry>36.7</entry>
  </row>
  <row>
    <entry>1975</entry>
    <entry>49.3</entry>
  </row>
  <row>
    <entry>1979</entry>
    <entry>72.6</entry>
  </row>
  <row>
    <entry>1980</entry>
    <entry>82.4</entry>
  </row>
  <row>
    <entry>1986</entry>
    <entry>109.6</entry>
  </row>
  <row>
    <entry>1991</entry>
    <entry>130.7</entry>
  </row>
  <row>
    <entry>1999</entry>
    <entry>166.6</entry>
  </row>
</tbody>



</tgroup>
</table>


<exercise id="element-603"><problem>

    <list id="list3" type="bulleted">
<item>Make a scatterplot of the data.</item>
<item>Calculate the least squares line. Write the equation in the form <m:math><m:mover><m:mi>y</m:mi><m:mo>^</m:mo></m:mover><m:mo>=</m:mo><m:mi>a</m:mi><m:mo>+</m:mo><m:mtext>bx</m:mtext></m:math>.</item>
<item>Draw the line on the scatterplot.</item>
<item>Find the correlation coefficient. Is it significant?</item>
<item>What is the average CPI for the year 1990?</item>
</list>

</problem>

<solution>
  <list id="element-298" type="bulleted"><item>Scatter plot and line of best fit.</item>
<item><m:math><m:mover><m:mi>y</m:mi><m:mo>^</m:mo></m:mover><m:mo>=</m:mo><m:mn>-3204</m:mn><m:mo>+</m:mo><m:mtext>1.662x</m:mtext></m:math> is the equation of the line of best fit.</item>
<item><m:math><m:mi>r</m:mi><m:mo>=</m:mo><m:mn>0.8694</m:mn></m:math></item>
<item>The number of data points is <m:math><m:mi>n</m:mi><m:mo>=</m:mo><m:mn>14</m:mn></m:math>. Use the 95% Critical Values of the Sample Correlation Coefficient table at the end of Chapter 12. <m:math><m:mi>n</m:mi><m:mo>-</m:mo><m:mn>2</m:mn><m:mo>=</m:mo><m:mn>12</m:mn></m:math>. The
corresponding critical value is 0.532. Since <m:math><m:mn>0.8694</m:mn><m:mo>&gt;</m:mo><m:mn>0.532</m:mn></m:math>, <m:math><m:mi>r</m:mi></m:math> is significant.</item>
<item><m:math><m:mover><m:mi>y</m:mi><m:mo>^</m:mo></m:mover><m:mo>=</m:mo><m:mn>-3204</m:mn><m:mo>+</m:mo><m:mn>1.662</m:mn><m:mo>(</m:mo><m:mn>1990</m:mn><m:mo>)</m:mo><m:mo>=</m:mo><m:mn>103.4</m:mn></m:math> CPI</item></list><figure id="linrgs_out1"><media type="image/png" src="linrgs_out1.png">
<param name="alt" value="Scatter plot and line of best fit of the consumer price index data, on the y-axis, and year data, on the x-axis."/>

<param name="print-width" value="3.5in"/>
</media></figure>
</solution>
</exercise>
</example>   
  </content>
  <glossary>
<definition id="outlier">
    <term>Outlier</term>
    <meaning>
   An observation that does not fit the rest of the data.
    </meaning>
  </definition>
</glossary>
</document>
