<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE document PUBLIC "-//CNX//DTD CNXML 0.5 plus MathML//EN" "http://cnx.rice.edu/technology/cnxml/schema/dtd/0.5/cnxml_mathml.dtd">
<document xmlns="http://cnx.rice.edu/cnxml" xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:bib="http://bibtexml.sf.net/" xmlns:m="http://www.w3.org/1998/Math/MathML" id="new">
  <name>Linear Regression and Correlation: Facts About the Correlation Coefficient for Linear Regression</name>
  <metadata>
  <md:version>1.5</md:version>
  <md:created>2008/06/23 16:21:17 GMT-5</md:created>
  <md:revised>2008/07/16 12:18:30.861 GMT-5</md:revised>
  <md:authorlist>
      <md:author id="billowsky">
      <md:firstname>Barbara</md:firstname>
      
      <md:surname>Illowsky</md:surname>
      <md:email>illowskybarbara@deanza.edu</md:email>
    </md:author>
      <md:author id="sdean">
      <md:firstname>Susan</md:firstname>
      
      <md:surname>Dean</md:surname>
      <md:email>deansusan@deanza.edu</md:email>
    </md:author>
  </md:authorlist>

  <md:maintainerlist>
    <md:maintainer id="cnxorg">
      <md:firstname/>
      
      <md:surname>Connexions</md:surname>
      <md:email>cnx@cnx.org</md:email>
    </md:maintainer>
  </md:maintainerlist>
  
  <md:keywordlist>
    <md:keyword>elementary</md:keyword>
    <md:keyword>statistics</md:keyword>
  </md:keywordlist>

  <md:abstract>This module provides an overview of Facts About the Correlation Coefficient for Linear Regression as a part of Collaborative Statistics collection (col10522) by Barbara Illowsky and Susan Dean.</md:abstract>
</metadata>
  <content>
    <list id="element-759" type="bulleted"><item>A positive <m:math><m:mi>r</m:mi></m:math> means that when <m:math><m:mi>x</m:mi></m:math> increases, <m:math><m:mi>y</m:mi></m:math> increases and when <m:math><m:mi>x</m:mi></m:math> decreases, <m:math><m:mi>y</m:mi></m:math> decreases <emphasis>(positive correlation)</emphasis>.</item>
<item>A negative <m:math><m:mi>r</m:mi></m:math> means that when <m:math><m:mi>x</m:mi></m:math> increases, <m:math><m:mi>y</m:mi></m:math> decreases and when <m:math><m:mi>x</m:mi></m:math> decreases, <m:math><m:mi>y</m:mi></m:math> increases <emphasis>(negative correlation)</emphasis>.</item>
<item>An <m:math><m:mi>r</m:mi></m:math> of zero means there is absolutely no linear relationship between <m:math><m:mi>x</m:mi></m:math> and <m:math><m:mi>y</m:mi></m:math> <emphasis>(no correlation)</emphasis>.</item>
<item>High correlation does not suggest that <m:math><m:mi>x</m:mi></m:math> causes <m:math><m:mi>y</m:mi></m:math> or <m:math><m:mi>y</m:mi></m:math> causes <m:math><m:mi>x</m:mi></m:math>. We say <emphasis>"correlation does not imply causation."</emphasis> For example, every person who learned
math in the 17th century is dead. However, learning math does not necessarily cause
death!</item></list><para id="element-227"><figure id="linrgs_facts_pics"><subfigure id="linrgs_facts1">
      <name>Positive Correlation</name>
      <media type="image/png" src="linrgs_facts1.png">
  <param name="alt" value="Scatterplot of points ascending from the lower left to the upper right."/>

  <param name="print-width" value="2in"/>
  </media>
  <caption>A scatter plot showing data with a positive correlation.</caption>
  </subfigure>
<subfigure id="linrgs_facts2">
      <name>Negative Correlation</name>
      <media type="image/png" src="linrgs_facts2.png">
  <param name="alt" value="Scatterplot of points descending from the upper left to the lower right."/>
  <param name="print-width" value="2in"/>
  </media>
<caption>A scatter plot showing data with a negative correlation.</caption>
  </subfigure>
<subfigure id="linrgs_facts3">
      <name>Zero Correlation</name>
      <media type="image/png" src="linrgs_facts3.png">
  <param name="alt" value="Scatterplot of points in a horizontal configuration."/>
 
  <param name="print-width" value="2in"/>
  </media>
<caption>A scatter plot showing data with zero correlation.</caption>
  </subfigure></figure></para><para id="element-884"><emphasis>The <cnxn document="m17098">95% Critical Values of the Sample Correlation Coefficient Table</cnxn> at the end of
this chapter (before the <cnxn document="m17081">Summary</cnxn>)</emphasis> may be used to give you a good idea of whether the
computed value of <emphasis><m:math><m:mi>r</m:mi></m:math> is significant or not</emphasis>. Compare <m:math><m:mi>r</m:mi></m:math> to the appropriate critical value in
the table. If <m:math><m:mi>r</m:mi></m:math> is significant, then you may want to use the line for prediction.</para><example id="element-684"><para id="element-798">Suppose you computed <m:math><m:mi>r</m:mi><m:mo>=</m:mo><m:mn>0.801</m:mn></m:math> using <m:math><m:mi>n</m:mi><m:mo>=</m:mo><m:mn>10</m:mn></m:math> data points.
<m:math><m:mtext>df</m:mtext><m:mo>=</m:mo><m:mi>n</m:mi><m:mo>-</m:mo><m:mn>2</m:mn><m:mo>=</m:mo><m:mn>10 </m:mn><m:mo>-</m:mo><m:mn>2</m:mn><m:mo>=</m:mo><m:mn>8</m:mn></m:math>. The critical values associated with <m:math><m:mtext>df</m:mtext><m:mo>=</m:mo><m:mn>8</m:mn></m:math> are -0.632 and
+ 0.632. If <m:math><m:mi>r</m:mi></m:math><m:math><m:reln><m:lt/></m:reln><m:mtext>negative critical value</m:mtext></m:math>  or <m:math><m:mi>r</m:mi><m:mo>&gt;</m:mo><m:mtext>positive critical value</m:mtext></m:math>, then <m:math><m:mi>r</m:mi></m:math> is
significant. Since <m:math><m:mi>r</m:mi><m:mo>=</m:mo><m:mn>0.801</m:mn></m:math> and <m:math><m:mn>0.801</m:mn><m:mo>&gt;</m:mo><m:mn>0.632</m:mn></m:math>, <m:math><m:mi>r</m:mi></m:math> is significant and the line may be used
for prediction. If you view this example on a number line, it will help you.
</para><para id="element-393"><figure><media type="image/png" src="linrgs_facts4.png">
<param name="alt" value="Horizontal number line with values of -1, -0.632, 0, 0.632, 0.801, and 1. A dashed line above values -0.632, 0, and 0.632 indicates not significant values."/>

<param name="print-width" value="3.5in"/>
</media><caption><m:math><m:mi>r</m:mi></m:math> is not significant between -0.632 and +0.632.  <m:math><m:mi>r</m:mi><m:mo>=</m:mo><m:mn>0.801</m:mn><m:mo>&gt;</m:mo><m:mn>+0.632</m:mn></m:math>. Therefore, <m:math><m:mi>r</m:mi></m:math> is significant.</caption></figure></para>
</example><example id="element-358"><para id="element-206">
 Suppose you computed <m:math><m:mi>r</m:mi><m:mo>=</m:mo><m:mn>-0.624</m:mn></m:math> with 14 data points. <m:math><m:mtext>df</m:mtext><m:mo>=</m:mo><m:mn>14</m:mn><m:mo>-</m:mo><m:mn>2</m:mn><m:mo>=</m:mo><m:mn>12</m:mn></m:math>. The critical values are -0.532 and 0.532. Since <m:math><m:mn>-0.624</m:mn></m:math><m:math><m:reln><m:lt/></m:reln></m:math><m:math><m:mn>-0.532</m:mn></m:math>, <m:math><m:mi>r</m:mi></m:math> is significant and
the line may be used for prediction
</para><para id="element-674"><figure><media type="image/png" src="linrgs_facts5.png">
<param name="alt" value="Horizontal number line with values of -0.624, -0.532, and 0.532."/>

<param name="print-width" value="3.5in"/>
</media><caption><m:math><m:mi>r</m:mi><m:mo>=</m:mo><m:mn>-0.624</m:mn></m:math><m:math><m:reln><m:lt/></m:reln></m:math><m:math><m:mn>-0.532</m:mn></m:math>. Therefore, <m:math><m:mi>r</m:mi></m:math> is significant.</caption></figure></para>
</example><example id="element-719"><para id="element-446">
 Suppose you computed <m:math><m:mi>r</m:mi><m:mo>=</m:mo><m:mn>0.776</m:mn></m:math> and <m:math><m:mi>n</m:mi><m:mo>=</m:mo><m:mn>6</m:mn></m:math>. <m:math><m:mtext>df</m:mtext><m:mo>=</m:mo><m:mn>6</m:mn><m:mo>-</m:mo><m:mn>2</m:mn><m:mo>=</m:mo><m:mn>4</m:mn></m:math>. The
critical values are -0.811 and 0.811. Since <m:math><m:mn>-0.811</m:mn></m:math><m:math><m:reln><m:lt/></m:reln></m:math> <m:math><m:mn>0.776</m:mn></m:math> <m:math><m:reln><m:lt/></m:reln></m:math> <m:math><m:mn>0.811</m:mn></m:math>, <m:math><m:mi>r</m:mi></m:math> is not significant
and the line should not be used for prediction.
</para><para id="element-845"><figure id="linrgs_facts5"><media type="image/png" src="linrgs_facts6.png">
<param name="alt" value="Horizontal number line with values -0.924, -0.532, and 0.532."/>

<param name="print-width" value="3.5in"/>
</media><caption><m:math><m:mn>-0.811</m:mn></m:math><m:math><m:reln><m:lt/></m:reln></m:math><m:math><m:mi>r</m:mi><m:mo>=</m:mo><m:mn>0.776</m:mn></m:math><m:math><m:reln><m:lt/></m:reln></m:math><m:math><m:mn>0.811</m:mn></m:math>. Therefore, <m:math><m:mi>r</m:mi></m:math> is not significant.</caption></figure></para>
</example><note>If <m:math><m:mi>r</m:mi></m:math> is -1 or <m:math><m:mi>r</m:mi></m:math> is +1, then all the data points lie exactly on a straight line. If the line is significant, then <emphasis>within the range of the x-values,</emphasis> the line can be used to predict a <m:math><m:mi>y</m:mi></m:math> value. As an illustration, consider the <cnxn document="m17092" target="element-22">third exam/final exam example</cnxn>.
The line of best fit is: <m:math>
<m:mover>
<m:mi>y</m:mi>
<m:mo>^</m:mo>
</m:mover>
<m:mo>=</m:mo>
<m:mn>-173.51</m:mn>
<m:mo>+</m:mo>
<m:mtext>4.83x</m:mtext>
</m:math> with
<m:math>
<m:mi>r</m:mi>
<m:mo>=</m:mo>
<m:mn>0.6631</m:mn>
</m:math>
</note><para id="element-217">Can the line be used for prediction? <emphasis>Given a third exam score (<m:math><m:mi>x</m:mi></m:math> value), can we
successfully predict the final exam score (predicted <m:math><m:mi>y</m:mi></m:math> value).</emphasis> Test <m:math><m:mi>r</m:mi><m:mo>=</m:mo><m:mn>0.6631</m:mn></m:math>
with its appropriate critical value.</para><para id="element-968">Using the table with <m:math><m:mtext>df</m:mtext><m:mo>=</m:mo><m:mn>11</m:mn><m:mo>-</m:mo><m:mn>2</m:mn><m:mo>=</m:mo><m:mn>9</m:mn></m:math>, the critical values are -0.602 and +0.602. Since
<m:math><m:mn>0.6631</m:mn><m:mo>&gt;</m:mo><m:mn>0.602</m:mn></m:math>, <m:math><m:mi>r</m:mi></m:math> is significant. <emphasis>Because <m:math><m:mi>r</m:mi></m:math> is significant and the scatter plot shows a reasonable linear trend, the line can be used to predict final exam scores.</emphasis></para><example id="element-433"><para id="element-294">
Suppose you computed the following correlation coefficients. Using the
table at the end of the chapter, determine if <m:math><m:mi>r</m:mi></m:math> is significant and the line of best fit associated
with each <m:math><m:mi>r</m:mi></m:math> can be used to predict a <m:math><m:mi>y</m:mi></m:math> value. If it helps, draw a number line.
</para><list id="element-467" type="bulleted"><item><m:math><m:mi>r</m:mi><m:mo>=</m:mo><m:mn>-0.567</m:mn></m:math> and the sample size, <m:math><m:mi>n</m:mi></m:math>, is 19. The <m:math><m:mtext>df</m:mtext><m:mo>=</m:mo><m:mi>n</m:mi><m:mo>-</m:mo><m:mn>2</m:mn><m:mo>=</m:mo><m:mn>17</m:mn></m:math>. The critical value is -0.456. <m:math><m:mn>-0.567</m:mn></m:math><m:math><m:reln><m:lt/></m:reln></m:math><m:math><m:mn>-0.456</m:mn></m:math> so <m:math><m:mi>r</m:mi></m:math> is significant.</item>
<item><m:math><m:mi>r</m:mi><m:mo>=</m:mo><m:mn>0.708</m:mn></m:math> and the sample size, <m:math><m:mi>n</m:mi></m:math>, is 9. The <m:math><m:mtext>df</m:mtext><m:mo>=</m:mo><m:mi>n</m:mi><m:mo>-</m:mo><m:mn>2</m:mn><m:mo>=</m:mo><m:mn>7</m:mn></m:math>. The critical value is 0.666. <m:math><m:mn>0.708</m:mn><m:mo>&gt;</m:mo><m:mn>0.666</m:mn></m:math> so <m:math><m:mi>r</m:mi></m:math> is significant.</item>
<item><m:math><m:mi>r</m:mi><m:mo>=</m:mo><m:mn>0.134</m:mn></m:math> and the sample size, <m:math><m:mi>n</m:mi></m:math>, is 14. The <m:math><m:mtext>df</m:mtext><m:mo>=</m:mo><m:mi>14</m:mi><m:mo>-</m:mo><m:mn>2</m:mn><m:mo>=</m:mo><m:mn>12</m:mn></m:math>. The critical value is 0.532. 0.134 is between -0.532 and 0.532 so <m:math><m:mi>r</m:mi></m:math> is not significant.</item>
<item><m:math><m:mi>r</m:mi><m:mo>=</m:mo><m:mn>0</m:mn></m:math> and the sample size, <m:math><m:mi>n</m:mi></m:math>, is 5. No matter what the dfs are, <m:math><m:mi>r</m:mi><m:mo>=</m:mo><m:mn>0</m:mn></m:math> is between the two critical values so <m:math><m:mi>r</m:mi></m:math> is not significant.</item></list>
</example>   
  </content>
  
</document>
