<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE document PUBLIC "-//CNX//DTD CNXML 0.5 plus MathML//EN" "http://cnx.rice.edu/technology/cnxml/schema/dtd/0.5/cnxml_mathml.dtd">
<document xmlns="http://cnx.rice.edu/cnxml" xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:bib="http://bibtexml.sf.net/" xmlns:m="http://www.w3.org/1998/Math/MathML" id="new">
  <name>Linear Regression and Correlation: The Correlation Coefficient</name>
  <metadata>
  <md:version>1.5</md:version>
  <md:created>2008/06/23 15:59:41 GMT-5</md:created>
  <md:revised>2008/10/27 18:13:32.044 GMT-5</md:revised>
  <md:authorlist>
      <md:author id="sdean">
      <md:firstname>Susan</md:firstname>
      
      <md:surname>Dean</md:surname>
      <md:email>deansusan@deanza.edu</md:email>
    </md:author>
      <md:author id="billowsky">
      <md:firstname>Barbara</md:firstname>
      
      <md:surname>Illowsky</md:surname>
      <md:email>illowskybarbara@deanza.edu</md:email>
    </md:author>
  </md:authorlist>

  <md:maintainerlist>
    <md:maintainer id="sdean">
      <md:firstname>Susan</md:firstname>
      
      <md:surname>Dean</md:surname>
      <md:email>deansusan@deanza.edu</md:email>
    </md:maintainer>
    <md:maintainer id="billowsky">
      <md:firstname>Barbara</md:firstname>
      
      <md:surname>Illowsky</md:surname>
      <md:email>illowskybarbara@deanza.edu</md:email>
    </md:maintainer>
    <md:maintainer id="cnxorg">
      <md:firstname/>
      
      <md:surname>Connexions</md:surname>
      <md:email>cnx@cnx.org</md:email>
    </md:maintainer>
  </md:maintainerlist>
  
  <md:keywordlist>
    <md:keyword>elementary</md:keyword>
    <md:keyword>statistics</md:keyword>
  </md:keywordlist>

  <md:abstract>This module provides an overview of Linear Regression and Correlation: The Correlation Coefficient as a part of Collaborative Statistics collection (col10522) by Barbara Illowsky and Susan Dean.</md:abstract>
</metadata>
  <content>
    <para id="delete_me">Besides looking at the scatter plot and seeing that a line seems reasonable, how can you
tell if the line is a good predictor? Use the correlation coefficient as another indicator
(besides the scatterplot) of the strength of the relationship between <m:math><m:mi>x</m:mi></m:math> and <m:math><m:mi>y</m:mi></m:math>. The
correlation coefficient, <m:math><m:mi>r</m:mi></m:math>, is defined as:

</para><para id="element-48"><m:math>
<m:mi>r</m:mi>
<m:mo>=</m:mo>
<m:mfrac>
<m:mrow>
<m:mi>n</m:mi>
<m:mo>⋅</m:mo>
<m:mi>Σ</m:mi>
<m:mi>x</m:mi>
<m:mo>⋅</m:mo>
<m:mi>y</m:mi>
<m:mo>-</m:mo>
<m:mo>(</m:mo>
<m:mi>Σ</m:mi>
<m:mi>x</m:mi>
<m:mo>)</m:mo>
<m:mo>⋅</m:mo>
<m:mo>(</m:mo>
<m:mi>Σ</m:mi>
<m:mi>y</m:mi>
<m:mo>)</m:mo>
</m:mrow>
<m:mrow>
<m:msqrt>
<m:mo>[</m:mo>
<m:mi>n</m:mi>
<m:mo>⋅</m:mo>
<m:mi>Σ</m:mi>
<m:msup>
<m:mi>x</m:mi>
<m:mn>2</m:mn>
</m:msup>
<m:mo>-</m:mo>
<m:mo>(</m:mo>
<m:mi>Σ</m:mi>
<m:mi>x</m:mi>
<m:msup>
<m:mo>)</m:mo>
<m:mn>2</m:mn>
</m:msup>
<m:mo>]</m:mo>
<m:mo>⋅</m:mo>
<m:mo>[</m:mo>
<m:mi>n</m:mi>
<m:mo>⋅</m:mo>
<m:mi>Σ</m:mi>
<m:msup>
<m:mi>y</m:mi>
<m:mn>2</m:mn>
</m:msup>
<m:mo>-</m:mo>
<m:mo>(</m:mo>
<m:mi>Σ</m:mi>
<m:mi>y</m:mi>
<m:msup>
<m:mo>)</m:mo>
<m:mn>2</m:mn>
</m:msup>
<m:mo>]</m:mo>
</m:msqrt>
</m:mrow>
</m:mfrac>
</m:math>
</para><para id="element-363">where: <list id="element-12351"><item><m:math><m:mn>-1</m:mn><m:mo>≤</m:mo><m:mi>r</m:mi><m:mo>≤</m:mo><m:mn>1</m:mn></m:math></item><item>
<m:math><m:mi>n</m:mi></m:math> = the number of data points</item></list>
</para><para id="element-447">If you suspect a linear relationship between <m:math><m:mi>x</m:mi></m:math> and <m:math><m:mi>y</m:mi></m:math>, then <m:math><m:mi>r</m:mi></m:math> can measure how strong it is.</para><para id="element-262">If <m:math><m:mi>r</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:math>, there is perfect positive correlation. If <m:math><m:mi>r</m:mi><m:mo>=</m:mo><m:mn>-1</m:mn></m:math>, there is perfect negative
correlation. In both these cases, the original data points lie on a straight line. Of course,
in the real world, this will not generally happen.</para><para id="element-996">The formula for <m:math><m:mi>r</m:mi></m:math> looks formidable. However, many calculators and any regression and
correlation computer program can calculate <m:math><m:mi>r</m:mi></m:math>. The sign of <m:math><m:mi>r</m:mi></m:math> is the same as the slope, <m:math><m:mi>b</m:mi></m:math>,
of the best fit line.</para>   
  </content>
  <glossary>
<definition id="coeffcorr">
    <term>Coefficient of Correlation</term>
    <meaning>
A measure developed by Karl Pearson (early 1900s) that gives the strength of association between the independent variable and the dependent variable. The formula is:
    <equation id="id5499555">
      <m:math>
        <m:semantics>
          <m:mrow>
            <m:mstyle fontsize="12pt">
              <m:mrow>
                <m:mrow>
                  <m:mrow>
                    <m:mi>r</m:mi>
                    <m:mo stretchy="false">=</m:mo>
                    <m:mfrac>
                      <m:mrow>
                        <m:mi>n</m:mi>
                        <m:mrow>
                          <m:mrow>
                            <m:mo stretchy="false">∑</m:mo>
                            <m:mstyle fontstyle="italic">
                              <m:mrow>
                                <m:mtext>XY</m:mtext>
                              </m:mrow>
                            </m:mstyle>
                          </m:mrow>
                          <m:mo stretchy="false">−</m:mo>
                          <m:mo stretchy="false">(</m:mo>
                        </m:mrow>
                        <m:mrow>
                          <m:mo stretchy="false">∑</m:mo>
                          <m:mrow>
                            <m:mi>X</m:mi>
                            <m:mo stretchy="false">)</m:mo>
                            <m:mo stretchy="false">(</m:mo>
                            <m:mrow>
                              <m:mo stretchy="false">∑</m:mo>
                              <m:mrow>
                                <m:mi>Y</m:mi>
                                <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                            </m:mrow>
                          </m:mrow>
                        </m:mrow>
                      </m:mrow>
                      <m:msqrt>
                        <m:mrow>
                          <m:mo stretchy="false">[</m:mo>
                          <m:mi>n</m:mi>
                          <m:mrow>
                            <m:mo stretchy="false">∑</m:mo>
                            <m:mrow>
                              <m:mrow>
                                <m:msup>
                                  <m:mi>X</m:mi>
                                  <m:mstyle fontsize="8pt">
                                    <m:mrow>
                                      <m:mn>2</m:mn>
                                    </m:mrow>
                                  </m:mstyle>
                                </m:msup>
                                <m:mo stretchy="false">−</m:mo>
                                <m:mo stretchy="false">(</m:mo>
                              </m:mrow>
                              <m:mrow>
                                <m:mo stretchy="false">∑</m:mo>
                                <m:mrow>
                                  <m:mi>X</m:mi>
                                  <m:msup>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mstyle fontsize="8pt">
                                      <m:mrow>
                                        <m:mn>2</m:mn>
                                      </m:mrow>
                                    </m:mstyle>
                                  </m:msup>
                                  <m:mo stretchy="false">]</m:mo>
                                  <m:mo stretchy="false">[</m:mo>
                                  <m:mi>n</m:mi>
                                  <m:mrow>
                                    <m:mo stretchy="false">∑</m:mo>
                                    <m:mrow>
                                      <m:mrow>
                                        <m:msup>
                                          <m:mi>Y</m:mi>
                                          <m:mstyle fontsize="8pt">
                                            <m:mrow>
                                              <m:mn>2</m:mn>
                                            </m:mrow>
                                          </m:mstyle>
                                        </m:msup>
                                        <m:mo stretchy="false">−</m:mo>
                                        <m:mo stretchy="false">(</m:mo>
                                      </m:mrow>
                                      <m:mrow>
                                        <m:mo stretchy="false">∑</m:mo>
                                        <m:mrow>
                                          <m:mi>Y</m:mi>
                                          <m:msup>
                                            <m:mo stretchy="false">)</m:mo>
                                            <m:mstyle fontsize="8pt">
                                              <m:mrow>
                                                <m:mn>2</m:mn>
                                              </m:mrow>
                                            </m:mstyle>
                                          </m:msup>
                                          <m:mo stretchy="false">]</m:mo>
                                        </m:mrow>
                                      </m:mrow>
                                    </m:mrow>
                                  </m:mrow>
                                </m:mrow>
                              </m:mrow>
                            </m:mrow>
                          </m:mrow>
                        </m:mrow>
                      </m:msqrt>
                    </m:mfrac>
                  </m:mrow>
                  <m:mi>,</m:mi>
                </m:mrow>
              </m:mrow>
            </m:mstyle>
            <m:mrow/>
          </m:mrow>
          <m:annotation encoding="StarMath 5.0"> size 12{r= {  {n Sum { ital "XY"}  -  \(  Sum {X \)  \(  Sum {Y \) } } }  over  { sqrt { \[ n Sum {X rSup { size 8{2} }  -  \(  Sum {X \)  rSup { size 8{2} }  \]  \[ n Sum {Y rSup { size 8{2} }  -  \(  Sum {Y \)  rSup { size 8{2} }  \] } } } } } } } ,} {}</m:annotation>
        </m:semantics>
      </m:math>
    </equation>
    where n is the number of data points. 
    The coefficient cannot be more then 1 and less then -1. The closer the coefficient is to 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:mrow><m:mo stretchy="false">±</m:mo><m:mn>1</m:mn></m:mrow></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{ +- 1} {}</m:annotation></m:semantics></m:math>, the stronger the evidence of a significant linear relationship between 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:mi>X</m:mi></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{X} {}</m:annotation></m:semantics></m:math> and 
<m:math><m:semantics><m:mrow><m:mstyle fontsize="12pt"><m:mrow><m:mi>Y</m:mi></m:mrow></m:mstyle><m:mrow/></m:mrow><m:annotation encoding="StarMath 5.0"> size 12{Y} {}</m:annotation></m:semantics></m:math>.
    </meaning>
  </definition>


</glossary>
</document>
