<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE document PUBLIC "-//CNX//DTD CNXML 0.5 plus MathML//EN" "http://cnx.rice.edu/cnxml/0.5/DTD/cnxml_mathml.dtd">
<document xmlns="http://cnx.rice.edu/cnxml" xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="m11133">
  
  <name>Sampling Distribution of Pearson's r</name>
  
  <metadata>
  <md:version>2.3</md:version>
  <md:created>2003/04/28</md:created>
  <md:revised>2003/06/19 12:10:47.764 GMT-5</md:revised>
  <md:authorlist>
    <md:author id="dmlane">
      <md:firstname>David</md:firstname>
      
      <md:surname>Lane</md:surname>
      <md:email>lane@rice.edu</md:email>
    </md:author>
  </md:authorlist>

  <md:maintainerlist>
    <md:maintainer id="dmlane">
      <md:firstname>David</md:firstname>
      
      <md:surname>Lane</md:surname>
      <md:email>lane@rice.edu</md:email>
    </md:maintainer>
    <md:maintainer id="liqun">
      <md:firstname>Liqun</md:firstname>
      
      <md:surname>Wang</md:surname>
      <md:email>liqun@rice.edu</md:email>
    </md:maintainer>
  </md:maintainerlist>
  
  <md:keywordlist>
    <md:keyword>sampling distribution</md:keyword>
    <md:keyword>Pearson correlation</md:keyword>
  </md:keywordlist>

  <md:abstract>This module discusses sampling distribution of Pearson's r.</md:abstract>
</metadata>


  <content>
    <para id="para1">
      Assume that the correlation between quantitative and verbal SAT
      scores in a given population is 0.60. In other words, 
      <m:math>
	<m:apply>
	  <m:eq/>
	  <m:ci>r</m:ci>
	  <m:cn>0.60</m:cn>
	</m:apply>
      </m:math>. If 12
      students were sampled randomly, the sample correlation, 
      <m:math><m:ci>r</m:ci></m:math>, would 
      not be exactly equal to 0.60. Naturally different samples of 12
      students would yield different values of
      <m:math><m:ci>r</m:ci></m:math>. The distribution of
      values of <m:math><m:ci>r</m:ci></m:math> after repeated samples
      of 12 students is the sampling distribution of
      <m:math><m:ci>r</m:ci></m:math>. 
    </para>

    <para id="para2">
      The shape of the sampling distribution of
      <m:math><m:ci>r</m:ci></m:math> for the above example is shown
      in <cnxn target="figure1" strength="7"/>. You can see that the
      sampling distribution is not symmetric: It is
      <emphasis>negatively skewed</emphasis>. The reason for the skew
      is that <m:math><m:ci>r</m:ci></m:math> cannot take on values
      greater than 1.0 and therefore the distribution cannot extend as
      far in the positive direction as it can in the negative
      direction. The greater the value of
      <m:math><m:ci>r</m:ci></m:math>, the more pronounced the skew.
    </para>

    <figure id="figure1">
      <media type="image/gif" src="figure1.gif"/>
      <caption>
	Sampling distribution of <m:math><m:ci>r</m:ci></m:math> for 
	<m:math>
	  <m:apply>
	    <m:eq/>
	    <m:ci>N</m:ci>
	    <m:cn>12</m:cn>
	  </m:apply>
	</m:math> and 
	<m:math>
	  <m:apply>
	    <m:eq/>
	    <m:ci>r</m:ci>
	    <m:cn>0.60</m:cn>
	  </m:apply>
	</m:math>.
      </caption>
    </figure>

    <para id="para3">
      <cnxn target="figure2" strength="7"/> shows the sampling
      distribution for 
      <m:math>
	<m:apply>
	  <m:eq/>
	  <m:ci>r</m:ci>
	  <m:cn>0.90</m:cn>
	</m:apply>
      </m:math>. This distribution has a very short positive tail and
      a long negative tail.
    </para>
    
    <figure id="figure2">
      <media type="image/gif" src="figure2.gif"/>
      <caption>
	The sampling distribution of <m:math><m:ci>r</m:ci></m:math>
	for 
	<m:math>
	  <m:apply>
	    <m:eq/>
	    <m:ci>N</m:ci>
	    <m:cn>12</m:cn>
	  </m:apply>
	</m:math> and 
	<m:math>
	  <m:apply>
	    <m:eq/>
	    <m:ci>r</m:ci>
	    <m:cn>0.90</m:cn>
	  </m:apply>
	</m:math>
      </caption>
    </figure>

    <para id="para4">
      Referring back to the SAT example, suppose you wanted to know
      the probability that in a sample of 19 students, the sample
      value of <m:math><m:ci>r</m:ci></m:math> would be 0.75 or
      higher. You might think that all you would 
      need to know to compute this probability is the mean and
      standard error of the sampling distribution of
      <m:math><m:ci>r</m:ci></m:math>. However, since
      the sampling distribution is not normal, you would still not be
      able to solve the problem. Fortunately, the statistician Fisher
      developed a way to transform <m:math><m:ci>r</m:ci></m:math> to
      a variable that is normally distributed with a known standard
      error. The variable is called 
      <m:math>
	<m:ci>
	  <m:msup><m:mi>z</m:mi><m:mi>′</m:mi></m:msup>
	</m:ci>
      </m:math> and the formula for the transformation is given below. 

      <m:math display="block">
	<m:apply>
	  <m:eq/>
	  <m:ci>
	    <m:msup><m:mi>z</m:mi><m:mi>′</m:mi></m:msup>
	  </m:ci>
	  <m:apply>
	    <m:times/>
	    <m:cn>0.5</m:cn>
	    <m:apply>
	      <m:ln/>
	      <m:apply>
		<m:divide/>
		<m:apply>
		  <m:plus/>
		  <m:cn>1</m:cn>
		  <m:ci>r</m:ci>
		</m:apply>
		<m:apply>
		  <m:minus/>
		  <m:cn>1</m:cn>
		  <m:ci>r</m:ci>
		</m:apply>
	      </m:apply>
	    </m:apply>
	  </m:apply>
	</m:apply>
      </m:math>

      The details of the formula are not important here since normally
      you will use either a <link src="r2z.html">table</link> or a
      computer program to do the transformation. What is important is
      that
      <m:math>
	<m:ci>
	  <m:msup><m:mi>z</m:mi><m:mi>′</m:mi></m:msup>
	</m:ci>
      </m:math> is normally distributed and has a standard error of  
      
      <m:math display="block">
	<m:apply>
	  <m:divide/>
	  <m:cn>1</m:cn>
	  <m:apply>
	    <m:root/>
	    <m:apply>
	      <m:minus/>
	      <m:ci>N</m:ci>
	      <m:cn>3</m:cn>
	    </m:apply>
	  </m:apply>
	</m:apply>
      </m:math>
      where <m:math><m:ci>N</m:ci></m:math> is the number of pairs of scores.  
    </para>

    <para id="para5">
      Let's return to the question of determining the probability of
      getting a sample correlation of 0.75 or above in a sample of 12
      from a population with a correlation of 0.60. The first step is
      to convert both 0.60 and 0.75 to 
      <m:math>
	<m:ci>
	  <m:msup><m:mi>z</m:mi><m:mi>′</m:mi></m:msup>
	</m:ci>
      </m:math>s. From <link src="r2z.html">a table</link>, the values
      are 0.6931 and 0.9730 respectively. The standard error of
      <m:math>
	<m:ci>
	  <m:msup><m:mi>z</m:mi><m:mi>′</m:mi></m:msup>
	</m:ci>
      </m:math> for 
      <m:math>
	<m:apply>
	  <m:eq/>
	  <m:ci>N</m:ci>
	  <m:cn>12</m:cn>
	</m:apply>
      </m:math> is 0.333. Therefore the question is reduced to the
      following: given a normal distributing with a mean of 0.6931 and
      a standard deviation of 0.333, what is the probability of
      obtaining a value of 0.9730 or higher? The answer can be found
      directly from the applet <cnxn document="m11328">Calculate Area
      for a given X</cnxn> to be 0.20. Alternatively, you could use
      the formula:
      
      <m:math display="block">
	<m:apply>
	  <m:eq/>
	  <m:ci>Z</m:ci>
	  <m:apply>
	    <m:divide/>
	    <m:apply>
	      <m:minus/>
	      <m:ci>X</m:ci>
	      <m:ci>m</m:ci>
	    </m:apply>
	    <m:ci>s</m:ci>
	  </m:apply>
	  <m:apply>
	    <m:divide/>
	    <m:apply>
	      <m:minus/>
	      <m:cn>0.9730</m:cn>
	      <m:cn>0.6931</m:cn>
	    </m:apply>
	    <m:cn>0.333</m:cn>
	  </m:apply>
	  <m:cn>0.8405</m:cn>
	</m:apply>
      </m:math>
      and use a table to find that the area above 0.8405 is 0.20. 
    </para>

  </content>  
</document>
