<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE document PUBLIC "-//CNX//DTD CNXML 0.5 plus MathML//EN" "http://cnx.rice.edu/cnxml/0.5/DTD/cnxml_mathml.dtd">
<document xmlns="http://cnx.rice.edu/cnxml" xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="m10805">

  <name>Percentiles</name>

  <metadata>
  <md:version>2.9</md:version>
  <md:created>2002/08/13</md:created>
  <md:revised>2008/04/20 14:57:21.844 GMT-5</md:revised>
  <md:authorlist>
      <md:author id="dmlane">
      <md:firstname>David</md:firstname>
      
      <md:surname>Lane</md:surname>
      <md:email>lane@rice.edu</md:email>
    </md:author>
  </md:authorlist>

  <md:maintainerlist>
    <md:maintainer id="dmlane">
      <md:firstname>David</md:firstname>
      
      <md:surname>Lane</md:surname>
      <md:email>lane@rice.edu</md:email>
    </md:maintainer>
    <md:maintainer id="meyer">
      <md:firstname>Eileen</md:firstname>
      
      <md:surname>Meyer</md:surname>
      <md:email>meyer@rice.edu</md:email>
    </md:maintainer>
  </md:maintainerlist>
  
  <md:keywordlist>
    <md:keyword>percentile</md:keyword>
  </md:keywordlist>

  <md:abstract>This module gives the definition of percentile along with several examples</md:abstract>
</metadata>

  <content>
    <para id="p1">
      A test score in and of itself is usually difficult to
      interpret.  For example, if you learned that your score
      on a measure of shyness were 35 out of a possible 50,
      you would have little idea how shy you are compared to
      other people.  More relevant is the
      <emphasis>percentage</emphasis> of people with lower shyness
      scores than yours.  
      <definition id="percentdef">
	<term>Percentile</term>
	<meaning>
	  The percentage with lower shyness scores than yours.
	</meaning>
	<example id="perxpl">
	  <para id="perp1">
	    If 65% of the scores were below yours, then your
	    score would be the 65th percentile.
	  </para>
	</example>
      </definition>
    </para>

    <section id="sec1">
      <name>Two Simple Definitions of Percentile</name>
      <para id="s1p1">There is no universally accepted definition of a percentile.
	Using the 65th percentile as an example, the 65th percentile
	can be defined as the lowest score that is greater than 65% of
	the scores.  This is the way we defined it above and we will
	call this "Definition 1".  The 65th percentile can also be
	defined as the smallest score that is greater than or equal to
	65% of the scores.  This we will call "Definition 2".
	Unfortunately, these two definitions can lead to dramatically
	different results, especially when there is relatively little
	data.  Moreover, neither of these definitions is explicit
	about how to handle rounding.  For instance, what score is
	required to be higher than 65% of the scores when the total
	number of scores is 50?  This is tricky because 65% of 50 is
	32.5.  How do we find the lowest number that is less than
	32.5% of the scores?  A third way to compute percentiles
	(presented below), is a weighted average of the percentiles
	computed according to the first two definitions.  This third
	definition handles rounding more gracefully than the other two
	and has the advantage that it allows the <link src="http://psych.rice.edu/online_stat/glossary/median.html">median</link>
	(discussed <cnxn document="m11165" strength="8">later</cnxn>)<!-- module not yet made--> to be
	defined conveniently as the 50th percentile.
      </para>
    </section>

    <section id="sec2">
      <name>A Third Definition</name>
      <para id="s2p1">
	Unless otherwise specified, when we refer to "percentile", we
	will be referring to this third definition of
	percentiles.  Let's begin with an example. 
      </para>
      <example id="example1">
	<para id="xpl1">
	  Consider the 25th percentile for the 8 numbers in the <cnxn target="testScore" strength="9">table</cnxn>.  Notice the
	  numbers are given ranks ranging from 1 for the lowest number
	  to 8 for the highest number.
	</para>
	<table frame="all" id="testScore">
	  <name>Test Scores</name>
	  <tgroup cols="2" align="center" colsep="1" rowsep="1">
	    <thead>
	      <row>
		<entry align="center">Number</entry>
		<entry align="center">Rank</entry>
	      </row>
	    </thead>
	    <tbody>
	      <row>
		<entry align="right">3</entry>
		<entry align="right">1</entry>
	      </row>
	      <row>
		<entry align="right">5</entry>
		<entry align="right">2</entry>
	      </row>	
	      <row>
		<entry align="right">7</entry>
		<entry align="right">3</entry>
	      </row>	
	      <row>
		<entry align="right">8</entry>
		<entry align="right">4</entry>
	      </row>	
	      <row>
		<entry align="right">9</entry>
		<entry align="right">5</entry>
	      </row>	
	      <row>
		<entry align="right">11</entry>
		<entry align="right">6</entry>
	      </row>	
	      <row>
		<entry align="right">13</entry>
		<entry align="right">7</entry>
	      </row>	
	      <row>
		<entry align="right">15</entry>
		<entry align="right">8</entry>
	      </row>	
	    </tbody>
	  </tgroup>
	</table>

	<para id="s2p2">The first step is to compute the rank
	  (<m:math><m:ci>R</m:ci></m:math>) of the 25th percentile.
	  This is done using the following formula:
	  <m:math display="block">
	    <m:apply>
	      <m:eq/>
	      <m:ci>R</m:ci>
	      <m:apply>
		<m:times/>
		<m:apply>
		  <m:divide/>
		  <m:ci>P</m:ci>
		  <m:cn>100</m:cn>
		</m:apply>
		<m:apply>
		  <m:plus/>
		  <m:ci>N</m:ci>
		  <m:cn>1</m:cn>
		</m:apply>
	      </m:apply>
	    </m:apply>
	  </m:math>
	  where <m:math><m:ci>P</m:ci></m:math> is the desired
	  percentile (<m:math><m:cn>25</m:cn></m:math> in this case)
	  and <m:math><m:ci>N</m:ci></m:math> is the number of numbers
	  (<m:math><m:cn>8</m:cn></m:math> in this case).  Therefore,
	  <m:math display="block">
	    <m:apply>
	      <m:eq/>
	      <m:ci>R</m:ci>
	      <m:apply>
		<m:times/>
		<m:apply>
		  <m:divide/>
		  <m:cn>25</m:cn>
		  <m:cn>100</m:cn>
		</m:apply>
		<m:apply>
		  <m:plus/>
		  <m:cn>8</m:cn>
		  <m:cn>1</m:cn>
		</m:apply>
	      </m:apply>
	      <m:apply>
		<m:divide/>
		<m:cn>9</m:cn>
		<m:cn>4</m:cn>
	      </m:apply>
	      <m:cn type="real">2.25</m:cn>
	    </m:apply>
	  </m:math>
	  If <m:math><m:ci>R</m:ci></m:math> were an integer, the
	  <m:math><m:ci>P</m:ci></m:math>the percentile would be the
	  number with rank <m:math><m:ci>R</m:ci></m:math>.  When
	  <m:math><m:ci>R</m:ci></m:math> is not an integer, we
	  compute the <m:math><m:ci>P</m:ci></m:math>th percentile by
	  interpolation as follows:
	  
	  <list id="list1" type="enumerated"><item>Define 
	      <m:math>
		<m:ci>IR</m:ci>
	      </m:math> as the integer portion of 
	      <m:math>
		<m:ci>R</m:ci>
	      </m:math> (the number to the left 
	      of the decimal point).  For this example, 
	      <m:math>
		<m:apply>
		  <m:eq/>
		  <m:ci>IR</m:ci>
		  <m:cn>2</m:cn>
		</m:apply>
	      </m:math>
	    </item>              
	    <item>Define 
	      <m:math>
		<m:ci>FR</m:ci>
	      </m:math> as the fractional portion of 
	      <m:math>
		<m:ci>R</m:ci>
	      </m:math>.  For this example, 
	      <m:math>
		<m:apply>
		  <m:eq/>
		  <m:ci>FR</m:ci>
		  <m:cn type="real">0.25</m:cn>
		</m:apply>
	      </m:math>
	    </item>
	    <item>Find the scores with Rank 
	      <m:math>
		<m:ci>
		  <m:msub>
		    <m:mi>I</m:mi>
		    <m:mi>R</m:mi>
		  </m:msub>
		</m:ci>
	      </m:math> and with Rank
	      <m:math>
		<m:apply>
		  <m:plus/>
		  <m:ci>
		    <m:msub>
		      <m:mi>I</m:mi>
		      <m:mi>R</m:mi>
		    </m:msub>
		  </m:ci>
		  <m:cn>1</m:cn>
		</m:apply>
	      </m:math>
	      For this example, this means the score with Rank 2 and the
	      score with Rank 3.  The scores are 5 and 7.
	    </item>
	    <item>Interpolate by multiplying the difference between the
	      scores by 
	      <m:math>
		<m:ci>
		  <m:msub>
		    <m:mi>F</m:mi>
		    <m:mi>R</m:mi>
		  </m:msub>
		</m:ci>
	      </m:math> and add the result to the lower score.  For
	      these data, this is 
	      <m:math>
		<m:apply>
		  <m:eq/>
		  <m:apply>
		    <m:plus/>
		    <m:apply>
		      <m:times/>
		      <m:cn type="real">0.25</m:cn>
		      <m:apply>
			<m:minus/>
			<m:cn>7</m:cn>
			<m:cn>5</m:cn>
		      </m:apply>
		    </m:apply>
		    <m:cn>5</m:cn>
		  </m:apply>
		  <m:cn type="real">5.5</m:cn>
		</m:apply>
	      </m:math> 
	    </item>
	  </list>
	</para>

	<para id="s2p3">
	  Therefore, the 25th percentile is 5.5.  If we had used the
	  first definition (the smallest score greater than 25% of the
	  scores) the 25th percentile would have been 7.  If we had
	  used the second definition ( the smallest score greater than
	  or equal to 25% of the scores) the 25th percentile would
	  have been 5.
	</para>
      </example>

      <example id="example2">
	<para id="s2p4">
	  For a second example, consider the 20 quiz scores in the
	  <cnxn target="quiz20" strength="9">table</cnxn>.
	</para>
	<table frame="all" id="quiz20">
	  <name>20 Quiz Scores</name>
	  <tgroup cols="2" align="center" colsep="1" rowsep="1">
	    <thead>
	      <row>
		<entry align="center">Score</entry>
		<entry align="center">Rank</entry>
	      </row>
	    </thead>
	    <tbody valign="top">
	      <row>
		<entry align="right">4</entry>
		<entry align="right">1</entry>
	      </row>
	      <row>
		<entry align="right">4</entry>
		<entry align="right">2</entry>
	      </row>	
	      <row>
		<entry align="right">5</entry>
		<entry align="right">3</entry>
	      </row>	
	      <row>
		<entry align="right">5</entry>
		<entry align="right">4</entry>
	      </row>	
	      <row>
		<entry align="right">5</entry>
		<entry align="right">5</entry>
	      </row>	
	      <row>
		<entry align="right">5</entry>
		<entry align="right">6</entry>
	      </row>	
	      <row>
		<entry align="right">6</entry>
		<entry align="right">7</entry>
	      </row>	
	      <row>
		<entry align="right">6</entry>
		<entry align="right">8</entry>
	      </row>	
	      <row>
		<entry align="right">6</entry>
		<entry align="right">9</entry>
	      </row>	
	      <row>
		<entry align="right">7</entry>
		<entry align="right">10</entry>
	      </row>	
	      <row>
		<entry align="right">7</entry>
		<entry align="right">11</entry>
	      </row>	
	      <row>
		<entry align="right">7</entry>
		<entry align="right">12</entry>
	      </row>	
	      <row>
		<entry align="right">8</entry>
		<entry align="right">13</entry>
	      </row>	
	      <row>
		<entry align="right">8</entry>
		<entry align="right">14</entry>
	      </row>	
	      <row>
		<entry align="right">9</entry>
		<entry align="right">15</entry>
	      </row>	
	      <row>
		<entry align="right">9</entry>
		<entry align="right">16</entry>
	      </row>	
	      <row>
		<entry align="right">9</entry>
		<entry align="right">17</entry>
	      </row>	
	      <row>
		<entry align="right">10</entry>
		<entry align="right">18</entry>
	      </row>	
	      <row>
		<entry align="right">10</entry>
		<entry align="right">19</entry>
	      </row>	
	      <row>
		<entry align="right">10</entry>
		<entry align="right">20</entry>
	      </row>	
	    </tbody>
	  </tgroup>
	</table>
	<para id="s2p5">
	  We will compute the 25th and the 85th percentiles.  For the
	  25th,
	  <m:math display="block">
	    <m:apply>
	      <m:eq/>
	      <m:ci>R</m:ci>
	      <m:apply>
		<m:times/>
		<m:apply>
		  <m:divide/>
		  <m:cn>25</m:cn>
		  <m:cn>100</m:cn>
		</m:apply>
		<m:apply>
		  <m:plus/>
		  <m:cn>20</m:cn>
		  <m:cn>1</m:cn>
		</m:apply>
	      </m:apply>
	      <m:apply>
		<m:divide/>
		<m:cn>21</m:cn>
		<m:cn>4</m:cn>
	      </m:apply>
	      <m:cn type="real">5.25</m:cn>
	    </m:apply>
	  </m:math>

	  <m:math display="block">
	    <m:apply>
	      <m:eq/>
	      <m:ci>IR</m:ci>
	      <m:cn>5</m:cn>
	    </m:apply>
	  </m:math>

	  <m:math display="block">
	    <m:apply>
	      <m:eq/>
	      <m:ci>FR</m:ci>
	      <m:cn type="real">0.25</m:cn>
	    </m:apply>
	  </m:math>
	  Since the score with a rank of
	  <m:math><m:ci>IR</m:ci></m:math> (which is 5) and the score
	  with a rank of
	  <m:math>
	    <m:apply>
	      <m:plus/>
	      <m:ci>IR</m:ci>
	      <m:cn>1</m:cn>
	    </m:apply>
	  </m:math> (which is 6) are both equal to 5, the 25th
	  percentile is 5.  In terms of the formula:
	</para>
	<para id="s2p6">
	  The 25th percentile equals
	  <m:math display="block">
	    <m:apply>
	      <m:eq/>
	      <m:apply>
		<m:plus/>
		<m:apply>
		  <m:times/>
		  <m:cn type="real">0.25</m:cn>
		  <m:apply>
		    <m:minus/>
		    <m:cn>5</m:cn>
		    <m:cn>5</m:cn>
		  </m:apply>
		</m:apply>
		<m:cn>5</m:cn>
	      </m:apply>
	      <m:cn>5</m:cn>
	    </m:apply>
	  </m:math>
	  For the 85th percentile,
	  <m:math display="block">
	    <m:apply>
	      <m:eq/>
	      <m:ci>R</m:ci>
	      <m:apply>
		<m:times/>
		<m:apply>
		  <m:divide/>
		  <m:cn>85</m:cn>
		  <m:cn>100</m:cn>
		</m:apply>
		<m:apply>
		  <m:plus/>
		  <m:cn>20</m:cn>
		  <m:cn>1</m:cn>
		</m:apply>
	      </m:apply>
	      <m:cn type="real">17.85</m:cn>
	    </m:apply>
	  </m:math>

	  <m:math display="block">
	    <m:apply>
	      <m:eq/>
	      <m:ci>IR</m:ci>
	      <m:cn>17</m:cn>
	    </m:apply>
	  </m:math>
	  <m:math display="block">
	    <m:apply>
	      <m:eq/>
	      <m:ci>FR</m:ci>
	      <m:cn type="real">0.85</m:cn>
	    </m:apply>
	  </m:math>
	  <note type="caution">
	    <m:math>
	      <m:ci>FR</m:ci>
	    </m:math> does not generally equal the percentile to be
	    computed as it does here.
	  </note>
	  The score with a rank of 17 is 9 and the score with a rank of
	  18 is 10.  Therefore, the 85th percentile is:
	  <m:math display="block">
	    <m:apply>
	      <m:eq/>
	      <m:apply>
		<m:plus/>
		<m:apply>
		  <m:times/>
		  <m:cn type="real">0.85</m:cn>
		  <m:apply>
		    <m:minus/>
		    <m:cn>10</m:cn>
		    <m:cn>9</m:cn>
		  </m:apply>
		</m:apply>
		<m:cn>9</m:cn>
	      </m:apply>
	      <m:cn type="real">9.85</m:cn>
	    </m:apply>
	  </m:math>
	  Let's consider the 50th percentile of the numbers 2, 3, 5, 9.
	  <m:math display="block">
	    <m:apply>
	      <m:eq/>
	      <m:ci>R</m:ci>
	      <m:apply>
		<m:times/>
		<m:apply>
		  <m:divide/>
		    <m:cn>50</m:cn>
		    <m:cn>100</m:cn>
		</m:apply>
		<m:apply>
		  <m:plus/>
		  <m:cn>4</m:cn>
		  <m:cn>1</m:cn>
		</m:apply>
	      </m:apply>
	      <m:cn type="real">2.5</m:cn>
	    </m:apply>
	  </m:math>
	  <m:math display="block">
	    <m:apply>
	      <m:eq/>
	      <m:ci>IR</m:ci>
	      <m:cn>2</m:cn>
	    </m:apply>
	  </m:math>
	  <m:math display="block">
	    <m:apply>
	      <m:eq/>
	      <m:ci>FR</m:ci>
	      <m:cn type="real">0.5</m:cn>
	    </m:apply>
	  </m:math>
	  The score with a rank of 
	  <m:math>
	    <m:ci>IR</m:ci>
	  </m:math> is 3 and the score with a rank of 
	  <m:math>
	    <m:apply>
	      <m:plus/>
	      <m:ci>IR</m:ci>
	      <m:cn>1</m:cn>
	    </m:apply>
	  </m:math> is 5.  Therefore, the 50th percentile is:
	  <m:math display="block">
	    <m:apply>
	      <m:eq/>
	      <m:apply>
		<m:plus/>
		<m:apply>
		  <m:times/>
		  <m:cn type="real">0.5</m:cn>
		  <m:apply>
		    <m:minus/>
		    <m:cn>5</m:cn>
		    <m:cn>3</m:cn>
		  </m:apply>
		</m:apply>
		<m:cn>3</m:cn>
	      </m:apply>
	      <m:cn>4</m:cn>
	    </m:apply>
	  </m:math>
	</para>
      </example>
      <example id="example3">
	<para id="parexpl3">
	  Finally, consider the 50th percentile of the numbers 2, 3, 5,
	  9, 11.
	  <m:math display="block">
	    <m:apply>
	      <m:eq/>
	      <m:ci>R</m:ci>
	      <m:apply>
		<m:times/>
		<m:apply>
		  <m:divide/>
		  <m:cn>50</m:cn>
		  <m:cn>100</m:cn>
		</m:apply>
		<m:apply>
		  <m:plus/>
		  <m:cn>5</m:cn>
		  <m:cn>1</m:cn>
		</m:apply>
	      </m:apply>
	      <m:cn>3</m:cn>
	    </m:apply>
	  </m:math>
	  <m:math display="block">
	    <m:apply>
	      <m:eq/>
	      <m:ci>IR</m:ci>
	      <m:cn>3</m:cn>
	    </m:apply>
	  </m:math>
	  <m:math display="block">
	    <m:apply>
	      <m:eq/>
	      <m:ci>FR</m:ci>
	      <m:cn>0</m:cn>
	    </m:apply>
	  </m:math>
	  Whenever 
	  <m:math>
	    <m:apply>
	      <m:eq/>
	      <m:ci>FR</m:ci>
	      <m:cn>0</m:cn>
	    </m:apply>
	  </m:math>, you simply find the number with rank 
	  <m:math>
	    <m:ci>IR</m:ci>
	  </m:math>.  In this case, the third number is equal to 5, so
	  the 50th percentile is 5.  You will also get the right answer
	  if you apply the general formula:  
	</para>
	<para id="s2p8">
	  The 50th percentile equals
	  <m:math display="block">
	    <m:apply>
	      <m:eq/>
	      <m:apply>
		<m:plus/>
		<m:apply>
		  <m:times/>
		  <m:cn type="real">0.00</m:cn>
		  <m:apply>
		    <m:minus/>
		    <m:cn>9</m:cn>
		    <m:cn>5</m:cn>
		  </m:apply>
		</m:apply>
		<m:cn>5</m:cn>
	      </m:apply>
	      <m:cn>5</m:cn>
	    </m:apply>
	  </m:math>
	</para>
      </example>
    </section>

  </content>  
</document>
