<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE document PUBLIC "-//CNX//DTD CNXML 0.5 plus MathML//EN" "http://cnx.rice.edu/cnxml/0.5/DTD/cnxml_mathml.dtd">
<document xmlns="http://cnx.rice.edu/cnxml" xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="m10185">

  <name xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">Inferential Statistics</name>


  <metadata xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">
  <md:version xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">2.5</md:version>
  <md:created xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">2001/07/11</md:created>
  <md:revised xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">2003/06/20 11:05:27.759 GMT-5</md:revised>
  <md:authorlist xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">
    <md:author xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="dmlane">
      <md:firstname xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">David</md:firstname>
      
      <md:surname xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">Lane</md:surname>
      <md:email xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">lane@rice.edu</md:email>
    </md:author>
  </md:authorlist>

  <md:maintainerlist xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">
    <md:maintainer xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="dmlane">
      <md:firstname xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">David</md:firstname>
      
      <md:surname xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">Lane</md:surname>
      <md:email xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">lane@rice.edu</md:email>
    </md:maintainer>
    <md:maintainer xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="meyer">
      <md:firstname xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">Eileen</md:firstname>
      
      <md:surname xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">Meyer</md:surname>
      <md:email xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">meyer@rice.edu</md:email>
    </md:maintainer>
  </md:maintainerlist>
  
  <md:keywordlist xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">
    <md:keyword xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">inferential statistics</md:keyword>
  </md:keywordlist>

  <md:abstract xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">Introduction to inferential statistics.</md:abstract>
</metadata>

  <content xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">
    <section xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="sec1">
      <name xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">Populations and samples</name>
      <para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="intro">
	In statistics, we often rely on a <term xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">sample</term> <note xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" type="footnote">A sample is a subset of a population, often
	  taken for the purpose of statistical inference.  Generally,
	  one tries uses a random sample.  See also: bias, stratified
	  random sample.</note> - that is, a small subset of a larger
	  set of data - to draw inferences about the larger set.  The
	  larger set is known as the <term xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">population</term>. <note xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" type="footnote">A population is the complete set of
	  observations a researcher is interested in.  Contrast this
	  with a sample which is a subset of a population.  A
	  population can be defined in a manner convenient for a
	  researcher.  For example, one could define a population as
	  all girls in fourth grade in Houston, Texas.  Or, a
	  different population is the set of all girls in fourth grade
	  in the United States.  Inferential statistics are computed
	  from sample data in order to make inferences about the
	  population.</note>
      </para>

      <example xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="ex1">
	<para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="example1">
	  You have been hired by the National Election Commission to
	  examine how the American people feel about the fairness of the
	  voting procedures in the U.S.  How will you do it?  Who will
	  you ask?
	</para>
      </example>

      <para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="sol1">
	It is not practical to ask every single American how he or she
	feels about the fairness of the voting procedures.  Instead,
	we query a relatively small number of Americans, and draw
	inferences about the entire country from their responses.  The
	Americans actually queried constitute our sample of the larger
	population of all Americans.  The mathematical procedures
	whereby we convert information about the sample into
	intelligent guesses about the population fall under the rubric
	of <term xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">inferential statistics</term> <note xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" type="footnote">
	  The branch of statistics concerned with drawing conclusions
	  about a population from a smaller sample.  This is generally
	  done through random sampling, followed by inferences made
	  about central tendency, or any of a number of other aspects of
	  a distribution. </note>.
      </para>

      <para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="sol2">
	A sample is typically a small subset of the population.  In
	the case of voting attitudes, we would sample a few thousand
	Americans, drawn from the hundreds of millions that make up
	the country.  In choosing a sample, it is therefore crucial
	that it be representative.  It must not overrepresent one kind
	of citizen at the expense of others.  For example, something
	would be wrong with our sample if it happened to be made up
	entirely of Florida residents.  (Recall the controversy
	surrounding presidential voting in Florida in 2000.)  If the
	sample held only Floridians, it could not be used to infer the
	attitudes of other Americans.  The same problem would arise if
	the sample were comprised only of Republicans.  Inferential
	statistics are based on the assumption that sampling is
	random.  We trust a random sample to represent different
	segments of society in close to the appropriate proportions
	(provided the sample is large enough; see below).
      </para>

      <example xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="ex2">
	<para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="ex2.1">
	  We are interested in examining how many math classes have
	  been taken on the average by current graduating seniors at
	  American colleges and universities during their four years
	  in school.  Whereas our population in the last example
	  included all US citizens, now it involves just the
	  graduating seniors throughout the country.  This is still a
	  large set since there are thousands of colleges and
	  universities, each enrolling many students. (New York
	  University, for example, enrolls 48,000 students.)  It would
	  be costly to examine the transcript of every college senior.
	  We therefore take a sample of college seniors and then make
	  inferences to the entire population based on what we find.
	  To make the sample, we might first choose some public and
	  private colleges and universities across the United States.
	  Then we might sample 50 students from each of these
	  institutions.  Suppose that the average number of math
	  classes taken by the people in our sample is 3.2.  Then we
	  might speculate that 3.2 approximates the number we would
	  find if we had the resources to examine every senior in the
	  entire population.  But we must be careful about the
	  possibility that our sample is non-representative of the
	  population.  Perhaps we chose an overabundance of math
	  majors, or chose too many technical institutions that have
	  heavy math requirements.  Such bad sampling makes our sample
	  unrepresentative of the population of all seniors. To
	  solidify your understanding of sampling <term xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">bias</term>
	  <note xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" type="footnote">(i) A sampling method is biased if
	  each individual does not have an equal chance of being
	  selected.  A sample of internet users found reading an
	  online statistics book would be a biased sample of all
	  internet users.  It would give a distorted view of what the
	  average internet user is like.  (ii) An estimator is biased
	  if it systematically overestimates of underestmates the
	  parameter it is estimating.  The average squared deviation
	  of sample scores from their mean is a biased estimate of the
	  variance since it tends to underestimate the population
	  variance.</note>, consider the following example.  Try to
	  identify the population and the sample, and then reflect on
	  whether the sample is likely to yield the information
	  desired.
	</para>
      </example>

      <exercise xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="rand1">
	<problem xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">
	  <para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="randprob">
	    A substitute teacher wants to know how students in the
	    class did on their last test.  He asks only the 10
	    students sitting in the front row to report how they did
	    on their last test and he concludes from them that the
	    class did extremely well.  What is the sample?  What is the
	    population?  Can you identify any problems with choosing
	    the sample in the way that the teacher
	    did?</para>
	</problem> 
	<solution xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/"> 
	  <para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="randsol">
	    <list xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="sollist1" type="bulleted">
	      <item xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">
		The population consists of all students in the class.
	      </item>
	      <item xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">
		The sample includes the 10 students sitting in the
		front row.
	      </item> 
	      <item xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">
		The sample is made up of just the 10 students sitting in
		the front row.  The sample is not likely to be
		representative of the population.  Those who sit in the
		front row tend to be more interested in the class and
		tend to perform higher on tests.  Hence, the sample may
		perform at a higher level than the population.
	      </item>
	    </list>
	  </para>
	</solution>
      </exercise>

      <exercise xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="exerc">
	<problem xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">
	  <para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="randprob2">
	    A coach is interested in how many cartwheels the average
	    college freshmen at his university can do.  Eight
	    volunteers from the freshman class step forward.  Aftering
	    observing their performance, the coach concludes that
	    college freshmen can do an average of 16 cartwheels in a
	    row without stopping.
	  </para>
	</problem>
	<solution xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">
	  <para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="randsol2">
	    <list xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="sollist2" type="bulleted">
	      <item xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">
		The population is the freshmen at the coach's
		university.
	      </item>
	      <item xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">
		The sample is poorly chosen because volunteers are
		more likely to be able to do cartwheels than the
		average freshman; people who can't do cartwheels
		probably did not volunteer!
	      </item>
	      <item xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">
		In the example, we are also not told of the gender of
		the volunteers.  Were they all women, for example?  That
		might affect the outcome, contributing to the
		nonrepresentative nature of the sample (if the school is
		co-ed).
	      </item>
	    </list>
	  </para>
	</solution>
      </exercise>
    </section>

    <section xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="sect3">
      <name xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">Simple Random Sampling</name>

      <para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="sect3p1">
	Researchers adopt a variety of sampling strategies.  The most
	straightforward is <term xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">simple random sampling</term> <note xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" type="footnote">The process of selecting a subset of a
	population for the purposes of statistical inference.  Random
	sampling means that every member of the population is equally
	likely to be chosen.  When this rule is violated, the sample
	is said to be biased.  See also: stratified random
	sampling.</note>.  Such sampling requires every member of the
	population to have an equal chance of being selected into the
	sample.  In addition, the selection of one member must be
	independent of the selection of every other member.  That is,
	picking one member from the population must not increase or
	decrease the probability of picking any other member (relative
	to the others).  In this sense, we can say that simple random
	sampling chooses a sample by <emphasis xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">pure chance</emphasis>.
	To check your understanding of simple random sampling,
	consider the following example.  What is the population?  What
	is the sample?  Was the sample picked by simple random
	sampling?  Is it biased?
      </para>

      <exercise xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="simpexer3">
	<problem xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">
	  <para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="randprob3">
	    A research scientist is interested in studying the
	    experiences of twins raised together versus those raised
	    apart . She obtains a list of twins from the National Twin
	    Registry, and selects two subsets of individuals for her
	    study.  First, she chooses all those in the registry whose
	    last name begins with Z.  Then she turns to all those
	    whose last name begins with B.  Because there are so many
	    names that start with B, however, our researcher decides
	    to incorporate only every other name into her sample.
	    Finally, she mails out a survey and compares
	    characteristics of twins raised apart versus together.
	  </para>
	</problem>
	<solution xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">
	  <para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="randsol3">
	    <list xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="solist3" type="bulleted">
	      <item xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">
		The population consists of all twins recorded in the
		National Twin Registry.
	      </item>
	      <item xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">
		It is important that the researcher only make
		statistical generalizations to the twins on this list,
		not to all twins in the nation or world.  That is, the
		National Twin Registry may not be representative of
		all twins.  Even if inferences are limited to the
		Registry, a number of problems affect the sampling
		procedure we described.  For instance, choosing only
		twins whose last names begin with Z does not give
		every individual an equal chance of being selected
		into the sample.  Moreover, such a procedure risks
		over-representing ethnic groups with many surnames
		that begin with Z.  There are other reasons why
		choosing just the Z's may bias the sample.  Perhaps
		such people are more patient than average because they
		often find themselves at the end of the line! The same
		problem occurs with choosing twins whose last name
		begins with B.  An additional problem for the B's is
		that the <emphasis xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">every-other-one</emphasis>
		procedure disallowed adjacent names on the B part of
		the list from being both selected.  Just this defect
		alone means the sample was not formed through simple
		random sampling.
	      </item>
	    </list>
	  </para>
	</solution>
      </exercise>
    </section>

    <section xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="sec2">
      <name xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">Sample size matters</name>
      <para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="p1sec2">
	Recall that the definition of a random sample is a sample in
	which every member of the population has an equal chance of
	being selected.  This means that the sampling
	<emphasis xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">procedure</emphasis> rather than the
	<emphasis xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">results</emphasis> of the procedure define what it
	means for a sample to be random.  Random samples, especially
	if the sample size is small, are not necessarily
	representative of the entire population.  For example, if a
	random sample of 20 subjects were taken from a population with
	an equal number of males and females, there would be a
	nontrivial probability (0.06) that 70% or more of the sample
	would be female.  (To see how we obtain this probability,
	click <!-- to be added by author later-->here.)  Such a sample
	would not be representative, although it would be drawn
	randomly.  Only a large sample size makes it likely that our
	sample is representative of the population.  For this reason,
	inferential statistics needs to take into account the sample
	size when it attempts to generalize results from samples to
	populations.  In later chapters, you'll see what kinds of
	mathematical techniques ensure this sensitivity to sample
	size.
      </para>
    </section>

    <section xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="sec3">
      <name xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">More sophisticated sampling</name>
      <para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="conclusion1">
	Sometimes it is not feasible to build a sample using simple
	random sampling.  To see the problem, consider the fact that
	both Dallas and Houston are competing to be hosts of the 2012
	Olympics.  Imagine that you are hired to assess whether most
	Texans prefer Houston to Dallas as the host, or the reverse.
	Given the impracticality of obtaining the opinion of every
	single Texan, you must construct a sample of the Texas
	population.  But now notice how difficult it would be to
	proceed by simple random sampling.  For example, how will you
	contact those individuals who don't vote and don't have a
	phone?  Even among people you find in the telephone book, how
	can you identify those who have just relocated to California
	(and had no reason to inform you of their move)?  What do you
	do about the fact that since the beginning of the study, an
	additional 4,212 people took up residence in the state of
	Texas?  As you can see, it is sometimes very difficult to
	develop a truly random procedure.  For this reason, other
	kinds of sampling techniques have been devised.  We now
	discuss two of them.
      </para>
    </section>

    <section xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="sec4">
      <name xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">Random Assignment</name>
      <para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="para7">


	In experimental research, populations are often hypothetical.
	For example, in an exeriment comparing the effectiveness of a
	new anti-depressant drug with a <term xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">placebo</term> <note xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" type="footnote">A device used in clinical trials, the placebo
	is visually indistinguishable from the study medication, but
	in reality has no medical effect (often, a sugar pill).  A
	group of subjects chosen randomly takes the placebo, the
	others take one or another type of medication.  This is done
	to prevent confounding the medical and psychological effects
	of the drug.  Even a sugar pill can lead some patients to
	report improvement and side effects.</note>, there is no
	actual populaton of individuals taking the drug.  In this
	case, a specified population of people with some degree of
	depression is defined and a random sample is taken from this
	populaton.  The sample is then randomly divided into two
	groups; one group is assigned to the treatment condition
	(drug) and the other group is assigned to the control
	condition (placebo).  This random division of the sample into
	two groups is called <term xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">random assignment</term>.  Random
	assignment is critical for the validity of an experiment.  For
	example, consider the bias that could be introduced if the
	first 20 subjects to show up at the experiment were assigned
	to the experimental group and the second 20 subjects were
	assigned to the control group.  It is possible that subjects
	who show up late tend to be more depressed than those who show
	up early thus making the experimental group less depressed
	than the control group even before the treatment was
	administered.
      </para>

      <para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="para8">
	In exerimental research of this kind, failure to assign
	subjects randomly to groups is generally more serious than
	having a non-random sample.  Failure to randomize (the former
	error) invalidates the experimental findings.  Non-random
	samples (the latter error) simply restricts the
	generalizeability of the results.
      </para>
    </section>

    <section xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="sec5">
      <name xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">Stratified Sampling</name>
      <para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="conc2">
	Since simple random sampling often does not ensure a
	representative sample, a sampling method called
	<term xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/">stratified random sampling</term> <note xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" type="footnote">In stratified random sampling, the population
	is divided into a number of subgroups (or strata).  Random
	samples are then taken from each subgroup with sample sizes
	proportionatal to the size of the subgroup in the
	population.  For instance, if a population contained equal
	numbers of men and women, and the variable of interest is
	suspected to vary by gender, one might conduct stratified
	random sampling to insure a representative sample.</note> is
	sometimes used to make the sample more representative of the
	population.  This method can be used if the population has a
	number of distinct "strata" or groups.  In stratified
	sampling, you first identify members of your sample who belong
	to each group.  Then you randomly sample from each of those
	subgroups in such a way that the sizes of the subgroups in the
	sample are proportional to their sizes in the population.
      </para>

      <para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="para6">
	Let's take an example: Suppose you were interested in views of
	capital punishment at an urban university.  You have the time
	and resources to interview 200 students.  The student body is
	diverse with respect to age; many older people work during the
	day and enroll in night courses (average age is 39), while
	younger students generally enroll in day classes (average age
	of 19).  It is possible that night students have different
	views about capital punishment than day students.  If 70% of
	the students were day students, it makes sense to ensure that
	70% of the sample consisted of day students.  Thus, your sample
	of 200 students would consist of 140 day students and 60 night
	students.  The proportion of day students in the sample and in
	the population (the entire university) would be the
	same.  Inferences to the entire population of students at the
	university would therefore be more secure.
      </para>
    </section>

  </content>
</document>
