<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE document PUBLIC "-//CNX//DTD CNXML 0.5 plus MathML//EN" "http://cnx.rice.edu/cnxml/0.5/DTD/cnxml_mathml.dtd">
<document xmlns="http://cnx.rice.edu/cnxml" xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="id12985772">
  <name>Sampling and Data: Sampling</name>
  <metadata>
  <md:version>1.7</md:version>
  <md:created>2008/03/31 14:35:43 GMT-5</md:created>
  <md:revised>2008/07/03 11:40:02.685 GMT-5</md:revised>
  <md:authorlist>
      <md:author id="billowsky">
      <md:firstname>Barbara</md:firstname>
      
      <md:surname>Illowsky</md:surname>
      <md:email>cnx@cnx.org</md:email>
    </md:author>
      <md:author id="sdean">
      <md:firstname>Susan</md:firstname>
      
      <md:surname>Dean</md:surname>
      <md:email>cnx@cnx.org</md:email>
    </md:author>
  </md:authorlist>

  <md:maintainerlist>
    <md:maintainer id="cnxorg">
      <md:firstname/>
      
      <md:surname>Connexions</md:surname>
      <md:email>cnx@cnx.org</md:email>
    </md:maintainer>
  </md:maintainerlist>
  
  <md:keywordlist>
    <md:keyword>cluster sample</md:keyword>
    <md:keyword>Convenience sampling</md:keyword>
    <md:keyword>nonsampling errors</md:keyword>
    <md:keyword>random sampling</md:keyword>
    <md:keyword>sample</md:keyword>
    <md:keyword>sampling</md:keyword>
    <md:keyword>sampling errors</md:keyword>
    <md:keyword>simple random sampling</md:keyword>
    <md:keyword>statistics</md:keyword>
    <md:keyword>stratified sample</md:keyword>
    <md:keyword>systematic sample</md:keyword>
    <md:keyword>without replacement</md:keyword>
    <md:keyword>with replacement</md:keyword>
  </md:keywordlist>

  <md:abstract>This module introduces the concept of statistical sampling.  Students are taught the difference between a simple random sample, stratified sample, cluster sample, systematic sample, and convenience sample.  Example problems are provided, including an optional classroom activity.

Note: This module is currently under revision, and its content is subject to change.  This module is being prepared as part of a statistics textbook that will be available for the Fall 2008 semester.</md:abstract>
</metadata>
  <content>
    

      <para id="id12361474">Gathering information about an entire population often costs too much or is virtually impossible. Instead, we use a sample of the population. <emphasis>A sample should have the same characteristics as the population it is representing.</emphasis></para>
      <para id="id13092136">Two common methods of sampling are <emphasis>with replacement</emphasis> and <emphasis>without replacement</emphasis>. If each member of a population may be chosen more than once then the sampling is with replacement. If each member may be chosen only once, then the sampling is without replacement. </para>
      <para id="id12311519">One of the most important methods of obtaining samples is called <emphasis>random sampling</emphasis>. If each member of a population has an equal chance of being selected for the sample, the sample is called a <emphasis>simple random sample</emphasis>. Two simple random samples would contain members equally representative of the entire population. In other words, each sample of the same size has an equal chance of being selected. For example, suppose Lisa wants to form a four-person study group (herself and three other people) from her pre-calculus class, which has 32 members including Lisa. To choose a simple random sample of size 3 from the other members of her class, Lisa first lists the last names of the members of her class together with a two-digit number as shown below.</para>
      <table id="element-621">
<name>Class Roster</name>
<tgroup cols="2"><colspec colnum="1" colname="id"/>
    <colspec colnum="2" colname="name"/>
    <thead>
      <row>
        <entry>ID</entry>
        <entry>Name</entry>
      </row>
    </thead>
<tbody>
  <row>
    <entry>00</entry>
    <entry>Anselmo</entry>
  </row>
  <row>
    <entry>01</entry>
    <entry>Bautista</entry>
  </row>
  <row>
    <entry>02</entry>
    <entry>Bayani</entry>
  </row>
  <row>
    <entry>03</entry>
    <entry>Cheng</entry>
  </row>
  <row>
    <entry>04</entry>
    <entry>Cuarismo</entry>
  </row>
  <row>
    <entry>05</entry>
    <entry>Cuningham</entry>
  </row>
  <row>
    <entry>06</entry>
    <entry>Fontecha</entry>
  </row>
  <row>
    <entry>07</entry>
    <entry>Hong</entry>
  </row>
  <row>
    <entry>08</entry>
    <entry>Hoobler</entry>
  </row>
  <row>
    <entry>09</entry>
    <entry>Jiao</entry>
  </row>
  <row>
    <entry>10</entry>
    <entry>Khan</entry>
  </row>
  <row>
    <entry>11</entry>
    <entry>King</entry>
  </row>
  <row>
    <entry>12</entry>
    <entry>Legeny</entry>
  </row>
  <row>
    <entry>13</entry>
    <entry>Lundquist</entry>
  </row>
  <row>
    <entry>14</entry>
    <entry>Macierz</entry>
  </row>
  <row>
    <entry>15</entry>
    <entry>Motogawa</entry>
  </row>
  <row>
    <entry>16</entry>
    <entry>Okimoto</entry>
  </row>
  <row>
    <entry>17</entry>
    <entry>Patel</entry>
  </row>
  <row>
    <entry>18</entry>
    <entry>Price</entry>
  </row>
  <row>
    <entry>19</entry>
    <entry>Quizon</entry>
  </row>
  <row>
    <entry>20</entry>
    <entry>Reyes</entry>
  </row>
  <row>
    <entry>21</entry>
    <entry>Roquero</entry>
  </row>
  <row>
    <entry>22</entry>
    <entry>Roth</entry>
  </row>
  <row>
    <entry>23</entry>
    <entry>Rowell</entry>
  </row>
  <row>
    <entry>24</entry>
    <entry>Salangsang</entry>
  </row>
  <row>
    <entry>25</entry>
    <entry>Slade</entry>
  </row>
  <row>
    <entry>26</entry>
    <entry>Stracher</entry>
  </row>
  <row>
    <entry>27</entry>
    <entry>Tallai</entry>
  </row>
  <row>
    <entry>28</entry>
    <entry>Tran</entry>
  </row>
  <row>
    <entry>29</entry>
    <entry>Wai</entry>
  </row>
  <row>
    <entry>30</entry>
    <entry>Wood</entry>
  </row>
</tbody>

</tgroup>
</table>
      <para id="id10904793">Lisa can either use a table of random numbers (found in many statistics books as well as mathematical handbooks) or a calculator or computer to generate random numbers. For this example, suppose Lisa chooses to generate random numbers from a calculator. The numbers generated are:</para>
      <para id="element-250"><list id="set-element-428" type="inline"><item>.94360</item>
<item>.99832</item>
<item>.14669</item>
<item>.51470</item>
<item>.40581</item>
<item>.73381</item>
<item>.04399</item></list></para>
      <para id="id12648586">Lisa reads two-digit groups until she has chosen three class members (that is, she reads .94360 as the groups 94, 43, 36, 60). Each random number may only contribute one class member. If she needed to, Lisa could have generated more random numbers. </para>
      <para id="id12688561">The random numbers .94360 and .99832 do not contain appropriate two digit numbers. However the third random number, .14669, contains 14 (the fourth random number also contains 14), the fifth random number contains 05, and the seventh random number contains 04. The two-digit number 14 corresponds to Macierz, 05 corresponds to Cunningham, and 04 corresponds to Cuarismo. Besides herself, Lisa's group will consist of Marcierz, and Cunningham, and Cuarismo.</para>
      <para id="id11076554">Sometimes, it is difficult or impossible to obtain a simple random sample because populations are too large. Then we choose other forms of sampling methods that involve a chance process for getting the sample. <emphasis>Other well-known random sampling methods are the stratified sample, the cluster sample, and the systematic sample.</emphasis></para>
      <para id="id12511076">To choose a <emphasis>stratified sample</emphasis>, divide the population into groups called strata and then take a sample from each stratum. For example, you could stratify (group) your college population by department and then choose a simple random sample from each stratum to get a stratified random sample.</para>
      <para id="id13017093">To choose a <emphasis>cluster sample</emphasis>, divide the population into sections and then randomly select some of the sections. All the members from these sections are in the cluster sample. For example, if you randomly sample four departments from your stratified college population (randomly choose four departments from all of the departments), the four departments make up the cluster sample. </para>
      <para id="id12769433">To choose a <emphasis>systematic sample</emphasis>, randomly select a starting point and take every nth piece of data from a listing of the population. For example, suppose you have to do a phone survey. Your phone book contains 20,000 residence listings. You must choose 400 names for the sample. You start by randomly picking one of the first 50 names and then choose every 50th name thereafter. Systematic sampling is frequently chosen because it is a simple method.</para>
      <para id="id12385449">A type of sampling that is nonrandom is convenience sampling. <emphasis>Convenience sampling</emphasis> involves using results that are readily available. For example, a computer software store conducts a marketing study by interviewing potential customers who happen to be in the store browsing through the available software. The results of convenience sampling may be very good in some cases and highly biased (favors certain outcomes) in others.</para>
      <para id="id10814456">Sampling data should be done very carefully. Collecting data carelessly can have devastating results. Surveys mailed to households and then returned may be very biased (for example, they may favor a certain group). It is better for the person conducting the survey to select the sample respondents.</para>
      <para id="id11715554">When you analyze data, it is important to be aware of <emphasis>sampling errors</emphasis> and nonsampling errors. The actual process of sampling causes sampling errors. For example, the sample may not be large enough or representative of the population. Factors not related to the sampling process cause <emphasis>nonsampling errors</emphasis>. A defective counting device can cause a nonsampling error.</para>
      <exercise id="element-3770"><problem>
<para id="element-669">
		Determine the type of sampling used (simple random, stratified, systematic, cluster, or convenience).
	</para>
		<list type="enumerated" id="element-187">
			<item>A soccer coach selects 6 players from a group of boys aged 8 to 10, 7 players from a group of boys aged 11 to 12, and 3 players from a group of boys aged 13 to 14 to form a recreational soccer team.
		</item>
			
	<item>A pollster interviews all human resource personnel in five different high tech companies.</item>
	
		<item>
			An engineering researcher interviews 50 women engineers and 50 men engineers.</item>
		<item>
		A medical researcher interviews every third cancer patient from a list of cancer patients at a local hospital.</item>
	
		<item>
		 A high school counselor uses a computer to generate 50 random numbers and then picks students whose names correspond to the numbers.</item>
		<item>A student interviews classmates in his algebra class to determine how many pairs of jeans a student owns, on the average.</item> </list>
	</problem>
<solution>
<para id="exercise-solution-1">
<list type="enumerated" id="solution-list-1">
<item>stratified</item>
<item>cluster</item>
<item>stratified</item>
<item>systematic</item>
<item>simple random</item>
<item>convenience</item>
</list>
</para>
</solution></exercise>
      
      
      
<para id="id7645179">If we were to examine two samples representing the same population, they would, more than likely, not be the same. Just as there is variation in data, there is variation in samples. As you become accustomed to sampling, the variability will seem natural. </para>
      
      <example id="element-575"><para id="element-62">
		Suppose ABC College has 10,000 part-time students (the population). We are interested in the average amount of money a part-time student spends on books in the fall term. Asking all 10,000 students is an almost impossible task.
	</para><para id="element-199">Suppose we take two different samples. </para><para id="element-499">First, we use convenience sampling and survey 10 students from a first term organic chemistry class. Many of these students are taking first term calculus in addition to the organic chemistry class . The amount of money they spend is as follows:</para>
<para id="element-25001"><list id="set-element-195" type="inline"><item>$128</item>
<item>$87</item>
<item>$173</item>
<item>$116</item>
<item>$130</item>
<item>$204</item>
<item>$147</item>
<item>$189</item>
<item>$93</item>
<item>$153</item></list></para><para id="element-849">The second sample is taken by using a list from the P.E. department of senior citizens who take P.E. classes and taking every 5th senior citizen on the list, for a total of 10 senior citizens. They spend:</para><para id="element-25002"><list id="set-element-865" type="inline"><item>$50</item>
<item>$40</item>
<item>$36</item>
<item>$15</item>
<item>$50</item>
<item>$100</item>
<item>$40</item>
<item>$53</item>
<item>$22</item>
<item>$22</item></list></para><exercise id="element-536"><problem>
		<para id="element-377">
			Do you think that either of these samples is representative of (or is characteristic of) the entire 10,000 part-time student population?
		</para>
	</problem>

	<solution>
		<para id="element-335"><emphasis>No</emphasis>. The first sample probably consists of science-oriented students.  Besides the chemistry course, some of them are taking first-term calculus.  Books for these classes tend to be expensive.  Most of these students are, more than likely, paying more than the average part-time student for their books.  The second sample is a group of senior citizens who are, more than likely, taking courses for health and interest.  The amount of money they spend on books is probably much less than the average part-time student.  Both samples are biased.  Also, in both cases, not all students have a chance to be in either sample.
		</para>
	</solution>
</exercise><exercise id="element-179"><problem>
		<para id="element-546">
			Since these samples are not representative of the entire population, is it wise to use the results to describe the entire population?
		</para>
	</problem>

	<solution>
		<para id="element-480"><emphasis>No.</emphasis> Never use a sample that is not representative or does not have the characteristics of the population.
		</para>
	</solution>
</exercise><para id="element-513">Now, suppose we take a third sample. We choose ten different part-time students from the disciplines of chemistry, math, English, psychology, sociology, history, nursing, physical education, art, and early childhood development. Each student is chosen using simple random sampling. Using a calculator, random numbers are generated and a student from a particular discipline is selected if he/she has a corresponding number. The students spend:</para><para id="element-25003"><list id="set-element-651" type="inline"><item>$180</item>
<item>$50</item>
<item>$150</item>
<item>$85</item>
<item>$260</item>
<item>$75</item>
<item>$180</item>
<item>$200</item>
<item>$200</item>
<item>$150</item></list></para><exercise id="element-887"><problem>
		<para id="element-666">
			Do you think this sample is representative of the population?
		</para>
	</problem>

	<solution>
		<para id="element-971"><emphasis>Yes.</emphasis> It is chosen from different disciplines across the population.
		</para>
	</solution>
</exercise><para id="element-577">Students often ask if it is "good enough" to take a sample, instead of surveying the entire population. If the survey is done well, the answer is yes. </para>
</example>
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      <section id="id-425122345235">
        <name>Optional Collaborative Classroom Exercise</name>
<exercise id="lastex"><problem>
        <para id="id7878578">As a class, determine whether or not the following samples are representative. If they are not, discuss the reasons.</para>
        
        
        
        <list id="element-785" type="enumerated"><item>To find the average GPA of all students in a university, use all honor students at the university as the sample.</item>
	<item>To find out the most popular cereal among young people under the age of 10, stand outside a large supermarket for three hours and speak to every 20th child under age 10 who enters the supermarket.</item>
	<item>To find the average annual income of all adults in the United States, sample U.S. congressmen. Create a cluster sample by considering each state as a stratum (group).  By using simple random sampling, select states to be part of the cluster.  Then survey every U.S. congressman in the cluster.</item>
<item>To determine the proportion of people taking public transportation to work, survey 20 people in New York City. Conduct the survey by sitting in Central Park on a bench and interviewing every person who sits next to you.</item>
<item>To determine the average cost of a two day stay in a hospital in Massachusetts, survey 100 hospitals across the state using simple random sampling.</item></list> </problem></exercise>
        
      </section>

  </content>
</document>
