Skip to content Skip to navigation

OpenStax-CNX

You are here: Home » Content » Basics of Data Collection

Navigation

Recently Viewed

This feature requires Javascript to be enabled.
 

Basics of Data Collection

Module by: David Lane. E-mail the author

Summary: (Blank Abstract)

Most statistical analyses require that your data be in numerical rather than verbal form (you can't punch letters into your calculator). Therefore, data collected in verbal form must be coded so that it is represented by numbers. To illustrate, consider the data in Table 1.

Table 1: Example Data
Student Name Hair Color Gender Major Height Computer Experience
Norma Brown Female Psychology 5'4" Lots
Amber Blonde Female Social Sciences 5'7" Very Little
Paul Blonde Male History 6'1" Moderate
Christopher Black Male Biology 5'10" Lots
Sonya Brown Female Psychology 5'4" Little

Can you conduct statistical analyses on the above data or must you re-code it in some way? For example, how would you go about computing the average height of the 5 students.

Definition 1: average
1. The arithmetic mean
2. Any measure of central tendency
You cannot enter students' heights in their current form into a statistical program -- the computer would probably give you an error message because it does not understand notation such as 5'4" . One solution is to change all the numbers to inches? So, 5'4" becomes 5×12+4=64 5 12 4 64 , and 6'1" becomes 6×12+1=73 6 12 1 73 , and so forth. In this way, you are converting height in feet and inches to simply height in inches. From there, it is very easy to ask a statistical program to calculate the mean height in inches for the 5 students.

You may ask, "Why not simply ask subjects to write their height in inches in the first place?" Well, the number one rule of data collection is to ask for information in such a way as it will be most accurately reported. Most people know their height in feet and inches and cannot quickly and accurately convert it into inches "on the fly". So, in order to preserve data accuracy, it is best for researchers to make the necessary conversions.

Let's take another example. Suppose you wanted to calculate the mean amount of computer experience for the five students shown in Table 1. One way would be to convert the verbal descriptions to numbers as shown in Table 2. Thus, "Very Little" would be converted to "1" and "Little" would be converted to "2".

Table 2: Conversion of verbal descriptions to numbers
1 2 3 4 5
Very Little Little Moderate Lots Very Lots

Measurement Examples

Example 1: How much information should I record?

Say you are volunteering at a track meet at your college, and your job is to record each runner's time as they pass the finish line for each race. Their times are shown in large red numbers on a digital clock with eight digits to the right of the decimal point, and you are told to record the entire number in your tablet. Thinking eight decimal places is a bit excessive, you only record runners' times to one decimal place. The track meet begins, and runner number one finishes with a time of 22.93219780 seconds. You dutifully record her time in your tablet, but only to one decimal place, that is 22.9. Race number two finishes and you record 32.7 for the winning runner. The fastest time in Race number three is 25.6. Race number four winning time is 22.9, Race number five is...;.But wait! You suddenly realize your mistake; you now have a tie between runner one and runner four for the title of Fastest Overall Runner! You should have recorded more information from the digital clock -- that information is now lost, and you cannot go back in time and record running times to more decimal places.

The point is that you should think very carefully about the scales and specificity of information needed in your research before you begin collecting data. If you believe you might need additional information later but are not sure, measure it; you can always decide to not use some of the data, or "collapse" your data down to lower scales if you wish, but you cannot expand your data set to include more information after the fact. In this example, you probably would not need to record eight digits to the right of the decimal point. But recording only one decimal digit is clearly too few.

Example 2

Pretend for a moment that you are teaching five children in middle school (yikes!), and you are trying to convince them that they must study more in order to earn better grades. To prove your point, you decide to collect actual data from their recent math exams, and, toward this end, you develop a questionnaire to measure their study time and subsequent grades. You might develop a questionnaire which looks like the following:

  1. Please write your name: ____________________________
  2. Please indicate how much you studied for this math exam: a lot_________moderate________little
  3. Please circle the grade you received on the math exam: A  B  C  D  F
Given the above questionnaire, your obtained data might look like that in Table 3:

Table 3
Name Amount Studied Grade
John Little C
Sally Moderate B
Alexander Lots A
Linda Moderate A
Thomas Little B

Eyeballing the data, it seems as if the children who studied more received better grades, but it's difficult to tell. "Little","lots", and "B", are imprecise, qualitative terms. You could get more precise information by asking specifically how many hours they studied and how their exact score on the exam. The data then might look as follows:

Table 4
Name Hours Studied % Correct
John 5 71
Sally 9 83
Alexander 13 97
Linda 12 91
Thomas 7 85

Of course, this assumes the students would know how many hours they studied. Rather than trust the students' memories, you might ask them to keep a log of their study time as they study.

Content actions

Download module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks