Skip to content Skip to navigation

Connexions

You are here: Home » Content » Descriptive Statistics: Measuring the Location of the Data

Navigation

Content Actions

  • Download module PDF
  • Add to ...
    Add the module to:
    • My Favorites
    • A lens
    • An external social bookmarking service
    • My Favorites (What is 'My Favorites'?)
      'My Favorites' is a special kind of lens which you can use to bookmark modules and collections directly in Connexions. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need a Connexions account to use 'My Favorites'.
    • A lens (What is a lens?)

      Definition of a lens

      Lenses

      A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

      What is in a lens?

      Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

      Who can create a lens?

      Any individual Connexions member, a community, or a respected organization.

    • External bookmarks
  • E-mail the authors

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual Connexions member, a community, or a respected organization.

This content is ...

In these lenses

  • Bio 502 at CSUDH

    This module is included inLens: Bio 502
    By: Terrence McGlynnAs a part of collection:"Collaborative Statistics"

    Comments:

    "This is the course textbook for Biology 502 at CSU Dominguez Hills"

    Click the "Bio 502 at CSUDH" link to see all content selected in this lens.

Recently Viewed

This feature requires Javascript to be enabled.

Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.

Descriptive Statistics: Measuring the Location of the Data

Module by: Dr. Barbara Illowsky, Susan Dean

Summary: This module describes a number of statistical measures used to describe data, such as percentiles, spread, and skewness.

The common measures of location are quartiles and percentiles (%iles). Quartiles are special percentiles. The first quartile, Q1 Q1 is the same as the 25th percentile (25th %ile) and the third quartile, Q3 Q3 , is the same as the 75th percentile (75th %ile). The median, MM, is called both the second quartile and the 50th percentile (50th %ile).

To calculate quartiles and percentiles, the data must be ordered from smallest to largest. Recall that quartiles divide ordered data into quarters. Percentiles divide ordered data into hundredths. To score in the 90th percentile of an exam does not mean, necessarily, that you received 90% on a test. It means that your score was higher than 90% of the people who took the test and lower than the scores of the remaining 10% of the people who took the test. Percentiles are useful for comparing values. For this reason, universities and colleges use percentiles extensively.

The interquartile range is a number that indicates the spread of the middle half or the middle 50% of the data. It is the difference between the third quartile ( Q3 Q3 ) and the first quartile ( Q1 Q1 ).

IQR = Q3 - Q1 IQR= Q3 - Q1 (1)

The IQR can help to determine potential outliers. A value is suspected to be a potential outlier if it is more than (1.5)(IQR)(1.5)(IQR) below the first quartile or more than (1.5)(IQR)(1.5)(IQR) above the third quartile. Potential outliers always need further investigation.

Example 1

Problem 1

For the following 13 real estate prices, calculate the IQR IQR and determine if any prices are outliers. Prices are in dollars. (Source: San Jose Mercury News)

389,950; 230,500; 158,000; 479,000; 639,000; 114,950; 5,500,000; 387,000; 659,000; 529,000; 575,000; 488,000; 1,095,000

Solution 1

Order the data from smallest to largest.

114,950; 158,000; 230,500; 387,000; 389,950; 479,000; 488,000; 529,000; 575,000; 639,000; 659,000; 1,095,000; 5,500,000

M = 488,800 M=488,800

Q 1 = 230500 + 387000 2 = 308750 Q 1 = 230500 + 387000 2 =308750

Q 3 = 639000 + 659000 2 = 649000 Q 3 = 639000 + 659000 2 =649000

IQR = 649000 - 308750 = 340250 IQR=649000-308750=340250

( 1.5 ) ( IQR ) = ( 1.5 ) ( 340250 ) = 510375 (1.5)(IQR)=(1.5)(340250)=510375

Q 1 - ( 1.5 ) ( IQR ) = 308750 - 510375 = - 201625 Q 1 -(1.5)(IQR)=308750-510375=-201625

Q 3 + ( 1.5 ) ( IQR ) = 649000 + 510375 = 1159375 Q 3 +(1.5)(IQR)=649000+510375=1159375

No house price is less than -201625. However, 5,500,000 is more than 1,159,375. Therefore, 5,500,000 is a potential outlier.

Example 2

Problem 1

For the two data sets in the test scores example, find the following:

  • a. The interquartile range. Compare the two interquartile ranges.
  • b. Any outliers in either set.
  • c. The 30th percentile and the 80th percentile for each set. How much data falls below the 30th percentile? Above the 80th percentile?

Solution 1

For the IQRs, see the answer to the test scores example. The first data set has the larger IQR, so the scores between Q3Q3 and Q1Q1 (middle 50%) for the first data set are more spread out and not clustered about the median.

First Data Set

  • ( 3 2 )  ⋅  ( IQR )  =  ( 3 2 )  ⋅  ( 26.5 )  =  39.75 ( 3 2 ) ⋅ (IQR) = ( 3 2 ) ⋅ (26.5) = 39.75
  • Xmax  -  Q3  =  99  -  82.5  =  16.5 Xmax - Q3 = 99 - 82.5 = 16.5
  • Q1  -  Xmin  =  56  -  32  =  24 Q1 - Xmin = 56 - 32 = 24
( 3 2 ) ( IQR ) = 39.75 ( 3 2 )(IQR) = 39.75 is larger than 16.5 and larger than 24, so the first set has no outliers.

Second Data Set

  • ( 3 2 ) ( IQR ) = ( 3 2 ) ( 11 ) = 16.5 ( 3 2 )(IQR) = ( 3 2 )(11) = 16.5
  • Xmax - Q3 = 98 - 89 = 9 Xmax - Q3 = 98 - 89 = 9
  • Q1 - Xmin = 78 - 25.5 = 52.5 Q1 - Xmin = 78 - 25.5 = 52.5
( 3 2 ) ( IQR ) = 16.5 ( 3 2 )(IQR) = 16.5 is larger than 9 but smaller than 52.5, so for the second set 45 and 25.5 are outliers.

To find the percentiles, create a frequency, relative frequency, and cumulative relative frequency chart (see "Frequency" from the Sampling and Data Chapter). Get the percentiles from that chart.

First Data Set

  • 30th %ile (between the 6th and 7th values)  =  ( 56  +  59 ) 2  =  57.5 30th %ile (between the 6th and 7th values) =  ( 56  +  59 ) 2  = 57.5
  • 80th %ile (between the 16th and 17th values)  =  ( 84  +  84.5 ) 2  =  84.25 80th %ile (between the 16th and 17th values) =  ( 84  +  84.5 ) 2  = 84.25

Second Data Set

  • 30th %ile (7th value) = 78 30th %ile (7th value) = 78
  • 80th %ile (18th value) = 90 80th %ile (18th value) = 90

30% of the data falls below the 30th %ile, and 20% falls above the 80th %ile.

Example 3: Finding Quartiles and Percentiles Using a Table

Fifty statistics students were asked how much sleep they get per school night (rounded to the nearest hour). The results were (student data):

AMOUNT OF SLEEPPER SCHOOL NIGHT (HOURS) FREQUENCY RELATIVE FREQUENCY CUMULATIVE RELATIVE FREQUENCY
4 2 0.04 0.04
5 5 0.10 0.14
6 7 0.14 0.28
7 12 0.24 0.52
8 14 0.28 0.80
9 7 0.14 0.94
10 3 0.06 1.00

Find the 28th percentile: Notice the 0.28 in the "cumulative relative frequency" column. 28% of 50 data values = 14. There are 14 values less than the 28th %ile. They include the two 4s, the five 5s, and the seven 6s. The 28th %ile is between the last 6 and the first 7. The 28th %ile is 6.5.

Find the median: Look again at the "cumulative relative frequency " column and find 0.52. The median is the 50th %ile or the second quartile. 50% of 50 = 25. There are 25 values less than the median. They include the two 4s, the five 5s, the seven 6s, and eleven of the 7s. The median or 50th %ile is between the 25th (7) and 26th (7) values. The median is 7.

Find the third quartile: The third quartile is the same as the 75th percentile. You can "eyeball" this answer. If you look at the "cumulative relative frequency" column, you find 0.52 and 0.80. When you have all the 4s, 5s, 6s and 7s, you have 52% of the data. When you include all the 8s, you have 80% of the data. The 75th %ile, then, must be an 8 . Another way to look at the problem is to find 75% of 50 (= 37.5) and round up to 38. The third quartile, Q3 Q3 , is the 38th value which is an 8. You can check this answer by counting the values. (There are 37 values below the third quartile and 12 values above.)

Example 4

Problem 1

Using the table:

  1. Find the 80th percentile.
  2. Find the 90th percentile.
  3. Find the first quartile. What is another name for the first quartile?
  4. Construct a box plot of the data.

Solution 1

  1. ( 8 + 9 ) 2  =  8.5 ( 8 + 9 ) 2  = 8.5
  2. 9
  3. 6
  4. First Quartile = 25th %ile

Collaborative Classroom Exercise: Your instructor or a member of the class will ask everyone in class how many sweaters they own. Answer the following questions.

  1. How many students were surveyed?
  2. What kind of sampling did you do?
  3. Find the mean and standard deviation.
  4. Find the mode.
  5. Construct 2 different histograms. For each, starting value = _____ ending value = ____.
  6. Find the median, first quartile, and third quartile.
  7. Construct a box plot.
  8. Construct a table of the data to find the following:
    • The 10th percentile
    • The 70th percentile
    • The percent of students who own less than 4 sweaters

Glossary

Interquartile Range (IQR):
The distance between the third quartile and the first quartile.
Outlier:
An observation that does not fit the rest of the data.
Percentile:
A number that separates 11001100 size 12{ { {1} over {"100"} } } {}of the data.

Example:

Let a data set contain 200 ordered observations starting with {2.3,2.7,2.8,2.9,2.9,3.0...}{2.3,2.7,2.8,2.9,2.9,3.0...} size 12{ lbrace 2 "." 3,2 "." 7,2 "." 8,2 "." 9,2 "." 9,3 "." 0 "." "." "." rbrace } {}. Then the first percentile is (2.7+2.8)2=2.75(2.7+2.8)2=2.75 size 12{ { { \( 2 "." 7+2 "." 8 \) } over {2} } =2 "." "75"} {}, because 1% of the data is to the left of this point on the number line and 99% of the data is on its right. The second percentile is (2.9+2.9)2=2.9(2.9+2.9)2=2.9 size 12{ { { \( 2 "." 9+2 "." 9 \) } over {2} } =2 "." 9} {}, separating 2% of the data. Percentiles may or may not be part of the data. (In this example, the first percentile is not in the data, but the second percentile is.). The median of the data is the second quartile and is the 50-th percentile at the same time. The first and third quartiles are 25th and 75th percentiles, respectively.

Quartiles:
The numbers that separate the data into quarters. Quartiles may or may not be part of the data. The second quartile is the median of the data.

Comments, questions, feedback, criticisms?

Send feedback