Skip to content Skip to navigation Skip to collection information

Connexions

You are here: Home » Content » Collaborative Statistics Using R » Measuring the Location of the Data

Navigation

Recently Viewed

This feature requires Javascript to be enabled.
 

Measuring the Location of the Data

Module by: Ananda Mahto. E-mail the author

Based on: Descriptive Statistics: Measuring the Location of the Data by Susan Dean, Barbara Illowsky, Ph.D.

Summary: Descriptive Statistics: Measuring the Location of Data explains percentiles and quartiles and is part of the collection col10555 written by Barbara Illowsky and Susan Dean. Roberta Bloom contributed the section "Interpreting Percentiles, Quartile and the Median."

The common measures of location are quartiles and percentiles (%iles). Quartiles are special percentiles. The first quartile, Q1 Q1 is the same as the 25th percentile (25th %ile) and the third quartile, Q3 Q3 , is the same as the 75th percentile (75th %ile). The median, MM, is called both the second quartile and the 50th percentile (50th %ile).

To calculate quartiles and percentiles, the data must be ordered from smallest to largest. Recall that quartiles divide ordered data into quarters. Percentiles divide ordered data into hundredths. To score in the 90th percentile of an exam does not mean, necessarily, that you received 90% on a test. It means that 90% of test scores are the same or less than your score and 10% of the test scores are the same or greater than your test score.

Percentiles are useful for comparing values. For this reason, universities and colleges use percentiles extensively.

Percentiles are mostly used with very large populations. Therefore, if you were to say that 90% of the test scores are less (and not the same or less) than your score, it would be acceptable because removing one particular data value is not significant.

The interquartile range is a number that indicates the spread of the middle half or the middle 50% of the data. It is the difference between the third quartile ( Q3 Q3 ) and the first quartile ( Q1 Q1 ).

IQR = Q3 - Q1 IQR= Q3 - Q1
(1)

The IQR can help to determine potential outliers. A value is suspected to be a potential outlier if it is less than (1.5)(IQR)(1.5)(IQR) below the first quartile or more than (1.5)(IQR)(1.5)(IQR) above the third quartile. Potential outliers always need further investigation.

Example 1

Problem 1

For the following 13 real estate prices, calculate the IQR IQR and determine if any prices are outliers. Prices are in dollars. (Source: San Jose Mercury News)

389950, 230500, 158000, 479000, 639000, 114950, 5500000, 387000, 659000, 529000, 575000, 488800, 1095000

Solution

Order the data from smallest to largest.

114950, 158000, 230500, 387000, 389950, 479000, 488800, 529000, 575000, 639000, 659000, 1095000, 5500000

M = 488,800 M=488,800

Q 1 = 230500 + 387000 2 = 308750 Q 1 = 230500 + 387000 2 =308750

Q 3 = 639000 + 659000 2 = 649000 Q 3 = 639000 + 659000 2 =649000

IQR = 649000 - 308750 = 340250 IQR=649000-308750=340250

( 1.5 ) ( IQR ) = ( 1.5 ) ( 340250 ) = 510375 (1.5)(IQR)=(1.5)(340250)=510375

Q 1 - ( 1.5 ) ( IQR ) = 308750 - 510375 = - 201625 Q 1 -(1.5)(IQR)=308750-510375=-201625

Q 3 + ( 1.5 ) ( IQR ) = 649000 + 510375 = 1159375 Q 3 +(1.5)(IQR)=649000+510375=1159375

No house price is less than -201625. However, 5,500,000 is more than 1,159,375. Therefore, 5,500,000 is a potential outlier.

Example 2

Problem 1

For the two data sets in the test scores example, find the following:

  • a. The interquartile range. Compare the two interquartile ranges.
  • b. Any outliers in either set.
  • c. The 30th percentile and the 80th percentile for each set. How much data falls below the 30th percentile? Above the 80th percentile?

Solution

For the IQRs, see the answer to the test scores example. The first data set has the larger IQR, so the scores between Q3Q3 and Q1Q1 (middle 50%) for the first data set are more spread out and not clustered about the median.

First Data Set

  • ( 3 2 )  ⋅  ( IQR )  =  ( 3 2 )  ⋅  ( 26.5 )  =  39.75 ( 3 2 ) ⋅ (IQR) = ( 3 2 ) ⋅ (26.5) = 39.75
  • Xmax  -  Q3  =  99  -  82.5  =  16.5 Xmax - Q3 = 99 - 82.5 = 16.5
  • Q1  -  Xmin  =  56  -  32  =  24 Q1 - Xmin = 56 - 32 = 24
( 3 2 ) ( IQR ) = 39.75 ( 3 2 )(IQR) = 39.75 is larger than 16.5 and larger than 24, so the first set has no outliers.

Second Data Set

  • ( 3 2 ) ( IQR ) = ( 3 2 ) ( 11 ) = 16.5 ( 3 2 )(IQR) = ( 3 2 )(11) = 16.5
  • Xmax - Q3 = 98 - 89 = 9 Xmax - Q3 = 98 - 89 = 9
  • Q1 - Xmin = 78 - 25.5 = 52.5 Q1 - Xmin = 78 - 25.5 = 52.5
( 3 2 ) ( IQR ) = 16.5 ( 3 2 )(IQR) = 16.5 is larger than 9 but smaller than 52.5, so for the second set 45 and 25.5 are outliers.

To find the percentiles, create a frequency, relative frequency, and cumulative relative frequency chart. Get the percentiles from that chart.

First Data Set
  • 30th %ile (between the 6th and 7th values)  =  ( 56  +  59 ) 2  =  57.5 30th %ile (between the 6th and 7th values) =  ( 56  +  59 ) 2  = 57.5
  • 80th %ile (between the 16th and 17th values)  =  ( 84  +  84.5 ) 2  =  84.25 80th %ile (between the 16th and 17th values) =  ( 84  +  84.5 ) 2  = 84.25
Second Data Set
  • 30th %ile (7th value) = 78 30th %ile (7th value) = 78
  • 80th %ile (18th value) = 90 80th %ile (18th value) = 90

30% of the data falls below the 30th %ile, and 20% falls above the 80th %ile.

Example 3: Finding Quartiles and Percentiles Using a Table

Fifty statistics students were asked how much sleep they get per school night (rounded to the nearest hour). The results were (student data):

Table 1
AMOUNT OF SLEEP PER SCHOOL NIGHT (HOURS) FREQUENCY RELATIVE FREQUENCY CUMULATIVE RELATIVE FREQUENCY
4 2 0.04 0.04
5 5 0.10 0.14
6 7 0.14 0.28
7 12 0.24 0.52
8 14 0.28 0.80
9 7 0.14 0.94
10 3 0.06 1.00

Find the 28th percentile: Notice the 0.28 in the "cumulative relative frequency" column. 28% of 50 data values = 14. There are 14 values less than the 28th %ile. They include the two 4s, the five 5s, and the seven 6s. The 28th %ile is between the last 6 and the first 7. The 28th %ile is 6.5.

Find the median: Look again at the "cumulative relative frequency " column and find 0.52. The median is the 50th %ile or the second quartile. 50% of 50 = 25. There are 25 values less than the median. They include the two 4s, the five 5s, the seven 6s, and eleven of the 7s. The median or 50th %ile is between the 25th (7) and 26th (7) values. The median is 7.

Find the third quartile: The third quartile is the same as the 75th percentile. You can "eyeball" this answer. If you look at the "cumulative relative frequency" column, you find 0.52 and 0.80. When you have all the 4s, 5s, 6s and 7s, you have 52% of the data. When you include all the 8s, you have 80% of the data. The 75th %ile, then, must be an 8 . Another way to look at the problem is to find 75% of 50 (= 37.5) and round up to 38. The third quartile, Q3 Q3 , is the 38th value which is an 8. You can check this answer by counting the values. (There are 37 values below the third quartile and 12 values above.)

Example 4

Problem 1

Using the table:

  1. Find the 80th percentile.
  2. Find the 90th percentile.
  3. Find the first quartile. What is another name for the first quartile?
  4. Construct a box plot of the data.

Solution

  1. ( 8 + 9 ) 2  =  8.5 ( 8 + 9 ) 2  = 8.5
  2. 9
  3. 6
  4. First Quartile = 25th %ile

Collaborative Classroom Exercise: Your instructor or a member of the class will ask everyone in class how many shirts they own. Answer the following questions.

  1. How many students were surveyed?
  2. What kind of sampling did you do?
  3. Find the mean and standard deviation.
  4. Find the mode.
  5. Construct 2 different histograms. For each, starting value = _____ ending value = ____.
  6. Find the median, first quartile, and third quartile.
  7. Construct a box plot.
  8. Construct a table of the data to find the following:
    • The 10th percentile
    • The 70th percentile
    • The percent of students who own less than 4 shirts

Interpreting Percentiles, Quartiles, and Median

A percentile indicates the relative standing of a data value when data are sorted into numerical order, from smallest to largest. p% of data values are less than or equal to the pth percentile. For example, 15% of data values are less than or equal to the 15th percentile.

  • Low percentiles always correspond to lower data values.
  • High percentiles always correspond to higher data values.
A percentile may or may not correspond to a value judgment about whether it is "good" or "bad". The interpretation of whether a certain percentile is good or bad depends on the context of the situation to which the data applies. In some situations, a low percentile would be considered "good'; in other contexts a high percentile might be considered "good". In many situations, there is no value judgment that applies.

Understanding how to properly interpret percentiles is important not only when describing data, but is also important in later chapters of this textbook when calculating probabilities.

Guideline:

When writing the interpretation of a percentile in the context of the given data, the sentence should contain the following information:
  • information about the context of the situation being considered,
  • the data value (value of the variable) that represents the percentile,
  • the percent of individuals or items with data values below the percentile.
  • Additionally, you may also choose to state the percent of individuals or items with data values above the percentile.

Example 5

On a timed math test, the first quartile for times for finishing the exam was 35 minutes. Interpret the first quartile in the context of this situation.

  • 25% of students finished the exam in 35 minutes or less.
  • 75% of students finished the exam in 35 minutes or more.
  • A low percentile could be considered good, as finishing more quickly on a timed exam is desirable. (If you take too long, you might not be able to finish.)

Example 6

On a 20 question math test, the 70th percentile for number of correct answers was 16. Interpret the 70th percentile in the context of this situation.

  • 70% of students answered 16 or fewer questions correctly.
  • 30% of students answered 16 or more questions correctly.
  • Note: A high percentile could be considered good, as answering more questions correctly is desirable.

Example 7

At a certain community college, it was found that the 30th percentile of credit units that students are enrolled for is 7 units. Interpret the 30th percentile in the context of this situation.

  • 30% of students are enrolled in 7 or fewer credit units
  • 70% of students are enrolled in 7 or more credit units
  • In this example, there is no "good" or "bad" value judgment associated with a higher or lower percentile. Students attend community college for varied reasons and needs, and their course load varies according to their needs.


Do the following Practice Problems for Interpreting Percentiles

Exercise 1

  • a. For runners in a race, a low time means a faster run. The winners in a race have the shortest running times. Is it more desirable to have a finish time with a high or a low percentile when running a race?
  • b. The 20th percentile of run times in a particular race is 5.2 minutes. Write a sentence interpreting the 20th percentile in the context of the situation.
  • c. A bicyclist in the 90th percentile of a bicycle race between two towns completed the race in 1 hour and 12 minutes. Is he among the fastest or slowest cyclists in the race? Write a sentence interpreting the 90th percentile in the context of the situation.

Solution

  • a. For runners in a race it is more desirable to have a low percentile for finish time. A low percentile means a short time, which is faster.
  • b. INTERPRETATION: 20% of runners finished the race in 5.2 minutes or less. 80% of runners finished the race in 5.2 minutes or longer.
  • c. He is among the slowest cyclists (90% of cyclists were faster than him.) INTERPRETATION: 90% of cyclists had a finish time of 1 hour, 12 minutes or less.Only 10% of cyclists had a finish time of 1 hour, 12 minutes or longer

Exercise 2

  • a. For runners in a race, a higher speed means a faster run. Is it more desirable to have a speed with a high or a low percentile when running a race?
  • b. The 40th percentile of speeds in a particular race is 7.5 miles per hour. Write a sentence interpreting the 40th percentile in the context of the situation.

Solution

  • a. For runners in a race it is more desirable to have a high percentile for speed. A high percentile means a higher speed, which is faster.
  • b. INTERPRETATION: 40% of runners ran at speeds of 7.5 miles per hour or less (slower). 60% of runners ran at speeds of 7.5 miles per hour or more (faster).

Exercise 3

On an exam, would it be more desirable to earn a grade with a high or low percentile? Explain.

Solution

On an exam you would prefer a high percentile; higher percentiles correspond to higher grades on the exam.

**With contributions from Roberta Bloom

Calculating IQR, Quantiles and Percentiles in R

R has functions for calculating the interquartile range (IQR()) and different percentiles (quantile()).

Note:

There are several methods for calculating the IQR. The method R uses by default will not match the result of the real estate example demonstrated earlier; however, by using one of the alternative methods, it is possible to match the output. Both are demonstrated below. For more information, type ?quantile at the R prompt and read the section titled "Types".


# Enter the real estate data
real.estate = c(389950, 230500, 158000, 479000, 639000, 
    114950, 5500000, 387000, 659000, 529000, 575000, 
    488800, 1095000)
# R's default IQR type does not match.  We were
# expecting IQR = 340250
IQR(real.estate)
## [1] 252000
# The R help file mentions that 'type = 6' is the
# method used by SPSS and Minitab. We'll try that
# method. It works!
IQR(real.estate, type = 6)
## [1] 340250

# Quantiles. Default is 0, 25, 50, 75, and 100
# %ile
quantile(real.estate)
##      0%     25%     50%     75%    100% 
##  114950  387000  488800  639000 5500000
# What about by 10s instead?
quantile(real.estate, probs = seq(0, 1, 0.1))
##      0%     10%     20%     30%     40%     50% 
##  114950  172500  293100  388770  461190  488800 
##     60%     70%     80%     90%    100% 
##  538200  600600  651000 1007800 5500000

As can be seen, R is quite flexible at calculating percentiles. The key is to use the quantile() function in conjunction with specifying the probabilities that you're interested in. This is most easily done using seq(): the "sequence" function in R. Just be sure that your probabilities are all between "0" and "1"!

Glossary

Interquartile Range (IRQ):
The distance between the third quartile (Q3) and the first quartile (Q1). IQR = Q3 - Q1.
Outlier:
An observation that does not fit the rest of the data.
Percentile:
A number that divides ordered data into hundredths.

Example:

Let a data set contain 200 ordered observations starting with {2.3,2.7,2.8,2.9,2.9,3.0...}{2.3,2.7,2.8,2.9,2.9,3.0...} size 12{ lbrace 2 "." 3,2 "." 7,2 "." 8,2 "." 9,2 "." 9,3 "." 0 "." "." "." rbrace } {}. Then the first percentile is (2.7+2.8)2=2.75(2.7+2.8)2=2.75 size 12{ { { \( 2 "." 7+2 "." 8 \) } over {2} } =2 "." "75"} {}, because 1% of the data is to the left of this point on the number line and 99% of the data is on its right. The second percentile is (2.9+2.9)2=2.9(2.9+2.9)2=2.9 size 12{ { { \( 2 "." 9+2 "." 9 \) } over {2} } =2 "." 9} {}. Percentiles may or may not be part of the data. In this example, the first percentile is not in the data, but the second percentile is. The median of the data is the second quartile and the 50th percentile. The first and third quartiles are the 25th and the 75th percentiles, respectively.

Quartiles:
The numbers that separate the data into quarters. Quartiles may or may not be part of the data. The second quartile is the median of the data.

Collection Navigation

Content actions

Download module as:

Add:

Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks