The common measures of location are quartiles and percentiles (%iles). Quartiles are special percentiles. The first quartile,
Q1
Q1
is the same as the 25th percentile (25th %ile) and the third quartile,
Q3
Q3
, is the same as the 75th percentile (75th %ile). The median, MM, is called both the second quartile and the 50th percentile (50th %ile).
To calculate quartiles and percentiles, the data must be ordered from smallest to largest. Recall that quartiles divide ordered data into quarters. Percentiles divide ordered data into hundredths. To score in the 90th percentile of an exam does not mean, necessarily, that you received 90% on a test. It means that your score was higher than 90% of the people who took the test and lower than the scores of the remaining 10% of the people who took the test. Percentiles are useful for comparing values. For this reason, universities and colleges use percentiles extensively.
The interquartile range is a number that indicates the spread of the middle half or the middle 50% of the data. It is the difference between the third quartile (
Q3
Q3
) and the first quartile (
Q1
Q1
).
IQR
=
Q3
-
Q1
IQR=
Q3
-
Q1
(1)
The IQR can help to determine potential outliers. A value is suspected to be a potential outlier if it is more than (1.5)(IQR)(1.5)(IQR) below the first quartile or more than (1.5)(IQR)(1.5)(IQR) above the third quartile. Potential outliers always need further investigation.
For the following 13 real estate prices, calculate the IQR IQR and determine if any prices are outliers. Prices are in dollars. (Source: San Jose Mercury News)
389,950; 230,500; 158,000; 479,000; 639,000; 114,950; 5,500,000; 387,000; 659,000; 529,000; 575,000; 488,000; 1,095,000
Order the data from smallest to largest.
114,950; 158,000; 230,500; 387,000; 389,950; 479,000; 488,000; 529,000; 575,000; 639,000; 659,000; 1,095,000; 5,500,000
M
=
488,800
M=488,800
Q
1
=
230500
+
387000
2
=
308750
Q
1
=
230500
+
387000
2
=308750
Q
3
=
639000
+
659000
2
=
649000
Q
3
=
639000
+
659000
2
=649000
IQR
=
649000
-
308750
=
340250
IQR=649000-308750=340250
(
1.5
)
(
IQR
)
=
(
1.5
)
(
340250
)
=
510375
(1.5)(IQR)=(1.5)(340250)=510375
Q
1
-
(
1.5
)
(
IQR
)
=
308750
-
510375
=
-
201625
Q
1
-(1.5)(IQR)=308750-510375=-201625
Q
3
+
(
1.5
)
(
IQR
)
=
649000
+
510375
=
1159375
Q
3
+(1.5)(IQR)=649000+510375=1159375
No house price is less than -201625. However, 5,500,000 is more than 1,159,375. Therefore, 5,500,000 is a potential outlier.
For the two data sets in the test scores example, find the following:
- a. The interquartile range. Compare the two interquartile ranges.
- b. Any outliers in either set.
- c. The 30th percentile and the 80th percentile for each set. How much data falls below the
30th percentile? Above the 80th percentile?
For the IQRs, see the answer to the test scores example. The first data set has the larger IQR, so the scores between Q3Q3 and Q1Q1 (middle 50%) for the first data set are more spread out and not clustered about the median.
-
(
3
2
)
⋅
(
IQR
)
=
(
3
2
)
⋅
(
26.5
)
=
39.75
(
3
2
) ⋅ (IQR) = (
3
2
) ⋅ (26.5) = 39.75
-
Xmax
-
Q3
=
99
-
82.5
=
16.5
Xmax - Q3 = 99 - 82.5 = 16.5
-
Q1
-
Xmin
=
56
-
32
=
24
Q1 - Xmin = 56 - 32 = 24
(
3
2
)
⋅
(
IQR
)
=
39.75
(
3
2
) ⋅ (IQR) = 39.75
is larger than 16.5 and larger than 24, so the first set has no outliers.
-
(
3
2
)
⋅
(
IQR
)
=
(
3
2
)
⋅
(
11
)
=
16.5
(
3
2
) ⋅ (IQR) = (
3
2
) ⋅ (11) = 16.5
-
Xmax
-
Q3
=
98
-
89
=
9
Xmax - Q3 = 98 - 89 = 9
-
Q1
-
Xmin
=
78
-
25.5
=
52.5
Q1 - Xmin = 78 - 25.5 = 52.5
(
3
2
)
⋅
(
IQR
)
=
16.5
(
3
2
) ⋅ (IQR) = 16.5
is larger than 9 but smaller than 52.5, so for the second set 45 and 25.5 are outliers.
To find the percentiles, create a frequency, relative frequency, and cumulative relative frequency chart (see "Frequency" from the Sampling and Data Chapter). Get the percentiles from that chart.
-
30th %ile (between the 6th and 7th values)
=
(
56
+
59
)
2
=
57.5
30th %ile (between the 6th and 7th values) =
(
56
+
59
)
2
= 57.5
-
80th %ile (between the 16th and 17th values)
=
(
84
+
84.5
)
2
=
84.25
80th %ile (between the 16th and 17th values) =
(
84
+
84.5
)
2
= 84.25
-
30th %ile (7th value)
=
78
30th %ile (7th value) = 78
-
80th %ile (18th value)
=
90
80th %ile (18th value) = 90
30% of the data falls below the 30th %ile, and 20% falls above the 80th %ile.
Fifty statistics students were asked how much sleep they get per school night (rounded to the nearest hour). The results were (student data):
| AMOUNT OF SLEEPPER SCHOOL NIGHT (HOURS) |
FREQUENCY |
RELATIVE FREQUENCY |
CUMULATIVE RELATIVE FREQUENCY |
| 4 |
2 |
0.04 |
0.04 |
| 5 |
5 |
0.10 |
0.14 |
| 6 |
7 |
0.14 |
0.28 |
| 7 |
12 |
0.24 |
0.52 |
| 8 |
14 |
0.28 |
0.80 |
| 9 |
7 |
0.14 |
0.94 |
| 10 |
3 |
0.06 |
1.00 |
Find the 28th percentile: Notice the 0.28 in the "cumulative relative frequency" column. 28% of 50 data values = 14. There are 14 values less than the 28th %ile. They include the two 4s, the five 5s, and the seven 6s. The 28th %ile is between the last 6 and the first 7. The 28th %ile is 6.5.
Find the median: Look again at the "cumulative relative frequency " column and find 0.52. The median is the 50th %ile or the second quartile. 50% of 50 = 25. There are 25 values less than the median. They include the two 4s, the five 5s, the seven 6s, and eleven of the 7s. The median or 50th %ile is between the 25th (7) and 26th (7) values. The median is 7.
Find the third quartile: The third quartile is the same as the 75th percentile. You can "eyeball" this answer. If you look at the "cumulative relative frequency" column, you find 0.52 and 0.80. When you have all the 4s, 5s, 6s and 7s, you have 52% of the data. When you include all the 8s, you have 80% of the data. The 75th %ile, then, must be an 8 . Another way to look at the problem is to find 75% of 50 (= 37.5) and round up to 38. The third quartile,
Q3
Q3
, is the 38th value which is an 8. You can check this answer by counting the values. (There are 37 values below the third quartile and 12 values above.)
Using the table:
- Find the 80th percentile.
- Find the 90th percentile.
- Find the first quartile. What is another name for the first quartile?
- Construct a box plot of the data.
-
(
8
+
9
)
2
=
8.5
(
8
+
9
)
2
= 8.5
- 9
- 6
- First Quartile = 25th %ile
Collaborative Classroom Exercise: Your instructor or a member of the class will ask everyone in class how many sweaters they own. Answer the following questions.
- How many students were surveyed?
- What kind of sampling did you do?
- Find the mean and standard deviation.
- Find the mode.
- Construct 2 different histograms. For each, starting value = _____ ending value = ____.
- Find the median, first quartile, and third quartile.
- Construct a box plot.
- Construct a table of the data to find the following:
- The 10th percentile
- The 70th percentile
- The percent of students who own less than 4 sweaters
- Interquartile Range (IQR):
The distance between the third quartile and the first quartile.
- Outlier:
An observation that does not fit the rest of the data.
- Percentile:
A number that separates
11001100 size 12{ { {1} over {"100"} } } {}of the data.
Let a data set contain 200 ordered observations starting with
{2.3,2.7,2.8,2.9,2.9,3.0...}{2.3,2.7,2.8,2.9,2.9,3.0...} size 12{ lbrace 2 "." 3,2 "." 7,2 "." 8,2 "." 9,2 "." 9,3 "." 0 "." "." "." rbrace } {}. Then the first percentile is
(2.7+2.8)2=2.75(2.7+2.8)2=2.75 size 12{ { { \( 2 "." 7+2 "." 8 \) } over {2} } =2 "." "75"} {}, because 1% of the data is to the left of this point on the number line and 99% of the data is on its right. The second percentile is
(2.9+2.9)2=2.9(2.9+2.9)2=2.9 size 12{ { { \( 2 "." 9+2 "." 9 \) } over {2} } =2 "." 9} {}, separating 2% of the data. Percentiles may or may not be part of the data. (In this example, the first percentile is not in the data, but the second percentile is.). The median of the data is the second quartile and is the 50-th percentile at the same time. The first and third quartiles are 25th and 75th percentiles, respectively.
- Quartiles:
The numbers that separate the data into quarters. Quartiles may or may not be part of the data. The second quartile is the median of the data.
"This is the course textbook for Biology 502 at CSU Dominguez Hills"