Skip to content Skip to navigation Skip to collection information

OpenStax_CNX

You are here: Home » Content » Collaborative Statistics Using R » Box Plots

Navigation

Recently Viewed

This feature requires Javascript to be enabled.
 

Box Plots

Module by: Ananda Mahto. E-mail the author

Based on: Descriptive Statistics: Box Plot by Susan Dean, Barbara Illowsky, Ph.D.

Summary: Boxplots are useful for getting a sense of the distribution of a variable.

Box plots or box-whisker plots give a good graphical image of the concentration of the data. They also show how far from most of the data the extreme values are. The box plot is constructed from five values: the smallest value, the first quartile, the median, the third quartile, and the largest value. The median, the first quartile, and the third quartile will be discussed here, and then again in the section on measuring data in this chapter. We use these values to compare how close other data values are to them.

The median, a number, is a way of measuring the "center" of the data. You can think of the median as the "middle value," although it does not actually have to be one of the observed values. It is a number that separates ordered data into halves. Half the values are the same number or smaller than the median and half the values are the same number or larger. For example, consider the following data:

1, 11.5, 6, 7.2, 4, 8, 9, 10, 6.8, 8.3, 2, 2, 10, 1

Ordered from smallest to largest:

1, 1, 2, 2, 4, 6, 6.8, 7.2, 8, 8.3, 9, 10, 10, 11.5

The median is between the 7th value, 6.8, and the 8th value 7.2. To find the median, add the two values together and divide by 2.

6.8+7.22=7 6.8 7.2 2 7
(1)

The median is 7. Half of the values are smaller than 7 and half of the values are larger than 7.

Quartiles are numbers that separate the data into quarters. Quartiles may or may not be part of the data. To find the quartiles, first find the median or second quartile. The first quartile is the middle value of the lower half of the data and the third quartile is the middle value of the upper half of the data. To get the idea, consider the same data set shown above.

The median or second quartile is 7. The lower half of the data is 1, 1, 2, 2, 4, 6, 6.8. The middle value of the lower half is 2.

1, 1, 2, 2, 4, 6, 6.8

The number 2, which is part of the data, is the first quartile. One-fourth of the values are the same or less than 2 and three-fourths of the values are more than 2.

The upper half of the data is 7.2, 8, 8.3, 9, 10, 10, 11.5. The middle value of the upper half is 9.

7.2, 8, 8.3, 9, 10, 10, 11.5

The number 9, which is part of the data, is the third quartile. Three-fourths of the values are less than 9 and one-fourth of the values are more than 9.

To construct a box plot, use a horizontal number line and a rectangular box. The smallest and largest data values label the endpoints of the axis. The first quartile marks one end of the box and the third quartile marks the other end of the box. The middle fifty percent of the data fall inside the box. The "whiskers" extend from the ends of the box to the smallest and largest data values. The box plot gives a good quick picture of the data.

Using the data from the start of this section, recall that the first quartile is 2, the median is 7, and the third quartile is 9. The smallest value is 1 and the largest value is 11.5. The box plot is constructed as follows:

Figure 1
Horizontal boxplot's first whisker extends from the smallest value, 1, to the first quartile, 2, the box begins at the first quartile and extends to the third quartile, 9, a vertical dashed line is drawn at the median, 7, and the second whisker extends from the third quartile to the largest value of 11.5.

The two whiskers extend from the first quartile to the smallest value and from the third quartile to the largest value. The median is shown with a dashed line.

Making a Box Plot in R

As with histograms, box plots are very easy to make in R. They are made using the boxplot() function. By default, R orients the box plot vertically; to change this, simply add the argument horizontal = TRUE to the function. For most simple plots, no additional arguments are required. You may also want to change the size of the R graph window to view the box plot in a more aesthetic scale.

Here is how you would create the box plot in R using the same data from the start of this section.



a = c(1, 11.5, 6, 7.2, 4, 8, 9, 
      10, 6.8, 8.3, 2, 2, 10, 1)
boxplot(a, horizontal = TRUE)

Figure 2
Horizontal boxplot

Example 1

The following data are the heights of 40 students in a statistics class.

59, 60, 61, 62, 62, 63, 63, 64, 64, 64, 65, 65, 65, 65, 65, 65, 65, 65, 65, 66, 66, 67, 67, 68, 68, 69, 70, 70, 70, 70, 70, 71, 71, 72, 72, 73, 74, 74, 75, 77

Construct a box plot with the following properties:

  • Smallest value = 59
  • Largest value = 77
  • Q1: First quartile = 64.5
  • Q2: Second quartile or median= 66
  • Q3: Third quartile = 70

Figure 3
Horizontal boxplot with first whisker extending from smallest value, 59, to Q1, 64.5, box beginning from Q1 to Q3, 70, median dashed line at Q2, 66, and second whisker extending from Q3 to largest value, 77.

  • a. Each quarter has 25% of the data.
  • b. The spreads of the four quarters are 64.5 - 59 = 5.5 (first quarter), 66 - 64.5 = 1.5 (second quarter), 70 - 66 = 4 (3rd quarter), and 77 - 70 = 7 (fourth quarter). So, the second quarter has the smallest spread and the fourth quarter has the largest spread.
  • c. Interquartile Range: IQR = Q3 - Q1 = 70 - 64.5 = 5.5IQR =Q3 -Q1 =70- 64.5= 5.5.
  • d. The interval 59 through 65 has more than 25% of the data so it has more data in it than the interval 66 through 70 which has 25% of the data.

For some sets of data, some of the largest value, smallest value, first quartile, median, and third quartile may be the same. For instance, you might have a data set in which the median and the third quartile are the same. In this case, the diagram would not have a dotted line inside the box displaying the median. The right side of the box would display both the third quartile and the median. For example, if the smallest value and the first quartile were both 1, the median and the third quartile were both 5, and the largest value was 7, the box plot would look as follows:

Figure 4
Horizontal boxplot box begins at the smallest value and Q1, 1, until the Q3 and median, 5, no median line is designated, and has its lone whisker extending from the Q3 to the largest value, 7.

Example 2

Test scores for a college statistics class held during the day are:

99, 56, 78, 55.5, 32, 90, 80, 81, 56, 59, 45, 77, 84.5, 84, 70, 72, 68, 32, 79, 90

Test scores for a college statistics class held during the evening are:

98, 78, 68, 83, 81, 89, 88, 76, 65, 45, 98, 90, 80, 84.5, 85, 79, 78, 98, 90, 79, 81, 25.5

Problem 1

  • What are the smallest and largest data values for each data set?
  • What is the median, the first quartile, and the third quartile for each data set?
  • Create a boxplot for each set of data.
  • Which boxplot has the widest spread for the middle 50% of the data (the data between the first and third quartiles)? What does this mean for that set of data in comparison to the other set of data?
  • For each data set, what percent of the data is between the smallest value and the first quartile? (Answer: 25%) the first quartile and the median? (Answer: 25%) the median and the third quartile? the third quartile and the largest value? What percent of the data is between the first quartile and the largest value? (Answer: 75%)

Solution

First Data Set
  • Xmin = 32Xmin = 32
  • Q1 = 56Q1 = 56
  • M = 74.5M = 74.5
  • Q3 = 82.5Q3 = 82.5
  • Xmax = 99Xmax = 99
Second Data Set
  • Xmin = 25.5Xmin = 25.5
  • Q1 = 78Q1 = 78
  • M = 81M = 81
  • Q3 = 89Q3 = 89
  • Xmax = 98Xmax = 98

Figure 5
Two box plots over a number line from 0 to 100.  The top plot shows a whisker from 32 to 56, a solid line at 56, a dashed line at 74.5, a solid line at 82.5, and a whisker from 82.5 to 99.  The lower plot shows a whisker from 25.5 to 78, solid line at 78, dashed line at 81, solid line at 89, and a whisker from 89 to 98.

The first data set (the top box plot) has the widest spread for the middle 50% of the data. IQR = Q3 - Q1IQR = Q3 - Q1 is 82.5 - 56 = 26.582.5 - 56 = 26.5 for the first data set and 89 - 78 = 1189 - 78 = 11 for the second data set. So, the first set of data has its middle 50% of scores more spread out.

25% of the data is between MM and Q3Q3 and 25% is between Q3Q3 and XmaxXmax.

Glossary

Median:
A number that separates ordered data into halves: half the values are the same number or smaller than the median and half the values are the same number or larger than the median. The median may or may not be part of the data.
Quartiles:
The numbers that separate the data into quarters. Quartiles may or may not be part of the data. The second quartile is the median of the data.

Collection Navigation

Content actions

Download:

Collection as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add:

Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks