Skip to content Skip to navigation

OpenStax-CNX

You are here: Home » Content » Distributions

Navigation

Recently Viewed

This feature requires Javascript to be enabled.
 

Distributions

Module by: David Lane. E-mail the author

Summary: (Blank Abstract)

Distributions of Discrete Variables

I recently purchased a bag of Plain M&Ms. The M&M's were in six different colors. A quick count showed that there were 5555 M&M's: 1717 brown, 1818 red, 77 yellow, 77 green, 22 blue, and 44 orange. These counts are shown below in Table 1.

Table 1: Distributions of Colors
Color Frequency
Brown 17
Red 18
Yellow 7
Green 7
Blue 2
Orange 4

This table is called a frequency table and it describes the distribution of M&M color frequencies. Not surprisingly, this kind of table is called a frequency distribution. Often a frequency distribution is shown graphically as in Figure 1.

Figure 1: Distribution of 55 M&Ms
Figure 1 (mm_dist.gif)

The distribution shown in Figure 1 concerns just my one bag of M&M's. You might be wondering about the distribution of colors for all M&M's. The manufacturer of M&M's provides some information about this matter, but they do not tell us exactly how many M&M's of each color they have ever produced. Instead, they report proportions rather than frequencies. Figure 2 shows these proportions. Since every M&M is one of the six familiar colors, the six proportions shown in the figure add to one. We call Figure 2 a probability distribution because if you chose an M&M at random, the probability of getting, say, a brown M&M is equal to the proportion of M&M's that are brown (0.300.30).

Figure 2: Distribution of all M&Ms
Figure 2 (mm_dist2.gif)

Notice that the distributions in Figure 1 and Figure 2 are not identical. Figure 1 portrays the distribution in a sample of 5555 M&M's. Figure 2 shows the proportions for all M&M's. Chance factors involving the machines used by the manufacturer introduce random variation into the different bags produced. Some bags will have a distribution of colors that is close to Figure 2; others will be farther away.

Continuous Variables

The variable "color of M&M" used in this example is a discrete variable, and its distributions is also called discrete. Let us now extend the concept of a distribution to continuous variables.

The data shown in Table 2 are the times it took one of us (DL) to move the mouse over a small target in a series of 2020 trials. The times are sorted from fastest to slowest. The variable "time to respond" is a continuous variable. With time measured accurately (to many decimal places), no two response times would be expected to be the same. Measuring time in milliseconds (thousandths of a second) is often precise enough to approximate a continuous variable in Psychology. As you can see in Table 2, measuring DL's responses this way produced times no two of which were the same. As a result, a frequency distribution would be uninformative: it would consist of the 2020 times in the experiment, each with a frequency of 11.

The solution to this problem is to create a grouped frequency distribution. In a grouped frequency distribution, scores falling withing various ranges are tabulated. Table 3 shows a grouped frequency distribution for these 2020 times.

Figure 3
(a) (b)
Response Times (in milliseconds)
568 720
577 728
581 729
640 777
641 808
645 824
657 825
673 865
696 875
703 1007
Grouped frequency distribution
Range Frequency
500-600 3
600-700 6
700-800 5
800-900 5
900-1000 0
1000-1100 1

Grouped frequency distributions may be portrayed graphically. Figure 4 shows a graphical representation of the frequency distribution in Table 2. This kind of graph is called a histogram. Chapter 2 contains an entire section devoted to histograms.

Figure 4: A histogram of the grouped frequency distribution shown in Table 3. The labels on the XX-axis are the middle values of the range they represent.
Figure 4 (histo.gif)

Probability Densities

The histogram in Figure 4 portrays just DL's 2020 times in the one experiment he performed. To represent the probability associated with an arbitrary movement (which can take any positive amount of time), we must represent all these potential times at once. For this purpose, we plot the distribution for the continuous variable of time. Distributions for continous variables are called continuous distributions. They also carry the fancier name probability density. Some probability densities have particular importance in Statistics. A very important one is shaped like a bell, and called the normal distribution. Many naturally-occuring phenomena can be approximated surprisingly well by this distribution. It will serve to illustrate some features of all continous distributions.

An example of a normal distribution is shown in Figure 5. Do you see the "bell"? The normal distribution doesn't represent a real bell, however, since the left and right tips extend indefinitely (we can't draw them any further so they look like they've stopped in our diagram). The YY axis in the normal distribution represents the " density of probability." Intuitively, it shows the chance of obtaining values near corresponding points on the XX axis. In Figure 5, for example, the probability of an observation with value near 4040 is about half of the probability of an observation with value near 5050. Although this text does not discuss the concept of probability density in detail, you should keep the following ideas in mind about the curve that describes a continuous distribution (like the normal distribution). First, the area under the curve equals 1. Second, the probabiity of any exact value of XX is 0. Finally, the area under the curve and bounded between two given points on the XX axis is the probability that a number chosen at random will fall between the two points. Let us illustrate with DL's hand movements. First, the probability that his movement takes some amount of time is one! (We exclude the possibility of him never finishing his gesture.) Second, the probability that his movement takes exactly 598.956432342346576598.956432342346576 milliseconds is essentially zero. (We can make the probability as close as we like to zero by making the time measurement more and more precise.) Finally, suppose that the probability of DL's movement taking between 600600 and 700700 milliseconds is one tenth. Then the continous distribution for DL's possible times would have a shape that places 1010% of the area below the curve in the region bounded by 600600 and 700700 on the XX axis.

Figure 5: A Normal Distribution
Figure 5 (normal_example.gif)

Shapes of Distributions

Distributions have different shapes; they don't all look like the normal distribution in Figure 5. For example, the normal probability density is higher in the middle compared to its two tails. Other distributions need not have this feature. There is even variation among the distributions that we call "normal." For example, some normal distributions are more spread out than the one shown in Figure 5 (their tails begin to hit the XX axis further from the middle of the curve --for example, at 1010 and 9090 if drawn in place of Figure 2 ). Others are less spread out (their tails might approach the XX axis at 3030 and 7070). More information on the normal distribution can be found in a later chapter completely devoted to them.

The normal distribution shown in Figure 5 is symmetric; if you folded it in the middle, the two sides would match perfectly. Figure 6 shows the discrete distribution of scores on a psycholoogy test. This distribution is not symmetric: the tail in the positive direction extends further than the tail in the negative direction. A distribution with the longer tail extending in the positive direction is said to have a positive skew. It is also described as "skewed to the right."

Figure 6: A distribution with a positive skew
Figure 6 (image001.gif)

Figure 7 shows the salaries of major league baseball players in 1974 (in thousands of dollars). This distribution has an extreme positive skew.

Figure 7: A distribution with a very large positive skew. This histogram shows the salaries of major league baseball players.
Figure 7 (histo2.gif)

Although less common, some distributions have negative skew. Figure 8 shows the scores on a 2020-point problem on a statistics exam. Since the tail of the disribution extends to the left, this distribution is skewed to the left.

Figure 8: A distribution with negative skew. This histogram shows the frequencies of various scores on a 20-point question on a statistics test.
Figure 8 (midterm11.gif)

The distributions shown so far all have one distinct high point or peak. The distribution in Figure 9 has two distinct peaks. A distribution with two peaks is called a bimodal distribution.

Figure 9: Frequencies of times between eruptions of the old faithful geyser. Notice the two distinct peaks: one at 1.85 and the other at 3.85.
Figure 9 (faithful.gif)

Distributions also differ from each other in terms of how large or "fat" their tails are. Figure 10 shows two distributions that differ in this respect. The upper distribution has relatively more scores in its tails; its shape is called leptokurtic. The lower distribution has relatively fewer scores in its tails; its shape is called platykurtic.

Figure 10: Distributions differing in kurtosis. The top distribution has long tails. It is called "leptokurtic." The bottom distribution has short tails. It is called "platykurtic."
Figure 10 (kurtosis.gif)

Content actions

Download module as:

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks