This is a demonstration of a very complex issue. Experts in the field disagree on how to interpret differences on an ordinal scale, so do not be discouraged if it takes you a while to catch on. In this demonstration you will explore the relationship between interval and ordinal scales. The demonstration is based on two brands of baked goods. The data on the left side labeled "interval scores" shows the amount of sugar in each of 12 products. There are two columns of data. The column labeled "Brand 1" contains the sugar content of each of 12 brand-one products. The second column ("Brand 2") shows the sugar content of the brand-two products. The amount of sugar is measured on an interval scale.
A rater tastes each of the products and rates them on a 5-point "sweetness" scale. Rating scales are typically ordinal rather than interval.
The first objective of this demonstration is to let you see concretely what it means for a scale to be an ordinal scale. The scale at the bottom shows the "mapping" of sugar content onto the ratings. Sugar content between 37 and 43 is rated as 1, between 43 and 49, 2, etc. Therefore, the difference between a rating of 1 and a rating of 2 represents, on average a "sugar difference" of 6. A difference between a rating of 2 and a rating of 3 also represents, on average a "sugar difference" of 6. Therefore, the orginal ratings displayed are on an interval scale. They are rounded off, but they are on an interval scale. It is likely that real ratings would not be on an interval scale. You can change the cutoff points between ratings by moving the vertical lines with the mouse. As you change these cutoffs, the ratings change automatically. For example, you might see what the ratings would look like if people did not consider something very sweet (rating of 5) unless it was very very sweet.
A second objective of this demonstration is to let you investigate whether an investigator can be misled by computing means of ordinal data. The mean amount of sugar in Dataset 1 is 50 for the first brand and 55 for the second brand. The obvious conclusion is that, on average, the second brand is sweeter than the first. However, pretend that you only had the ratings to go by and where not aware of the actual amounts of sugar. Would you reach the correct decision if you compared the mean ratings of the two brands. Change the cutoffs for mapping the interval sugar scale onto the ordinal rating scale. Do any mappings lead to incorrect interpretations? Try this with Data Set 1 and with Data Set 2. Try to find a situation where the mean sweetness rating is higher for Brand 2 even though the mean amount of sugar is greater for Brand 1. If you find such a situation, then you have found an instance in which using the means of ordinal data lead to incorrect conclusions. It is possible to find this situation, so look hard.
Keep in mind that in realistic situations, you only know the ratings and not the "true" interva scale that underlies them. If you knew the interval scale, you would use it.
The first product in brand one has 38 units of sugar. Take a look at the scale at the bottom of the window. It shows that any value between 37 and 43 would be rated 1. That's why the rating for the first product is a 1. Now look at the 5th product from Brand 1. It has 45 units of sugar. Since this is more than 43, it is rated 2. Examine other products and make sure you understand how the sugar contents combined with the scale produce the ratings.
This demonstration allows you to change the way sugar units are transformed into ratings. Lets suppose that out rater would not give a brand a sweetness rating of 1 unless it was truly not sweet at all. For example, the rater might only give ratings of 1 if the sugar content was less than 40. To see what would happen, move the vertical line above 43 to the left. Notice that as you move it, its label reflects its current value. Keep moving it to the left until it equals 39. Now look at the ratings of the products. WIth our original "mapping," the first 4 Brand-one products were rated 1. Now the only the first product has so little sugar to get a rating of 1. No lets suppose that our rater was pretty generous in awarding 3's. Lets say that all a brand needed to get a 3 was a 43. So move the divider between 2 and 3 from 49 to 43. And, just for the sake of the example, let's assume that the rater required a product to be very sweet to get a rating of 4. Specifically, lets say that it needed a sugar content of 60. Move the divider between 3 and 4 from 56 to 60. Notice how the ratings are automatically updated. Finally, lets assume that our rater does not require much more sweetness in order to give a rating of 5. So lets leave the cutoff between 4 and 5 at 62.
Our rater is generating very "non-interval" ratings. A difference between a 4 and a 5 could represent, at most, 2 units. In contrast, a difference between a 2 and a 3 could represent as much as 17 units.
Now consider how the mapping of the sugar content onto the sweetness rating affects our interpretation of the difference between Brand 1 and Brand 2. The mean difference in sugar content is 55. With the original mapping, the mean difference in ratings is 0.69. Lets see what happens when we change the mappings. For example, change the boundary between ratings of 1 and 2 from 43 to 39. The difference in ratings is now 3.31-2.77 = 0.54. You can see that the mappings do make a difference. But qualitatively, whether you were looking at the sugar content or the ratings, you would conclude that Brand 2 is somewhat sweeter than Brand 1. Experiment by changing the various boundaries. You will find that conclusions based on the mean ratings are valid.
Now choose Data Set 2. Just as with Data Set 1, the mean difference in sweetness is 5.0. The data are quite, different, though. Brand 2 has the three lowest sweetness levels as well as the three highest. Brand 1 is in the middle. The initial difference in ratings is 3.08-2.62 = 0.46. If you change the boundaries you get slightly different results, but you will probably find that the mean difference on the ratings is not misleading. However, there are ways of getting a misleading result. Notice that there are three 43's for Brand 1 and that these are associated with sweetness ratings of 2. Move the boundary between ratings of 1 and 2 to 43. Then you will see that the ratings for these products changes from 2 to 3. Since the value of 43 is rounded off, you may have to move the boundary slightly to the left after hitting 43 for the ratings to change. With all the sugar contents of 43 receiving a rating of 43, Brand 1 now has a higher mean sweetness rating than Brand 2 even though the mean sugar content for Brand 2 is higher. This effect can be made even larger by moving the boundary between 4 and 5 to 75. This will lower the ratings for the Brand 2 products with 71 and 72 from 5 to 4 thus lowering the Brand 2 mean without affecting the Brand 1 mean. The mean for Brand 1 will be 3.23 compared to a mean for Brand 2 of 2.92. Again the important point is that even though Brand 1 has a lower mean sugar content than Brand 2, it has a higher mean rated sweetness score.
When an interval scale such as sugar content is mapped onto a rating scale such as judgment of sweetness, the resulting rating scale is probably not an interval scale. For most real-world situations, the means of ordinal-level rating scales allow valid conclusions about the direction of the means on the interval scale. However, it is theoretically possible for means on the ordinal scale to be in the opposite direction from means on the interval scale. Experts disagree on the importance of this in real-world data analysis. We believe that the chances of misinterpretation with real data are extremely low, and that it is only with contrived artificial data and mappings of the interval to the ordinal scale that these problems occur.