Summary: This module provides an overview of Linear Regression and Correlation: Outliers as a part of Collaborative Statistics collection (col10522) by Barbara Illowsky and Susan Dean.
Note: You are viewing an old version of this document. The latest version is available here.
In some data sets, there are values (points) called outliers. Outliers are points that are far from the least squares line. They have large "errors." Outliers need to be examined closely. Sometimes, for some reason or another, they should not be included in the analysis of the data. It is possible that an outlier is a result of erroneous data. Other times, an outlier may hold valuable information about the population under study. The key is to carefully examine what causes a data point to be an outlier.
In the third exam/final exam example, you can determine if there is an outlier or not. If there is one, as an exercise, delete it and fit the remaining data to a new line. For this example, the new line ought to fit the remaining data better. This means the SSE should be smaller and the correlation coefficient ought to be closer to 1 or -1.
Computers and many calculators can determine outliers from the data. However, as an exercise, we will go through the steps that are needed to calculate an outlier. In the table below, the first two columns are the third exam and the final exam data. The third column shows the y-hat values calculated from the line of best fit.
| 65 | 175 | 140 |
|---|---|---|
| 67 | 133 | 150 |
| 71 | 185 | 169 |
| 71 | 163 | 169 |
| 66 | 126 | 145 |
| 75 | 198 | 189 |
| 67 | 153 | 150 |
| 70 | 163 | 164 |
| 71 | 159 | 169 |
| 69 | 151 | 160 |
| 69 | 159 | 160 |
A Residual is the
Calculate the absolute value of each residual.
Calculate each
| 65 | 175 | 140 | |
|---|---|---|---|
| 67 | 133 | 150 | |
| 71 | 185 | 169 | |
| 71 | 163 | 169 | |
| 66 | 126 | 145 | |
| 75 | 198 | 189 | |
| 67 | 153 | 150 | |
| 70 | 163 | 164 | |
| 71 | 159 | 169 | |
| 69 | 151 | 160 | |
| 69 | 159 | 160 |
Square each
•
Then, add (sum) all the
Next, calculate
For the third exam/final exam problem,
Next, multiply
For the example, if any of the
Mathematically, we say that if
For the third exam/final exam problem, all the
The point which corresponds to
If you compare
Using the new line of best fit (calculated with 10 points), what would a student who receives a 73 on the third exam expect to receive on the final exam?
184.28
(From The Consumer Price Indexes Web site) The Consumer Price Index
(CPI) measures the average change over time in the prices paid by urban consumers for
consumer goods and services. The CPI affects nearly all Americans because of the many ways
it is used. One of its biggest uses is as a measure of inflation. By providing information about
price changes in the Nation's economy to government, business, and labor, the CPI helps them
to make economic decisions. The President, Congress, and the Federal Reserve Board use the
CPI's trends to formulate monetary and fiscal policies. In the following table,
| 1915 | 10.1 |
| 1926 | 17.7 |
| 1935 | 13.7 |
| 1940 | 14.7 |
| 1947 | 24.1 |
| 1952 | 26.5 |
| 1964 | 31.0 |
| 1969 | 36.7 |
| 1975 | 49.3 |
| 1979 | 72.6 |
| 1980 | 82.4 |
| 1986 | 109.6 |
| 1991 | 130.7 |
| 1999 | 166.6 |
![]() |