Summary: This module provides an overview of Linear Regression and Correlation: The Regression Equation as a part of Collaborative Statistics collection (col10522) by Barbara Illowsky and Susan Dean.
Data rarely fits a straight line exactly. Usually, you must be satisfied with rough predictions. Typically, you have a set of data whose scatter plot appears to "fit" a straight line. This is called a Line of Best Fit or Least Squares Line.
If you know a person's pinky (smallest) finger length, do you think you could predict that
person's height? Collect data from your class (pinky finger length, in inches). The
independent variable,
For each set of data, plot the points on graph paper. Make your graph big enough and use a ruler. Then "by eye" draw a line that appears to "fit" the data. For your line, pick two convenient points and use them to find the slope of the line. Find the y-intercept of the line by extending your lines so they cross the y-axis. Using the slopes and the y-intercepts, write your equation of "best fit". Do you think everyone will have the same equation? Why or why not?
Using your equation, what is the predicted height for a pinky length of 2.5 inches?
A random sample of 11 statistics students produced the following data
where
|
The third exam score,
Consider the diagram shown. Each point of data is of the the form
The
![]() |
The term
For each data point, you can calculate,
Each
For the example about the third exam scores and the final exam scores for the 11
statistics students, there are 11 data points. Therefore, there are 11
This is called the Sum of Squared Errors (SSE).
Using calculus, you can make the SSE a minimum. When you make the SSE a minimum, you have determined the points that are on the line of best fit. It turns out that the line of best fit has the equation:
where
The slope
The graph of the line of best fit for the third exam/final exam example is shown below:
![]() |
Remember, the best fit line is called the least squares regression line (it is sometimes referred to as the LSL which is an acronym for least squares line). The best fit line for the third exam/final exam example has the equation:
The idea behind finding the best fit line is based on the assumption that the data are actually scattered about a straight line. Remember, it is always important to plot a scatter diagram first (which many calculators and computer programs can do) to see if it is worth calculating the line of best fit.
"This book was purchased from the authors by the Maxfield Foundation and provided to the community as an open textbook available freely online and in PDF format. Bound copies of the book can also […]"