Summary: This module briefly introduces linear regression. It also includes an example and an exercise.
Linear regression is a method of estimating the conditional expected value of one variable, y, given the values of some other variable, x. The variable of interest, y, is called the dependent variable. The other variable, x, is called the independent variable. [Wikipedia2006L] The term linear is used because the relation of the dependent to the independent variables is assumed to be a linear function with two parameters. If this is not the case, then non-linear regression must be performed.
A modeller may relate the weights of individuals to their heights using a linear regression model.
Before attempting to fit a linear model to observed data, a modeller should first determine whether or not there is a relationship between the variables of interest. A scatterplot can be a helpful tool in determining the strength of the relationship between two variables. A valuable numerical measure of association between two variables is the correlation coefficient, which is a value between -1 and 1 indicating the strength of the association of the observed data for the two variables.
How can a linear regression be modelled?
A linear regression line has an equation of the form Y = a + bX, where X is the independent variable and Y is the dependent variable. The slope of the line is b, and a is the intercept: the value of Y when X = 0.
References: