Besides looking at the scatter plot and seeing that a line seems reasonable, how can you
tell if the line is a good predictor? Use the correlation coefficient as another indicator
(besides the scatterplot) of the strength of the relationship between xx and yy. The
correlation coefficient, rr, is defined as:
r
=
n
⋅
Σ
x
⋅
y
-
(
Σ
x
)
⋅
(
Σ
y
)
[
n
⋅
Σ
x
2
-
(
Σ
x
)
2
]
⋅
[
n
⋅
Σ
y
2
-
(
Σ
y
)
2
]
r=
n
⋅
Σ
x
⋅
y
-
(
Σ
x
)
⋅
(
Σ
y
)
[
n
⋅
Σ
x
2
-
(
Σ
x
)
2
]
⋅
[
n
⋅
Σ
y
2
-
(
Σ
y
)
2
]
where:
- -1≤r≤1-1≤r≤1
-
nn = the number of data points
If you suspect a linear relationship between xx and yy, then rr can measure how strong it is.
If r=1r=1, there is perfect positive correlation. If r=-1r=-1, there is perfect negative
correlation. In both these cases, the original data points lie on a straight line. Of course,
in the real world, this will not generally happen.
The formula for rr looks formidable. However, many calculators and any regression and
correlation computer program can calculate rr. The sign of rr is the same as the slope, bb,
of the best fit line.
- Coefficient of Correlation:
A measure developed by Karl Pearson (early 1900s) that gives the strength of association between the independent variable and the dependent variable. The formula is:
r
=
n
∑
XY
−
(
∑
X
)
(
∑
Y
)
[
n
∑
X
2
−
(
∑
X
)
2
]
[
n
∑
Y
2
−
(
∑
Y
)
2
]
,
r
=
n
∑
XY
−
(
∑
X
)
(
∑
Y
)
[
n
∑
X
2
−
(
∑
X
)
2
]
[
n
∑
Y
2
−
(
∑
Y
)
2
]
,
size 12{r= { {n Sum { ital "XY"} - \( Sum {X \) \( Sum {Y \) } } } over { sqrt { \[ n Sum {X rSup { size 8{2} } - \( Sum {X \) rSup { size 8{2} } \] \[ n Sum {Y rSup { size 8{2} } - \( Sum {Y \) rSup { size 8{2} } \] } } } } } } } ,} {}
(1)
where n is the number of data points.
The coefficient cannot be more then 1 and less then -1. The closer the coefficient is to
±1±1 size 12{ +- 1} {}, the stronger the evidence of a significant linear relationship between
XX size 12{X} {} and
YY size 12{Y} {}.
"Part of the Books featured on Community College Open Textbook Project"