When we take the expected
value, or average, of a random process, we measure
several important characteristics about how the process behaves
in general. This proves to be a very important observation.
However, suppose we have several random processes measuring
different aspects of a system. The relationship between these
different processes will also be an important observation. The
covariance and correlation are two important tools in finding
these relationships. Below we will go into more details as to
what these words mean and how these tools are helpful. Note
that much of the following discussions refer to just random
variables, but keep in mind that these variables can represent
random signals or random processes.
To begin with, when dealing with more than one random process,
it should be obvious that it would be nice to be able to have
a number that could quickly give us an idea of how similar
the processes are. To do this, we use the
covariance, which is analogous to the variance of
a single variable.
- Definition 1: Covariance
A measure of how much the deviations of two or more
variables or processes match.
For two processes,
XX and
Y Y, if they are
not closely related then the covariance
will be small, and if they are similar then the covariance
will be large. Let us clarify this statement by describing
what we mean by "related" and "similar." Two processes are
"closely related" if their distribution spreads are almost
equal and they are around the same, or a very slightly
different, mean.
Mathematically, covariance is often written as
σ
x
y
σ
x
y
and is defined as
covXY=
σ
x
y
=EX−X¯Y−Y¯
cov
X
Y
σ
x
y
X
X
Y
Y
(1)
This can also be reduced and rewritten in the following two
forms:
σ
x
y
=xy¯−x¯y¯
σ
x
y
x
y
x
y
(2)
σ
x
y
=∫-∞∞∫-∞∞X−X¯Y−Y¯fxydxdy
σ
x
y
y
x
X
X
Y
Y
f
x
y
(3)
-
If XX and
YY are independent and
uncorrelated or one of them has zero mean value, then
σ
x
y
=0
σ
x
y
0
-
If XX and
YY are orthogonal, then
σ
x
y
=-EXEY
σ
x
y
X
Y
-
The covariance is symmetric
covXY=covYX
cov
X
Y
cov
Y
X
-
Basic covariance identity
covX+YZ=covXZ+covYZ
cov
X
Y
Z
cov
X
Z
cov
Y
Z
-
Covariance of equal variables
covXX=VarX
cov
X
X
Var
X
For anyone who has any kind of statistical background, you
should be able to see that the idea of dependence/independence
among variables and signals plays an important role when
dealing with random processes. Because of this, the
correlation of two variables provides us with a
measure of how the two variables affect one another.
- Definition 2: Correlation
A measure of how much one random variable depends upon the
other.
This measure of association between the variables will provide
us with a clue as to how well the value of one variable can be
predicted from the value of the other. The correlation is
equal to the average of the product of two random variables
and is defined as
corXY=EXY=∫-∞∞∫-∞∞xyfxydxdy
cor
X
Y
X
Y
y
x
x
y
f
x
y
(4)
It is often useful to express the correlation of random
variables with a range of numbers, like a percentage. For a
given set of variables, we use the correlation
coefficient to give us the linear relationship
between our variables. The correlation coefficient of two
variables is defined in terms of their covariance and standard
deviations, denoted by
σ
x
σ
x
, as seen below
ρ=covXY
σ
x
σ
y
ρ
cov
X
Y
σ
x
σ
y
(5)
where we will always have
-1≤ρ≤1
-1
ρ
1
This provides us with a quick and easy way to view the
correlation between our variables. If there is no
relationship between the variables then the correlation
coefficient will be zero and if there is a perfect positive
match it will be one. If there is a perfect inverse
relationship, where one set of variables increases while the
other decreases, then the correlation coefficient will be
negative one. This type of correlation is often referred to
more specifically as the
Pearson's Correlation
Coefficient,or Pearson's Product Moment Correlation.
So far we have dealt with correlation simply as a number
relating the relationship between any two variables.
However, since our goal will be to relate random processes
to each other, which deals with signals as a function of
time, we will want to continue this study by looking at
correlation
functions.
Now let us take just a second to look at a simple example that
involves calculating the covariance and correlation of two
sets of random numbers. We are given the following data sets:
x=
31634
x
3
1
6
3
4
y=
15343
y
1
5
3
4
3
To begin with, for the covariance we will need to find the
expected value, or
mean, of xx and
yy.
x¯=153+1+6+3+4=3.4
x
1
5
3
1
6
3
4
3.4
y¯=151+5+3+4+3=3.2
y
1
5
1
5
3
4
3
3.2
xy¯=153+5+18+12+12=10
x
y
1
5
3
5
18
12
12
10
Next we will solve for the standard deviations of our two sets
using the formula below (for a review click here).
σ=EX−EX2
σ
X
X
2
σ
x
=150.16+5.76+6.76+0.16+0.36=1.625
σ
x
1
5
0.16
5.76
6.76
0.16
0.36
1.625
σ
y
=164.84+3.24+0.04+0.64+0.04=1.327
σ
y
1
6
4.84
3.24
0.04
0.64
0.04
1.327
Now we can finally calculate the covariance using one of the
two formulas found above. Since we calculated the three
means, we will use that formula since it will be much simpler.
σ
x
y
=10−3.4×3.2=-0.88
σ
x
y
10
3.4
3.2
-0.88
And for our last calculation, we will solve for the
correlation coefficient, ρρ.
ρ=-0.881.625×1.327=-0.408
ρ
-0.88
1.625
1.327
-0.408
The above example can be easily calculated using Matlab.
Below I have included the code to find all of the values
above.
x = [3 1 6 3 4];
y = [1 5 3 4 3];
mx = mean(x)
my = mean(y)
mxy = mean(x.*y)
% Standard Dev. from built-in Matlab Functions
std(x,1)
std(y,1)
% Standard Dev. from Equation Above (same result as std(?,1))
sqrt( 1/5 * sum((x-mx).^2))
sqrt( 1/5 * sum((y-my).^2))
cov(x,y,1)
corrcoef(x,y)