The Kalman filter is an important generalization
of the Wiener filter. Unlike Wiener filters, which are designed
under the assumption that the signal and noise are
stationary, the Kalman filter has the ability
to adapt itself to non-stationary
environments.
The Kalman filter can be viewed as a
sequential minimum MSE estimator of a
signal in additive noise. If the signal and noise are jointly
Gaussian, the then Kalman filter is optimal in a minimum MSE
sense (minimizes expected quadratic loss).
If the signal and/or noise are non-Gaussian, then
the Kalman filter is the best linear estimator (linear estimator that
minimizes MSE among all possible linear estimators).
Recall the simple DC signal estimation problem.
x
n
=A+
w
n
,
n=0…N−1
n
n
0
…
N
1
x
n
A
w
n
(1)
Where
AA is the unknown DC level and
w
n
w
n
is the white Gaussian
noise.
AA could represent the
voltage of a DC power supply. We know how to find several good
estimators of
AA given the
measurements
x
0
…
x
N
-
1
x
0
…
x
N
-
1
.
In practical situations this model may be too
simplistic. the load on the power supply may charge over time
and there will be other variations due to temperature and
component aging.
To account for these variations we can employ a
more accurate measurement model:
x
n
=
A
n
+
w
n
,
n=0…N−1
n
n
0
…
N
1
x
n
A
n
w
n
(2)
where the voltage
A
n
A
n
is the
true voltage at time
nn.
Now the estimation problem is significantly more
complicated since we must estimate
A
0
…
A
N
-
1
A
0
…
A
N
-
1
. Suppose that the true voltage
A
n
A
n
does not vary too rapidly over time. Then successive
samples of
A
n
A
n
will not be too different, suggesting that the voltage
signal displays a high degree of correlation.
This reasoning suggests that it may be reasonable
to regard the sequence
A
0
…
A
N
-
1
A
0
…
A
N
-
1
, as a realization of a correlated (not white) random
process. Adopting a random process model for
A
n
A
n
allows us to pursue a Bayesian approach to the
estimation problem (Figure 1).
Using the model in
Equation 2, it is easy to verify
that the maximum likelihood and MVUB esitmators are given by
Our estimate is simply the noisy measurements! No averaging
takes place, so there is no noise reduction.
Let's look at the example again, Figure 2.
The voltage
A
n
A
n
is varying about an average value of 10V. Assume this
average value is known and write
A
n
=10+
y
n
A
n
10
y
n
(4)
Where
y
n
y
n
is a zero-mean random process. Now a simple model for
y
n
y
n
which allows us to specify the correlation between samples is
the
first-order Gauss-Markov prcoess model:
y
n
=a
y
n
-
1
+
u
n
,
n=12…
n
n
1
2
…
y
n
a
y
n
-
1
u
n
(5)
Where
u
n
∼𝒩0
σ
u
2
u
n
0
σ
u
2
iid (white Gaussian noise process). To initialize the
process we take
y
0
y
0
to be the realization of a Gaussian random variable:
y
0
∼𝒩0
σ
y
2
y
0
0
σ
y
2
.
u
n
u
n
is called the
driving or
excitation noise. The model in
Equation 5 is called the
dynamical or
state model. The current output
y
n
y
n
depends only on the
state of the
system at the previous time, or
y
n
-
1
y
n
-
1
, and the current input
u
n
u
n
(
Figure 3).
y
1
=a
y
0
+
u
0
y
1
a
y
0
u
0
y
2
=a
y
1
+
u
1
=a(a
y
0
+
u
0
)+
u
1
=a2
y
0
+a
u
1
+
u
2
y
2
a
y
1
u
1
a
a
y
0
u
0
u
1
a
2
y
0
a
u
1
u
2
(6)
⋮
⋮
y
n
=an+1
y
0
+∑
k
=1nak
u
n
-
k
y
n
a
n
1
y
0
k
1
n
a
k
u
n
-
k
E
y
n
=an+1E
y
0
+∑
k
=1nakE
u
n
-
k
=0
y
n
a
n
1
y
0
k
1
n
a
k
u
n
-
k
0
(7)
Correlation:
E
y
m
y
n
=E(am+1
y
0
+∑
k
=1mak
u
m
-
k
)(an+1
y
0
+∑
l
=1nal
u
n
-
l
)=Eam+n+2
y
0
2+E∑
k
=1m∑
l
=1nak+l
u
m
-
k
u
n
-
l
y
m
y
n
a
m
1
y
0
k
1
m
a
k
u
m
-
k
a
n
1
y
0
l
1
n
a
l
u
n
-
l
a
m
n
2
y
0
2
k
1
m
l
1
n
a
k
l
u
m
-
k
u
n
-
l
(8)
E
u
m
-
k
u
n
-
l
={
σ
n
2 if m−k=n−l0 otherwise
u
m
-
k
u
n
-
l
σ
n
2
m
k
n
l
0
(9)
If
m>n
m
n
, then
E
y
m
y
n
=am+n+2
σ
y
2+am−n
σ
u
2∑
k
=1na2k
y
m
y
n
a
m
n
2
σ
y
2
a
m
n
σ
u
2
k
1
n
a
2
k
(10)
If
|a|>1
a
1
, then it's obvious that the process diverges (
variance→∞
variance
). This is equivalent to having a pole outside the unit
circle shown in
Figure 4.
So, let's assume
|a|<1
a
1
and hence a stable system. Thus as
mm and
nn get large
am+n+2
σ
y
2→0
a
m
n
2
σ
y
2
0
Now let
m−n=τ
m
n
τ
. Then for
mm and
nn large we have
E
y
m
y
n
=aτ
σ
u
2∑
k
=1na2k=aτ+2
σ
u
21−a2
y
m
y
n
a
τ
σ
u
2
k
1
n
a
2
k
a
τ
2
σ
u
2
1
a
2
(11)
This shows us how correlated the process is:
|a|→1⇒heavily correlated (or anticorrelated)
a
1
heavily correlated (or anticorrelated)
|a|→0⇒weakly correlated
a
0
weakly correlated
How can we use this model to help us in our estimation problem?
Let's look at a more general formulation of the
problem at hand. Suppose that we have a vector-valued dynamical
equation
y
n+1
=A
y
n
+b
u
n
y
n
1
A
y
n
b
u
n
(12)
Where
y
n
y
n
is
p×1
p
1
dimensional,
A
A is
p×p
p
p
,
and
b
b is
p×1
p
1
.
The initial
state vector is
Y
0
∼𝒩0
R
0
Y
0
0
R
0
, where
R
0
R
0
is the covariance matrix and
u
n
∼𝒩0
σ
u
2
u
n
0
σ
u
2
iid (white Gaussian
excitation noise).
This reduces to the case we just looked at when
p=1
p
1
. This model could represent a
p
th
p
th
order Gauss-Markov process:
y
n
-
1
=
a
1
y
n
+
a
2
y
n
-
1
+…+
a
p
y
n
-
p
+
1
+
u
n
y
n
-
1
a
1
y
n
a
2
y
n
-
1
…
a
p
y
n
-
p
+
1
u
n
(13)
Define
y
n
=
y
n
-
p
+
1
y
n
-
p
+
2
⋮
y
n
-
1
y
n
y
n
y
n
-
p
+
1
y
n
-
p
+
2
⋮
y
n
-
1
y
n
(14)
Then,
y
n+1
=A
y
n
+b
u
n
=(
010……0
0010…0
⋮⋮⋱⋱⋱⋮
⋮⋮⋱⋱⋱0
00……01
a
1
a
2
……
a
p
-
1
a
p
)
y
n
-
p
+
1
y
n
-
p
+
2
⋮⋮
y
n
-
1
y
n
+0⋮⋮⋮01+
u
n
y
n
1
A
y
n
b
u
n
0
1
0
…
…
0
0
0
1
0
…
0
⋮
⋮
⋱
⋱
⋱
⋮
⋮
⋮
⋱
⋱
⋱
0
0
0
…
…
0
1
a
1
a
2
…
…
a
p
-
1
a
p
y
n
-
p
+
1
y
n
-
p
+
2
⋮
⋮
y
n
-
1
y
n
0
⋮
⋮
⋮
0
1
u
n
(15)
Here
A
A is the
state transition matrix. Since
y
n
y
n
is a linear combination of Gaussian vectors:
y
n
=A2
y
0
+∑
k
=1nAk−1b
u
n
-
k
y
n
A
2
y
0
k
1
n
A
k
1
b
u
n
-
k
(16)
We know that
y
n
y
n
is also Gaussian distributed with mean and covariance
R
n
=E
y
n
y
n
T
R
n
y
n
y
n
,
Y
n
∼𝒩
R
n
Y
n
R
n
. The covariance can be recursively computed from the
basic state equation:
R
n+1
=A
R
n
AT+
σ
u
2bbT
R
n
1
A
R
n
A
σ
u
2
b
b
(17)
Assume that measurements of the state are available:
x
n
=CT
y
n
+
w
n
x
n
C
y
n
w
n
(18)
Where
w
n
∼𝒩0
σ
w
2
w
n
0
σ
w
2
iid independant of
u
n
u
n
(white Gaussian
observation noise).
For example, if
C=0…01T
C
0
…
0
1
, then
x
n
=
y
n
+
w
n
x
n
y
n
w
n
(19)
Where
x
n
x
n
is the observation,
y
n
y
n
is the signal, and
w
n
w
n
is the noise. Since our model for the signal is
Gaussian as well as the observation noise, it follows that
x
n
∼𝒩0
σ
n
2
x
n
0
σ
n
2
,
where
σ
n
2=CT
R
n
C+
σ
w
2
σ
n
2
C
R
n
C
σ
w
2
(
Figure 5).
Kalman first posed the problem of estimating the
state of
y
n
y
n
from the sequence of measurements
x
n
=
x
0
⋮
x
n
x
n
x
0
⋮
x
n
To derive the Kalman filter we will call upon the Gauss-Markov
Theorem.
First note that the conditional distribution of
y
n
y
n
given
x
n
x
n
is Gaussian:
y
n
|
x
n
∼𝒩y^
n
|
n
P
n
|
n
y
n
|
x
n
y
n
|
n
P
n
|
n
Where
y^
n
|
n
y
n
|
n
is the conditional mean and
P
n
|
n
P
n
|
n
is the covariance.
We know that this is the
form of the conditional distribution because
y
n
y
n
and
x
n
x
n
are jointly Gaussian
distributed.
y
n
|
x
n
∼𝒩y^
n
|
n
P
n
|
n
y
n
|
x
n
y
n
|
n
P
n
|
n
where
y
n
y
n
is the signal samples
y
n
,
…
,
y
n
-
p
+
1
y
n
,
…
,
y
n
-
p
+
1
,
x
n
x
n
is the observations/measurements
x
n
,
…
,
x
n
-
p
+
1
x
n
,
…
,
x
n
-
p
+
1
, and
y^
n
|
n
y
n
|
n
is the best (minimum MSE) estimator of
y
n
y
n
given
x
n
x
n
.
This is all well and good, but we need to know what
the conditional mean and covariance are explicitly. So the
problem is now to find/compute
y^
n
|
n
y
n
|
n
and
P
n
|
n
P
n
|
n
. We can take advantage of the recursive state equation
to obtain a recursive procedure for this calculation. To begin,
consider the "predictor"
y^
n
|
n
-
1
y
n
|
n
-
1
:
y
n
|
x
n
-
1
∼𝒩y^
n
|
n
-
1
P
n
|
n
-
1
y
n
|
x
n
-
1
y
n
|
n
-
1
P
n
|
n
-
1
Where
y
n
y
n
is the signal samples,
y
n
…
y
n
-
p
+
1
y
n
…
y
n
-
p
+
1
,
x
n
-
1
x
n
-
1
is the observations
x
n
-
1
…
x
n
-
p
x
n
-
1
…
x
n
-
p
, and
y^
n
|
n
-
1
y
n
|
n
-
1
is the best min MSE estimator of
y
n
y
n
given
x
n
-
1
x
n
-
1
. Although we don't know what forms
y^
n
|
n
-
1
y
n
|
n
-
1
and
P
n
|
n
-
1
P
n
|
n
-
1
have, we do know two important facts:
- The predictor
y^
n
|
n
-
1
y
n
|
n
-
1
acts as a sufficient statistic for
y
n
y
n
. That is, we can replace
x
n
-
1
x
n
-
1
(the data) with
y^
n
|
n
-
1
y
n
|
n
-
1
(the predictor). In other words, all the
relevant information in
x
n
-
1
x
n
-
1
pertaining to
y
n
y
n
is summarized by the predictor
y^
n
|
n
-
1
y
n
|
n
-
1
, which is, of course, a function of
x
n
-
1
x
n
-
1
.
-
The predictor
y^
n
|
n
-
1
y
n
|
n
-
1
and the prediction error
e
n
|
n
-
1
=
y
n
−y^
n
|
n
-
1
e
n
|
n
-
1
y
n
y
n
|
n
-
1
are orthogonal (the orthogonality
principle of minimum MSE estimators
⇒
(Ey^
n
|
n
-
1
e
n
|
n
-
1
T=0)⇒
y
n
|
n
-
1
e
n
|
n
-
1
0
error is orthogonal to estimator).
Moreover,
y
n
=y^
n
|
n
-
1
+
e
n
|
n
-
1
y
n
y
n
|
n
-
1
e
n
|
n
-
1
(20)
Since all quantities are zero-mean,
e
n
|
n
-
1
∼𝒩0
P
n
|
n
-
1
e
n
|
n
-
1
0
P
n
|
n
-
1
where
P
n
|
n
-
1
P
n
|
n
-
1
is the covariance of
y
n
|
x
n
-
1
y
n
|
x
n
-
1
and "variability" of
y
n
y
n
about the predictor
y^
n
|
n
-
1
y
n
|
n
-
1
. Therefore,
y^
n
|
n
-
1
∼𝒩0
R
n
−
P
n
|
n
-
1
y
n
|
n
-
1
0
R
n
P
n
|
n
-
1
Where
R
n
R
n
is the covariance of
Y
n
Y
n
. Now suppose that we have the predictor
y^
n
|
n
-
1
y
n
|
n
-
1
computed and a new measurement is made:
x
n
=CT
y
n
+
w
n
=CT(y^
n
|
n
-
1
+
e
n
|
n
-
1
)+
w
n
x
n
C
y
n
w
n
C
y
n
|
n
-
1
e
n
|
n
-
1
w
n
(21)
y^
n
|
n
-
1
y
n
|
n
-
1
,
e
n
|
n
-
1
e
n
|
n
-
1
, and
w
n
w
n
are all orthogonal.
We can express all relevant quantities in the
matrix equation
y
n
y^
n
|
n
-
1
x
n
=(
II0
0I0
CTCT1
)
e
n
|
n
-
1
y^
n
|
n
-
1
w
n
y
n
y
n
|
n
-
1
x
n
I
I
0
0
I
0
C
C
1
e
n
|
n
-
1
y
n
|
n
-
1
w
n
(22)
Now because of the orthogonality, the covariance
E
e
n
|
n
-
1
y^
n
|
n
-
1
w
n
e
n
|
n
-
1
y^
n
|
n
-
1
w
n
T=(
P
n
|
n
-
1
00
0
R
n
−
P
n
|
n
-
1
0
00
σ
w
2
)
e
n
|
n
-
1
y
n
|
n
-
1
w
n
e
n
|
n
-
1
y
n
|
n
-
1
w
n
P
n
|
n
-
1
0
0
0
R
n
P
n
|
n
-
1
0
0
0
σ
w
2
(23)
Combining this with the matrix
Equation 22 shows that
E
y
n
y^
n
|
n
-
1
x
n
y
n
y^
n
|
n
-
1
x
n
T=(
R
n
P^
n
|
n
-
1
R
n
C
P^
n
|
n
-
1
⋮…
CT
R
n
⋮
S
n
)
y
n
y
n
|
n
-
1
x
n
y
n
y
n
|
n
-
1
x
n
R
n
P
n
|
n
-
1
R
n
C
P
n
|
n
-
1
⋮
…
C
R
n
⋮
S
n
(24)
y
n
y^
n
|
n
-
1
x
n
T
y
n
y
n
|
n
-
1
x
n
are jointly Gaussian with the covariance in
Equation 24 and means zero.
Where
P^
n
|
n
-
1
=
R
n
−P^
n
|
n
-
1
P
n
|
n
-
1
R
n
P
n
|
n
-
1
(25)
S
n
=(
P^
n
|
n
-
1
P^
n
|
n
-
1
C
CTP^
n
|
n
-
1
σ
w
2
)
S
n
P
n
|
n
-
1
P
n
|
n
-
1
C
C
P
n
|
n
-
1
σ
w
2
(26)
We now have all the quantities necessary to compute our
recursive estimator using the Gauss-Markov Theorem.
We will now derive a recursion for conditional
distribution of
y
n
y
n
given
y^
n
|
n
-
1
y
n
|
n
-
1
(best estimate based on past observations) and
x
n
x
n
(current observation). We know that
y
n
|
(
y^
n
|
n
-
1
,
x
n
)
y
n
|
(
y
n
|
n
-
1
,
x
n
)
is Gaussian (since all quantities are jointly
Gaussian). Let's denote this conditional distribution by
y
n
|
(
y^
n
|
n
-
1
,
x
n
)
∼𝒩y^
n
|
n
P
n
|
n
y
n
|
(
y
n
|
n
-
1
,
x
n
)
y
n
|
n
P
n
|
n
Applying the Gauss-Markov Theorem we find
y^
n
|
n
=P^
n
|
n
-
1
R
n
CT
S
n
-1y^
n
|
n
-
1
x
n
y
n
|
n
P
n
|
n
-
1
R
n
C
S
n
y
n
|
n
-
1
x
n
(27)
which is the best estimator of
y
n
y
n
given
y^
n
|
n
-
1
y
n
|
n
-
1
and
x
n
x
n
. The inverse of
S
n
S
n
is given by
S
n
-1=(
P^
n
|
n
-
1
-10
00
)+
γ
n
-1−C1−CT1T
S
n
P
n
|
n
-
1
0
0
0
γ
n
C
1
C
1
(28)
where
γ
n
-1=
σ
n
2−CT(
R
n
−
P
n
|
n
-
1
)C=CT
P
n
|
n
-
1
C+
σ
w
2
γ
n
σ
n
2
C
R
n
P
n
|
n
-
1
C
C
P
n
|
n
-
1
C
σ
w
2
(29)
σ
n
2=CT
R
n
C+
σ
w
2
σ
n
2
C
R
n
C
σ
w
2
Substituting this inverse formula into
Equation 27 yields
y^
n
|
n
=y^
n
|
n
-
1
+
P
n
|
n
-
1
C
γ
n
-1(
x
n
−CTy^
n
|
n
-
1
)
y
n
|
n
y
n
|
n
-
1
P
n
|
n
-
1
C
γ
n
x
n
C
y
n
|
n
-
1
(30)
The Gauss-Markov Theorem also gives us an expression for
P
n
|
n
P
n
|
n
:
P
n
|
n
=
R
n
−P^
n
|
n
-
1
R
n
CT
S
n
-1P^
n
|
n
-
1
CT
R
n
P
n
|
n
R
n
P
n
|
n
-
1
R
n
C
S
n
P
n
|
n
-
1
C
R
n
(31)
and upon substituting
Equation 27 for
S
n
-1
S
n
we get
P
n
|
n
=
P
n
|
n
-
1
−
γ
n
-1
P
n
|
n
-
1
CCT
P
n
|
n
-
1
P
n
|
n
P
n
|
n
-
1
γ
n
P
n
|
n
-
1
C
C
P
n
|
n
-
1
(32)
Note that both expressions contain the quantity
K
n
=
P
n
|
n
-
1
C
γ
n
-1
K
n
P
n
|
n
-
1
C
γ
n
(33)
which is the so-called
Kalman gain.
Using the Kalman gain, the Kalman
recursions are given by
y^
n
|
n
=y^
n
|
n
-
1
+
K
n
(
x
n
−CTy^
n
|
n
-
1
)
y
n
|
n
y
n
|
n
-
1
K
n
x
n
C
y
n
|
n
-
1
(34)
P
n
|
n
=
P
n
|
n
-
1
−
γ
n
K
n
K
n
T
P
n
|
n
P
n
|
n
-
1
γ
n
K
n
K
n
(35)
The recursions are complete except for definitions of
y^
n
|
n
-
1
y
n
|
n
-
1
and
P
n
|
n
-
1
P
n
|
n
-
1
.
y^
n
|
n
-
1
=E
y
n
|
x
n−1
=E
A
y
n−1
+b
u
n
-
1
|
x
n−1
=Ay^
n
-
1
|
n
-
1
y
n
|
n
-
1
y
n
|
x
n
1
A
y
n
1
b
u
n
-
1
|
x
n
1
A
y
n
-
1
|
n
-
1
(36)
P
n
|
n
-
1
=E(
y
n
−y^
n
|
n
-
1
)(
y
n
−y^
n
|
n
-
1
)T=A
P
n
-
1
|
n
-
1
AT+
σ
n
2bbT
P
n
|
n
-
1
y
n
y
n
|
n
-
1
y
n
y
n
|
n
-
1
A
P
n
-
1
|
n
-
1
A
σ
n
2
b
b
(37)
Now we can summarize the
Kalman filter:
Measurements/observation model:
x
n
=
y
n
+
w
n
x
n
y
n
w
n
w
n
∼𝒩0
σ
w
2
w
n
0
σ
w
2
(
C=1
C
1
).
y
n
+
1
=a
y
n
+
u
n
y
n
+
1
a
y
n
u
n
Where
y
n
+
1
y
n
+
1
is a time-varying voltage,
u
n
∼𝒩0
σ
n
2
u
n
0
σ
n
2
, and
σ
u
=0.1
σ
u
0.1
.
(a=0.99)⇒highly correlated process
a
0.99
highly correlated process
. (
A=a
A
a
,
b=1
b
1
).
-
P
n
|
n
-
1
=a2
P
n
-
1
|
n
-
1
+
σ
n
2
P
n
|
n
-
1
a
2
P
n
-
1
|
n
-
1
σ
n
2
(
qn
q
n
in MATLAB code)
-
γ
n
-1=
P
n
|
n
-
1
+
σ
w
2
γ
n
P
n
|
n
-
1
σ
w
2
(
gn
g
n
in MATLAB code)
-
P
n
|
n
=
P
n
|
n
-
1
−
γ
n
-1
P
n
|
n
-
1
2
P
n
|
n
P
n
|
n
-
1
γ
n
P
n
|
n
-
1
2
(
pn
p
n
in MATLAB code)
-
K
n
=
P
n
|
n
-
1
γ
n
-1
K
n
P
n
|
n
-
1
γ
n
(
kn
k
n
in MATLAB code)
-
y^
n
|
n
-
1
=ay^
n
-
1
|
n
-
1
y
n
|
n
-
1
a
y
n
-
1
|
n
-
1
(
pyn
py
n
in MATLAB code)
-
y^
n
|
n
=y^
n
|
n
-
1
+
K
n
(
x
n
−y^
n
|
n
-
1
)
y
n
|
n
y
n
|
n
-
1
K
n
x
n
y
n
|
n
-
1
(
eyn
ey
n
in MATLAB code)
Initialization:
ey1=0
ey
1
0
,
q1=
σ
u
2
q
1
σ
u
2