The mean-squared error for any estimate of
a nonrandom parameter has a lower bound, the
Cramér-Rao Bound (Cramér (1946) pp. 474-477), which
defines the ultimate accuracy of any
estimation procedure. This lower bound, as shown later, is
intimately related to the maximum likelihood estimator.
We seek a "bound" on the mean-squared error matrix MM defined to be
M=E
θ
̂-θ
θ
̂-θT=EεεT
M
θ
θ
θ
θ
ε
ε
A matrix is "lower bounded" by a second matrix if the difference
between the two is a non-negative definite matrix. Define the
column matrix xx to be
x=
θ
̂-θ-bθ∇θlnpr|θr
x
θ
θ
b
θ
θ
p
r
θ
r
where
bθ
b
θ
denotes the column matrix of estimator biases. To derive the
Cramér-Rao bound, evaluate
ExxT
x
x
.
ExxT=M-bbTI+∇θbI+∇θbTF
x
x
M
b
b
I
θ
b
I
θ
b
F
where
∇θb
θ
b
represents the matrix of partial derivatives of the bias
∂∂
θ
j
b
i
θ
j
b
i
and the matrix FF is
the Fisher information matrix
F=E∇θlnpr|θr∇θlnpr|θrT
F
θ
p
r
θ
r
θ
p
r
θ
r
(1)
Note that this matrix can alternatively be expressed as
F=-E
∇
θ
∇
θ
T
lnpr|θr
F
∇
θ
∇
θ
T
p
r
θ
r
The notation
∇
θ
∇
θ
T
∇
θ
∇
θ
T
means the matrix of all second partials of the quantity it
operates on (the gradient of the gradient). The matrix is known
as the
Hessian. Demonstrating the equivalence of
these two forms for the Fisher information is quite
easy. Because
∫pr|θrdr=1
r
p
r
θ
r
1
for all choices of the parameter vector, the gradient of the
expression equals zero. Furthermore,
∇θlnpr|θr=∇θpr|θrpr|θr
θ
p
r
θ
r
θ
p
r
θ
r
p
r
θ
r
. Combining these results yields
∫∇θlnpr|θrpr|θrdr=0
r
θ
p
r
θ
r
p
r
θ
r
0
Evaluating the gradient for this quantity (using the chain rule)
also yields zero.
∫
∇
θ
∇
θ
T
lnpr|θrpr|θr+∇θlnpr|θr∇θlnpr|θrTpr|θrdr=0
r
∇
θ
∇
θ
T
p
r
θ
r
p
r
θ
r
θ
p
r
θ
r
θ
p
r
θ
r
p
r
θ
r
0
or
E∇θlnpr|θr∇θlnpr|θrT=-E
∇
θ
∇
θ
T
lnpr|θr
θ
p
r
θ
r
θ
p
r
θ
r
∇
θ
∇
θ
T
p
r
θ
r
Calculating the expected value for the Hessian for is somewhat
easier than finding the expected value of the outer product of
the gradient with itself. In the scalar case, we have
E∂∂θlnpr|θr2=-E∂2∂θ2lnpr|θr
θ
p
r
θ
r
2
θ
2
2
p
r
θ
r
Returning to the derivation, the matrix
ExxT
x
x
is non-negative definite because it is a correlation
matrix. Thus, for any column matrix,
αα, the quadratic form
αTExxTα
α
x
x
α
is non-negative. Choose a form for
αα that simplifies the
quadratic form. A convenient choice is
α=β-F-1I+∇θbTβ
α
β
F
I
θ
b
β
where ββ is an arbitrary
column matrix. The quadratic form becomes in this case
αTExxTα=βTM-bbT-I+∇θbF-1I+∇θbTβ
α
x
x
α
β
M
b
b
I
θ
b
F
I
θ
b
β
As this quadratic form must be non-negative, the
matrix expression enclosed in brackets must be non-negative
definite. We thus obtain the well-known Cramér-Rao bound
on the mean-square error matrix.
EεεT≥bθbθT+I+∇θbF-1I+∇θbT
ε
ε
b
θ
b
θ
I
θ
b
F
I
θ
b
(2)
This form for the Cramér-Rao Bound does
not mean that each term in the matrix of
squared errors is greater than the corresponding term in the
bounding matrix. As stated earlier, this expression means that
the difference between these matrices is non-negative
definite. For a matrix to be non-negative definite, each term on
the main diagonal must be non-negative. The elements of the main
diagonal of
EεεT
ε
ε
are the squared errors of the estimate of the individual
parameters. Thus, for each parameter, the mean-squared
estimation error can be no smaller than
Eθ̂i-
θ
i
2≥
b
i
2θ+I+∇θbF-1I+∇θbTii
θ
i
θ
i
2
b
i
θ
2
I
θ
b
F
I
θ
b
i
i
This bound simplifies greatly if the estimator is unbiased (
b=0
b
0
).
In this case, the Cramér-Rao bound becomes
Eθ̂i-
θ
i
2≥F-1ii
θ
i
θ
i
2
F
i
i
Thus, the mean-squared error for each parameter in a
multiple-parameter, unbiased-estimator problem can be no smaller
than the corresponding diagonal term in the
inverse for the Fisher information
matrix. In such problems, the estimate's error characteristics
of any parameter become intertwined with the other parameters in
a complicated way. Any estimator satisfying the Cramér-Rao
bound with equality is said to be
efficient.
Let's evaluate the Cramér-Rao bound for the example we
have been discussing: the estimation of the mean and variance
of a length LL sequence of
statistically independent Gaussian random variables. Let the
estimate of the mean
θ
1
θ
1
be the sample average
θ̂1=∑rlL
θ
1
r
l
L
; as shown in the last example, this estimate is
unbiased. Let the estimate of the variance
θ
2
θ
2
be the unbiased estimate
θ̂2=∑rl-θ̂12L-1
θ
2
r
l
θ
1
2
L
1
. Each term in the Fisher information matrix
FF is given by the
expected value of the paired products of derivatives of the
logarithm of the likelihood function.
Fij=E∂∂
θ
i
lnpr|θr∂∂
θ
j
lnpr|θr
F
i
j
θ
i
p
r
θ
r
θ
j
p
r
θ
r
The logarithm of the likelihood function is
lnpr|θr=-L2ln2π
θ
2
-12
θ
2
∑l=0L-1rl-
θ
1
2
p
r
θ
r
L
2
2
θ
2
1
2
θ
2
l
0
L
1
r
l
θ
1
2
its partial derivatives are
∂∂
θ
1
lnpr|θr=1
θ
2
∑l=0L-1rl-
θ
1
θ
1
p
r
θ
r
1
θ
2
l
0
L
1
r
l
θ
1
(3)
∂∂
θ
2
lnpr|θr=-L2
θ
2
+12
θ
2
2∑l=0L-1rl-
θ
1
2
θ
2
p
r
θ
r
L
2
θ
2
1
2
θ
2
2
l
0
L
1
r
l
θ
1
2
(4)
and its second partials are
∂2∂
θ
1
2lnpr|θr=-L
θ
2
θ
1
2
2
p
r
θ
r
L
θ
2
∂2∂
θ
1
∂
θ
2
lnpr|θr=-1
θ
2
2∑l=0L-1rl-
θ
1
θ
1
θ
2
2
p
r
θ
r
1
θ
2
2
l
0
L
1
r
l
θ
1
∂2∂
θ
2
∂
θ
1
lnpr|θr=-1
θ
2
2∑l=0L-1rl-
θ
1
θ
2
θ
1
2
p
r
θ
r
1
θ
2
2
l
0
L
1
r
l
θ
1
∂2∂
θ
2
2lnpr|θr=L2
θ
2
2-1
θ
2
3∑l=0L-1rl-
θ
1
2
θ
2
2
2
p
r
θ
r
L
2
θ
2
2
1
θ
2
3
l
0
L
1
r
l
θ
1
2
The Fisher information matrix has the surprisingly simple form
F=L
θ
2
00L2
θ
2
2
F
L
θ
2
0
0
L
2
θ
2
2
its inverse is also a diagonal matrix with the elements on the
main diagonal equalling the reciprocal of those in the
original matrix. Because of the zero-values off-diagonal
entries in the Fisher information matrix, the errors between
the corresponding estimates are not inter-dependent. In this
problem, the mean-square estimation error can be no smaller
than
Eθ̂1-
θ
1
2≥
θ
2
L
θ
1
θ
1
2
θ
2
L
Eθ̂2-
θ
2
2≥2
θ
2
2L
θ
2
θ
2
2
2
θ
2
2
L
Note that nowhere in the preceding example
did the form of the estimator enter into the computation of the
bound. The only quantity used in the computation of the
Cramér-Rao bound is the logarithm of the likelihood
function, which is a consequence of the problem statement, not
how it is solved. Only in the case of unbiased
estimators is the bound independent of the estimators
used. Because of this property, the Cramér-Rao
bound is frequently used to assess the performance limits that
can be obtained with an unbiased estimator in a particular
problem. When bias is present, the exact form of the estimator's
bias explicitly enters the computation of the bound. All too
frequently, the unbiased form is used in situations where the
existence of an unbiased estimator can be
questioned. As we shall see, one such problem is time delay
estimation, presumably of some importance to the reader. This
misapplication of the unbiased Cramér-Rao arises from
desperation: the estimator is so complicated and nonlinear that
computing the bias is nearly impossible. As shown in this problem, biased
estimators can yield mean-squared error smaller as well as
larger than the unbiased version of the Cramér-Rao
bound. Consequently, desperation can yield misinterpretation
when a general result is misapplied.
In the single-parameter estimation problem, the
Cramér-Rao bound incorporating bias has the well-known
form
Eε2≥b2+1+ddθb2E∂∂θpr|θr2
ε
2
b
2
1
θ
b
2
θ
p
r
θ
r
2
(5)
Note that the sign of the bias's derivative determines whether
this bound is larger or potentially smaller than the unbiased
version, which is obtained by setting the bias term to zero.
An interesting question arises: when, if ever, is the bound
satisfied with equality? Recalling the details of the
derivation of the bound, equality results when the quantity
EαTxxTα
α
x
x
α
equals zero. As this quantity is the expected value of the
square of
αTx
α
x
, it can only equal zero if
αTx=0
α
x
0
. Substituting in the form of the column matrices
αα and xx, equality in the
Cramér-Rao bound results whenever
∇θlnpr|θr=I+∇θbT-1F
θ
̂r-θ-b
θ
p
r
θ
r
I
θ
b
F
θ
r
θ
b
(6)
This complicated expression means that only if estimation
problems (as expressed by the
a priori
density have the form of the right side of this equation can
the mean-squared error equal the Cramér-Rao bound. In
particular, the gradient of the log likelihood function can
only depend on the observations through
the estimator. In all other problems, the Cramér-Rao
bound is a lower bound but not a tight one
no estimator can have error
characteristics that equal it. In such cases, we have limited
insight into ultimate limitations on estimation error size
with the Cramér-Rao bound. However, consider the case
where the estimator is unbiased (
b=0
b
0
). In addition, note the maximum likelihood estimate
occurs when the gradient of the logarithm of the likelihood
function equals zero:
∇θlnpr|θr=0
θ
p
r
θ
r
0
when
θ=θ̂ML
θ
θ
ML
. In this case, the condition for equality in the
Cramér-Rao bound becomes
F
θ
̂-θ̂ML=0
F
θ
θ
ML
0
As the Fisher information matrix is positive-definite, we
conclude that if the estimator equals the maximum likelihood
estimator, equality in the Cramér-Rao bound
can be satisfied with equality,
only the maximum likelihood estimate will
achieve it. To use estimation theoretic terminology,
if an efficient estimate exists, it is the maximum
likelihood estimate. This result stresses the
importance of maximum likelihood estimates, despite the
seemingly
ad hoc manner by which they are
defined.
Consider the Gaussian example being examined so frequently
in this section. The components of the gradient of the
logarithm of the likelihood function were given earlier by
Equation 3 and Equation 4. These
expressions can be rearranged to reveal
∂∂
θ
1
lnpr|θr∂∂
θ
2
lnpr|θr=L
θ
2
1L∑lrl-
θ
1
-L