In terms of the densities involved in scalar random-parameter
problems, the mean-squared error is given by
Eε2=∫∫θ−
θ
̂2prθdrdθ
ε
2
θ
r
θ
θ
2
p
r
θ
(1)
where
prθ
p
r
θ
is the joint density of the observations and the parameter. To
minimize this integral with respect to
θ
̂
θ
, we rewrite using the laws of conditional
probability as
Eε2=∫pr∫θ−
θ
̂r2pθ|rdθdr
ε
2
r
p
r
θ
θ
θ
r
2
p
r
θ
(2)
The density
p
r
(
·
)
p
r
(
·
)
is nonnegative. To minimize the mean-squared error, we must
minimize the inner integral for each value of
rr because the integral is weighted
by a positive quantity. We focus attention on the inner
integral, which is the conditional expected value of the squared
estimation error. The condition, a fixed value of
rr, implies that we seek that
constant
θ
̂r
θ
r
derived from
rr that minimizes the second moment
of the random parameter
θθ. A
well-known result from probability theory states that the
minimum of
Ex−c2
x
c
2
occurs when the constant
cc
equals the expected value of the random variable
xx
(see
Expected Values of Probability
Functions). The inner integral and thereby the
mean-squared error is minimized by choosing the estimator to be
the conditional expected value of the parameter given the
observations.
θ̂MMSEr=Eθ|r
θ
MMSE
r
r
θ
(3)
Thus, a parameter's minimum mean-squared error (MMSE) estimate
is the parameter's
a posteriori (after the
observations have been obtained) expected value.
The associated conditional probability density
pθ|r
p
r
θ
is not often directly stated in a problem definition and must
somehow be derived. In many applications, the likelihood
function
pr|θ
p
θ
r
and the a priori density of the parameter are
a direct consequence of the problem statement. These densities
can be used to find the joint density of the observations and
the parameter, enabling us to use Bayes's Rule to fine the
a posteriori density if
we knew the unconditional probability density of the
observations.
pθ|r=pr|θpθpr
p
r
θ
p
θ
r
p
θ
p
r
(4)
This density
pr
p
r
is often difficult to determine. Be that as it may, to find the
a posteriori conditional expected value, it
need not be known. The numerator entirely expresses the
a
posteriori density's dependence on
θθ; the denominator only serves
as the scaling factor to yield a unit-area quantity. The expected
value is the center-of-mass of the probability density and does
not depend directly on the "weight" of the
density, bypassing calculation of the scaling factor. If not, the
MMSE estimate can be exceedingly difficult to compute.
Let LL statistically independent
observations be obtained, each of which is expressed by
rl=θ+nl
r
l
θ
n
l
.
Each
nl
n
l
is a Gaussian random variable having zero mean and variance
σ
n
2
σ
n
2
. Thus, the unknown parameter in this problem is the
mean of the observations. Assume it to be a Gaussian random
variable a priori (mean
m
θ
m
θ
and variance
σ
θ
2
σ
θ
2
).
The likelihood function is easily found to be
pr|θ=∏l=0L−112π
σ
n
2ⅇ-12rl−θ
σ
n
2
p
θ
r
l
0
L
1
1
2
σ
n
2
1
2
r
l
θ
σ
n
2
(5)
so that the
a posteriori density is given by
pθ|r=12π
σ
θ
2ⅇ-12θ−
m
θ
σ
θ
2∏l=0L−112π
σ
n
2ⅇ-12rl−θ
σ
n
2pr
p
r
θ
1
2
σ
θ
2
1
2
θ
m
θ
σ
θ
2
l
0
L
1
1
2
σ
n
2
1
2
r
l
θ
σ
n
2
p
r
(6)
In an attempt to find the expected value of this distribution,
lump all terms that do not depend
explicitly on the quantity
θθ
into a proportionality term.
pθ|r∝ⅇ-12∑rl−θ2
σ
n
2+θ−
m
θ
2
σ
θ
2
∝
p
r
θ
1
2
r
l
θ
2
σ
n
2
θ
m
θ
2
σ
θ
2
(7)
After some manipulation, this expression can be written as
pθ|r∝ⅇ-12σ2θ−σ2
m
θ
σ
θ
2+∑rl
σ
n
22
∝
p
r
θ
1
2
σ
2
θ
σ
2
m
θ
σ
θ
2
r
l
σ
n
2
2
(8)
where
σ2
σ
2
is a quantity that succinctly expresses the ratio
σ
n
2
σ
θ
2
σ
n
2+L
σ
θ
2
σ
n
2
σ
θ
2
σ
n
2
L
σ
θ
2
. The form of the
a posteriori
density suggests that it too is Gaussian; its mean, and
therefore the MMSE estimate of
θθ, is given by
θ̂MMSEr=σ2
m
θ
σ
θ
2+∑rl
σ
n
2
θ
MMSE
r
σ
2
m
θ
σ
θ
2
r
l
σ
n
2
(9)
More insight into the nature of this estimate is gained by
rewriting it as
θ̂MMSEr=
σ
n
2L
σ
θ
2+
σ
n
2L
m
θ
+
σ
θ
2
σ
θ
2+
σ
n
2L1L∑l=0L−1rl
θ
MMSE
r
σ
n
2
L
σ
θ
2
σ
n
2
L
m
θ
σ
θ
2
σ
θ
2
σ
n
2
L
1
L
l
0
L
1
r
l
(10)
The term
σ
n
2L
σ
n
2
L
is the variance of the averaged observations for a given value
of
θθ; it expresses the
squared error encountered in estimating the mean by simple
averaging. If this error is much greater than the
a
priori variance of
θθ (
σ
n
2L≫
σ
θ
2
≫
σ
n
2
L
σ
θ
2
), implying that the observations are noisier than
the variation of the parameter, the MMSE estimate ignores the
observations and tends to yield the
a
priori mean
m
θ
m
θ
as its value. If the averaged observations are less variable
than the parameter, the second term dominates, and the average
of the observations is the estimate's value. This estimate
behavior between these extremes is very intuitive. The
detailed form of the estimate indicates how the squared error
can be minimized by a linear combination of these extreme
estimates.
The conditional expected value of the estimate equals
Eθ̂MMSE|θ=
σ
n
2L
σ
θ
2+
σ
n
2L
m
θ
+
σ
θ
2
σ
θ
2+
σ
n
2Lθ
θ
θ
MMSE
σ
n
2
L
σ
θ
2
σ
n
2
L
m
θ
σ
θ
2
σ
θ
2
σ
n
2
L
θ
(11)
This estimate is biased because its expected value does not
equal the value of the sought-after parameter. It is
asymptotically unbiased as the squared measurement error
σ
n
2L
σ
n
2
L
tends to zero as
LL becomes
large. The consistency of the estimator is determined by
investigating the expected value of the squared error. Note
that the variance of the
a posteriori
density is the quantity
σ2
σ
2
; as this quantity does not depend on
rr, it also equals the
unconditional variance. As the number of observations
increases, this variance tends to zero. In concert with the
estimate being asymptotically unbiased, the expected value of
the estimation error thus tends to zero, implying that we have
a consistent estimate.