When the a priori density of a parameter is
not known or the parameter itself is inconveniently described as
a random variable, techniques must be developed that make no
presumption about the relative possibilities of parameter
values. Lacking this knowledge, we can expect the error
characteristics of the resulting estimates to be worse than
those which can use it.
The maximum likelihood estimate
θ̂MLr
θ
ML
r
of a nonrandom parameter is, simply, that value which
maximizes the likelihood function (the a
priori density of the observations). Assuming that the
maximum can be found by evaluating a derivative,
θ̂MLr
θ
ML
r
is defined by
∂∂θpr|θ|θ=θ̂ML=0
θ
θ
ML
θ
p
θ
r
0
(1)
The logarithm of the likelihood function may also be used in
this maximization.
Let
rl
r
l
be a sequence of independent, identically distributed Gaussian
random variables having an unknown mean
θθ but a known variance
σ
n
2
σ
n
2
. Often, we cannot assign a probability density to a
parameter of a random variable's density; we simply do not know
what the parameter's value is. Maximum likelihood estimates are
often used in such problems. In the specific case here, the
derivative of the logarithm of the likelihood function equals
∂∂θlnpr|θ=1
σ
n
2∑l=0L-1rl-θ
θ
p
θ
r
1
σ
n
2
l
0
L
1
r
l
θ
The solution of this equation is the maximum likelihood
estimate, which equals the sample average.
θ̂ML=1L∑l=0L-1rl
θ
ML
1
L
l
0
L
1
r
l
The expected value of this estimate
Eθ̂ML|θ
θ
θ
ML
equals the actual value θθ,
showing that the maximum likelihood estimate is unbiased. The
mean-squared error equals
σ
n
2L
σ
n
2
L
and we infer that this estimate is consistent.
The maximum likelihood procedure (as well as the others being
discussed) can be easily generalized to situations where more
than one parameter must be estimated. Letting
θθ denote the parameter
vector, the likelihood function is now expressed as
pr|θ
p
θ
r
. The maximum likelihood estimate
θ̂ML
θ
ML
of the parameter vector is given by the location of
the maximum of the likelihood function (or equivalently of its
logarithm). Using derivatives, the calculation of the maximum
likelihood estimate becomes
∇θlnpr|θ|θ=θ̂ML=0
θ
θ
ML
θ
p
θ
r
0
(2)
where
∇
θ
∇
θ
denotes the gradient with respect to the parameter
vector. This equation means that we must estimate all of the
parameter
simultaneously by setting the
partial of the likelihood function with respect to
each parameter to zero. Given
PP parameters, we must solve in
most cases a set of
PP nonlinear,
simultaneous equations to find the maximum likelihood
estimates.
Let's extend the previous example to the situation where
neither the mean nor the variance of a sequence of independent
Gaussian random variables is known. The likelihood function
is, in this case,
pr|θ=∏l=0L-112π
θ
2
ⅇ-12
θ
2
rl-
σ
1
2
p
θ
r
l
0
L
1
1
2
θ
2
1
2
θ
2
r
l
σ
1
2
Evaluating the partial derivatives of the logarithm of this
quantity, we find the following set of two equations to solve
for
θ
1
θ
1
, representing the mean, and
θ
2
θ
2
, representing the variance.
1
θ
2
∑l=0L-1rl-
θ
1
=0
1
θ
2
l
0
L
1
r
l
θ
1
0
-L2
θ
2
+12
θ
2
2∑l=0L-1rl-
θ
1
2=0
L
2
θ
2
1
2
θ
2
2
l
0
L
1
r
l
θ
1
2
0
The solution of this set of equations is easily found to be
θ
1
ML
̂=1L∑l=0L-1rl
θ
1
ML
1
L
l
0
L
1
r
l
θ
2
ML
̂=1L∑l=0L-1rl-
θ
1
ML
̂2
θ
2
ML
1
L
l
0
L
1
r
l
θ
1
ML
2
The expected value of
θ
1
ML
̂
θ
1
ML
equals the actual value of
θ
1
θ
1
; thus, this estimate is unbiased. However, the
expected value of the estimate of the variance equals
θ
2
L-1L
θ
2
L
1
L
. The estimate of the variance is biased, but
asymptotically unbiased. This bias can be removed by replacing
the normalization of LL in the
averaging computation for
θ
2
ML
̂
θ
2
ML
by
L-1
L
1
.