Connexions

You are here: Home » Content » Cramer-Rao Bound

Recently Viewed

This feature requires Javascript to be enabled.

Cramer-Rao Bound

Module by: Don Johnson. E-mail the author

The mean-squared error for any estimate of a nonrandom parameter has a lower bound, the Cramér-Rao Bound (Cramér (1946) pp. 474-477), which defines the ultimate accuracy of any estimation procedure. This lower bound, as shown later, is intimately related to the maximum likelihood estimator.

We seek a "bound" on the mean-squared error matrix MM defined to be M=E( θ ^θ)( θ ^θ)T=EεεT M θ θ θ θ ε ε A matrix is "lower bounded" by a second matrix if the difference between the two is a non-negative definite matrix. Define the column matrix xx to be x=( θ ^θbθ θlnp r | θ r ) x θ θ b θ θ p r θ r where bθ b θ denotes the column matrix of estimator biases. To derive the Cramér-Rao bound, evaluate ExxT x x . ExxT=( MbbTI+θb (I+θb)TF ) x x M b b I θ b I θ b F where θb θ b represents the matrix of partial derivatives of the bias b i θ j θ j b i and the matrix FF is the Fisher information matrix

F=Eθlnp r | θ rθlnp r | θ rT F θ p r θ r θ p r θ r
(1)
Note that this matrix can alternatively be expressed as F=E θ θ T lnp r | θ r F θ θ T p r θ r The notation θ θ T θ θ T means the matrix of all second partials of the quantity it operates on (the gradient of the gradient). The matrix is known as the Hessian. Demonstrating the equivalence of these two forms for the Fisher information is quite easy. Because p r | θ rd r =1 r p r θ r 1 for all choices of the parameter vector, the gradient of the expression equals zero. Furthermore, θlnp r | θ r=θp r | θ rp r | θ r θ p r θ r θ p r θ r p r θ r . Combining these results yields θlnp r | θ rp r | θ rd r =0 r θ p r θ r p r θ r 0 Evaluating the gradient for this quantity (using the chain rule) also yields zero. θ θ T lnp r | θ rp r | θ r+θlnp r | θ rθlnp r | θ rTp r | θ rd r =0 r θ θ T p r θ r p r θ r θ p r θ r θ p r θ r p r θ r 0 or Eθlnp r | θ rθlnp r | θ rT=E θ θ T lnp r | θ r θ p r θ r θ p r θ r θ θ T p r θ r Calculating the expected value for the Hessian for is somewhat easier than finding the expected value of the outer product of the gradient with itself. In the scalar case, we have Elnp r | θ rθ2=E 2 lnp r | θ r θ 2 2 θ p r θ r 2 θ 2 2 p r θ r

Returning to the derivation, the matrix ExxT x x is non-negative definite because it is a correlation matrix. Thus, for any column matrix, αα, the quadratic form αTExxTα α x x α is non-negative. Choose a form for αα that simplifies the quadratic form. A convenient choice is α=( β (F-1(I+θb)Tβ) ) α β F I θ b β where ββ is an arbitrary column matrix. The quadratic form becomes in this case αTExxTα=βT( MbbT(I+θb)F-1(I+θb)T )β α x x α β M b b I θ b F I θ b β As this quadratic form must be non-negative, the matrix expression enclosed in brackets must be non-negative definite. We thus obtain the well-known Cramér-Rao bound on the mean-square error matrix.

EεεTbθbθT+(I+θb)F-1(I+θb)T ε ε b θ b θ I θ b F I θ b
(2)

This form for the Cramér-Rao Bound does not mean that each term in the matrix of squared errors is greater than the corresponding term in the bounding matrix. As stated earlier, this expression means that the difference between these matrices is non-negative definite. For a matrix to be non-negative definite, each term on the main diagonal must be non-negative. The elements of the main diagonal of EεεT ε ε are the squared errors of the estimate of the individual parameters. Thus, for each parameter, the mean-squared estimation error can be no smaller than Eθ^i θ i 2 b i 2θ+((I+θb)F-1(I+θb)T)i,i θ i θ i 2 b i θ 2 I θ b F I θ b i i

This bound simplifies greatly if the estimator is unbiased ( b=0 b 0 ). In this case, the Cramér-Rao bound becomes Eθ^i θ i 2F-1i,i θ i θ i 2 F i i Thus, the mean-squared error for each parameter in a multiple-parameter, unbiased-estimator problem can be no smaller than the corresponding diagonal term in the inverse for the Fisher information matrix. In such problems, the estimate's error characteristics of any parameter become intertwined with the other parameters in a complicated way. Any estimator satisfying the Cramér-Rao bound with equality is said to be efficient.

Example 1

Let's evaluate the Cramér-Rao bound for the example we have been discussing: the estimation of the mean and variance of a length LL sequence of statistically independent Gaussian random variables. Let the estimate of the mean θ 1 θ 1 be the sample average θ^1=rlL θ 1 r l L ; as shown in the last example, this estimate is unbiased. Let the estimate of the variance θ 2 θ 2 be the unbiased estimate θ^2=rlθ^12L1 θ 2 r l θ 1 2 L 1 . Each term in the Fisher information matrix FF is given by the expected value of the paired products of derivatives of the logarithm of the likelihood function. Fi,j=Elnp r | θ r θ i lnp r | θ r θ j F i j θ i p r θ r θ j p r θ r The logarithm of the likelihood function is lnp r | θ r=((L2ln2π θ 2 ))12 θ 2 l =0L1rl θ 1 2 p r θ r L 2 2 θ 2 1 2 θ 2 l 0 L 1 r l θ 1 2 its partial derivatives are

lnp r | θ r θ 1 =1 θ 2 l =0L1rl θ 1 θ 1 p r θ r 1 θ 2 l 0 L 1 r l θ 1
(3)
lnp r | θ r θ 2 =L2 θ 2 +12 θ 2 2 l =0L1rl θ 1 2 θ 2 p r θ r L 2 θ 2 1 2 θ 2 2 l 0 L 1 r l θ 1 2
(4)
and its second partials are 2 lnp r | θ r θ 1 2 2 =L θ 2 θ 1 2 2 p r θ r L θ 2 2 lnp r | θ r θ 1 θ 2 =(1 θ 2 2) l =0L1rl θ 1 θ 1 θ 2 2 p r θ r 1 θ 2 2 l 0 L 1 r l θ 1 2 lnp r | θ r θ 2 θ 1 =(1 θ 2 2) l =0L1rl θ 1 θ 2 θ 1 2 p r θ r 1 θ 2 2 l 0 L 1 r l θ 1 2 lnp r | θ r θ 2 2 2 =L2 θ 2 21 θ 2 3 l =0L1rl θ 1 2 θ 2 2 2 p r θ r L 2 θ 2 2 1 θ 2 3 l 0 L 1 r l θ 1 2 The Fisher information matrix has the surprisingly simple form F=( L θ 2 0 0L2 θ 2 2 ) F L θ 2 0 0 L 2 θ 2 2 its inverse is also a diagonal matrix with the elements on the main diagonal equalling the reciprocal of those in the original matrix. Because of the zero-values off-diagonal entries in the Fisher information matrix, the errors between the corresponding estimates are not inter-dependent. In this problem, the mean-square estimation error can be no smaller than Eθ^1 θ 1 2 θ 2 L θ 1 θ 1 2 θ 2 L Eθ^2 θ 2 22 θ 2 2L θ 2 θ 2 2 2 θ 2 2 L

Note that nowhere in the preceding example did the form of the estimator enter into the computation of the bound. The only quantity used in the computation of the Cramér-Rao bound is the logarithm of the likelihood function, which is a consequence of the problem statement, not how it is solved. Only in the case of unbiased estimators is the bound independent of the estimators used.1 Because of this property, the Cramér-Rao bound is frequently used to assess the performance limits that can be obtained with an unbiased estimator in a particular problem. When bias is present, the exact form of the estimator's bias explicitly enters the computation of the bound. All too frequently, the unbiased form is used in situations where the existence of an unbiased estimator can be questioned. As we shall see, one such problem is time delay estimation, presumably of some importance to the reader. This misapplication of the unbiased Cramér-Rao arises from desperation: the estimator is so complicated and nonlinear that computing the bias is nearly impossible. As shown in this problem, biased estimators can yield mean-squared error smaller as well as larger than the unbiased version of the Cramér-Rao bound. Consequently, desperation can yield misinterpretation when a general result is misapplied.

In the single-parameter estimation problem, the Cramér-Rao bound incorporating bias has the well-known form 2

Eε2b2+1+dbd θ 2Ep r | θ r θ 2 ε 2 b 2 1 θ b 2 θ p r θ r 2
(5)
Note that the sign of the bias's derivative determines whether this bound is larger or potentially smaller than the unbiased version, which is obtained by setting the bias term to zero.

Efficiency

An interesting question arises: when, if ever, is the bound satisfied with equality? Recalling the details of the derivation of the bound, equality results when the quantity EαTxxTα α x x α equals zero. As this quantity is the expected value of the square of αTx α x , it can only equal zero if αTx=0 α x 0 . Substituting in the form of the column matrices αα and xx, equality in the Cramér-Rao bound results whenever

θlnp r | θ r=I+θbT-1F( θ ^rθb) θ p r θ r I θ b F θ r θ b
(6)
This complicated expression means that only if estimation problems (as expressed by the a priori density have the form of the right side of this equation can the mean-squared error equal the Cramér-Rao bound. In particular, the gradient of the log likelihood function can only depend on the observations through the estimator. In all other problems, the Cramér-Rao bound is a lower bound but not a tight one no estimator can have error characteristics that equal it. In such cases, we have limited insight into ultimate limitations on estimation error size with the Cramér-Rao bound. However, consider the case where the estimator is unbiased ( b=0 b 0 ). In addition, note the maximum likelihood estimate occurs when the gradient of the logarithm of the likelihood function equals zero: θlnp r | θ r=0 θ p r θ r 0 when θ=θ^ML θ θ ML . In this case, the condition for equality in the Cramér-Rao bound becomes F( θ ^θ^ML)=0 F θ θ ML 0 As the Fisher information matrix is positive-definite, we conclude that if the estimator equals the maximum likelihood estimator, equality in the Cramér-Rao bound can be satisfied with equality, only the maximum likelihood estimate will achieve it. To use estimation theoretic terminology, if an efficient estimate exists, it is the maximum likelihood estimate. This result stresses the importance of maximum likelihood estimates, despite the seemingly ad hoc manner by which they are defined.

Example 2

Consider the Gaussian example being examined so frequently in this section. The components of the gradient of the logarithm of the likelihood function were given earlier by Equation 3 and Equation 4. These expressions can be rearranged to reveal

( lnp r | θ r θ 1 lnp r | θ r θ 2 )=( L θ 2 (1L l lrl θ 1 ) L2 θ 2 +12 θ 2 2 l lrl θ 1 2 ) θ 1 p r θ r θ 2 p r θ r L θ 2 1 L l l r l θ 1 L 2 θ 2 1 2 θ 2 2 l l r l θ 1 2
(7)
The first component, which corresponds to the estimate of the mean, is expressed in the form required for the existence of an efficient estimate. The second component--the partial with respect to the variance θ 2 θ 2 --cannot be rewritten in a similar fashion. No unbiased, efficient estimate of the variance exits in this problem. The mean-squared error of the variance's unbiased estimate, but not the maximum likelihood estimate, is lower-bounded by 2 θ 2 2L12 2 θ 2 2 L 1 2 . This error is strictly greater than the Cramér-Rao bound of 2 θ 2 2L2 2 θ 2 2 L 2 . As no unbiased estimate of the variance can have a mean-squared error equal to the Cramér-Rao bound (no efficient estimate exists for the variance in the Gaussian problem), one presumes that the closeness of the error of our unbiased estimator to the bound implies that it possesses the smallest squared-error of any estimate. This presumption may, of course, be incorrect.

Properties of the Maximum Likelihood Estimator

The maximum likelihood estimate is the most used estimation technique for nonrandom parameters. Not only because of its close linkage to the Cramér-Rao bound, but also because it has desirable asymptotic properties in the context of any problem (Cramér (1946) pp. 500-506).

1. The maximum likelihood estimate is at least asymptotically unbiased. It may be unbiased for any number of observations (as in the estimation of the mean of a sequence of independent random variable) for some problems.
2. The maximum likelihood estimate is consistent.
3. The maximum likelihood estimates is asymptotically efficient. As more and more data are incorporated into an estimate, the Cramér-Rao bound accurately projects the best attainable error and the maximum likelihood estimate has those optimal characteristics.
4. Asymptotically, the maximum likelihood estimate is distributed as a Gaussian random variable. Because of the previous properties, the mean asymptotically equals the parameter and the covariance matrix is LFθ-1 L F θ .
Most would agree that a "good" estimator should have these properties. What these results do not provide is assessment of how many observations are needed for the asymptotic results to apply to some specified degree of precision. Consequently, they should be used with caution; for instance, some other estimator may have a smaller mean-square error than the maximum likelihood for a modest number of observations.

Footnotes

1. That's why we assumed in the example that we used an unbiased estimator for the variance.
2. Note that this bound differs somewhat from that originally given by Cramér (1946) p.480; his derivation ignores the additive bias term bbT b b .

References

1. H. Cramér. (1946). Mathematical Methods of Statistics. Princeton, NJ: Princeton University Press.

Content actions

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks