Skip to content Skip to navigation

Connexions

You are here: Home » Content » Robust Hypothesis Testing

Navigation

Recently Viewed

This feature requires Javascript to be enabled.

Robust Hypothesis Testing

Module by: Don Johnson. E-mail the author

User rating (How does the rating system work?)
Ratings

Ratings allow you to judge the quality of modules. If other users have ranked the module then its average rating is displayed below. Ratings are calculated on a scale from one star (Poor) to five stars (Excellent).

How to rate a module

Hover over the star that corresponds to the rating you wish to assign. Click on the star to add your rating. Your rating should be based on the quality of the content. You must have an account and be logged in to rate content.

:
(0 ratings)

Note: Your browser may not currently support MathML. See our browser support page for additional details. You can always view the correct math in the PDF version.

"Robust" is a technical word that implies insensitivity to modeling assumptions. As we have seen, some algorithms are robust while others are not. The intent of robust signal processing is to derive algorithms that are explicitly insensitive to the underlying signal and/or noise models. The way in which modeling incertainties are described is typified by the approach we shall use in the following discussion of robust model evaluation.

We assume that two nominal models of the generation of the statistically independent observations are known; the "actual" conditional probability density that describes the data under the assumptions of each model is not known exactly, but is "close" to the nominal. Letting p· p · be the actual probability density for each observation and p o · p o · the nominal, we say that (Huber; 1981) px=1ε p o x+ε p d x p x 1 ε p o x ε p d x where p d p d is the unknown disturbance density and εε is the uncertainty variable ( 0ε<1 0 ε 1 ). The uncertainty variable specifies how accurate the nominal model is through to be: the smaller εε, the smaller the contribution of the disturbance. It is assumed that some value for εε can be rationally assigned. The disturbance density is entirely unknown and is assumed to be any value probability density function. The expression given above is normalized so that p· p · has unit density ranging about it. An example of densities described this way are shown in Figure 1.

Figure 1: The nominal density, a Gaussian, is shown as a dashed line along with example densities derived from it having an uncertainty of 10% ( ε=0.1 ε 0.1 ). The left plot illustrates a symmetric contamination and the right an asymmetric one.
Figure 1 (contam.jpg)

The robust model evaluation problem is formally stated as 0 : pr| 0 r=l=0L11ε p o r l | 0 r l +ε p d r l | 0 r l 0 : p r 0 r l 0 L 1 1 ε p o r l 0 r l ε p d r l 0 r l 1 : pr| 1 r=l=0L11ε p o r l | 1 r l +ε p d r l | 1 r l 1 : p r 1 r l 0 L 1 1 ε p o r l 1 r l ε p d r l 1 r l The nominal densities under each model correspond to the conditional densities that we have been using until now. The disturbance densities are intended to model imprecision of both descriptions; hence, they are assumed to be different in the context of each model. Note that the measure of imprecision εε is assumed to be the same under either model.

To solve this problem, we take what is known as a minimax approach: find the worst-case combinations of a priori densities (max), then minimize the consequences of this situation (mini) according to some criterion. In this way, bad situations are handles as well as can be expected while the more tolerable ones are (hopefully) processed well also. The "mini" phase of the minimax solution corresponds to the likelihood ratio for many criteria. Thus, the "max" phase amounts to finding the worst-case probability distributions for the likelihood ratio test as described in the previous section: find the disturbance densities that can result in a constant value for the ratio over large domains of functions. When the two nominal distributions scaled by 1ε 1 ε can be brought together so that they are equal for some disturbance, then the likelihood ratio will be constant in that domain. Of most interest here is the case where the models differ only in the value of the mean, as shown in Figure 2. "Bringing the distributions together" means, in this case, scaling the distribution for 0 0 by 1ε 1 ε while adding the constant εε to the scaled distribution for 1 1 . One can shown in general that if the ratio of the nominal densities is monotonic, this procedure finds the worst-case distribution (Huber; 1965). The distributions overlap for small and for large values of the data with no overlap in a central region. As we shall see, the size of this central region depends greatly on the choice of εε. The tails of the worst-case distributions under each model are equal; conceptually, we consider that the worst-case densities have exponential tails in the model evaluation problem.

Figure 2: Nominal probability distributions for each model are shown. The worst-case distributions corresponding to these are also shown for the uncertainty variable εε equaling 0.10.1
Figure 2 (worstcase.jpg)

Letting p ω p ω denote the worst-case density, out minimax procedure results in the following densities for each model in the likelihood ratio test. p ω r l | i r l = p o r l | 0 r l C i - K | r l r l |if r l < r l p o r l | i r l if r l < r l < r l ′′ p o r l ′′ | 0 r l ′′ C i ′′ - K ′′ | r l r l ′′ |if r l > r l ′′ p ω r l i r l p o r l 0 r l C i K r l r l r l r l p o r l i r l r l r l r l ′′ p o r l ′′ 0 r l ′′ C i ′′ K ′′ r l r l ′′ r l r l ′′ The constants K K and K ′′ K ′′ determine the rate of decay of the exponential tails of these worst-case distributions. Their specific values have not yet been determined, but since they are not needed to compute the likelihood ratio, we don't need them. The constants C i C i and C i ′′ C i ′′ are required so that a unit-area density results. The likelihood ratio for each observation in the robust model evaluation problem becomes

Λ r l = C 1 C 0 if r l < r l p o r l | 1 r l p o r l | 0 r l if r l < r l < r l ′′ C 1 ′′ C 0 ′′ if r l ′′ < r l Λ r l C 1 C 0 r l r l p o r l 1 r l p o r l 0 r l r l r l r l ′′ C 1 ′′ C 0 ′′ r l ′′ r l (1)

The evaluation of the likelihood ratio depends entirely on determining values for r l r l and r l ′′ r l ′′ . The ratios C 1 C 0 = c C 1 C 0 c and C 1 ′′ C 0 ′′ = c ′′ C 1 ′′ C 0 ′′ c ′′ are easily found; in the tails, the value of the likelihood ration equals that at the edges of the central region for continuous densities. c = p o r l | 1 r l p o r l | 0 r l c p o r l 1 r l p o r l 0 r l c ′′ = p o r l | 1 r l ′′ p o r l | 0 r l ′′ c ′′ p o r l 1 r l ′′ p o r l 0 r l ′′ At the left boundary, for example, the distribution functions must satisfy 1εp r l | 0 r l =1εp r l | 1 r l +ε 1 ε p r l 0 r l 1 ε p r l 1 r l ε . In terms of the nominal densities, we have - r l p r l | 0 xp r l | 1 xdx=ε1ε x r l p r l 0 x p r l 1 x ε 1 ε This equation also applies the value right edge r l ′′ r l ′′ . Thus, for a given value of εε, the integral of the difference between the nominal densities should equal the ratio ε1ε ε 1 ε for two values. Figure 3 illustrates this effect for a Gaussian example. The bi-valued nature of this integral may not be valid for some values of εε; the value chosen for εε can be too large, making it impossible to distinguish the models! This unfortunate circumstance means that the uncertainties, as described by the value of εε, swamp the characteristics that distinguish the models. Thus, the models must be made more precise (more must be known about the data) so that smaller deviations from the nominal models can describe the observations.

Figure 3: The quantity used to determine the thresholds in the robust decision rule is shown when m=1 m 1 and σ2=5 σ 2 5 . Given a value of εε, a value on the vertical axis is selected and the corresponding values on the horizontal axis yield the thresholds.
Figure 3 (robthresh.png)

Returning to the likelihood ratio, the "robust" decision rule consists of computing a clipped function of each observed value, multiplying them together, and comparing the product computed over the observations with a threshold value. We assume that the nominal distributions of each of the LL observations are equal; the values of the boundaries r l r l and r l ′′ r l ′′ then do not depend on the observation index ll in this case. More simply, evaluating the logarithm of the quantities involved results in the decision rule l=0L1f r l 0 1 γ l 0 L 1 f r l 0 1 γ where the function f· f · is the clipping function given by f r l =ln c if r l < r ln p o r l | 1 r l p o r l | 0 r l if r < r l < r ′′ ln c ′′ if r ′′ < r l f r l c r l r p o r l 1 r l p o r l 0 r l r r l r ′′ c ′′ r ′′ r l If the observations were not identically distributed, then the clipping function would depend on the observation index. 1

Determining the threshold γγ that meets a specific performance criterion is difficult in the context of robust model evaluation. By the very nature of the problem formulation, some degree of uncertainty in the a priori densities exists. A specific false-alarm probability can be guaranteed by using the worst-case distribution under 0 0 . This density has the disturbance term begin an impulse at infinity. Thus, the expected value m c m c of a clipped observation f r l f r l with respect to the worst-case density is 1εEf r l +εln c ′′ 1 ε f r l ε c ′′ where the expected value in this expression is evaluated with respect to the nominal density under 0 0 . Similarly, an expression for the variance σ c 2 σ c 2 of the clipped observation can be derived. As the decision rule computes the sum of the clipped, statistically independent observations, the Central Limit Theorem can be applied to the sum, with the result that the worst-case false-alarm probability will approximately equal QγL m c L σ c Q γ L m c L σ c . The threshold γγ can then be found which will guarantee a specified performance level. Usually, the worst-case situation does not occur and the threshold set by this method is conservative. We can assess the degree of conservatism by evaluating these quantities under the nominal density rather than the worst-case density.

Example 1

Let's consider the Gaussian model evaluation problem we have been using so extensively. The individual observations are statistically independent and identically distributed with variance five: σ2=5 σ 2 5 . For model 0 0 , the mean is zero; for 1 1 , the mean is one. These nominal densities describe our best models for the observations, but we seek to allow slight deviations (10%) from them. The equation to be solved for the boundaries is the implicit equation QzmσQzσ=ε1ε Q z m σ Q z σ ε 1 ε The quantity on the left side of the equation is shown in Figure 3. If the uncertainty in the Gaussian model, as expressed by the parameter εε, is larger than 0.15 (for the example values of mm and σσ), no solution exists. Assuming that εε equals 0.1, the quantity ε1ε=0.11 ε 1 ε 0.11 and the clipping thresholds are r =-1.675 r -1.675 and r ′′ =2.675 r ′′ 2.675 . Between these values, the clipping function is given by the logarithm of the likelihood ratio, which is given by 2m r l m22σ2 2 m r l m 2 2 σ 2 .

We can decompose the clipping operation into a cascade of two operations: a linear scaling and shifting (as described by the previous expression) followed by a clipper having unit slope (see Figure 4).

Figure 4: The robust decision rule for the case of a Gaussian nominal density is shown. The observations are first scaled and shifted by quantities that depend on the mean mm and the variance σ2 σ 2 . The resulting quantity is then passed through a symmetric unit-slope clipping function whose clipping thresholds also depend on the parameters of the distributions.
Figure 4 (robdet1.png)
Let r l ˜ r l ˜ denote the result of the scaling and shifting operation. This quantity has mean m22σ2 m 2 2 σ 2 and variance m2σ2 m 2 σ 2 under 1 1 and the opposite signed mean and the same variance under 0 0 . The threshold values of the unit-clipping function are thus given by the solution of the equation Q z ˜ +m22σ2mσQ z ˜ m22σ2mσ=ε1ε Q z ˜ m 2 2 σ 2 m σ Q z ˜ m 2 2 σ 2 m σ ε 1 ε By substituting - z ˜ z ˜ for z ˜ z ˜ in this equation, we find that the two solutions are negatives of each other. We have now placed the unit-clipper's threshold values symmetrically about the origin; however, they do depend on the value of the mean mm. In this example, the threshold is numerically given by z ˜ =0.435 z ˜ 0.435 . The expected value of the result of the clipping function with respect to the worst-case density is given by the complicated expression Ef r l =1ε r Q- r σ+ r ′′ Q r ′′ σ+σ22π- r 22σ2- r ′′ 22σ2+ε r ′′ f r l 1 ε r Q r σ r ′′ Q r ′′ σ σ 2 2 r 2 2 σ 2 r ′′ 2 2 σ 2 ε r ′′ The variance is found in a similar fashion and can be used to find the threshold γγ on the sum of clipped observation values.

Footnotes

  1. Note that we only need to require that the nominal density remain constant throughout the observations. The disturbance density, and through it the density of each observation, could vary without disturbing the validity of this result! Such generality is typical when one loosens modeling restrictions, but, as we have said, this generality is bought with diminished performance.

References

  1. P.J. Huber. (1965). A robust version of the probability ratio test. Ann. Math. Stat., 36, 1753-1758.
  2. P.J. Huber. (1981). Robust Statistics. New York: John Wiley and Sons.

Content actions

Give Feedback:

E-mail the module author | Rate module ( How does the rating system work?)

Rating system

Ratings

Ratings allow you to judge the quality of modules. If other users have ranked the module then its average rating is displayed below. Ratings are calculated on a scale from one star (Poor) to five stars (Excellent).

How to rate a module

Hover over the star that corresponds to the rating you wish to assign. Click on the star to add your rating. Your rating should be based on the quality of the content. You must have an account and be logged in to rate content.

(0 ratings)

Download:

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections directly in Connexions. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need a Connexions account to use 'My Favorites'.

| A lens (?)

Definition of a lens

Lenses

A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual Connexions member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks