# Connexions

You are here: Home » Content » Criteria in Hypothesis Testing

### Recently Viewed

This feature requires Javascript to be enabled.

# Criteria in Hypothesis Testing

Module by: Don Johnson. E-mail the author

The criterion used in the previous section - minimize the average cost of an incorrect decision - may seem to be a contrived way of quantifying decisions. Well, often it is. For example, the Bayesian decision rule depends explicitly on the a priori probabilities; a rational method of assigning values to these - either by experiment or through true knowledge of the relative likelihood of each model - may be unreasonable. In this section, we develop alternative decision rules that try to answer such objections. One essential point will emerge from these considerations: the fundamental nature of the decision rule does not change with choice of optimization criterion. Even criteria remote from error measures can result in the likelihood ratio test (see this problem). Such results do not occur often in signal processing and underline the likelihood ratio test's significance.

## Maximum Probability of a Correct Decision

As only one model can describe any given set of data (the models are mutually exclusive), the probability of being correct P c P c for distinguishing two models is given by P c =Pr say   0   when   0   true +Pr say   1   when   1   true P c say   0   when   0   true say   1   when   1   true We wish to determine the optimum decision region placement Expressing the probability correct in terms of the likelihood functions p r | i r p r i r , the a priori probabilities, and the decision regions, P c = π 0 p r | 0 rd r + π 1 p r | 1 rd r P c r 0 π 0 p r 0 r r 1 π 1 p r 1 r We want to maximize P c P c by selecting the decision regions 0 0 and 0 0 . The probability correct is maximized by associating each value of r r with the largest term in the expression for P c P c . Decision region 0 0 , for example, is defined by the collection of values of r r for which the first term is largest. As all of the quantities involved are non-negative, the decision rule maximizing the probability of a correct decision is

### correct decision:

Given r r, choose i i for which the product π i p r | i r π i p r i r is largest.
Simple manipulations lead to the likelihood ratio test. p r | 1 rp r | 0 r 0 1 π 0 π 1 p r 1 r p r 0 r 0 1 π 0 π 1 Note that if the Bayes' costs were chosen so that C i i =0 C i i 0 and C i j =C C i j C , ( ij i j ), we would have the same threshold as in the previous section.

To evaluate the quality of the decision rule, we usually compute the probability of error P e P e rather than the probability of being correct. This quantity can be expressed in terms of the observations, the likelihood ratio, and the sufficient statistic.

P e = π 0 p r | 0 rd r + π 1 p r | 1 rd r = π 0 p Λ | 0 Λd Λ + π 1 p Λ | 1 Λd Λ = π 0 p ϒ | 0 ϒd ϒ + π 1 p ϒ | 1 ϒd ϒ P e π 0 r 1 p r 0 r π 1 r 0 p r 1 r π 0 Λ Λ η p Λ 0 Λ π 1 Λ Λ η p Λ 1 Λ π 0 ϒ ϒ γ p ϒ 0 ϒ π 1 ϒ ϒ γ p ϒ 1 ϒ
(1)
When the likelihood ratio is non-monotonic, the first expression is most difficult to evaluate. When monotonic, the middle expression proves the most difficult. Furthermore, these expressions point out that the likelihood ratio and the sufficient statistic can be considered a function of the observations r r; hence, they are random variables and have probability densities for each model. Another aspect of the resulting probability of error is that no other decision rule can yield a lower probability of error. This statement is obvious as we minimized the probability of error in deriving the likelihood ratio test. The point is that these expressions represent a lower bound on performance (as assessed by the probability of error). This probability will be non-zero if the conditional densities overlap over some range of values of r r, such as occurred in the previous example. In this region of overlap, the observed values are ambiguous: either model is consistent with the observations. Our "optimum" decision rule operates in such regions by selecting that model which is most likely (has the highest probability) of generating any particular value.

## Neyman-Pearson Criterion

Situations occur frequently where assigning or measuring the a priori probabilities P i P i is unreasonable. For example, just what is the a priori probability of a supernova occurring in any particular region of the sky? We clearly need a model evaluation procedure which can function without a priori probabilities. This kind of test results when the so-called Neyman-Pearson criterion is used to derive the decision rule. The ideas behind and decision rules derived with the Neyman-Pearson criterion (Neyman and Pearson) will serve us well in sequel; their result is important!

Using nomenclature from radar, where model 1 1 represents the presence of a target and 0 0 its absence, the various types of correct and incorrect decisions have the following names (Woodward, pp. 127-129).1

• Detection: we say it's there when it is; P D =Pr say   1 | 1   true P D Pr say   1 | 1   true
• False-alarm: we say it's there when it's not; P F =Pr say   1 | 0   true P F Pr say   1 | 0   true
• Miss: we say it's not there when it is; P M =Pr say   0 | 1   true P M Pr say   0 | 1   true
The remaining probability Pr say   0 | 0   true 0   true say   0 has historically been left nameless and equals 1 P F 1 P F . We should also note that the detection and miss probabilities are related by P M =1 P D P M 1 P D . As these are conditional probabilities, they do not depend on the a priori probabilities and the two probabilities P F P F and P D P D characterize the errors when any decision rule is used.

These two probabilities are related to each other in an interesting way. Expressing these quantities in terms of the decision regions and the likelihood functions, we have P F =p r | 0 rd r P F r 1 p r 0 r P D =p r | 1 rd r P D r 1 p r 1 r As the region 1 1 shrinks, both of these probabilities tend toward zero; as 1 1 expands to engulf the entire range of observation values, they both tend toward unity. This rather direct relationship between P D P D and P F P F does not mean that they equal each other; in most cases, as 1 1 expands, P D P D increases more rapidly than P F P F (we had better be right more often than we are wrong!). However, the "ultimate" situation where a rule is always right and never wrong ( P D =1 P D 1 , P F =0 P F 0 ) cannot occur when the conditional distributions overlap. Thus, to increase the detection probability we must also allow the false-alarm probability to increase. This behavior represents the fundamental tradeoff in hypothesis testing and detection theory.

One can attempt to impose a performance criterion that depends only on these probabilities with the consequent decision rule not depending on the a priori probabilities. The Neyman-Pearson criterion assumes that the false-alarm probability is constrained to be less than or equal to a specified value α α while we attempt to maximize the detection probability P D P D . P F , P F α:max 1 1 P D P F P F α 1 P D A subtlety of the succeeding solution is that the underlying probability distribution functions may not be continuous, with the result that P F P F can never equal the constraining value α α. Furthermore, an (unlikely) possibility is that the optimum value for the false-alarm probability is somewhat less than the criterion value. Assume, therefore, that we rephrase the optimization problem by requiring that the false-alarm probability equal a value α α that is less than or equal to α α.

This optimization problem can be solved using Lagrange multipliers (see Constrained Optimization); we seek to find the decision rule that maximizes F= P D +λ( P F α) F P D λ P F α where λ λ is the Lagrange multiplier. This optimization technique amounts to finding the decision rule that maximizes F F, then finding the value of the multiplier that allows the criterion to be satisfied. As is usual in the derivation of optimum decision rules, we maximize these quantities with respect to the decision regions. Expressing P D P D and P F P F in terms of them, we have

F=p r | 1 rd r +λ(p r | 0 rd r α)=(λα)+p r | 1 r+λp r | 0 rd r F r 1 p r 1 r λ r 1 p r 0 r α λ α r 1 p r 1 r λ p r 0 r
(2)
To maximize this quantity with respect to 1 1 , we need only to integrate over those regions of r r where the integrand is positive. The region 1 1 thus corresponds to those values of r r where p r | 1 r>(λp r | 0 r) p r 1 r λ p r 0 r and the resulting decision rule is p r | 1 rp r | 0 r 0 1 (λ) p r 1 r p r 0 r 0 1 λ The ubiquitous likelihood ratio test again appears; it is indeed the fundamental quantity in hypothesis testing. Using the logarithm of the likelihood ratio or the sufficient statistic, this result can be expressed as either lnΛr 0 1 lnλ Λ r 0 1 λ or ϒr 0 1 γ ϒ r 0 1 γ

We have not as yet found a value for the threshold. The false-alarm probability can be expressed in terms of the Neyman-Pearson threshold in two (useful) ways.

P F =λp Λ | 0 Λd Λ =γp ϒ | 0 ϒd ϒ P F Λ λ p Λ 0 Λ ϒ γ p ϒ 0 ϒ
(3)
One of these implicit equations must be solved for the threshold by setting P F P F equal to α α . The selection of which to use is usually based on pragmatic considerations: the easiest to compute. From the previous discussion of the relationship between the detection and false-alarm probabilities, we find that to maximize P D P D we must allow α α to be as large as possible while remaining less than α α. Thus, we want to find the smallest value of λ λ (note the minus sign) consistent with the constraint. Computation of the threshold is problem-dependent, but a solution always exists.

### Example 1

An important application of the likelihood ratio test occurs when r r is a Gaussian random vector for each model. Suppose the models correspond to Gaussian random vectors having different mean values but sharing the same identity covariance.

• 0 0 : r𝒩0σ2I r 0 σ 2 I
• 1 1 : r𝒩mσ2I r m σ 2 I
Thus, r r is of dimension L L and has statistically independent, equal variance components. The vector of means m= m 0 m L 1 T m m 0 m L 1 distinguishes the two models. The likelihood functions associated this problem are p r | 0 r= l =0L112πσ2e(1/2 r l σ2) p r 0 r l 0 L 1 1 2 σ 2 12 r l σ 2 p r | 1 r= l =0L112πσ2e(1/2 r l m l σ2) p r 1 r l 0 L 1 1 2 σ 2 12 r l m l σ 2 The likelihood ratio Λr Λ r becomes Λr= l =0L1e(1/2 r l m l σ2) l =0L1e(1/2 r l σ2) Λ r l 0 L 1 12 r l m l σ 2 l 0 L 1 12 r l σ 2 This expression for the likelihood ratio is complicated. In the Gaussian case (and many others), we use the logarithm the reduce the complexity of the likelihood ratio and form a sufficient statistic.
lnΛr= l =0L1-1/2 r l m l 2σ2+1/2 r l 2σ2=1σ2 l =0L1 m l r l 12σ2 l =0L1 m l 2 Λ r l 0 L 1 -12 r l m l 2 σ 2 12 r l 2 σ 2 1 σ 2 l 0 L 1 m l r l 1 2 σ 2 l 0 L 1 m l 2
(4)
The likelihood ratio test then has the much simpler, but equivalent form l =0L1( m l r l ) 0 1 (σ2lnη)+1/2 l =0L1 m l 2 l 0 L 1 m l r l 0 1 σ 2 η 12 l 0 L 1 m l 2 To focus on the model evaluation aspects of this problem, let's assume means be equal to a positive constant: m l =m m l m ( 0 0 ).2 l =0L1 r l 0 1 (σ2mlnη)+Lm2 l 0 L 1 r l 0 1 σ 2 m η L m 2 Note that all that need be known about the observations r l r l is their sum. This quantity is the sufficient statistic for the Gaussian problem: ϒr= r l ϒ r r l and γ=σ2lnηm+Lm2 γ σ 2 η m L m 2 .

When trying to compute the probability of error or the threshold in the Neyman-Pearson criterion, we must find the conditional probability density of one of the decision statistics: the likelihood ratio, the log-likelihood, or the sufficient statistic. The log-likelihood and the sufficient statistic are quite similar in this problem, but clearly we should use the latter. One practical property of the sufficient statistic is that it usually simplifies computations. For this Gaussian example, the sufficient statistic is a Gaussian random variable under each model.

• 0 0 : ϒr𝒩0Lσ2 ϒ r 0 L σ 2
• 1 1 : ϒr𝒩LmLσ2 ϒ r L m L σ 2
To find the probability of error from Equation 1, we must evaluate the area under a Gaussian probability density function. These integrals are succinctly expressed in terms of Qx Q x , which denotes the probability that a unit-variance, zero-mean Gaussian random variable exceeds x x (see Probability and Stochastic Processes). As 1Qx=Qx 1 Q x Q x , the probability of error can be written as P e = π 1 QLmγLσ+ π 0 QγLσ P e π 1 Q L m γ L σ π 0 Q γ L σ An interesting special case occurs when π 0 =1/2= π 1 π 0 12 π 1 . In this case, γ=Lm2 γ L m 2 and the probability of error becomes P e =QLm2σ P e Q L m 2 σ As Q· Q · is a monotonically decreasing function, the probability of error decreases with increasing values of the ratio Lm2σ L m 2 σ . However, as shown in this figure, Q· Q · decreases in a nonlinear fashion. Thus, increasing m m by a factor of two may decrease the probability of error by a larger or a smaller factor; the amount of change depends on the initial value of the ratio.

To find the threshold for the Neyman-Pearson test from the expressions given on Equation 3, we need the area under a Gaussian density.

P F =QγLσ2=α P F Q γ L σ 2 α
(5)
As Q· Q · is a monotonic and continuous function, we can now set α α equal to the criterion value α α with the result γ=LσQ-1α γ L σ Q α where Q-1· Q · denotes the inverse function of Q· Q · . The solution of this equation cannot be performed analytically as no closed form expression exists for Q· Q · (much less its inverse function); the criterion value must be found from tables or numerical routines. Because Gaussian problems arise frequently, the Table 1 accompanying table provides numeric values for this quantity at the decade points.
Table 1: The table displays interesting values for Q-1· Q · that can be used to determine thresholds in the Neyman-Pearson variant of the likelihood ratio test. Note how little the inverse function changes for decade changes in its argument; Q· Q · is indeed very nonlinear.
x x Q-1x Q x
10-1 10 -1 1.281
10-2 10 -2 2.396
10-3 10 -3 3.090
10-4 10 -4 3.719
10-5 10 -5 4.265
10-6 10 -6 4.754
The detection probability is given by P D =QQ-1αLmσ P D Q Q α L m σ

## Footnotes

1. In hypothesis testing, a false-alarm is known as a type I error and a miss a type II error.
2. Why did the authors assume that the mean was positive? What would happen if it were negative?

## References

1. J. Neyman and E.S. Pearson. (1933). On the problem of the most efficient tests of statistical hypotheses. Phil. Trans. Roy. Soc. Ser. A, 231, 289-337.
2. P.M. Woodward. (1964). Probability and Information Theory, with Applications to Radar. (second edition). Oxford: Pergamon Press.

## Content actions

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks