# Connexions

You are here: Home » Content » The Neyman-Pearson Criterion

### Recently Viewed

This feature requires Javascript to be enabled.

# The Neyman-Pearson Criterion

Module by: Clayton Scott. E-mail the author

In hypothesis testing, as in all other areas of statistical inference, there are two major schools of thought on designing good tests: Bayesian and frequentist (or classical). Consider the simple binary hypothesis testing problem 0 : x f 0 x 0 : x f 0 x 1 : x f 1 x 1 : x f 1 x In the Bayesian setup, the prior probability π i =Pr i π i i of each hypothesis occurring is assumed known. This approach to hypothesis testing is represented by the minimum Bayes risk criterion and the minimum probability of error criterion.

In some applications, however, it may not be reasonable to assign an a priori probability to a hypothesis. For example, what is the a priori probability of a supernova occurring in any particular region of the sky? What is the prior probability of being attacked by a ballistic missile? In such cases we need a decision rule that does not depend on making assumptions about the a priori probability of each hypothesis. Here the Neyman-Pearson criterion offers an alternative to the Bayesian framework.

The Neyman-Pearson criterion is stated in terms of certain probabilities associated with a particular hypothesis test. The relevant quantities are summarized in Table 1. Depending on the setting, different terminology is used.

Table 1
Statistics Signal Processing
Probability Name Notation Name Notation
P 0 declare 1 P 0 declare 1 size α α false-alarm probability P F P F
P 1 declare 1 P 1 declare 1 power β β detection probability P D P D

Here P i declare j P i declare j dentoes the probability that we declare hypothesis j j to be in effect when i i is actually in effect. The probabilities P 0 declare 0 P 0 declare 0 and P 1 declare 0 P 1 declare 0 (sometimes called the miss probability), are equal to 1 P F 1 P F and 1 P D 1 P D , respectively. Thus, P F P F and P D P D represent the two degrees of freedom in a binary hypothesis test. Note that P F P F and P D P D do not involve a priori probabilities of the hypotheses.

These two probabilities are related to each other through the decision regions. If R 1 R 1 is the decision region for 1 1 , we have P F = f 0 xd x P F x R 1 f 0 x P D = f 1 xd x P D x R 1 f 1 x The densities f i x f i x are nonnegative, so as R 1 R 1 shrinks, both probabilities tend to zero. As R 1 R 1 expands, both tend to one. The ideal case, where P D =1 P D 1 and P F =0 P F 0 , cannot occur unless the distributions do not overlap (i.e., f 0 x f 1 xd x =0 x f 0 x f 1 x 0 ). Therefore, in order to increase P D P D , we must also increase P F P F . This represents the fundamental tradeoff in hypothesis testing and detection theory.

## Example 1

Consider the simple binary hypothesis test of a scalar measurement xx: 0 : x𝒩01 0 : x 0 1 1 : x𝒩11 1 : x 1 1 Suppose we use a threshold test x 0 1 γ x 0 1 γ where γR γ is a free parameter. Then the false alarm and detection probabilities are P F =γ12πet22d t =Qγ P F t γ 1 2 t 2 2 Q γ P D =γ12πet122d t =Qγ1 P D t γ 1 2 t 1 2 2 Q γ 1 where QQ denotes the Q-function. These quantities are depicted in Figure 1.

Since the QQ-function is monotonicaly decreasing, it is evident that both P D P D and P F P F decay to zero as γγ increases. There is also an explicit relationship P D =QQ-1 P F 1 P D Q Q P F 1 A common means of displaying this relationship is with a receiver operating characteristic (ROC) curve, which is nothing more than a plot of P D P D versus P F P F (Figure 2).

## The Neyman-Pearson Lemma: A First Look

The Neyman-Pearson criterion says that we should construct our decision rule to have maximum probability of detection while not allowing the probability of false alarm to exceed a certain value αα. In other words, the optimal detector according to the Neyman-Pearson criterion is the solution to the following constrainted optimization problem:

### Neyman-Pearson Criterion

max P D , such that P F α P D , such that P F α
(1)

The maximization is over all decision rules (equivalently, over all decision regions R 0 R 0 , R 1 R 1 ). Using different terminology, the Neyman-Pearson criterion selects the most powerful test of size (not exceeding) αα.

Fortunately, the above optimization problem has an explicit solution. This is given by the celebrated Neyman-Pearson lemma, which we now state. To ease the exposition, our initial statement of this result only applies to continuous random variables, and places a technical condition on the densities. A more general statement is given later in the module.

### Theorem 1: Neyman-Pearson Lemma: initial statement

Consider the test 0 : x f 0 x 0 : x f 0 x 1 : x f 1 x 1 : x f 1 x where f i x f i x is a density. Define Λx= f 1 x f 0 x Λ x f 1 x f 0 x , and assume that Λx Λ x satisfies the condition that for each γR γ , Λx Λ x takes on the value γγ with probability zero under hypothesis 0 0 . The solution to the optimization problem in Equation 1 is given by Λx= f 1 x f 0 x 0 1 η Λ x f 1 x f 0 x 0 1 η where ηη is such that P F = f 0 xd x =α P F x x Λ x η f 0 x α If α=0 α 0 , then η= η . The optimal test is unique up to a set of probability zero under 0 0 and 1 1 .

The optimal decision rule is called the likelihood ratio test. Λx Λ x is the likelihood ratio, and ηη is a threshold. Observe that neither the likelihood ratio nor the threshold depends on the a priori probabilities Pr i i . they depend only on the conditional densities f i f i and the size constraint αα. The threshold can often be solved for as a function of αα, as the next example shows.

## Example 2

Continuing with Example 1, suppose we wish to design a Neyman-Pearson decision rule with size constraint αα. We have

Λx=12πex12212πex22=ex12 Λ x 1 2 x 1 2 2 1 2 x 2 2 x 1 2
(2)
By taking the natural logarithm of both sides of the LRT and rarranging terms, the decision rule is not changed, and we obtain x 0 1 lnη+12γ x 0 1 η 1 2 γ Thus, the optimal rule is in fact a thresholding rule like we considered in Example 1. The false-alarm probability was seen to be P F =Qγ P F Q γ Thus, we may express the value of γγ required by the Neyman-Pearson lemma in terms of αα: γ=Q-1α γ Q α

## Sufficient Statistics and Monotonic Transformations

For hypothesis testing involving multiple or vector-valued data, direct evaluation of the size ( P F P F ) and power ( P D P D ) of a Neyman-Pearson decision rule would require integration over multi-dimensional, and potentially complicated decision regions. In many cases, however, this can be avoided by simplifying the LRT to a test of the form t 0 1 γ t 0 1 γ where the test statistic t=Tx t T x is a sufficient statistic for the data. Such a simplified form is arrived at by modifying both sides of the LRT with montonically increasing transformations, and by algebraic simplifications. Since the modifications do not change the decision rule, we may calculate P F P F and P D P D in terms of the sufficient statistic. For example, the false-alarm probability may be written

P F =Pr declare 1 = f 0 td t P F declare 1 t t t γ f 0 t
(3)
where f 0 t f 0 t denotes the density of t t under 0 0 . Since t t is typically of lower dimension than xx, evaluation of P F P F and P D P D can be greatly simplified. The key is being able to reduce the LRT to a threshold test involving a sufficient statistic for which we know the distribution.

### Example 3

#### Common Variances, Uncommon Means

Let's design a Neyman-Pearson decision rule of size αα for the problem 0 : x𝒩0σ2I 0 : x 0 σ 2 I 1 : x𝒩μ1σ2I 1 : x μ 1 σ 2 I where μ>0 μ 0 , σ2>0 σ 2 0 are known, 0=00T 0 0 0 , 1=11T 1 1 1 are NN-dimensional vectors, and II is the N N×NN identity matrix. The likelihood ratio is

Λx= n =1N12πσ2e x n μ22σ2 n =1N12πσ2e x n 22σ2=e n =1N x n μ22σ2e n =1N x n 22σ2=e12σ2 n =1N2 x n μμ2=e1σ2(Nμ22+μ n =1N x n ) Λ x n 1 N 1 2 σ 2 x n μ 2 2 σ 2 n 1 N 1 2 σ 2 x n 2 2 σ 2 n 1 N x n μ 2 2 σ 2 n 1 N x n 2 2 σ 2 1 2 σ 2 n 1 N 2 x n μ μ 2 1 σ 2 N μ 2 2 μ n 1 N x n
(4)
To simplify the test further we may apply the natural logarithm and rearrange terms to obtain t n =1N x n 0 1 σ2μlnη+Nμ2γ t n 1 N x n 0 1 σ 2 μ η N μ 2 γ
##### Note:
We have used the assumption μ>0 μ 0 . If μ<0 μ 0 , then division by μμ is not a monotonically increasing operation, and the inequalities would be reversed.
The test statistic tt is sufficient for the unknown mean. To set the threshold γγ, we write the false-alarm probability (size) as P F =Prt>γ= f 0 td t P F t γ t t t γ f 0 t To evaluate P F P F , we need to know the density of tt under 0 0 . Fortunately, tt is the sum of normal variates, so it is again normally distributed. In particular, we have t=Ax t A x , where A=1T A 1 , so t𝒩A0A(σ2I)AT=𝒩0Nσ2 t A 0 A σ 2 I A 0 N σ 2 under 0 0 . Therefore, we may write P F P F in terms of the Q-function as P F =QγNσ P F Q γ N σ The threshold is thus determined by γ=NσQ-1α γ N σ Q α Under 1 1 , we have t𝒩A1A(σ2I)AT=𝒩NμNσ2 t A 1 A σ 2 I A N μ N σ 2 and so the detection probability (power) is P D =Prt>γ=QγNμNσ P D t γ Q γ N μ N σ Writing P D P D as a function of P F P F , the ROC curve is given by P D =QQ-1 P F Nμσ P D Q Q P F N μ σ The quantity Nμσ N μ σ is called the signal-to-noise ratio. As its name suggests, a larger SNR corresponds to improved performance of the Neyman-Pearson decision rule.
##### remark:
In the context of signal processing, the foregoing problem may be viewed as the problem of detecting a constant (DC) signal in additive white Gaussian noise: 0 : x n = w n , n = 1 , , N 0 : x n w n , n = 1 , , N 1 : x n =A+ w n , n = 1 , , N 1 : x n A w n , n = 1 , , N where AA is a known, fixed amplitude, and w n 𝒩0σ2 w n 0 σ 2 . Here AA corresponds to the mean μμ in the example.

## The Neyman-Pearson Lemma: General Case

In our initial statement of the Neyman-Pearson Lemma, we assumed that for all ηη, the set x x Λx=η x Λ x η x had probability zero under 0 0 . This eliminated many important problems from consideration, including tests of discrete data. In this section we remove this restriction.

It is helpful to introduce a more general way of writing decision rules. Let φφ be a function of the data xx with φx 0 1 φ x 0 1 . φφ defines the decision rule "declare 1 1 with probability φx φ x ." In other words, upon observing xx, we flip a " φx φ x coin." If it turns up heads, we declare 1 1 ; otherwise we declare 0 0 . Thus far, we have only considered rules with φx01 φ x 0 1

### Theorem 2: Neyman-Pearson Lemma

Consider the hypothesis testing problem 0 : x f 0 x 0 : x f 0 x 1 : x f 1 x 1 : x f 1 x where f 0 f 0 and f 1 f 1 are both pdfs or both pmfs. Let α 0 1 α 0 1 be the size (false-alarm probability) constraint. The decision rule φx={1  if  Λx>ηρ  if  Λx=η0  if  Λx<η φ x 1 Λ x η ρ Λ x η 0 Λ x η is the most powerful test of size αα, where ηη and ρρ are uniquely determined by requiring P F =α P F α . If α=0 α 0 , we take η= η , ρ=0 ρ 0 . This test is unique up to sets of probability zero under 0 0 and 1 1 .

When PrΛx=η>0 Λ x η 0 for certain η η, we choose ηη and ρρ as follows: Write P F =PrΛx>η+ρPrΛx=η P F Λ x η ρ Λ x η Choose ηη such that PrΛx>ηαPrΛxη Λ x η α Λ x η Then choose ρρ such that ρPrΛx=η=αPrΛx<η ρ Λ x η α Λ x η

### Example 4

#### Repetition Code

Suppose we have a friend who is trying to transmit a bit (0 or 1) to us over a noisy channel. The channel causes an error in the transmission (that is, the bit is flipped) with probability pp, where 0p<12 0 p 1 2 , and pp is known. In order to increase the chance of a successful transmission, our friend sends the same bit NN times. Assume the NN transmissions are statistically independent. Under these assumptions, the bits you receive are Bernoulli random variables: x n Bernoulliθ x n Bernoulli θ . We are faced with the following hypothesis test: 0 : θ=p (0 sent) 0 : θ p (0 sent) 1 : θ=1p (1 sent) 1 : θ 1 p (1 sent) We decide to decode the received sequence x= x 1 x N T x x 1 x N by designing a Neyman-Pearson rule. The likelihood ratio is

Λx= n =1N1p x n p1 x n n =1Np x n 1p1 x n =1pkpNkpk1pNk=1pp2kN Λ x n 1 N 1 p x n p 1 x n n 1 N p x n 1 p 1 x n 1 p k p N k p k 1 p N k 1 p p 2 k N
(5)
where k= n =1N x n k n 1 N x n is the number of 1s received.
##### Note:
kk is a sufficient statistic for θθ.
The LRT is 1pp2kN 0 1 η 1 p p 2 k N 0 1 η Taking the natural logarithm of both sides and rearranging, we have k 0 1 N2+12lnηln1pp=γ k 0 1 N 2 1 2 η 1 p p γ The false alarm probability is
P F =Prk>γ+ρPrk=γ= k =γ+1NNkpk1pNk+ρNγpγ1pNγ P F k γ ρ k γ k γ 1 N N k p k 1 p N k ρ N γ p γ 1 p N γ
(6)
γγ and ρρ are chosen so that P F =α P F α , as described above.

The corresponding detection probability is

P D =Prk>γ+ρPrk=γ= k =γ+1NNk1pkpNk+ρNγ1pγpNγ P D k γ ρ k γ k γ 1 N N k 1 p k p N k ρ N γ 1 p γ p N γ
(7)

## Problems

### Exercise 1

Design a hypothesis testing problem involving continous random variables such that PrΛx=η>0 Λ x η 0 for certain values of ηη. Write down the false-alarm probability as a function of the threshold. Make as general a statement as possible about when the technical condition is satisfied.

### Exercise 2

Consider the scalar hypothesis testing problem 0 : x f 0 x 0 : x f 0 x 1 : x f 1 x 1 : x f 1 x where f i x=1π(1+xi2) , i=01 f i x 1 1 x i 2 , i 0 1

#### 2.a)

Write down the likelihood ratio test.

#### 2.b)

Determine the decision regions as a function of η 1 η 1 for all η>0 η 0 . Draw a representative of each. What are the "critical" values of ηη?

##### Hint:
There are five distinct cases.

#### 2.c)

Compute the size and power ( P F P F and P D P D ) in terms of the threshold η 1 η 1 and plot the ROC.

##### Hint:
11+x2d x =arctanx x 1 1 x 2 x

#### 2.d)

Suppose we decide to use a simple threshold test x 0 1 η x 0 1 η instead of the Neyman-Pearson rule. Does our performance 0 0 suffer much? Plot the ROC for this decision rule on the same graph as for the previous ROC.

### Exercise 3

Suppose we observe NN independent realizations of a Poisson random variable kk with intensity parameter λλ: fk=eλλkk! f k λ λ k k We must decide which of two intensities is in effect: 0 : λ= λ 0 0 : λ λ 0 1 : λ= λ 1 1 : λ λ 1 where λ 0 < λ 1 λ 0 λ 1 .

#### 3.a)

Write down the likelihood ratio test.

#### 3.b)

Simplify the LRT to a test statistic involving only a sufficient statistic. Apply a monotonically increasing transformation to simplify further.

#### 3.c)

Determine the distribution of the sufficient statistic under both hypotheses.

##### Hint:
Use the characteristic function to show that a sum of IID Poisson variates is again Poisson distributed.

#### 3.d)

Derive an expression for the probability of error.

#### 3.e)

Assuming the two hypotheses are equally likely, and λ 0 =5 λ 0 5 and λ 1 =6 λ 1 6 , what is the minimum number NN of observations needed to attain a false-alarm probability no greater than 0.01?

##### Hint:
If you have numerical trouble, try rewriting the log-factorial so as to avoid evaluating the factorial of large integers.

### Exercise 4

In Example 3, suppose p=0.1 p 0.1 . What is the smallest value of NN needed to ensure P F 0.01 P F 0.01 ? What is P D P D in this case?

## Content actions

### Give feedback:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks