Skip to content Skip to navigation

Connexions

You are here: Home » Content » Minimum Probability of Error Decision Rule

Navigation

Content Actions

  • Download module PDF
  • Add to ...
    Add the module to:
    • My Favorites
    • A lens
    • An external social bookmarking service
    • My Favorites (What is 'My Favorites'?)
      'My Favorites' is a special kind of lens which you can use to bookmark modules and collections directly in Connexions. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need a Connexions account to use 'My Favorites'.
    • A lens (What is a lens?)

      Definition of a lens

      Lenses

      A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

      What is in a lens?

      Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

      Who can create a lens?

      Any individual Connexions member, a community, or a respected organization.

    • External bookmarks
  • E-mail the authors

Recently Viewed

This feature requires Javascript to be enabled.

Minimum Probability of Error Decision Rule

Module by: Clayton Scott, Robert Nowak

Consider the binary hypothesis test 0 : x f 0 x 0 : x f 0 x 1 : x f 1 x 1 : x f 1 x Let π i π i , denote the a priori probability of hypothesis i i . Suppose our decision rule declares " 0 0 is the true model" when x R 0 x R 0 , and it selects 1 1 when x R 1 x R 1 , where R 1 = R 0 R 1 R 0 . The probability of making an error, denoted P e P e , is

P e =Pr declare 0 and 1 true +Pr declare 1 and 0 true =Pr 1 Pr 0 | 1 +Pr 0 Pr 1 | 0 = R 0 π 1 f 1 xdx+ R 1 π 0 f 0 xdx P e declare 0 and 1 true declare 1 and 0 true 1 1 0 0 0 1 x R 0 π 1 f 1 x x R 1 π 0 f 0 x (1)
In this module, we study the minimum probability of error decision rule, which selects R 0 R 0 and R 1 R 1 so as to minimize the above expression.

Since an observation xx falls into one and only one of the decision regions R i R i , in order to minimize P e P e , we assign xx to the region for which the corresponding integrand in Equation 1 is smaller. Thus, we select x R 0 x R 0 if π 1 f 1 x< π 0 f 0 x π 1 f 1 x π 0 f 0 x , and x R 1 x R 1 if the inequality is reversed. This decision rule may be summarized concisely as Λx f 1 x f 0 x 0 1 π 0 π 1 η Λ x f 1 x f 0 x 0 1 π 0 π 1 η Here, Λx Λ x is called the likelihood ratio, ηη is called a threshold, and the overall decision rule is called the Likelihood Ratio Test.

Example 1

Normal with Common Variance, Uncommon Means

Consider the binary hypothesis test of a scalar xx 0 : x0σ2 0 : x 0 σ 2 1 : xμσ2 1 : x μ σ 2 where μμ and σ2 σ 2 are known, positive quantities. Suppose we observe a single measurement xx. The likelihood ratio is

Λx=12πσ2-x-μ22σ212πσ2-x22σ2=1σ2μx-μ22 Λ x 1 2 σ 2 x μ 2 2 σ 2 1 2 σ 2 x 2 2 σ 2 1 σ 2 μ x μ 2 2 (2)
and so the minimum probability of error decision rule is 1σ2μx-μ22 0 1 π 0 π 1 =η 1 σ 2 μ x μ 2 2 0 1 π 0 π 1 η The expression for Λx Λ x is somewhat complicated. By applying a sequence of monotonically increasing functions to both sides, we can obtain a simplified expression for the optimal decision rule without changing the rule. In this example, we apply the natural logarithm and rearrange terms to arrive at x 0 1 σ2μlnη+μ2γ x 0 1 σ 2 μ η μ 2 γ Here we have used the assumption μ>0 μ 0 . If μ<0 μ 0 , then dividing by μ μ would reverse the inequalities.

This form of the decision rule is much simpler: we just compare the observed value xx to a threshold γγ. Figure 1 depicts the two candidate densities and a possible value of γγ. If each hypothesis is a priori equally likely ( π 0 = π 1 =12 π 0 π 1 1 2 ), then γ=μ2 γ μ 2 . Figure 1 illustrates the case where π 0 > π 1 π 0 π 1 ( γ>μ2 γ μ 2 ).

Figure 1: The two candidate densities, and a threshold corresponding to π 0 > π 1 π 0 π 1
Subfigure 1.1
Subfigure 1.1 (GaussUncMeanComVarPd.png)
Subfigure 1.2
Subfigure 1.2 (GaussUncMeanComVarPf.png)
If we plot the two densities so that each is weighted by its a priori probability of occuring, the two curves will intersect at the threshold γγ (see Figure 2). (Can you explain why this is? Think back to our derivation of the LRT). This plot also offers a way to visualize the probability of error. Recall
P e = R 0 π 1 f 1 xdx+ R 1 π 0 f 0 xdx=x<γ π 1 f 1 xdx+x>γ π 0 f 0 xdx= π 1 P M + π 0 P F P e x R 0 π 1 f 1 x x R 1 π 0 f 0 x x x γ π 1 f 1 x x x γ π 0 f 0 x π 1 P M π 0 P F (3)
where P M P M and P F P F denote the miss and false alarm probabilities, respectively. These quantities are depicted in Figure 2.
Figure 2: The candidate densities weighted by their a priori probabilities. The shaded region is the probability of error for the optimal decision rule.
Figure 2 (GaussUncMeanComVarPe.png)
We can express P M P M and P F P F in terms of the Q-function as P e = π 1 Qμ-γσ+ π 0 Qγσ P e π 1 Q μ γ σ π 0 Q γ σ When π 0 = π 1 =12 π 0 π 1 1 2 , we have γ=μ2 γ μ 2 , and the error probability is P e =Qμ2σ P e Q μ 2 σ Since Qx Q x is monotonically decreasing, this says that the "difficulty" of the detection problem decreases with decreasing σσ and increasing μμ.

In the preceding example, computation of the probability of error involved a one-dimensional integral. If we had multiple observations, or vector-valued data, generalizing this procedure would involve multi-dimensional integrals over potentially complicated decision regions. Fortunately, in many cases, we can avoid this problem through the use of sufficient statistics.

Example 2

Suppose we have the same test as in the previous example, but now we have NN independent observations: 0 : x n 0σ2 , n = 1 , , N 0 : x n 0 σ 2 , n = 1 , , N 1 : x n μσ2 , n = 1 , , N 1 : x n μ σ 2 , n = 1 , , N where μ>0 μ 0 and σ2>0 σ 2 0 and both are known. The likelihood ratio is

Λx=n=1N12πσ2- x n -μ22σ2n=1N12πσ2- x n 22σ2=-12σ2n=1N x n -μ2-12σ2n=1N x n 2=12σ2n=1N2 x n μ-μ2=1σ2μn=1N x n -Nμ22 Λ x n 1 N 1 2 σ 2 x n μ 2 2 σ 2 n 1 N 1 2 σ 2 x n 2 2 σ 2 -1 2 σ 2 n 1 N x n μ 2 -1 2 σ 2 n 1 N x n 2 1 2 σ 2 n 1 N 2 x n μ μ 2 1 σ 2 μ n 1 N x n N μ 2 2 (4)
As in the previous example, we may apply the natural logarithm and rearrange terms to obtain an equivalent form of the LRT: tn=1N x n 0 1 σ2μlnη+Nμ2γ t n 1 N x n 0 1 σ 2 μ η N μ 2 γ The scalar quantity tt is a sufficient statistic for the mean. In order to evaluate the probability of error without resorting to a multi-dimensional integral, we can express P e P e in terms of tt as P e = π 1 Prt<γ| 1 true + π 0 Prt>γ| 0 true P e π 1 1 true t γ π 0 0 true t γ Now tt is a linear combination of normal variates, so it is itself normal. In particular, we have t=Ax t A x , where 11 1 1 is an NN-dimensional row vector of 1's, and xx is multivariate normal with mean 00 or μ=μμT μ μ μ , and covariance σ2I σ 2 I . Thus we have t | 0 A0Aσ2IAT=0Nσ2 t | 0 A 0 A σ 2 I A 0 N σ 2 t | 1 AμAσ2IAT=NμNσ2 t | 1 A μ A σ 2 I A N μ N σ 2 Therefore, we may write P e P e in terms of the Q-function as P e = π 1 QNμ-γNσ+ π 0 QγNσ P e π 1 Q N μ γ N σ π 0 Q γ N σ In the special case π 0 = π 1 =12 π 0 π 1 1 2 , P e =QNμσ P e Q N μ σ Since QQ is monotonically decreasing, this result provides mathematical support for something that is intuitively obvious: The performance of our decision rule improves with increasing NN and μμ, and decreasing σσ.

remark:

In the context of signal processing, the foregoing problem may be viewed as the problem of detecting a constant (DC) signal in additive white Gaussian noise: 0 : x n = w n , n = 1 , , N 0 : x n w n , n = 1 , , N 1 : x n =A+ w n , n = 1 , , N 1 : x n A w n , n = 1 , , N where AA is a known, fixed amplitude, and w n 0σ2 w n 0 σ 2 . Here AA corresponds to the mean μμ in the example.

The next example explores the minimum probability of error decision rule in a discrete setting.

Example 3

Repetition Code

Suppose we have a friend who is trying to transmit a bit (0 or 1) to us over a noisy channel. The channel causes an error in the transmission (that is, the bit is flipped) with probability pp, where 0p<12 0 p 1 2 , and pp is known. In order to increase the chance of a successful transmission, our friend sends the same bit NN times. Assume the NN transmissions are statistically independent. Under these assumptions, the bits you receive are Bernoulli random variables: x n Bernoulliθ x n Bernoulli θ . We are faced with the following hypothesis test:

0 0 θ=p θ p 0 sent
1 1 θ=1-p θ 1 p 1 sent
We decide to decode the received sequence x= x 1 x N T x x 1 x N by minimizing the probability of error. The likelihood ratio is
Λx=n=1N1-p x n p1- x n n=1Np x n 1-p1- x n =1-pkpN-kpk1-pN-k=1-pp2k-N Λ x n 1 N 1 p x n p 1 x n n 1 N p x n 1 p 1 x n 1 p k p N k p k 1 p N k 1 p p 2 k N (5)
where k=n=1N x n k n 1 N x n is the number of 1s received.

Note:

kk is a sufficient statistic for θθ.
The LRT is 1-pp2k-N 0 1 π 0 π 1 =η 1 p p 2 k N 0 1 π 0 π 1 η Taking the natural logarithm of both sides and rearranging, we have k 0 1 N2+12lnηln1-pp=γ k 0 1 N 2 1 2 η 1 p p γ In the case that both hypotheses are equally likely, the minimum probability of error decision is the "majority-vote" rule: Declare 1 1 if there are more 1s than 0s, declare 0 0 otherwise. In the event k=γ k γ , we may decide arbitrarily; the probability of error is the same either way. Let's adopt the convention that 0 0 is declared in this case.

To compute the probability of error of the optimal rule, write

P e = π 0 Pr declare 1 | 0 true + π 1 Pr declare 0 | 1 true = π 0 Prk>γ| 0 true + π 1 Prkγ| 1 true P e π 0 0 true declare 1 π 1 1 true declare 0 π 0 0