Consider the binary hypothesis test
ℋ
0
:
x∼
f
0
x
ℋ
0
:
x
f
0
x
ℋ
1
:
x∼
f
1
x
ℋ
1
:
x
f
1
x
Let
π
i
π
i
, denote the a priori probability of
hypothesis
ℋ
i
ℋ
i
. Suppose our decision rule declares
"
ℋ
0
ℋ
0
is the true model" when
x∈
R
0
x
R
0
, and it selects
ℋ
1
ℋ
1
when
x∈
R
1
x
R
1
, where
R
1
=
R
0
′
R
1
R
0
. The probability of making an error, denoted
P
e
P
e
, is
P
e
=Pr
declare
ℋ
0
and
ℋ
1
true
+Pr
declare
ℋ
1
and
ℋ
0
true
=Pr
ℋ
1
Pr
ℋ
0
|
ℋ
1
+Pr
ℋ
0
Pr
ℋ
1
|
ℋ
0
=∫
R
0
π
1
f
1
xdx+∫
R
1
π
0
f
0
xdx
P
e
declare
ℋ
0
and
ℋ
1
true
declare
ℋ
1
and
ℋ
0
true
ℋ
1
ℋ
1
ℋ
0
ℋ
0
ℋ
0
ℋ
1
x
R
0
π
1
f
1
x
x
R
1
π
0
f
0
x
(1)
In this module, we study the minimum probability of error
decision rule, which selects
R
0
R
0
and
R
1
R
1
so as to minimize the above expression.
Since an observation xx falls into one and only one of the
decision regions
R
i
R
i
, in order to minimize
P
e
P
e
,
we assign xx to the
region for which the corresponding integrand in Equation 1
is smaller. Thus, we select
x∈
R
0
x
R
0
if
π
1
f
1
x<
π
0
f
0
x
π
1
f
1
x
π
0
f
0
x
, and
x∈
R
1
x
R
1
if the inequality is reversed. This decision rule may
be summarized concisely as
Λx≡
f
1
x
f
0
x
≷
ℋ
0
ℋ
1
π
0
π
1
≡η
Λ
x
f
1
x
f
0
x
≷
ℋ
0
ℋ
1
π
0
π
1
η
Here,
Λx
Λ
x
is called the likelihood ratio,
ηη is called a
threshold, and the overall decision rule is called
the Likelihood Ratio Test.
Consider the binary hypothesis test of a scalar
xx
ℋ
0
:
x∼0σ2
ℋ
0
:
x
0
σ
2
ℋ
1
:
x∼μσ2
ℋ
1
:
x
μ
σ
2
where μμ and
σ2
σ
2
are known, positive quantities. Suppose we observe
a single measurement xx. The
likelihood ratio is
Λx=12πσ2ⅇ-x-μ22σ212πσ2ⅇ-x22σ2=ⅇ1σ2μx-μ22
Λ
x
1
2
σ
2
x
μ
2
2
σ
2
1
2
σ
2
x
2
2
σ
2
1
σ
2
μ
x
μ
2
2
(2)
and so the minimum probability of error decision rule is
ⅇ1σ2μx-μ22
≷
ℋ
0
ℋ
1
π
0
π
1
=η
1
σ
2
μ
x
μ
2
2
≷
ℋ
0
ℋ
1
π
0
π
1
η
The expression for
Λx
Λ
x
is somewhat complicated. By applying a sequence of
monotonically increasing functions to both sides, we can
obtain a simplified expression for the optimal decision rule
without changing the rule. In this example, we apply the
natural logarithm and rearrange terms to arrive at
x
≷
ℋ
0
ℋ
1
σ2μlnη+μ2≡γ
x
≷
ℋ
0
ℋ
1
σ
2
μ
η
μ
2
γ
Here we have used the assumption
μ>0
μ
0
. If
μ<0
μ
0
, then dividing by
μ
μ would reverse the inequalities.
This form of the decision rule is much
simpler: we just compare the observed value
xx to a threshold
γγ. Figure 1
depicts the two candidate densities and a possible value of
γγ. If each hypothesis is
a priori equally likely
(
π
0
=
π
1
=12
π
0
π
1
1
2
), then
γ=μ2
γ
μ
2
. Figure 1 illustrates the case where
π
0
>
π
1
π
0
π
1
(
γ>μ2
γ
μ
2
).
If we plot the two densities so that each is weighted by its
a priori probability of occuring, the two
curves will intersect at the threshold
γγ (see
Figure 2). (Can you explain why this is? Think back
to our derivation of the LRT). This plot also offers a way
to visualize the probability of error. Recall
P
e
=∫
R
0
π
1
f
1
xdx+∫
R
1
π
0
f
0
xdx=∫x<γ
π
1
f
1
xdx+∫x>γ
π
0
f
0
xdx=
π
1
P
M
+
π
0
P
F
P
e
x
R
0
π
1
f
1
x
x
R
1
π
0
f
0
x
x
x
γ
π
1
f
1
x
x
x
γ
π
0
f
0
x
π
1
P
M
π
0
P
F
(3)
where
P
M
P
M
and
P
F
P
F
denote the miss and false alarm probabilities,
respectively. These quantities are depicted in
Figure 2.
We can express
P
M
P
M
and
P
F
P
F
in terms of the
Q-function as
P
e
=
π
1
Qμ-γσ+
π
0
Qγσ
P
e
π
1
Q
μ
γ
σ
π
0
Q
γ
σ
When
π
0
=
π
1
=12
π
0
π
1
1
2
, we have
γ=μ2
γ
μ
2
, and the error probability is
P
e
=Qμ2σ
P
e
Q
μ
2
σ
Since
Qx
Q
x
is monotonically decreasing, this says that the
"difficulty" of the detection problem decreases with
decreasing
σσ and
increasing
μμ.
In the preceding example, computation of the
probability of error involved a one-dimensional integral. If we
had multiple observations, or vector-valued data, generalizing
this procedure would involve multi-dimensional integrals over
potentially complicated decision regions. Fortunately, in many
cases, we can avoid this problem through the use of sufficient statistics.
Suppose we have the same test as in the
previous example, but now we have
NN independent observations:
ℋ
0
:
x
n
∼0σ2
,
n
=
1
,
…
,
N
ℋ
0
:
x
n
0
σ
2
,
n
=
1
,
…
,
N
ℋ
1
:
x
n
∼μσ2
,
n
=
1
,
…
,
N
ℋ
1
:
x
n
μ
σ
2
,
n
=
1
,
…
,
N
where
μ>0
μ
0
and
σ2>0
σ
2
0
and both are known. The likelihood ratio is
Λx=∏n=1N12πσ2ⅇ-
x
n
-μ22σ2∏n=1N12πσ2ⅇ-
x
n
22σ2=ⅇ-12σ2∑n=1N
x
n
-μ2ⅇ-12σ2∑n=1N
x
n
2=ⅇ12σ2∑n=1N2
x
n
μ-μ2=ⅇ1σ2μ∑n=1N
x
n
-Nμ22
Λ
x
n
1
N
1
2
σ
2
x
n
μ
2
2
σ
2
n
1
N
1
2
σ
2
x
n
2
2
σ
2
-1
2
σ
2
n
1
N
x
n
μ
2
-1
2
σ
2
n
1
N
x
n
2
1
2
σ
2
n
1
N
2
x
n
μ
μ
2
1
σ
2
μ
n
1
N
x
n
N
μ
2
2
(4)
As in the
previous example,
we may apply the natural logarithm and rearrange terms to
obtain an equivalent form of the LRT:
t≡∑n=1N
x
n
≷
ℋ
0
ℋ
1
σ2μlnη+Nμ2≡γ
t
n
1
N
x
n
≷
ℋ
0
ℋ
1
σ
2
μ
η
N
μ
2
γ
The scalar quantity
tt is a
sufficient statistic for the mean. In order to evaluate the
probability of error without resorting to a multi-dimensional
integral, we can express
P
e
P
e
in terms of
tt as
P
e
=
π
1
Prt<γ|
ℋ
1
true
+
π
0
Prt>γ|
ℋ
0
true
P
e
π
1
ℋ
1
true
t
γ
π
0
ℋ
0
true
t
γ
Now
tt is a linear combination of
normal variates, so it is itself normal. In particular, we
have
t=Ax
t
A
x
, where
1…1
1
…
1
is an
NN-dimensional
row vector of 1's, and
xx is multivariate normal with
mean
00 or
μ=μ…μT
μ
μ
…
μ
, and covariance
σ2I
σ
2
I
. Thus we have
t
|
ℋ
0
∼A0Aσ2IAT=0Nσ2
t
|
ℋ
0
A
0
A
σ
2
I
A
0
N
σ
2
t
|
ℋ
1
∼AμAσ2IAT=NμNσ2
t
|
ℋ
1
A
μ
A
σ
2
I
A
N
μ
N
σ
2
Therefore, we may write
P
e
P
e
in terms of the
Q-function as
P
e
=
π
1
QNμ-γNσ+
π
0
QγNσ
P
e
π
1
Q
N
μ
γ
N
σ
π
0
Q
γ
N
σ
In the special case
π
0
=
π
1
=12
π
0
π
1
1
2
,
P
e
=QNμσ
P
e
Q
N
μ
σ
Since
QQ is monotonically
decreasing, this result provides mathematical support for
something that is intuitively obvious: The performance of our
decision rule improves with increasing
NN and
μμ, and decreasing
σσ.
In the context of signal processing, the
foregoing problem may be viewed as the problem of detecting a
constant (DC) signal in
additive white
Gaussian noise:
ℋ
0
:
x
n
=
w
n
,
n
=
1
,
…
,
N
ℋ
0
:
x
n
w
n
,
n
=
1
,
…
,
N
ℋ
1
:
x
n
=A+
w
n
,
n
=
1
,
…
,
N
ℋ
1
:
x
n
A
w
n
,
n
=
1
,
…
,
N
where
AA is a known, fixed
amplitude, and
w
n
∼0σ2
w
n
0
σ
2
. Here
AA corresponds
to the mean
μμ in the
example.
The next example explores the minimum
probability of error decision rule in a
discrete setting.
Suppose we have a friend who is trying to
transmit a bit (0 or 1) to us over a noisy channel. The
channel causes an error in the transmission (that is, the bit
is flipped) with probability pp,
where
0≤p<12
0
p
1
2
, and pp is known. In
order to increase the chance of a successful transmission,
our friend sends the same bit
NN times. Assume the
NN transmissions are
statistically independent. Under these assumptions, the bits
you receive are Bernoulli random variables:
x
n
∼Bernoulliθ
x
n
Bernoulli
θ
. We are faced with the following hypothesis test:
|
ℋ
0
ℋ
0
|
θ=p
θ
p
|
0 sent |
|
ℋ
1
ℋ
1
|
θ=1-p
θ
1
p
|
1 sent |
We decide to decode the received sequence
x=
x
1
…
x
N
T
x
x
1
…
x
N
by minimizing the probability of error. The likelihood ratio is
Λx=∏n=1N1-p
x
n
p1-
x
n
∏n=1Np
x
n
1-p1-
x
n
=1-pkpN-kpk1-pN-k=1-pp2k-N
Λ
x
n
1
N
1
p
x
n
p
1
x
n
n
1
N
p
x
n
1
p
1
x
n
1
p
k
p
N
k
p
k
1
p
N
k
1
p
p
2
k
N
(5)
where
k=∑n=1N
x
n
k
n
1
N
x
n
is the number of 1s received.
The LRT is
1-pp2k-N
≷
ℋ
0
ℋ
1
π
0
π
1
=η
1
p
p
2
k
N
≷
ℋ
0
ℋ
1
π
0
π
1
η
Taking the natural logarithm of both sides and rearranging,
we have
k
≷
ℋ
0
ℋ
1
N2+12lnηln1-pp=γ
k
≷
ℋ
0
ℋ
1
N
2
1
2
η
1
p
p
γ
In the case that both hypotheses are equally likely, the
minimum probability of error decision is the "majority-vote"
rule: Declare
ℋ
1
ℋ
1
if there are more 1s than 0s, declare
ℋ
0
ℋ
0
otherwise. In the event
k=γ
k
γ
, we may decide arbitrarily; the probability of
error is the same either way. Let's adopt the convention
that
ℋ
0
ℋ
0
is declared in this case.
To compute the probability of error of the
optimal rule, write
P
e
=
π
0
Pr
declare
ℋ
1
|
ℋ
0
true
+
π
1
Pr
declare
ℋ
0
|
ℋ
1
true
=
π
0
Prk>γ|
ℋ
0
true
+
π
1
Prk≤γ|
ℋ
1
true
P
e
π
0
ℋ
0
true
declare
ℋ
1
π
1
ℋ
1
true
declare
ℋ
0
π
0
ℋ
0