The design of a hypothesis test/detector often
involves constructing the solution to an optimization
problem. The optimality criteria used fall into two classes:
Bayesian and frequent.
In the Bayesian setup, it is assumed that the
a priori probability of
each hypothesis occuring
(
π
i
π
i
) is known. A cost
C
ij
C
ij
is assigned to each possible outcome:
C
ij
=Pr
say
H
i
when
H
j
true
C
ij
say
H
i
when
H
j
true
The optimal test/detector is the one that minimizes the Bayes
risk, which is defined to be the expected cost of an
experiment:
C¯=∑
C
ij
π
i
Pr
say
H
i
when
H
j
true
C
i
j
C
ij
π
i
say
H
i
when
H
j
true
In the event that we have a binary problem, and both
hypotheses are simple, the decision rule that
minimizes the Bayes risk can be constructed explicitly. Let us
assume that the data is continuous (i.e.,
it has a density) under each hypothesis:
H
0
:
x∼
f
0
x
H
0
:
x
f
0
x
H
1
:
x∼
f
1
x
H
1
:
x
f
1
x
Let
R
0
R
0
and
R
1
R
1
denote the decision regions corresponding to the optimal test. Clearly,
the optimal test is specified once we specify
R
0
R
0
and
R
1
=
R
0
′
R
1
R
0
.
The Bayes risk may be written
C
-
=∑ij=01
C
i
j
π
i
∫
R
i
f
j
xdx=∫
R
0
C
00
π
0
f
0
x+
C
01
π
1
f
1
xdx+∫
R
1
C
10
π
0
f
0
x+
C
11
π
1
f
1
xdx
C
-
i
j
0
1
C
i
j
π
i
x
R
i
f
j
x
x
R
0
C
00
π
0
f
0
x
C
01
π
1
f
1
x
x
R
1
C
10
π
0
f
0
x
C
11
π
1
f
1
x
(1)
Recall that
R
0
R
0
and
R
1
R
1
partition the input space: they are
disjoint and their union is the full input space. Thus, every
possible input
xx
belongs to precisely one of these regions. In order to minimize
the Bayes risk, a measurement
xx should belong to the decision
region
R
i
R
i
for which the corresponding integrand in the preceding equation
is smaller. Therefore, the Bayes risk is minimized by assigning
xx to
R
0
R
0
whenever
π
0
C
00
f
0
x+
π
1
C
01
f
1
x<
π
0
C
10
f
0
x+
π
1
C
11
f
1
x
π
0
C
00
f
0
x
π
1
C
01
f
1
x
π
0
C
10
f
0
x
π
1
C
11
f
1
x
and assigning
xx to
R
1
R
1
whenever this inequality is reversed. The resulting rule may be
expressed concisely as
Λx≡
f
1
x
f
0
x
≷
H
0
H
1
π
0
C
10
-
C
00
π
1
C
01
-
C
11
≡η
Λ
x
f
1
x
f
0
x
≷
H
0
H
1
π
0
C
10
C
00
π
1
C
01
C
11
η
Here,
Λx
Λ
x
is called the
likelihood ratio,
ηη is called the threshold, and
the overall decision rule is called the
Likelihood Ratio Test
(LRT). The expression
on the right is called a
threshold.
An instructor in a course in detection theory wants to
determine if a particular student studied for his last test.
The observed quantity is the student's grade, which we
denote by
r
r. Failure may not indicate studiousness:
conscientious students may fail the test. Define the models
as
-
ℳ
0
ℳ
0
: did not study
-
ℳ
1
ℳ
1
: did study
The conditional densities of the grade are shown in
Figure 1.
Based on knowledge of student behavior, the instructor
assigns
a priori probabilities of
π
0
=14
π
0
1
4
and
π
1
=34
π
1
3
4
. The costs
C
i
j
C
i
j
are chosen to reflect the instructor's sensitivity
to student feelings:
C
01
=1=
C
10
C
01
1
C
10
(an erroneous decision either way is given the
same cost) and
C
00
=0=
C
11
C
00
0
C
11
. The likelihood ratio is plotted in
Figure 1 and the threshold value
η
η, which is computed from the
a
priori probabilities and the costs to be
13
1
3
, is indicated. The calculations of this
comparison can be simplified in an obvious way.
r50
≷
ℳ
0
ℳ
1
13
r
50
≷
ℳ
0
ℳ
1
1
3
or
r
≷
ℳ
0
ℳ
1
503=16.7
r
≷
ℳ
0
ℳ
1
50
3
16.7
The multiplication by the factor of 50 is a simple
illustration of the reduction of the likelihood ratio to a
sufficient statistic. Based on the assigned costs and
a priori probabilities, the optimum
decision rule says the instructor must assume that the
student did not study if the student's grade is less than
16.7; if greater, the student is assumed to have studied
despite receiving an abysmally low grade such as 20. Note
that as the densities given by each model overlap entirely:
the possibility of making the wrong interpretation
always haunts the instructor. However,
no other procedure will be better!
A special case of the minimum Bayes risk rule, the minimum probability of error rule, is
used extensively in practice, and is discussed at length in
another module.
Denote
α=Pr
declare
H
1
when
H
0
true
α
declare
H
1
when
H
0
true
and
β=Pr
declare
H
1
when
H
1
true
β
declare
H
1
when
H
1
true
. Express the Bayes risk
C
-
C
-
in terms of α
α and ββ,
C
i
j
C
i
j
, and
π
i
π
i
. Argue that the optimal decision rule is not
altered by setting
C
00
=
C
11
=0
C
00
C
11
0
.
Suppose we observe
x
x such that
Λx=η
Λ
x
η
. Argue that it doesn't matter whether we assign
x
x to
R
0
R
0
or
R
1
R
1
.