Frequently, more than two viable models for data generation can
be defined for a given situation. The
classification problem is to determine which of
several models best "fits" a set of measurements. For example,
determining the type of airplane from its radar returns forms a
classification problem. The model evaluation framework has the
right structure if we can allow more than two models. We happily
note that in deriving the likelihood ratio test we did not need
to assume that only two possible descriptions exist. Go back
and examine the expression for the maximum probability correct decision
rule. If KK models seem
appropriate for a specific problem, the decision rule maximizing
the probability of making a correct choice is
∀
i
,i∈1…K:max
π
i
p
r
|
ℳ
i
r
i
i
1
…
K
π
i
p
r
ℳ
i
r
To determine the largest of
K
K quantities, exactly
K−1
K
1
numeric comparisons need be made. When we have two
possible models
(
K=2
K
2
), this decision rule reduces to the computation of the
likelihood ratio and its comparison to a threshold. In general,
K−1
K
1
likelihood ratios need to be computed and compared to
a threshold. Thus the likelihood ratio test can be viewed as a
specific method for determining the largest of the decision
statistics
π
i
p
r
|
ℳ
i
r
π
i
p
r
ℳ
i
r
.
Since we need only the relative ordering of the
K
K decision statistics to make a decision, we can apply
any transformation
T·
T
·
to them that does not affect ordering. In general,
possible transformations must be positively monotonic to satisfy
this condition. For example, the needless common additive
components in the decision statistics can be eliminated, even if
they depend on the observations. Mathematically, "common" means
that the quantity does not depend on the model index
i
i. The transformation in this case would be of the form
T
z
i
=
z
i
−a
T
z
i
z
i
a
, clearly a monotonic transformation. A
positive multiplicative factor can also be
"canceled"; if negative, the ordering would be reversed and
that cannot be allowed. The simplest resulting expression
becomes the sufficient statistic
ϒ
i
r
ϒ
i
r
for the model. Expressed in terms of the sufficient
statistic, the maximum probability correct or the Bayesian
decision rule becomes
∀
i
,i∈1…K:max
C
i
+
ϒ
i
r
i
i
1
…
K
C
i
ϒ
i
r
where
C
i
C
i
summarizes all additive terms that do not depend on
the observation vector
r
r. The quantity
ϒ
i
r
ϒ
i
r
is termed the sufficient statistic associated
with model
i
i. In many cases, the functional form of the
sufficient statistic varies little from one model to another and
expresses the necessary operations that summarize the
observations. The constants
C
i
C
i
are usually lumped together to yield the threshold
against which we compare the sufficient statistic. For example,
in the binary model situation, the decision rule becomes
ϒ
1
r+
C
1
≷
ℳ
0
ℳ
1
ϒ
0
r+
C
0
ϒ
1
r
C
1
≷
ℳ
0
ℳ
1
ϒ
0
r
C
0
or
ϒ
1
r−
ϒ
0
r
≷
ℳ
0
ℳ
1
C
0
−
C
1
ϒ
1
r
ϒ
0
r
≷
ℳ
0
ℳ
1
C
0
C
1
Thus, the sufficient statistic for the decision rule
is
ϒ
1
r−
ϒ
0
r
ϒ
1
r
ϒ
0
r
and the threshold
γ
γ is
C
0
−
C
1
C
0
C
1
.
In the Gaussian problem just discussed, the logarithm of the
likelihood function is
lnp
r
|
ℳ
i
r=(−(L2ln2πσ2))−12σ2∑
l
=0L−1
r
l
−
m
(
i
)
2
p
r
ℳ
i
r
L
2
2
σ
2
1
2
σ
2
l
0
L
1
r
l
m
(
i
)
2
where
m
(
i
)
m
(
i
)
is the mean under model
i
i. After appropriate simplification that retains the
ordering, we have
ϒ
i
r=
m
(
i
)
σ2∑
l
=0L−1
r
l
ϒ
i
r
m
(
i
)
σ
2
l
0
L
1
r
l
C
i
=−(1/2L
m
(
i
)
2σ2)+
c
i
C
i
12
L
m
(
i
)
2
σ
2
c
i
The term
c
i
c
i
is a constant defined by the error criterion; for
the maximum probability correct criterion, this constant is
ln
π
i
π
i
.
When employing the Neyman-Pearson test, we need to specify the
various error probabilities
Pr
say
ℳ
i
|
H
j
true
H
j
true
say
ℳ
i
. These specifications amount to determining the
constants
c
i
c
i
when the sufficient statistic is used. Since
K−1
K
1
comparisons will be used to home in on the optimal
decision, only
K−1
K
1
error probabilities need be specified. Typically, the
quantities
Pr
say
H
i
|
ℳ
0
true
ℳ
0
true
say
H
i
,
i∈1…K−1
i
1
…
K
1
are used, particularly when the model
ℳ
0
ℳ
0
represents the situation when no signal is present
(see this
problem).