"Robust" is a technical word that implies insensitivity to
modeling assumptions. As we have seen, some algorithms are
robust while others are not. The intent of robust signal
processing is to derive algorithms that are
explicitly insensitive to the underlying
signal and/or noise models. The way in which modeling
incertainties are described is typified by the approach we shall
use in the following discussion of robust model evaluation.
We assume that two nominal models of the
generation of the statistically independent observations are
known; the "actual" conditional probability density that
describes the data under the assumptions of each model is not
known exactly, but is "close" to the nominal. Letting
p·
p
·
be the actual probability density for each observation
and
p
o
·
p
o
·
the nominal, we say that (Huber; 1981)
px=1-ε
p
o
x+ε
p
d
x
p
x
1
ε
p
o
x
ε
p
d
x
where
p
d
p
d
is the unknown disturbance density and
εε is the uncertainty variable
(
0≤ε<1
0
ε
1
). The uncertainty variable specifies how accurate the
nominal model is through to be: the smaller
εε, the smaller the
contribution of the disturbance. It is assumed that some value
for εε can be rationally
assigned. The disturbance density is entirely unknown and is
assumed to be any value probability density
function. The expression given above is normalized so that
p·
p
·
has unit density ranging about it. An example of
densities described this way are shown in Figure 1.
The robust model evaluation problem is formally stated as
ℳ
0
:
pr|
ℳ
0
r=∏l=0L-11-ε
p
o
r
l
|
ℳ
0
r
l
+ε
p
d
r
l
|
ℳ
0
r
l
ℳ
0
:
p
r
ℳ
0
r
l
0
L
1
1
ε
p
o
r
l
ℳ
0
r
l
ε
p
d
r
l
ℳ
0
r
l
ℳ
1
:
pr|
ℳ
1
r=∏l=0L-11-ε
p
o
r
l
|
ℳ
1
r
l
+ε
p
d
r
l
|
ℳ
1
r
l
ℳ
1
:
p
r
ℳ
1
r
l
0
L
1
1
ε
p
o
r
l
ℳ
1
r
l
ε
p
d
r
l
ℳ
1
r
l
The nominal densities under each model correspond to the
conditional densities that we have been using until now. The
disturbance densities are intended to model imprecision of both
descriptions; hence, they are assumed to be different in the
context of each model. Note that the measure of imprecision
εε is assumed to be the same
under either model.
To solve this problem, we take what is known as a
minimax approach: find the
worst-case combinations of a priori densities
(max), then minimize the consequences of this situation (mini)
according to some criterion. In this way, bad situations are
handles as well as can be expected while the more tolerable ones
are (hopefully) processed well also. The "mini" phase of the
minimax solution corresponds to the likelihood ratio for many
criteria. Thus, the "max" phase amounts to finding the
worst-case probability distributions for the likelihood ratio
test as described in the previous section: find the disturbance
densities that can result in a constant value for the ratio over
large domains of functions. When the two nominal distributions
scaled by
1-ε
1
ε
can be brought together so that they are equal for
some disturbance, then the likelihood ratio will be constant in
that domain. Of most interest here is the case where the models
differ only in the value of the mean, as shown in Figure 2. "Bringing the distributions together" means,
in this case, scaling the distribution for
ℳ
0
ℳ
0
by
1-ε
1
ε
while adding the constant
εε to the scaled distribution
for
ℳ
1
ℳ
1
. One can shown in general that if the ratio of the
nominal densities is monotonic, this procedure finds the
worst-case distribution (Huber;
1965). The distributions overlap for small and for large
values of the data with no overlap in a central region. As we
shall see, the size of this central region depends greatly on
the choice of εε. The
tails of the worst-case distributions under
each model are equal; conceptually, we consider that the
worst-case densities have exponential tails in the model
evaluation problem.
Letting
p
ω
p
ω
denote the worst-case density, out minimax procedure
results in the following densities for each model in the
likelihood ratio test.
p
ω
r
l
|
ℳ
i
r
l
=
p
o
r
l
′
|
ℳ
0
r
l
′
C
i
′
ⅇ-
K
′
|
r
l
-
r
l
′
|if
r
l
<
r
l
′
p
o
r
l
|
ℳ
i
r
l
if
r
l
′
<
r
l
<
r
l
′′
p
o
r
l
′′
|
ℳ
0
r
l
′′
C
i
′′
ⅇ-
K
′′
|
r
l
-
r
l
′′
|if
r
l
>
r
l
′′
p
ω
r
l
ℳ
i
r
l
p
o
r
l
′
ℳ
0
r
l
′
C
i
′
K
′
r
l
r
l
′
r
l
r
l
′
p
o
r
l
ℳ
i
r
l
r
l
′
r
l
r
l
′′
p
o
r
l
′′
ℳ
0
r
l
′′
C
i
′′
K
′′
r
l
r
l
′′
r
l
r
l
′′
The constants
K
′
K
′
and
K
′′
K
′′
determine the rate of decay of the exponential tails
of these worst-case distributions. Their specific values have
not yet been determined, but since they are not needed to
compute the likelihood ratio, we don't need them. The constants
C
i
′
C
i
′
and
C
i
′′
C
i
′′
are required so that a unit-area density results. The likelihood
ratio for each observation in the robust model evaluation
problem becomes
Λ
r
l
=
C
1
′
C
0
′
if
r
l
<
r
l
′
p
o
r
l
|
ℳ
1
r
l
p
o
r
l
|
ℳ
0
r
l
if
r
l
′
<
r
l
<
r
l
′′
C
1
′′
C
0
′′
if
r
l
′′
<
r
l
Λ
r
l
C
1
′
C
0
′
r
l
r
l
′
p
o
r
l
ℳ
1
r
l
p
o
r
l
ℳ
0
r
l
r
l
′
r
l
r
l
′′
C
1
′′
C
0
′′
r
l
′′
r
l
(1)
The evaluation of the likelihood ratio depends entirely on
determining values for
r
l
′
r
l
′
and
r
l
′′
r
l
′′
. The ratios
C
1
′
C
0
′
=
c
′
C
1
′
C
0
′
c
′
and
C
1
′′
C
0
′′
=
c
′′
C
1
′′
C
0
′′
c
′′
are easily found; in the tails, the value of the
likelihood ration equals that at the edges of the central region
for continuous densities.
c
′
=
p
o
r
l
|
ℳ
1
r
l
′
p
o
r
l
|
ℳ
0
r
l
′
c
′
p
o
r
l
ℳ
1
r
l
′
p
o
r
l
ℳ
0
r
l
′
c
′′
=
p
o
r
l
|
ℳ
1
r
l
′′
p
o
r
l
|
ℳ
0
r
l
′′
c
′′
p
o
r
l
ℳ
1
r
l
′′
p
o
r
l
ℳ
0
r
l
′′
At the left boundary, for example, the distribution functions
must satisfy
1-εp
r
l
|
ℳ
0
r
l
′
=1-εp
r
l
|
ℳ
1
r
l
′
+ε
1
ε
p
r
l
ℳ
0
r
l
′
1
ε
p
r
l
ℳ
1
r
l
′
ε
. In terms of the nominal densities, we have
∫-∞
r
l
′
p
r
l
|
ℳ
0
x-p
r
l
|
ℳ
1
xdx=ε1-ε
x
r
l
′
p
r
l
ℳ
0
x
p
r
l
ℳ
1
x
ε
1
ε
This equation also applies the value right edge
r
l
′′
r
l
′′
. Thus, for a given value of
εε, the integral of the
difference between the nominal densities should equal the ratio
ε1-ε
ε
1
ε
for two values. Figure 3 illustrates
this effect for a Gaussian example. The bi-valued nature of this
integral may not be valid for some values of
εε; the value chosen for
εε can be too large, making it
impossible to distinguish the models! This unfortunate
circumstance means that the uncertainties, as described by the
value of εε, swamp the
characteristics that distinguish the models. Thus, the models
must be made more precise (more must be known about the data) so
that smaller deviations from the nominal models can describe the
observations.
Returning to the likelihood ratio, the "robust" decision rule
consists of computing a clipped function of
each observed value, multiplying them together, and comparing
the product computed over the observations with a threshold
value. We assume that the nominal distributions of each of the
LL observations are equal; the
values of the boundaries
r
l
′
r
l
′
and
r
l
′′
r
l
′′
then do not depend on the observation index
ll in this case. More simply,
evaluating the logarithm of the quantities involved results in
the decision rule
∑l=0L-1f
r
l
≷
ℳ
0
ℳ
1
γ
l
0
L
1
f
r
l
≷
ℳ
0
ℳ
1
γ
where the function
f·
f
·
is the clipping function given by
f
r
l
=ln
c
′
if
r
l
<
r
′
ln
p
o
r
l
|
ℳ
1
r
l
p
o
r
l
|
ℳ
0
r
l
if
r
′
<
r
l
<
r
′′
ln
c
′′
if
r
′′
<
r
l
f
r
l
c
′
r
l
r
′
p
o
r
l
ℳ
1
r
l
p
o
r
l
ℳ
0
r
l
r
′
r
l
r
′′
c
′′
r
′′
r
l
If the observations were not identically distributed, then the
clipping function would depend on the observation index.
Determining the threshold γγ
that meets a specific performance criterion is difficult in the
context of robust model evaluation. By the very nature of the
problem formulation, some degree of uncertainty in the
a priori densities exists. A specific
false-alarm probability can be guaranteed by using the
worst-case distribution under
ℳ
0
ℳ
0
. This density has the disturbance term begin an impulse at infinity. Thus, the expected value
m
c
m
c
of a clipped observation
f
r
l
f
r
l
with respect to the worst-case density is
1-εEf
r
l
+εln
c
′′
1
ε
f
r
l
ε
c
′′
where the expected value in this expression is
evaluated with respect to the nominal density under
ℳ
0
ℳ
0
. Similarly, an expression for the variance
σ
c
2
σ
c
2
of the clipped observation can be derived. As the
decision rule computes the sum of the clipped, statistically
independent observations, the Central Limit Theorem can be
applied to the sum, with the result that the worst-case
false-alarm probability will approximately equal
Qγ-L
m
c
L
σ
c
Q
γ
L
m
c
L
σ
c
. The threshold γγ
can then be found which will guarantee a specified performance
level. Usually, the worst-case situation does not occur and the
threshold set by this method is conservative. We can assess the
degree of conservatism by evaluating these quantities under the
nominal density rather than the worst-case density.
Let's consider the Gaussian model evaluation problem we have
been using so extensively. The individual observations are
statistically independent and identically distributed with
variance five:
σ2=5
σ
2
5
. For model
ℳ
0
ℳ
0
, the mean is zero; for
ℳ
1
ℳ
1
, the mean is one. These nominal densities describe
our best models for the observations, but we seek to allow
slight deviations (10%) from them. The equation to be solved
for the boundaries is the implicit equation
Qz-mσ-Qzσ=ε1-ε
Q
z
m
σ
Q
z
σ
ε
1
ε
The quantity on the left side of the equation is shown in
Figure 3. If the uncertainty in the Gaussian
model, as expressed by the parameter
εε, is larger than 0.15 (for
the example values of mm and
σσ), no solution
exists. Assuming that εε
equals 0.1, the quantity
ε1-ε=0.11
ε
1
ε
0.11
and the clipping thresholds are
r
′
=-1.675
r
′
-1.675
and
r
′′
=2.675
r
′′
2.675
. Between these values, the clipping function is
given by the logarithm of the likelihood ratio, which is given
by
2m
r
l
-m22σ2
2
m
r
l
m
2
2
σ
2
.
We can decompose the clipping operation into a cascade of two
operations: a linear scaling and shifting (as described by the
previous expression) followed by a clipper having unit slope
(see Figure 4).
Let
r
l
˜
r
l
˜
denote the result of the scaling and shifting
operation. This quantity has mean
m22σ2
m
2
2
σ
2
and variance
m2σ2
m
2
σ
2
under
ℳ
1
ℳ
1
and the opposite signed mean and the same variance
under
ℳ
0
ℳ
0
. The threshold values of the unit-clipping function
are thus given by the solution of the equation
Q
z
˜
+m22σ2mσ-Q
z
˜
-m22σ2mσ=ε1-ε
Q
z
˜
m
2
2
σ
2
m
σ
Q
z
˜
m
2
2
σ
2
m
σ
ε
1
ε
By substituting
-
z
˜
z
˜
for
z
˜
z
˜
in this equation, we find that the two solutions are
negatives of each other. We have now placed the unit-clipper's
threshold values symmetrically about the origin; however, they
do depend on the value of the mean
mm. In this example, the
threshold is numerically given by
z
˜
=0.435
z
˜
0.435
. The expected value of the result of the clipping
function with respect to the worst-case density is given by the
complicated expression
Ef
r
l
=1-ε
r
′
Q-
r
′σ+
r
′′
Q
r
′′
σ+σ22πⅇ-
r
′
22σ2-ⅇ-
r
′′
22σ2+ε
r
′′
f
r
l
1
ε
r
′
Q
r
σ
r
′′
Q
r
′′
σ
σ
2
2
r
′
2
2
σ
2
r
′′
2
2
σ
2
ε
r
′′
The variance is found in a similar fashion and can
be used to find the threshold
γγ on the sum of clipped
observation values.
-
P.J. Huber. (1965). A robust version of the probability ratio test. Ann. Math. Stat., 36, 1753-1758.
-
P.J. Huber. (1981). Robust Statistics. New York: John Wiley and Sons.