In previous chapters, we assumed we knew the mathematical form of the probability distribution for the observations under each model; some of these distribution's parameters were not known and we developed decision rules to deal with this uncertainty. A more difficult problem occurs when the mathematical form is not known precisely. For example, the data may be approximately Gaussian, containing slight departures from the ideal. More radically, so little may be known about an accurate model for the data that we are only willing to assume that they are distributed symmetrically about some value. We develop model evaluation algorithms in this section that tackle both kinds of problems. However, be forewarned that solutions to such general models come at a price: the more specific a model can be that accurately describes a given problem, the better the performance. In other words, the more specific the model, the more the signal processing algorithms can be tailored to fit it with the obvious result that we enhance the performance. However, if our specific model is in error, our neatly tailored algorithms can lead us drastically astray. Thus, the best approach is to relax those aspects of the model which seem doubtful and to develop algorithms that will cope well with worst-case situations should they arise ("And they usually do," echoes every person experienced in the vagaries of data). These considerations lead us to consider nonparametric variations in the probability densities compatible with out assessment of model accuracy and to derive decision rules that minimize the impact of the worse-case situation.
In model evaluation problems, there are "optimally" hard problems, those where the models are the most difficult to distinguish. The impossible problem is to distinguish models that are identical. In this situation, the conditional densities of the observed data are equal and the likelihood ratio is constant for all possible values of the observations. It is obvious that identical models are indistinguishable; this elaboration suggest that in terms of the likelihood ratio, hard problems are those in which the likelihood ratio is constant. Thus, "hard problems" are those in which the class of conditional probability densities has a constant ratio for wide ranges of observed data values.
The most relevant model evaluation
problem for us is the discrimination between two models that
differ only in the means of statistically independent
observations: the conditional densities of each observation are
related as
From the functional equation, we see that the quantity