Instead of directly bounding the expected performance, we are going to prove stronger probability
bounds of the form
These bounds can be readily converted to expected performance bounds using Markov's
inequality:
Reduce the original problem to an easier one by replacing the larger class FF with
a smaller finite class {f0,⋯,fM}⊆F{f0,⋯,fM}⊆F. Observe that
inf
f
^
n
sup
f
∈
F
P
f
(
d
(
f
^
n
,
f
)
≥
s
n
)
≥
inf
f
^
n
sup
f
∈
{
f
0
,
⋯
,
f
M
}
P
f
(
d
(
f
^
n
,
f
)
≥
s
n
)
inf
f
^
n
sup
f
∈
F
P
f
(
d
(
f
^
n
,
f
)
≥
s
n
)
≥
inf
f
^
n
sup
f
∈
{
f
0
,
⋯
,
f
M
}
P
f
(
d
(
f
^
n
,
f
)
≥
s
n
)
(14)
The key idea is to choose a finite collection of models such that the resulting problem is as
hard as the original, otherwise the lower bound will not be tight.
Next, we reduce the problem to a hypotheses test. Ideally, we would like to have something like
inf
f
^
n
sup
f
∈
F
P
f
(
d
(
f
^
n
,
f
)
≥
s
n
)
≥
inf
f
^
n
sup
j
∈
{
0
,
⋯
,
M
}
P
f
j
(
h
^
n
(
Z
)
≠
j
)
inf
f
^
n
sup
f
∈
F
P
f
(
d
(
f
^
n
,
f
)
≥
s
n
)
≥
inf
f
^
n
sup
j
∈
{
0
,
⋯
,
M
}
P
f
j
(
h
^
n
(
Z
)
≠
j
)
(15)
The infinf is over all measurable test functions
h
^
n
:
Z
→
{
0
,
⋯
,
M
}
h
^
n
:
Z
→
{
0
,
⋯
,
M
}
(16)
and Pfj(h^n(Z)≠j)Pfj(h^n(Z)≠j) denotes the probability that after observing the
data, the test infers the wrong hypothesis.
This might not always be true or easy to show, but in certain scenarios it can be done.
Suppose d(.,.)d(.,.) is a semi-distance, i.e. it satisfies
E.g. with f,g:Rd→R,d(f,g)=Δ||f-g||2f,g:Rd→R,d(f,g)=Δ||f-g||2.
Lemma 1
Suppose d(.,.)d(.,.) is a semi-distance. Also suppose that we have constructed f0,⋯,fMf0,⋯,fM
s.t. d(fj,fk)≥2snd(fj,fk)≥2sn, ∀j≠k∀j≠k. Take any estimator f^nf^n and
define the test: Ψ*∘f^n:Z→{0,⋯,M}Ψ*∘f^n:Z→{0,⋯,M} as
Ψ
*
(
f
^
n
)
=
arg
min
j
d
(
f
^
n
,
f
j
)
Ψ
*
(
f
^
n
)
=
arg
min
j
d
(
f
^
n
,
f
j
)
(18)
Then Ψ*(f^n)≠jΨ*(f^n)≠j, implies d(f^n,fj)≥snd(f^n,fj)≥sn.
Suppose Ψ*(f^n)≠j⟺∃k≠j:d(f^n,fk)≤d(f^n,fj)Ψ*(f^n)≠j⟺∃k≠j:d(f^n,fk)≤d(f^n,fj). Now
2
s
n
≤
d
(
f
j
,
f
k
)
≤
d
(
f
^
n
,
f
j
)
+
d
(
f
^
n
,
f
k
)
≤
2
d
(
f
^
n
,
f
j
)
2
s
n
≤
d
(
f
j
,
f
k
)
≤
d
(
f
^
n
,
f
j
)
+
d
(
f
^
n
,
f
k
)
≤
2
d
(
f
^
n
,
f
j
)
(19)
⟹
d
(
f
^
n
,
f
j
)
≥
s
n
⟹
d
(
f
^
n
,
f
j
)
≥
s
n
(20)
The previous lemma implies that
P
f
j
(
d
(
f
^
n
,
f
j
)
≥
s
n
)
≥
P
f
j
(
Ψ
*
(
f
^
n
)
≠
j
)
P
f
j
(
d
(
f
^
n
,
f
j
)
≥
s
n
)
≥
P
f
j
(
Ψ
*
(
f
^
n
)
≠
j
)
(21)
Therefore,
inf
f
^
n
sup
f
∈
F
P
f
j
(
d
(
f
^
n
,
f
j
)
≥
s
n
)
≥
inf
f
^
n
max
f
∈
{
f
0
,
⋯
,
f
M
}
P
f
j
(
d
(
f
^
n
,
f
j
)
≥
s
n
)
≥
inf
f
^
n
max
j
∈
{
0
,
⋯
,
M
}
P
f
j
(
Ψ
*
(
f
^
n
)
≠
j
)
≥
inf
h
^
n
max
j
∈
{
0
,
⋯
,
M
}
P
j
(
h
^
n
≠
j
)
=
Δ
P
e
,
M
inf
f
^
n
sup
f
∈
F
P
f
j
(
d
(
f
^
n
,
f
j
)
≥
s
n
)
≥
inf
f
^
n
max
f
∈
{
f
0
,
⋯
,
f
M
}
P
f
j
(
d
(
f
^
n
,
f
j
)
≥
s
n
)
≥
inf
f
^
n
max
j
∈
{
0
,
⋯
,
M
}
P
f
j
(
Ψ
*
(
f
^
n
)
≠
j
)
≥