Skip to content Skip to navigation

Connexions

You are here: Home » Content » Lower Performance Bounds for Estimators

Navigation

Content Actions

  • Download module PDF
  • Add to ...
    Add the module to:
    • My Favorites
    • A lens
    • An external social bookmarking service
    • My Favorites (What is 'My Favorites'?)
      'My Favorites' is a special kind of lens which you can use to bookmark modules and collections directly in Connexions. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need a Connexions account to use 'My Favorites'.
    • A lens (What is a lens?)

      Definition of a lens

      Lenses

      A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

      What is in a lens?

      Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

      Who can create a lens?

      Any individual Connexions member, a community, or a respected organization.

    • External bookmarks
  • E-mail the authors

Recently Viewed

This feature requires Javascript to be enabled.

Lower Performance Bounds for Estimators

Module by: Robert Nowak, Aarti Singh, Rui Castro

Lower Performance Bounds

In other modules, estimators/predictors are analyzed, in order to obtain upper bounds on their performance. These bounds are of the form:

min f F E [ d ( f ^ n , f ) ] C n - γ min f F E [ d ( f ^ n , f ) ] C n - γ (1)

where γ>0γ>0. We would like to know if these bounds are tight, in the sense that there is no other estimator that is significantly better. To answer this, we need lower bounds like

inf f ^ n sup f F E [ d ( f ^ n , f ) ] c n - γ inf f ^ n sup f F E [ d ( f ^ n , f ) ] c n - γ (2)

We assume we have the following ingredients:

  • * - Class of models, FSFS. FF is a class of models containing the “true" model and is a subset of some bigger class SS. E.g. FF could be the class of Lipschitz density functions or distributions PXYPXY satisfying the box-counting condition.
  • * - An observation model, PfPf, indexed by fFfF. PfPf denotes the distribution of the data under model ff. E.g. in regression and classification, this is the distribution of Z=(X1,Y1,,Xn,Yn)ZZ=(X1,Y1,,Xn,Yn)Z. We will assume that PfPf is a probability measure on the measurable space (Z,B)(Z,B).
  • * - A performance metric d(.,.).0d(.,.).0. If you have a model estimate f^nf^n, then the performance of that model estimate relative to the true model ff is d(f^n,f)d(f^n,f). E.g.
    Regression:d(f^n,f)=||f^n-f||2=(f^n(x)-f(x))2dx1/2Regression:d(f^n,f)=||f^n-f||2=(f^n(x)-f(x))2dx1/2(3)
    Classification:d(f^n,f)=R(G^n)-R*=G^nΔG*|2η(x)-1|dPX(x)Classification:d(f^n,f)=R(G^n)-R*=G^nΔG*|2η(x)-1|dPX(x)(4)

As before, we are interested in the risk of a learning rule, in particular the maximal risk given as:

sup f F E f [ d ( f ^ n , f ) ] = sup f F d ( f ^ n ( Z ) , f ) d P f ( Z ) sup f F E f [ d ( f ^ n , f ) ] = sup f F d ( f ^ n ( Z ) , f ) d P f ( Z ) (5)

where f^nf^n is a function of the observations ZZ and EfEf denotes the expectation with respect to PfPf.

The main goal is to get results of the form

R n * = Δ inf f ^ n sup f F E [ d ( f ^ n , f ) ] c s n R n * = Δ inf f ^ n sup f F E [ d ( f ^ n , f ) ] c s n (6)

where c>0c>0 and sn0sn0 as nn. The infinf is taken over all estimators, i.e. all measurable functions f^n:ZSf^n:ZS.

Suppose we have shown that

lim inf n s n - 1 R n * c > 0 (A lower bound) lim inf n s n - 1 R n * c > 0 (A lower bound) (7)

and also that for a particular estimator f¯nf¯n

lim sup n s n - 1 sup f F E f [ d ( f ¯ n , f ) ] C lim sup n s n - 1 sup f F E f [ d ( f ¯ n , f ) ] C (8)
lim sup n s n - 1 R n * C , lim sup n s n - 1 R n * C , (9)

We say that snsn is the optimal rate of convergence for this problem and that f¯nf¯n attains that rate.

Note: Two rates of convergence ΨnΨn and Ψn'Ψn' are equivalent, i.e. ΨnΨn'ΨnΨn' iff

0 < lim inf n Ψ n Ψ n ' lim sup n Ψ n Ψ n ' < 0 < lim inf n Ψ n Ψ n ' lim sup n Ψ n Ψ n ' < (10)

General Reduction Scheme

Instead of directly bounding the expected performance, we are going to prove stronger probability bounds of the form

inf f ^ n sup f F P f ( d ( f ^ n , f ) s n ) c > 0 inf f ^ n sup f F P f ( d ( f ^ n , f ) s n ) c > 0 (11)

These bounds can be readily converted to expected performance bounds using Markov's inequality:

P f ( d ( f ^ n , f ) s n ) E f [ d ( f ^ n , f ) ] s n P f ( d ( f ^ n , f ) s n ) E f [ d ( f ^ n , f ) ] s n (12)

Therefore it follows:

inf f ^ n sup f F E f [ d ( f ^ n , f ) ] inf f ^ n sup f F s n P f ( d ( f ^ n , f ) s n ) c s n inf f ^ n sup f F E f [ d ( f ^ n , f ) ] inf f ^ n sup f F s n P f ( d ( f ^ n , f ) s n ) c s n (13)

First Reduction Step

Reduce the original problem to an easier one by replacing the larger class FF with a smaller finite class {f0,,fM}F{f0,,fM}F. Observe that

inf f ^ n sup f F P f ( d ( f ^ n , f ) s n ) inf f ^ n sup f { f 0 , , f M } P f ( d ( f ^ n , f ) s n ) inf f ^ n sup f F P f ( d ( f ^ n , f ) s n ) inf f ^ n sup f { f 0 , , f M } P f ( d ( f ^ n , f ) s n ) (14)

The key idea is to choose a finite collection of models such that the resulting problem is as hard as the original, otherwise the lower bound will not be tight.

Second Reduction Step

Next, we reduce the problem to a hypotheses test. Ideally, we would like to have something like

inf f ^ n sup f F P f ( d ( f ^ n , f ) s n ) inf f ^ n sup j { 0 , , M } P f j ( h ^ n ( Z ) j ) inf f ^ n sup f F P f ( d ( f ^ n , f ) s n ) inf f ^ n sup j { 0 , , M } P f j ( h ^ n ( Z ) j ) (15)

The infinf is over all measurable test functions

h ^ n : Z { 0 , , M } h ^ n : Z { 0 , , M } (16)

and Pfj(h^n(Z)j)Pfj(h^n(Z)j) denotes the probability that after observing the data, the test infers the wrong hypothesis.

This might not always be true or easy to show, but in certain scenarios it can be done. Suppose d(.,.)d(.,.) is a semi-distance, i.e. it satisfies

  • (i) - d(f,g)=d(g,f)0d(f,g)=d(g,f)0 (Symmetric)
  • (ii) -
    d ( f , f ) = 0 d ( f , f ) = 0 (17)
  • (iii) - d(f,g)d(h,f)+d(h,g)d(f,g)d(h,f)+d(h,g) (Triangle inequality)

E.g. with f,g:RdR,d(f,g)=Δ||f-g||2f,g:RdR,d(f,g)=Δ||f-g||2.

Lemma 1 Suppose d(.,.)d(.,.) is a semi-distance. Also suppose that we have constructed f0,,fMf0,,fM s.t. d(fj,fk)2snd(fj,fk)2sn, jkjk. Take any estimator f^nf^n and define the test: Ψ*f^n:Z{0,,M}Ψ*f^n:Z{0,,M} as

Ψ * ( f ^ n ) = arg min j d ( f ^ n , f j ) Ψ * ( f ^ n ) = arg min j d ( f ^ n , f j ) (18)

Then Ψ*(f^n)jΨ*(f^n)j, implies d(f^n,fj)snd(f^n,fj)sn.

Suppose Ψ*(f^n)jkj:d(f^n,fk)d(f^n,fj)Ψ*(f^n)jkj:d(f^n,fk)d(f^n,fj). Now

2 s n d ( f j , f k ) d ( f ^ n , f j ) + d ( f ^ n , f k ) 2 d ( f ^ n , f j ) 2 s n d ( f j , f k ) d ( f ^ n , f j ) + d ( f ^ n , f k ) 2 d ( f ^ n , f j ) (19)
d ( f ^ n , f j ) s n d ( f ^ n , f j ) s n (20)

The previous lemma implies that

P f j ( d ( f ^ n , f j ) s n ) P f j ( Ψ * ( f ^ n ) j ) P f j ( d ( f ^ n , f j ) s n ) P f j ( Ψ * ( f ^ n ) j ) (21)

Therefore,

inf f ^ n sup f F P f j ( d ( f ^ n , f j ) s n ) inf f ^ n max f { f 0 , , f M } P f j ( d ( f ^ n , f j ) s n ) inf f ^ n max j { 0 , , M } P f j ( Ψ * ( f ^ n ) j ) inf h ^ n max j { 0 , , M } P j ( h ^ n j ) = Δ P e , M inf f ^ n sup f F P f j ( d ( f ^ n , f j ) s n ) inf f ^ n max f { f 0 , , f M } P f j ( d ( f ^ n , f j ) s n ) inf f ^ n max j { 0 , , M } P f j ( Ψ * ( f ^ n ) j )