Skip to content Skip to navigation

Connexions

You are here: Home » Content » An Example of the Use of Sieves for Complexity Regularization in Denoising

Navigation

Content Actions

  • Download module PDF
  • Add to ...
    Add the module to:
    • My Favorites
    • A lens
    • An external social bookmarking service
    • My Favorites (What is 'My Favorites'?)
      'My Favorites' is a special kind of lens which you can use to bookmark modules and collections directly in Connexions. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need a Connexions account to use 'My Favorites'.
    • A lens (What is a lens?)

      Definition of a lens

      Lenses

      A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

      What is in a lens?

      Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

      Who can create a lens?

      Any individual Connexions member, a community, or a respected organization.

    • External bookmarks
  • E-mail the author

Recently Viewed

This feature requires Javascript to be enabled.

An Example of the Use of Sieves for Complexity Regularization in Denoising

Module by: Robert Nowak

An example of the use of Sieves for complexity regularization in denoising

Consider the following setting. Let

Y = f * ( X ) + W , Y = f * ( X ) + W , (1)

where X is a random variable (r.v.) on X=[0,1]X=[0,1], WW is a r.v. on Y=R,Y=R, independent of XX and satisfying

E [ W ] = 0 and E [ W 2 ] = σ 2 < . E [ W ] = 0 and E [ W 2 ] = σ 2 < . (2)

Finally let f*:[0,1]Rf*:[0,1]R be a function satisfying

| f * ( t ) - f * ( s ) | L | t - s | , t , s [ 0 , 1 ] , | f * ( t ) - f * ( s ) | L | t - s | , t , s [ 0 , 1 ] , (3)

where L>0L>0 is a constant. A function satisfying condition (Equation 3) is said to be Lipschitz on [0,1][0,1]. Notice that such a function must be continuous, but it is not necessarily differentiable. An example of such a function is depicted in Figure 1(a).

Figure 1: Example of a Lipschitz function, and our observations setting. (a) random sampling of f*f*, the points correspond to (Xi,Yi),i=1,...,n(Xi,Yi),i=1,...,n; (b) deterministic sampling of f*f*, the points correspond to (i/n,Yi),i=1,...,n(i/n,Yi),i=1,...,n.
Subfigure 1.1
lipschitz.png
Subfigure 1.2
lipschitz2.png

Note that

E [ Y | X = x ] = E [ f * ( X ) + W | X = x ] = E [ f * ( x ) + W | X = x ] = f * ( x ) + E [ W ] = f * ( x ) . E [ Y | X = x ] = E [ f * ( X ) + W | X = x ] = E [ f * ( x ) + W | X = x ] = f * ( x ) + E [ W ] = f * ( x ) . (4)

Consider our usual setup: Estimate f*f* using nn training examples

{ X i , Y i } i = 1 n i . i . d . P X Y , Y i = f * ( X i ) + W i , i = { 1 , ... , n } , { X i , Y i } i = 1 n i . i . d . P X Y , Y i = f * ( X i ) + W i , i = { 1 , ... , n } , (5)

where i.i.d.i.i.d. means independently and identically distributed. Figure 1(a) illustrates this setup.

In many applications we can sample X=[0,1]X=[0,1] as we like, and not necessarily at random. For example we can take nn samples uniformly on [0,1]

x i = i n , i = 1 , ... , n , Y i = f ( x i ) + W i = f i n + W i . x i = i n , i = 1 , ... , n , Y i = f ( x i ) + W i = f i n + W i . (6)

We will proceed with this setup (as in Figure 1(b)) in the rest of the lecture.

Our goal is to find f^nf^n such that E[f*-f^n2]0,asn0E[f*-f^n2]0,asn0 (here ·· is the usual L2L2-norm; i.e., f*-f^n2=01|f*(t)-f^n(t)|2dtf*-f^n2=01|f*(t)-f^n(t)|2dt).

Let

F = { f : f is Lipschitz with constant L } . F = { f : f is Lipschitz with constant L } . (7)

The Risk is defined as

R ( f ) = f * - f 2 = 0 1 | f * ( t ) - f ( t ) | 2 d t . R ( f ) = f * - f 2 = 0 1 | f * ( t ) - f ( t ) | 2 d t . (8)

The Expected Risk (recall that our estimator f^nf^n is based on {xi,Yi}{xi,Yi} and hence is a r.v.) is defined as

E [ R ( f ^ n ) ] = E [ f * - f ^ n 2 ] . E [ R ( f ^ n ) ] = E [ f * - f ^ n 2 ] . (9)

Finally the Empirical Risk is defined as

R ^ n ( f ) = 1 n i = 1 n f i n - Y i 2 . R ^ n ( f ) = 1 n i = 1 n f i n - Y i 2 . (10)

Let 0<m1m2m30<m1m2m3 be a sequence of integers satisfying mnmn as nn, and knmn=nknmn=n for some integer kn>0kn>0. That is, for each value of nn there is an associated integer value mnmn. Define the Sieve F1,F2,F3,...,F1,F2,F3,...,

F n = f : f ( t ) = j = 1 m n c j 1 { j - 1 m n t < j m n } , c j R . F n = f : f ( t ) = j = 1 m n c j 1 { j - 1 m n t < j m n } , c j R . (11)

FnFn is the space of functions that are constant on intervals

I j , m n j - 1 m n , j m n , j = 1 , ... , m n . I j , m n j - 1 m n , j m n , j = 1 , ... , m n . (12)

From here on we will use mm and kk instead of mnmn and knkn (dropping the subscript nn) for notational ease. Define

f n ( t ) = j = 1 m c j * 1 { t I j , m } , where c j * = 1 k i : i n I j , m f * i n f n ( t ) = j = 1 m c j * 1 { t I j , m } , where c j * = 1 k i : i n I j , m f * i n (13)

Note that fnFnfnFn.

Exercise 1 Upper bound f*-fn2f*-fn2.

f * - f 2 = 0 1 | f * ( t ) - f n ( t ) | 2 d t = j = 1 m I j , m | f * ( t ) - f n ( t ) | 2 d t = j = 1 m I j , m | f * ( t ) - c j * | 2 d t = j = 1 m I j , m f * ( t ) - 1 k i : i n I j , m f * i n 2 d t = j = 1 m I j , m 1 k i : i n I j , m f * ( t ) - f * i n 2 d t j = 1 m I j , m 1 k i : i n I j , m f * ( t ) - f * i n 2 d t j = 1 m I j , m 1 k i : i n I j , m L m 2 d t = j = 1 m I j , m L m 2 d t = j = 1 m 1 m L m 2 = L m 2 . f * - f 2 = 0 1 | f * ( t ) - f n ( t ) | 2 d t = j = 1 m I j , m | f * ( t ) - f n ( t ) | 2 d t = j = 1 m I j , m | f * ( t ) - c j * | 2 d t = j = 1 m I j , m f * ( t ) - 1 k i : i n I j , m f * i n 2 d t = j = 1 m I j , m 1 k i : i n I j , m f * ( t ) - f * i n 2 d t j = 1 m I j , m 1 k i : i n I j , m f * ( t ) - f * i n 2 d t j = 1 m I j , m 1 k i : i n I j , m L m 2 d t = j = 1 m I j , m L m 2 d t = j = 1 m 1 m L m 2 = L m 2 . (14)

The above implies that f*-fn20asn,f*-fn20asn, since m=mnasnm=mnasn. In words, with nn sufficiently large we can approximate f*f* to arbitrary accuracy using models in FnFn (even if the functions we are using to approximate f*f* are not Lipschitz!).

For any fFn,fFn,f=j=1mcj1{tIj,m},f=j=1mcj1{tIj,m}, we have

R ^ n ( f ) = 1 n j = 1 m i : i n I j , m ( c j - Y i ) 2 . R ^ n ( f ) = 1 n j = 1 m i : i n I j , m ( c j - Y i ) 2 . (15)

Let f^n=argminfFnR^n(f).f^n=argminfFnR^n(f). Then

f ^ n ( t ) = j = 1 m c ^ j 1 { t I j , m } , where c ^ j = 1 k i : i n I j , m Y i f ^ n ( t ) = j = 1 m c ^ j 1 { t I j , m } , where c ^ j = 1 k i : i n I j , m Y i (16)

Exercise 2 Show (Equation 16).

Note that E[c^j]=cj*E[c^j]=cj* and therefore E[f^n(t)]=fn(t)E[f^n(t)]=fn