Skip to content Skip to navigation

Connexions

You are here: Home » Content » NonLinear Approximation and Wavelet Analysis

Navigation

Content Actions

  • Download module PDF
  • Add to ...
    Add the module to:
    • My Favorites
    • A lens
    • An external social bookmarking service
    • My Favorites (What is 'My Favorites'?)
      'My Favorites' is a special kind of lens which you can use to bookmark modules and collections directly in Connexions. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need a Connexions account to use 'My Favorites'.
    • A lens (What is a lens?)

      Definition of a lens

      Lenses

      A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

      What is in a lens?

      Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

      Who can create a lens?

      Any individual Connexions member, a community, or a respected organization.

    • External bookmarks
  • E-mail the author

Recently Viewed

NonLinear Approximation and Wavelet Analysis

Module by: Robert Nowak

Review

In Lecture 4 and 15, we investigated the problem of denoising a smooth signal in additive white noise. In Lecture 4, we considered Lipschitz functions and showed that by filling constants on a uniform partition of width n-1/3n-1/3 we can achieve an n-2/3n-2/3 rate of MSE convergence.

In Lecture 15, we considered Holder-αα smooth functions, and we demonstrated that by automatically selecting partition width and using polynomial fits we can obtain a MSE convergence rate of n-2α/2α+1n-2α/2α+1, substantially better when α>1α>1. Also important is the fact that we don't need to know the value of αα a priori. The estimator f^nf^n is fundamentally different than its counterpart in Lecture 4.

In both cases f^n(t)f^n(t) is a linear function (polynomial on constant fit) of the data in each interval of the underlying partition. In Lecture 4, the partition was independent of the data, and so the overall estimator is a linear function of the data .

However, in Lecture 15 the partition itself was selected based on the data. Consequently, f^n(t)f^n(t) is a non-linear function of the data . Linear estimators (linear functions of the data) cannot adapt to unknown degrees of smoothness. In this lecture, we lay the groundwork for one more important extension in the denoising application - spatial adaptivity. That is, we would like to construct estimators that not only adapt to unknown degrees of global smoothness, but that also adapt to spatially varying degrees of smoothness.

We will focus on the approximation theoretic aspects of the problem in this lecture, considering tree-based approximations and wavelet expansions. In the next lecture, we will apply these results to the denoising problem, this will bring us up to date with the current state-of-the-art in denoising and non-parametric estimation.

Recall that Holder spaces contain smooth functions that are well approximated with polynomials or piecewise polynomial functions. Holder spaces are quite large and contain many interesting signals. However, Holder spaces are still inadequate in many applications. Often, we encounter functions that are not smooth everywhere; they contain discontinuities, jumps, spikes, etc. Indeed, the "singularities" (or non-smooth points) can be the most interesting and informative aspects of the functions.

Example 1 Functions not smooth everywhere.

Figure 1: Example of functions not smooth everywhere. (a) 1-D Case (b) 2-D Case
Subfigure 1.1
f1.png
Subfigure 1.2
f2.png

Furthermore, functions of interest may possess different degrees of smoothness in different regions.

Example 2 Functions with different degrees of smoothness.

Figure 2: Example of functions having different degrees of smoothness. (a) 1-D Case (b) 2-D Case
Subfigure 2.1
f3.png
Subfigure 2.2
f4.png

NonLinear Approximation via Trees

Let Bα(Cα)Bα(Cα) denote the set of all functions that are Hα(Cα)Hα(Cα) everywhere except on a set of measure zero. To simplify the notation, we won't explicitly identify the domain (e.g., [0,1][0,1] or [0,1]d[0,1]d); that will be clear from the context.

Example 3 Sets of measure zero.

Figure 3: Sets of measure zero. (a) 1-D Case (b) 2-D Case
Subfigure 3.1
f5.png
Subfigure 3.2
f6.png

Let's consider a 1-D case first.

Let fBα(Cα)fBα(Cα) and consider approximating ff by a piecewise polynomial function on a uniform partition.

If ff is Holder-αα smooth everywhere, then by using an appropriate partition width k-1k-1 and fitting degree αα polynomials on each interval we have an approximation fkfk satisfying

| f ( t ) - f k ( t ) | C α k - α | f ( t ) - f k ( t ) | C α k - α (1)

and

| | f - f k | | L 2 2 = O ( k - 2 α ) | | f - f k | | L 2 2 = O ( k - 2 α ) (2)

.

Figure 4: Smooth curve with a discontinuity.
f7.png

However, if there is a discontinuity then for tt in the interval containing the discontinuity the difference

| f ( t ) - f k ( t ) | | f ( t ) - f k ( t ) | (3)

will not be small.

Example 4 Suppose ff is piecewise Lipschitz and fkfk ia a piecewise constant.

Figure 5
f8_try.png
| f ( t ) - f k ( t ) | Δ | f ( t ) - f k ( t ) | Δ (4)

where ΔΔ is a constant equal to average of ff on right and left side of discontinuity in this interval.

| | f - f k | | L 2 2 = O ( k - 1 | | f - f k | | L 2 2 = O ( k - 1 (5)

where k-1k-1 is the width of the interval. Notice this rate is quite slow.

This problem naturally suggests the following remedy: use very small intervals near discontinuities and larger intervals in smooth regions. Specifically, suppose we use intervals of width k-2αk-2α to contain the discontinuities and the intervals of width k-1k-1 elsewhere. Then accordingly piecewise polynomial approximation f˜kf˜k satisfies

| | f - f ˜ k | | L 2 2 = O ( k - 2 α ) | | f - f ˜ k | | L 2 2 = O ( k - 2 α ) (6)

.

We can accomplish this need for "adaptive resolution" or "multiresolution" using recursive partitions and trees.

Recursive Dyadic Partitions

We discussed this idea already in our examination of classification trees. Here is the basic idea again, graphically.

Figure 6: Complete and pruned RDP along with their correspnding tree structures.
f9.png

Consider a function fBα(Cα)fBα(Cα) that contains no more than m points of discontinuity, and is Hα(Cα)Hα(Cα) away from these points.

Lemma 1 Consider a complete RDP with n intervals, then there exists an associated pruned RDP with O(klogn)O(klogn) intervals, such that an associated piecewise degree αα polynomial approximation (˜f)k(˜f)k, has a squared approximation error of O(min(k-2α,n-1))O(min(k-2α,n-1)).

Proof:Proof: Assume n>k>mn>k>m. Divide [0,1][0,1] into kk intervals. If ff is smooth on a particular interval II, then

| f ( t ) - f ˜ k ( t ) | = O ( k - 2 α ) t I | f ( t ) - f ˜ k ( t ) | = O ( k - 2 α ) t I (7)

.

In intervals that contain a discontinuity, recursively subdivide into two until the discontinuity is contained in an interval of width n-1n-1. This process results in at most log2nlog2n addition subintervals per discontinuity, and the squared approximation error is O(k-2α)O(k-2α) on all of them accept the mm intervals of width n-1n-1 containing the discontinuities where the error is O(1)O(1) at each point.

Thus, the overall squared L2L2 norm is

| | f - f ˜ k | | L 2 2 = O ( m i n ( k - 2 α , n - 1 ) ) | | f - f ˜ k | | L 2 2 = O ( m i n ( k - 2 α , n - 1 ) ) (8)

and there are at most k+log2nk+log2n intervals in the partition. Since k>m, we can upperbound the number of intervals by 2klog2n2klog2n.

Note that if the initial complete RDP has nk2αnk2α intervals, then the squared error is O(k-2α)O(k-2α).

Thus, we only incur a factor of 2αlogk2αlogk additional leafs and achieve the same overall approximation error as in the Hα(Cα)Hα(Cα) case. We will see that this is a small price to pay in order to handle not only smooth functions, but also piecewise smooth functions.

Wavelet Approximations

Let fL2([0,1])fL2([0,1]); f2(t)dt<f2(t)dt<.

A wavelet approximation is a series of the form

f = c o + j 0 k = 1 2 j < f , ψ j , k > ψ j , k f = c o + j 0 k = 1 2 j < f , ψ j , k > ψ j , k (9)

where coco is a constant (co=01f(t)dt)(co=01f(t)dt),

< f , ψ j , k > = 0 1 f ( t ) ψ j , k ( t ) d t < f , ψ j , k > = 0 1 f ( t ) ψ j , k ( t ) d t (10)

and the basis functions ψj,kψj,k are orthonormal, oscillatory signals, each with an associated scale 2-j2-j and position k2-jk2-j. ψj,kψj,k is called the wavelet at scale 2-j2-j and position k2-jk2-j.

Example 5 Haar Wavelets

ψ j , k ( t ) = 2 j / 2 ( 1 { t [ 2 - j ( k - 1 ) , 2 - j ( k - 1 / 2 ) ] } - 1 { t [ 2 - j ( k - 1 / 2 ) , 2 - j k ] } ) ψ j , k ( t ) = 2 j / 2 ( 1 { t [ 2 - j ( k - 1 ) , 2 - j ( k - 1 / 2 ) ] } - 1 { t [ 2 - j ( k - 1 / 2 ) , 2 - j k ] } ) (11)
Figure 7: Haar Wavelet
f10.png
0 1 ψ j , k ( t ) d t = 0 0 1 ψ j , k ( t ) d t = 0 (12)
0 1 ψ j , k 2 ( t ) d t = ( k - 1 ) 2 - j k 2 - j 2 j d t = 1 0 1 ψ j , k 2 ( t ) d t = ( k - 1 ) 2 - j k 2 - j 2 j d t = 1 (13)
0 1 ψ j , k ( t ) ψ l , m ( t ) d t = δ j , l . δ k , m 0 1 ψ j , k ( t ) ψ l , m ( t ) d t = δ j , l . δ k , m (14)

Note: If ff is constant on [2-j(k-1),2-jk][2-j(k-1),2-jk], then

f ψ j , k ( t ) = 0 f ψ j , k ( t ) = 0 (15)

.

Suppose ff is piecewise constant with at most mm discontinuities. Let

f J = c o + j = 0 J - 1 k = 1 2 j < f , ψ j , k > ψ j , k f J = c o + j = 0 J - 1 k = 1 2 j < f , ψ j , k > ψ j , k (16)

. Then, fJfJ has at most mJmJ non-zero wavelet coefficients; i.e., <f,ψj,k>=0<f,ψj,k>=0 for all but mJmJ terms, since at most one Haar Wavelet at each scale senses each point of discontinuity. Said another way, all but at most mm of the wavelets at each scale have support over constant regions of ff.

fJfJ itself will be piecewise constant with discontinuities only possible occurring at end points of the intervals [2-J(k-1),2-Jk][2-J(k-1),2-Jk]. Therefore, in this case

| | f - f J | | L 2 2 = O ( 2 - J ) . | | f - f J | | L 2 2 = O ( 2 - J ) . (17)

Daubechies wavelets are the extension of the Haar wavelet idea. Haar wavelets have one "vanishing moment":

0 1 ψ j , k = 0 . 0 1 ψ j , k = 0 . (18)

Daubechies wavelets are "smoother" basis functions with extra vanishing moments. The Daubechies-NN wavelet has NN vanishing moments.

0 1 t l ψ j , k d t = 0 f o r l = 0 , 1 , . . . , N - 1 . 0 1 t l ψ j , k d t = 0 f o r l = 0 , 1 , . . . , N - 1 . (19)

The Daubechies-1 wavelet is just the Haar case.

If ff is a piecewise degree NN polynomial with at most m pieces, then using the Daubechies-NN wavelet system.

| | f - f J | | L 2 2 = O ( 2 - J ) ; | | f - f J | | L 2 2 = O ( 2 - J ) ; (20)

and

f J ( t ) = c o + j = 0 J - 1 k = 1 2 j < f , ψ j , k > ψ j , k ( t ) f J ( t ) = c o + j = 0 J - 1 k = 1 2 j < f , ψ j , k > ψ j , k ( t ) (21)

has at most O(mJ)O(mJ) non-zero wavelet coefficients. fJfJ is called the Discrete Wavelet Transform (DWT) approximation of ff. The key idea is the same as we saw with trees.

Sampled Data

We can also use DWT's to analyze and represent discrete, sampled functions. Suppose,

f ̲ = [ f ( 1 / n ) , f ( 2 / n ) , . . . , f ( n / n ) ] f ̲ = [ f ( 1 / n ) , f ( 2 / n ) , . . . , f ( n / n ) ] (22)

then we can write f̲f̲ as

f ̲ = c o + j = 0 l o g 2 n - 1 k = 1 2 j < f ̲ , ψ ̲ j , k > ψ ̲ j , k f ̲ = c o + j = 0 l o g 2 n - 1 k = 1 2 j < f ̲ , ψ ̲ j , k > ψ ̲ j , k (23)

where

ψ ̲ j , k = [ ψ j , k ( 1 ) , ψ j , k ( 2 ) , . . . , ψ j , k ( n ) ] ψ ̲ j , k = [ ψ j , k ( 1 ) , ψ j , k ( 2 ) , . . . , ψ j , k ( n ) ] (24)

is a discrete time analog of the continuous time wavelets we considered before. In particular,

i = 1 n i l ψ j , k ( i ) = 0 , l = 0 , 1 , . . . , N - 1 i = 1 n i l ψ j , k ( i ) = 0 , l = 0 , 1 , . . . , N - 1 (25)

for the Daubechies-NN discrete wavelets.

< f ̲ , ψ ̲ j , k > = f ̲ T ψ ̲ j , k < f ̲ , ψ ̲ j , k > = f ̲ T ψ ̲ j , k (26)

Thus, we also have an analogous approximation result: If f̲f̲ are samples from a piecewise degree NN polynomial function with a finite number mm of discontinuities, then f̲f̲ has O(mJ)O(mJ) non-zero wavelet coefficients.

Approximating functions with wavelets

Suppose fBα(Cα)fBα(Cα) and has a finite number of discontinuities. Let fpfp denote piecewise degree-N(N=α)N(N=α) polynomial approximation to ff with O(k)O(k) pieces; a uniform partition into kk equal length intervals followed by addition splits at the points of discontinuity.

Figure 8
f11.png

Then