Skip to content Skip to navigation

Connexions

You are here: Home » Content » Approximation and Processing in Bases

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • NSF Partnership display tagshide tags

    This module is included inLens: NSF Partnership in Signal Processing
    By: Sidney BurrusAs a part of collection: "A Wavelet Tour of Signal Processing, The Sparse Way"

    Click the "NSF Partnership" link to see all content affiliated with them.

    Click the tag icon tag icon to display tags associated with this content.

  • Featured Content display tagshide tags

    This module is included inLens: Connexions Featured Content
    By: ConnexionsAs a part of collection: "A Wavelet Tour of Signal Processing, The Sparse Way"

    Click the "Featured Content" link to see all content affiliated with them.

    Click the tag icon tag icon to display tags associated with this content.

Also in these lenses

  • UniqU content

    This module is included inLens: UniqU's lens
    By: UniqU, LLCAs a part of collection: "A Wavelet Tour of Signal Processing, The Sparse Way"

    Click the "UniqU content" link to see all content selected in this lens.

  • Lens for Engineering

    This module is included inLens: Lens for Engineering
    By: Sidney Burrus

    Click the "Lens for Engineering" link to see all content selected in this lens.

Recently Viewed

This feature requires Javascript to be enabled.

Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.
 

Approximation and Processing in Bases

Module by: Stephane Mallat. E-mail the author

Summary: This collection comprises Chapter 1 of the book A Wavelet Tour of Signal Processing, The Sparse Way (third edition, 2009) by Stéphane Mallat. The book's website at Academic Press is http://www.elsevier.com/wps/find/bookdescription.cws_home/714561/description#description The book's complementary materials are available at http://wavelet-tour.com

Analog-to-digital signal conversion is the first step of digital signal processing. Chapter 3 explains that it amounts to projecting the signal over a basis of an approximation space. Most often, the resulting digital representation remains much too large and needs to be further reduced. A digital image typically includes more than 106106 samples and a CD music recording has 40×10340×103 samples per second. Sparse representations that reduce the number of parameters can be obtained by thresholding coefficients in an appropriate orthogonal basis. Efficient compression and noise-reduction algorithms are then implemented with simple operators in this basis.

Stochastic versus Deterministic Signal Models

A representation is optimized relative to a signal class, corresponding to all potential signals encountered in an application. This requires building signal models that carry available prior information.

A signal ff can be modeled as a realization of a random process FF, the probability distribution of which is known a priori. A Bayesian approach then tries to minimize the expected approximation error. Linear approximations are simpler because they only depend on the covariance. Chapter 9 shows that optimal linear approximations are obtained on the basis of principal components that are the eigenvectors of the covariance matrix. However, the expected error of nonlinear approximations depends on the full probability distribution of FF. This distribution is most often not known for complex signals, such as images or sounds, because their transient structures are not adequately modeled as realizations of known processes such as Gaussian ones.

To optimize nonlinear representations, weaker but sufficiently powerful deterministic models can be elaborated. A deterministic model specifies a set Θ, where the signal belongs. This set is defined by any prior information—for example, on the time-frequency localization of transients in musical recordings or on the geometric regularity of edges in images. Simple models can also define Θ as a ball in a functional space, with a specific regularity norm such as a total variation norm. A stochastic model is richer because it provides the probability distribution in Θ. When this distribution is not available, the average error cannot be calculated and is replaced by the maximum error over Θ. Optimizing the representation then amounts to mini-mizing this maximum error, which is called a minimax optimization.

Sampling with Linear Approximations

Analog-to-digital signal conversion is most often implemented with a linear approximation operator that filters and samples the input analog signal. From these samples, a linear digital-to-analog converter recovers a projection of the original analog signal over an approximation space whose dimension depends on the sampling density. Linear approximations project signals in spaces of lowest possible dimensions to reduce computations and storage cost, while controlling the resulting error.

Sampling Theorems

Let us consider finite energy signals f¯2=|f¯(x)|2dxf¯2=|f¯(x)|2dx of finite support, which is normalized to [0,1][0,1] or [0,1]2[0,1]2 for images. A sampling process implements a filtering of f¯(x)f¯(x) with a low-pass impulse response φ¯s(x)φ¯s(x) and a uniform sampling to output a discrete signal:

f [ n ] = f ¯ φ ¯ s ( n s ) for 0 n < N . f [ n ] = f ¯ φ ¯ s ( n s ) for 0 n < N .
(1)

In two dimensions, n=(n1,n2)n=(n1,n2) and x=(x1,x2)x=(x1,x2). These filtered samples can also be written as inner products:

f ¯ φ ¯ s ( n s ) = f ( u ) φ ¯ s ( n s - u ) d u = f ( x ) , φ s ( x - n s ) f ¯ φ ¯ s ( n s ) = f ( u ) φ ¯ s ( n s - u ) d u = f ( x ) , φ s ( x - n s )
(2)

with φs(x)=φ¯s(-x)φs(x)=φ¯s(-x). Chapter 3 explains that φs is chosen, like in the classic Shannon–Whittaker sampling theorem, so that a family of functions {φs(x-ns)}1nN{φs(x-ns)}1nN is a basis of an appropriate approximation space UN. The best linear approximation of f¯f¯ in UN recovered from these samples is the orthogonal projection f¯Nf¯N of ff in UN, and if the basis is orthonormal, then

f ¯ N ( x ) = n = 0 N - 1 f [ n ] φ s ( x - n s ) . f ¯ N ( x ) = n = 0 N - 1 f [ n ] φ s ( x - n s ) .
(3)

A sampling theorem states that if f¯UNf¯UN then f¯=f¯Nf¯=f¯N so recovers f¯(x)f¯(x) from the measured samples. Most often, f¯f¯ does not belong to this approximation space. It is called aliasing in the context of Shannon–Whittaker sampling, where UN is the space of functions having a frequency support restricted to the N lower frequencies. The approximation error f¯-f¯N2f¯-f¯N2 must then be controlled.

Linear Approximation Error

The approximation error is computed by finding an orthogonal basis B={g¯m(x)}0m<+B={g¯m(x)}0m<+ of the whole analog signal space L2(R)[0,1]2L2(R)[0,1]2, with the first N vector {g¯m(x)}0m<N{g¯m(x)}0m<N that defines an orthogonal basis of UN. Thus, the orthogonal projection on UN can be rewritten as

f ¯ N ( x ) = m = 0 N - 1 f ¯ , g ¯ m g ¯ m ( x ) . f ¯ N ( x ) = m = 0 N - 1 f ¯ , g ¯ m g ¯ m ( x ) .
(4)

Since f¯=m=0+f¯,g¯mg¯mf¯=m=0+f¯,g¯mg¯m, the approximation error is the energy of the removed inner products:

ϵ l ( N , f ) = f ¯ - f ¯ N 2 = m = N + | f ¯ , g ¯ m | 2 . ϵ l ( N , f ) = f ¯ - f ¯ N 2 = m = N + | f ¯ , g ¯ m | 2 .
(5)

This error decreases quickly when N increases if the coefficient amplitudes |f¯,g¯m||f¯,g¯m| have a fast decay when the index m increases. The dimension N is adjusted to the desired approximation error.

Figure (a) shows a discrete image f[n]f[n] approximated with N=2562N=2562 pixels. Figure (c) displays a lower-resolution image fN/16fN/16 projected on a space UN/16UN/16 of dimension N/16N/16, generated by N/16N/16 large-scale wavelets. It is calculated by setting all the wavelet coefficients to zero at the first two smaller scales. The approximation error is f-fN/162/f2=14×10-3f-fN/162/f2=14×10-3. Reducing the resolution introduces more blur and errors. A linear approximation space UN corresponds to a uniform grid that approximates precisely uniform regular signals. Since images f¯f¯ are often not uniformly regular, it is necessary to measure it at a high-resolution N. This is why digital cameras have a resolution that increases as technology improves.

Sparse Nonlinear Approximations

Linear approximations reduce the space dimensionality but can introduce important errors when reducing the resolution if the signal is not uniformly regular, as shown by Figure (c). To improve such approximations, more coefficients should be kept where needed—not in regular regions but near sharp transitions and edges.This requires defining an irregular sampling adapted to the local signal regularity. This optimized irregular sampling has a simple equivalent solution through nonlinear approximations in wavelet bases.

Nonlinear approximations operate in two stages. First, a linear operator approximates the analog signal f¯f¯ with N samples written f[n]=f¯φ¯s(ns)f[n]=f¯φ¯s(ns). Then, a nonlinear approximation of f[n]f[n] is computed to reduce the N coefficients f[n]f[n] to MNMN coefficients in a sparse representation.

The discrete signal ff can be considered as a vector of CNCN. Inner products and norms in CNCN are written

f , g = n = 0 N - 1 f [ n ] g * [ n ] and f 2 = n = 0 N - 1 | f [ n ] | 2 . f , g = n = 0 N - 1 f [ n ] g * [ n ] and f 2 = n = 0 N - 1 | f [ n ] | 2 .
(6)

To obtain a sparse representation with a nonlinear approximation, we choose a new orthonormal basis B={gm[n]}mΓB={gm[n]}mΓ of CNCN, which concentrates the signal energy as much as possible over few coefficients. Signal coefficients {f,gm}mΓ{f,gm}mΓ are computed from the N input sample values f[n]f[n] with an orthogonal change of basis that takes N2 operations in nonstructured bases. In a wavelet or Fourier bases, fast algorithms require, respectively, O(N)O(N) and O(Nlog2N)O(Nlog2N) operations.

Approximation by Thresholding

For M<NM<N, an approximation fMfM is computed by selecting the “best” M<NM<N vectors within BB. The orthogonal projection of ff on the space Vλ generated by M vectors {gm}mΛ{gm}mΛ in BB is

f λ = m λ f , g m g m . f λ = m λ f , g m g m .
(7)

Since f=mγf,gmgmf=mγf,gmgm, the resulting error is

f - f λ 2 = m / λ | f , g m | 2 . f - f λ 2 = m / λ | f , g m | 2 .
(8)

We write |λ||λ| the size of the set λ. The best M=|λ|M=|λ| term approximation, which minimizes f-fλ2f-fλ2,  is thus obtained by selecting the M coefficients of largest amplitude. These coefficients are above a threshold T that depends on M:

f M = f λ T = m λ T f , g m g m with λ T = { m γ : | f , g m | T } . f M = f λ T = m λ T f , g m g m with λ T = { m γ : | f , g m | T } .
(9)

This approximation is nonlinear because the approximation set λT changes with ff. The resulting approximation error is:

ϵ n ( M , f ) = f - f M 2 = m / Λ T | f , g m | 2 . ϵ n ( M , f ) = f - f M 2 = m / Λ T | f , g m | 2 .
(10)

(Reference)(b) shows that the approximation support λT of an image in a wavelet orthonormal basis depends on the geometry of edges and textures. Keeping large wavelet coefficients is equivalent to constructing an adaptive approximation grid specified by the scale–space support λT. It increases the approximation resolution where the signal is irregular. The geometry of λT gives the spatial distribution of sharp image transitions and edges, and their propagation across scales. Chapter 6 proves that wavelet coefficients give important information about singularities and local Lipschitz regularity. This example illustrates how approximation support provides “geometric” information on ff, relative to a dictionary, that is a wavelet basis in this example.

(Reference)(d) gives the nonlinear wavelet approximation fMfM recovered from the M=N/16M=N/16 large-amplitude wavelet coefficients, with an error f-fM2/f2=5×10-3f-fM2/f2=5×10-3. This error is nearly three times smaller than the linear approximation error obtained with the same number of wavelet coefficients, and the image quality is much better.

An analog signal can be recovered from the discrete nonlinear approxima-tion fMfM:

f ¯ M ( x ) = n = 0 N - 1 f M [ n ] φ s ( x - n s ) . f ¯ M ( x ) = n = 0 N - 1 f M [ n ] φ s ( x - n s ) .
(11)

Since all projections are orthogonal, the overall approximation error on the original analog signal f¯(x)f¯(x) is the sum of the analog sampling error and the discrete nonlinear error:

f ¯ - f ¯ M 2 = f ¯ - f ¯ N 2 + f - f M 2 = ϵ l ( N , f ) + ϵ n ( M , f ) . f ¯ - f ¯ M 2 = f ¯ - f ¯ N 2 + f - f M 2 = ϵ l ( N , f ) + ϵ n ( M , f ) .
(12)

In practice, N is imposed by the resolution of the signal-acquisition hardware, and M is typically adjusted so that ϵn(M,f)ϵl(N,f)ϵn(M,f)ϵl(N,f).

Sparsity with Regularity

Sparse representations are obtained in a basis that takes advantage of some form of regularity of the input signals, creating many small-amplitude coefficients. Since wavelets have localized support, functions with isolated singularities produce few large-amplitude wavelet coefficients in the neighborhood of these singularities. Nonlinear wavelet approximation produces a small error over spaces of functions that do not have “too many” sharp transitions and singularities. Chapter 9 shows that functions having a bounded total variation norm are useful models for images with nonfractal (finite length) edges.

Edges often define regular geometric curves. Wavelets detect the location of edges but their square support cannot take advantage of their potential geometric regularity. More sparse representations are defined in dictionaries of curvelets or bandlets, which have elongated support in multiple directions, that can be adapted to this geometrical regularity. In such dictionaries, the approximation support λT is smaller but provides explicit information about edges' local geometrical properties such as their orientation. In this context, geometry does not just apply to multidimensional signals. Audio signals, such as musical recordings, also have a complex geometric regularity in time-frequency dictionaries.

Compression

Storage limitations and fast transmission through narrow bandwidth channels require compression of signals while minimizing degradation. Transform codes compress signals by coding a sparse representation. Chapter 10 introduces the information theory needed to understand these codes and to optimize their performance.

In a compression framework, the analog signal has already been discretized into a signal f[n]f[n] of size N. This discrete signal is decomposed in an orthonormal basis B={gm}mΓB={gm}mΓ of CNCN:

f = m Γ f , g m g m . f = m Γ f , g m g m .
(13)

Coefficients f,gmf,gm are approximated by quantized values Q(f,gm)Q(f,gm). If Q is auniform quantizer of step Δ, then |x-Q(x)|Δ/2|x-Q(x)|Δ/2; and if |x|<Δ/2|x|<Δ/2, then Q(x)=0Q(x)=0. The signal f˜f˜ restored from quantized coefficients is

f ˜ = m Γ Q ( f , g m ) g m . f ˜ = m Γ Q ( f , g m ) g m .
(14)

An entropy code records these coefficients with R bits. The goal is to minimize the signal-distortion rate d(R,f)=f˜-f2d(R,f)=f˜-f2.

The coefficients not quantized to zero correspond to the set λT={mγ:|f,gm|T}λT={mγ:|f,gm|T} with T=Δ/2T=Δ/2. For sparse signals, Chapter 10 shows that the bit budget R is dominated by the number of bits to code λT in γ, which is nearly proportional to its size |λT||λT|. This means that the “information” about a sparse representation ismostly geometric. Moreover, the distortion is dominated by the nonlinear approximation error f-fΛT2f-fΛT2, for fΛT=mλTf,gmgmfΛT=mλTf,gmgm. Compression is thus a sparse approximation problem. For a given distortion d(R,f)d(R,f), minimizing R requires reducing |λT||λT| and thus optimizing the sparsity.

The number of bits to code ΛT can take advantage of any prior information on the geometry. (Reference)(b) shows that large wavelet coefficients are not randomly distributed. They have a tendency to be aggregated toward larger scales, and at fine scales they are regrouped along edge curves or in texture regions. Using such prior geometric models is a source of gain in coders such as JPEG-2000.

Chapter 10 describes the implementation of audio transform codes. Image transform codes in block cosine bases and wavelet bases are introduced, together with the JPEG and JPEG-2000 compression standards.

Denoising

Signal-acquisition devices add noise that can be reduced by estimators using prior information on signal properties. Signal processing has long remained mostly Bayesian and linear. Nonlinear smoothing algorithms existed in statistics, but these procedures were often ad hoc and complex. Two statisticians, Donoho andJohnstone (DonohoJ:94), changed the “game” by proving that simple thresholding in sparse representations can yield nearly optimal nonlinear estimators. This was the beginning of a considerable refinement of nonlinear estimation algorithms that is still ongoing.

Let us consider digital measurements that add a random noise W[n]W[n] to the original signal f[n]f[n]:

X [ n ] = f [ n ] + W [ n ] for 0 n < N . X [ n ] = f [ n ] + W [ n ] for 0 n < N .
(15)

The signal ff is estimated by transforming the noisy data X with an operator D:

F ˜ = D X . F ˜ = D X .
(16)

The risk of the estimator F˜F˜ of ff is the average error, calculated with respect to the probability distribution of noise W:

r ( D , f ) = E { f - D X 2 } . r ( D , f ) = E { f - D X 2 } .
(17)

Bayes versus Minimax

To optimize the estimation operator D, one must take advantage of prior information available about signal ff. In a Bayes framework, ff is considered a realization of a random vector FF and the Bayes risk is the expected risk calculated with respect to the prior probability distribution π of the random signal model FF:

r ( D , π ) = E π { r ( D , F ) } . r ( D , π ) = E π { r ( D , F ) } .
(18)

Optimizing D among all possible operators yields the minimum Bayes risk:

r n ( π ) = inf a l l D r ( D , π ) . r n ( π ) = inf a l l D r ( D , π ) .
(19)

In the 1940s, Wald brought in a new perspective on statistics with a decision theory partly imported from the theory of games. This point of view uses deterministic models, where signals are elements of a set Θ, without specifying their probability distribution in this set. To control the risk for any fΘfΘ, we compute the maximum risk:

r ( D , Θ ) = sup f Θ r ( D , f ) . r ( D , Θ ) = sup f Θ r ( D , f ) .
(20)

The minimax risk is the lower bound computed over all operators D:

r n ( Θ ) = inf a l l D r ( D , Θ ) . r n ( Θ ) = inf a l l D r ( D , Θ ) .
(21)

In practice, the goal is to find an operator D that is simple to implement and yields a risk close to the minimax lower bound.

Figure 1
Figure 1 (newfig2.png)

Thresholding Estimators

It is tempting to restrict calculations to linear operators D because of their simplicity. Optimal linear Wiener estimators are introduced in Chapter 11. Figure (a) is an image contaminated by Gaussian white noise. Figure (b) shows an optimized linear filtering estimation F˜=Xh[n]F˜=Xh[n], which is therefore diagonal in a Fourier basis BB. This convolution operator averages the noise but also blurs the image and keeps low-frequency noise by retaining the image's low frequencies.

If ff has a sparse representation in a dictionary, then projecting X on the vectors of this sparse support can considerably improve linear estimators. The difficulty is identifying the sparse support of ff from the noisy data X. Donoho and Johnstone (DonohoJ:94) proved that, in an orthonormal basis, a simple thresholding of noisy coefficients does the trick. Noisy signal coefficients in an orthonormal basisB={gm}mΓB={gm}mΓ are

X , g m = f , g m + W , g m for m γ . X , g m = f , g m + W , g m for m γ .
(22)

Thresholding these noisy coefficients yields an orthogonal projection estimator

F ˜ = X Λ ˜ T = m Λ ˜ T X , g m g m with Λ ˜ T = { m γ : | X , g m | T } . F ˜ = X Λ ˜ T = m Λ ˜ T X , g m g m with Λ ˜ T = { m γ : | X , g m | T } .
(23)

The set Λ˜TΛ˜T is an estimate of an approximation support of ff. It is hopefully close to the optimal approximation support λT={mγ:|f,gm|T}λT={mγ:|f,gm|T}.

Figure 1(b) shows the estimated approximation set λ˜Tλ˜T of noisy-wavelet coefficients, |X,ψj,n|T|X,ψj,n|T, that can be compared to the optimal approximation support ΛT shown in (Reference)(b). The estimation in Figure 1(d) from wavelet coefficients in λ˜Tλ˜T has considerably reduced the noise in regular regions while keeping the sharpness of edges by preserving large-wavelet coefficients. This estimation is improved with a translation-invariant procedure that averages this estimator over several translated wavelet bases. Thresholding wavelet coefficients implements an adaptive smoothing, which averages the data X with a kernel that depends on the estimated regularity of the original signal ff.

Donoho and Johnstone proved that for Gaussian white noise of variance σ2, choosing T=σ2logeNT=σ2logeN yields a risk E{f-F˜2}E{f-F˜2} of the order of f-fΛT2f-fΛT2, up to a logeNlogeN factor. This spectacular result shows that the estimated support λ˜Tλ˜T does nearly as well as the optimal unknown support λT. The resulting risk is small if the representation is sparse and precise.

The set λ˜Tλ˜T in Figure 1(b) “looks” different from the λT in (Reference)(b) because it has more isolated points. This indicates that some prior information on the geometry of λT could be used to improve the estimation. For audio noise-reduction, thresholding estimators are applied in sparse representations provided by time-frequency bases. Similar isolated time-frequency coefficients produce a highly annoying “musical noise.” Musical noise is removed with a block thresholding that regularizes the geometry of the estimated support λ˜Tλ˜T and avoids leaving isolated points. Block thresholding also improves wavelet estimators.

If W is a Gaussian noise and signals in Θ have a sparse representation in BB, then Chapter 11 proves that thresholding estimators can produce a nearly minimax risk. In particular, wavelet thresholding estimators have a nearly minimax risk for large classes of piecewise smooth signals, including bounded variation images.

Content actions

Download module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks