Skip to content Skip to navigation

OpenStax_CNX

You are here: Home » Content » Approximation

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Endorsed by Endorsed (What does "Endorsed by" mean?)

This content has been endorsed by the organizations listed. Click each link for a list of all content endorsed by the organization.
  • IEEE-SPS display tagshide tags

    This module is included inLens: IEEE Signal Processing Society Lens
    By: IEEE Signal Processing SocietyAs a part of collection: "Concise Signal Models"

    Comments:

    "A resource on sparse, compressible, and manifold signal models for signals processing and compressed sensing."

    Click the "IEEE-SPS" link to see all content they endorse.

    Click the tag icon tag icon to display tags associated with this content.

Also in these lenses

  • UniqU content

    This module is included inLens: UniqU's lens
    By: UniqU, LLCAs a part of collection: "Concise Signal Models"

    Click the "UniqU content" link to see all content selected in this lens.

Recently Viewed

This feature requires Javascript to be enabled.

Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.
 

Approximation

Module by: Michael Wakin. E-mail the author

Summary: This collection reviews fundamental concepts underlying the use of concise models for signal processing. Topics are presented from a geometric perspective and include low-dimensional linear, sparse, and manifold-based signal models, approximation, compression, dimensionality reduction, and Compressed Sensing.

To this point, we have discussed signal representations and models as basic tools for signal processing. In the following modules, we discuss the actual application of these tools to tasks such as approximation and compression, and we continue to discuss the geometric implications.

Linear approximation

One common prototypical problem in signal processing is to find the best linear approximation to a signal xx. By “best linear approximation,” we mean the best approximation to xx from among a class of signals comprising a linear (or affine) subspace. This situation may arise, for example, when we have a noisy observation of a signal believed to obey a linear model. If we choose an 22 error criterion, the solution to this optimization problem has a particularly strong geometric interpretation.

To be more concrete, suppose SS is a KK-dimensional linear subspace of RNRN. (The case of an affine subspace follows similarly.) If we seek

s * : = arg min s S s - x 2 , s * : = arg min s S s - x 2 ,
(1)
standard linear algebra results state that the minimizer is given by
s * = A T A x , s * = A T A x ,
(2)
where AA is a K×NK×N matrix whose rows form an orthonormal basis for SS. Geometrically, one can easily see that this solution corresponds to an orthogonal projection of xx onto the subspace SS (see Figure 1(a)).

Figure 1: Approximating a signal xR2xR2 with an 22 error criterion. (a) Linear approximation using one element of the dictionary ΨΨ corresponds to orthogonal projection of the signal onto the linear subspace. (b) Nonlinear approximation corresponds to orthogonal projection of the signal onto the nearest candidate subspace. In this case, we choose the best 1-sparse signal that can be built using ΨΨ. (c) Manifold-based approximation, finding the nearest point on MM.
(a)
Figure 1(a) (mbFrameLAdists.png)
(b)
Figure 1(b) (mbFrameNLAdists.png)
(c)
Figure 1(c) (mbFrameManifolddists.png)

The linear approximation problem arises frequently in settings involving signal dictionaries. In some settings, such as the case of an oversampled bandlimited signal, certain coefficients in the vector αα may be assumed to be fixed at zero. In the case where the dictionary ΨΨ forms an orthonormal basis, the linear approximation estimate of the unknown coefficients has a particularly simple form: rows of the matrix AA in Equation 2 are obtained by selecting and transposing the columns of ΨΨ whose expansion coefficients are unknown, and consequently, the unknown coefficients can be estimated simply by taking the inner products of xx against the appropriate columns of ΨΨ.

For example, in choosing a fixed subset of the Fourier or wavelet dictionaries, one may rightfully choose the lowest frequency (coarsest scale) basis functions for the set SS because, as discussed in Linear Models from Low-Dimensional Signal Models , the coefficients generally tend to decay at higher frequencies (finer scales). For smooth functions, this strategy is appropriate and effective; functions in Sobolev smoothness spaces are well-approximated using linear approximations from the Fourier or wavelet dictionaries [11]. For piecewise smooth functions, however, even the wavelet-domain linear approximation strategy would miss out on significant coefficients at fine scales. Since the locations of such coefficients are unknown a priority, it is impossible to propose a linear wavelet-domain approximation scheme that could simultaneously capture all piecewise smooth signals. As an example, Figure 2(a) shows the linear approximation of the Cameraman test image obtained by keeping only the lowest-frequency scaling and wavelet coefficients. No high-frequency information is available to clearly represent features such as edges.

Figure 2: Linear versus nonlinear approximation in the wavelet domain. (a) Linear approximation of the Cameraman test image obtained by keeping the K=4096K=4096 lowest-frequency wavelet coefficients from the five-level wavelet decomposition. The MSE with respect to the original image is 353353. (b) Nonlinear approximation of the Cameraman test image obtained by keeping the K=4096K=4096 largest wavelet coefficients from the five-level wavelet decomposition. The MSE with respect to the original image is 7272. Compared with linear approximation, more high frequency coefficients are included, which allows better representation of features such as edges.
(a)
Figure 2(a) (cameraLinear.png)
(b)
Figure 2(b) (cameraNonlinear.png)

Nonlinear approximation

A related question often arises in settings involving signal dictionaries. Rather than finding the best approximation to a signal ff using a fixed collection of KK elements from the dictionary ΨΨ, one may often seek the best KK-term representation to ff among all possible expansions that use KK terms from the dictionary. Compared to linear approximation, this type of nonlinear approximation [8], [4] utilizes the ability of the dictionary to adapt: different elements may be important for representing different signals.

The KK-term nonlinear approximation problem corresponds to the optimization

s K , p * : = arg min s Σ K s - f p . s K , p * : = arg min s Σ K s - f p .
(3)
(For the sake of generality, we consider general LpLp and pp norms in this section.) Due to the nonlinearity of the set ΣKΣK for a given dictionary, solving this problem can be difficult. Supposing ΨΨ is an orthonormal basis and p=2p=2, the solution to Equation 3 is easily obtained by thresholding: one simply computes the coefficients αα and keeps the KK largest (setting the remaining coefficients to zero). The approximation error is then given simply by the coefficients that are discarded:
s K , 2 * - f 2 = k > K α ˜ k 2 1 / 2 . s K , 2 * - f 2 = k > K α ˜ k 2 1 / 2 .
(4)
When ΨΨ is a redundant dictionary, however, the situation is much more complicated. We mention more on this below (see also Figure 1(b)).

Measuring approximation quality

One common measure for the quality of a dictionary ΨΨ in approximating a signal class is the fidelity of its KK-term representations. Often one examines the asymptotic rate of decay of the KK-term approximation error as KK grows large. Defining

σ K ( f ) p : = s K , p * - f p , σ K ( f ) p : = s K , p * - f p ,
(5)
for a given signal ff we may consider the asymptotic decay of σK(f)pσK(f)p as KK. (We recall the dependence of Equation 3 and hence Equation 5 on the dictionary ΨΨ.) In many cases, the function σK(f)pσK(f)p will decay as K-rK-r for some rr, and when ΨΨ represents a harmonic dictionary, faster decay rates tend to correspond to smoother functions. Indeed, one can show that when ΨΨ is an orthonormal basis, then σK(f)2σK(f)2 will decay as K-rK-r if and only if α˜kα˜k decays as k-r+1/2k-r+1/2 [7].

Nonlinear approximation of piecewise smooth functions

Let fCHfCH be a 1-D function. Supposing the wavelet dictionary has more than HH vanishing moments, then ff can be well approximated using its KK largest coefficients (most of which are at coarse scales). As KK grows large, the nonlinear approximation error will decay1 as σK(f)2K-HσK(f)2K-H.

Supposing that ff is piecewise smooth, however, with a finite number of discontinuities, then (as discussed in Sparse (Nonlinear) Models from Low-Dimensional Signal Models) ff will have a limited number of significant wavelet coefficients at fine scales. Because of the concentration of these significant coefficients within each scale, the nonlinear approximation rate will remain σK(f)2K-HσK(f)2K-H as if there were no discontinuities present [11].

Unfortunately, this resilience of wavelets to discontinuities does not extend to higher dimensions. Suppose, for example, that ff is a CHCH smooth 2-D signal. Assuming the proper number of vanishing moments, a wavelet representation will achieve the optimal nonlinear approximation rate σK(f)2K-H/2σK(f)2K-H/2 [3], [11]. As in the 1-D case, this approximation rate is maintained when a finite number of point discontinuities are introduced into ff. However, when ff contains 1-D discontinuities (edges separating the smooth regions), the approximation rate will fall to σK(f)2K-1/2σK(f)2K-1/2 [11]. The problem actually arises due to the isotropic, dyadic supports of the wavelets; instead of O(1)O(1) significant wavelets at each scale, there are now O(2j)O(2j) wavelets overlapping the discontinuity. We revisit this important issue in Compression.

Despite the limited approximation capabilities for images with edges, nonlinear approximation in the wavelet domain typically offers a superior approximation to an image compared to linear approximation in the wavelet domain. As an example, Figure 2(b) shows the nonlinear approximation of the Cameraman test image obtained by keeping the largest scaling and wavelet coefficients. In this case, a number of high-frequency coefficients are selected, which gives an improved ability to represent features such as edges. Better concise transforms, which capture the image information in even fewer coefficients, would offer further improvements in terms of nonlinear approximation quality.

Finding approximations

As mentioned above, in the case where ΨΨ is an orthonormal basis and p=2p=2, the solution to Equation 3 is easily obtained by thresholding: one simply computes the coefficients αα and keeps the KK largest (setting the remaining coefficients to zero). Thresholding can also be shown to be optimal for arbitrary pp norms in the special case where ΨΨ is the canonical basis. While the optimality of thresholding does not generalize to arbitrary norms and bases, thresholding can be shown to be a near-optimal approximation strategy for wavelet bases with arbitrary LpLp norms [7].

In the case where ΨΨ is a redundant dictionary, however, the expansion coefficients αα are not unique, and the optimization problem Equation 3 can be much more difficult to solve. Indeed, supposing even that an exact KK-term representation exists for ff in the dictionary ΨΨ, finding that KK-term approximation is NP-hard in general, requiring a combinatorial enumeration of the ZKZK possible sparse subspaces [6]. This search can be recast as the optimization problem

α ^ = arg min α 0 s.t. f = Ψ α . α ^ = arg min α 0 s.t. f = Ψ α .
(6)
While solving Equation 6 is prohibitively complex, a variety of algorithms have been proposed as alternatives. One approach convexifies the optimization problem by replacing the 00 fidelity criterion by an 11 criterion
α ^ = arg min α 1 s.t. f = Ψ α . α ^ = arg min α 1 s.t. f = Ψ α .
(7)
This problem, known as Basis Pursuit [5], is significantly more approachable and can be solved with traditional linear programming techniques whose computational complexities are polynomial in ZZ. The 11 criterion has the advantage of yielding a convex optimization problem while still encouraging sparse solutions due to the polytope geometry of the 11 unit ball (see for example [9] and [10]). Iterative greedy algorithms such as Matching Pursuit (MP) and Orthogonal Matching Pursuit (OMP) [11] have also been suggested to find sparse representations αα for a signal ff. Both MP and OMP iteratively select the columns from ΨΨ that are most correlated with ff, then subtract the contribution of each column, leaving a residual. OMP includes an additional step at each iteration where the residual is orthogonalized against the previously selected columns.

Manifold approximation

We also consider the problem of finding the best manifold-based approximation to a signal (see Figure 1(c)). Suppose that F={fθ:θΘ}F={fθ:θΘ} is a parametrized KK-dimension manifold and that we are given a signal II that is believed to approximate fθfθ for an unknown θΘθΘ. From II we wish to recover an estimate of θθ. Again, we may formulate this parameter estimation problem as an optimization, writing the objective function (here we concentrate solely on the L2L2 or 22 case)

D ( θ ) = f θ - I 2 2 D ( θ ) = f θ - I 2 2
(8)
and solving for
θ * = arg min θ Θ D ( θ ) . θ * = arg min θ Θ D ( θ ) .
(9)
We suppose that the minimum is uniquely defined.

Standard nonlinear parameter estimation [2] tells us that, if DD is differentiable, we can use Newton's method to iteratively refine a sequence of guesses θ(0),θ(1),θ(2),θ(0),θ(1),θ(2), to θ*θ* and rapidly convergence to the true value. Supposing that FF is a differentiable manifold, we would let

J = [ D / θ 0 D / θ 1 D / θ K - 1 ] T J = [ D / θ 0 D / θ 1 D / θ K - 1 ] T
(10)
be the gradient of DD, and let HH be the K×KK×K Hessian, Hij=2DθiθjHij=2Dθiθj. Assuming DD is differentiable, Newton's method specifies the following update step:
θ ( k + 1 ) θ ( k ) + [ H ( θ ( k ) ) ] - 1 J ( θ ( k ) ) . θ ( k + 1 ) θ ( k ) + [ H ( θ ( k ) ) ] - 1 J ( θ ( k ) ) .
(11)
To relate this method to the structure of the manifold, we can actually express the gradient and Hessian in terms of signals, writing
D ( θ ) = f θ - I 2 2 = ( f θ - I ) 2 d x = f θ 2 - 2 I f θ + I 2 d x . D ( θ ) = f θ - I 2 2 = ( f θ - I ) 2 d x = f θ 2 - 2 I f θ + I 2 d x .
(12)
Differentiating with respect to component θiθi, we obtain
D θ i = J i = θ i f θ 2 - 2 I f θ + I 2 d x = θ i ( f θ 2 ) - 2 I θ i f θ d x = 2 f θ τ θ i - 2 I τ θ i d x = 2 f θ - I , τ θ i , D θ i = J i = θ i f θ 2 - 2 I f θ + I 2 d x = θ i ( f θ 2 ) - 2 I θ i f θ d x = 2 f θ τ θ i - 2 I τ θ i d x = 2 f θ - I , τ θ i ,
(13)
where τθi=fθθiτθi=fθθi is a tangent signal. Continuing, we examine the Hessian,
2 D θ i θ j = H i j = θ j D θ i = θ j 2 f θ τ θ i - 2 I τ θ i d x = 2 τ θ i τ θ j + 2 f θ τ θ i j - 2 I τ θ i j d x = 2 τ θ i , τ θ j + 2 f θ - I , τ θ i j , 2 D θ i θ j = H i j = θ j D θ i = θ j 2 f θ τ θ i - 2 I τ θ i d x = 2 τ θ i τ θ j + 2 f θ τ θ i j - 2 I τ θ i j d x = 2 τ θ i , τ θ j + 2 f θ - I , τ θ i j ,
(14)
where τθij=2fθθiθjτθij=2fθθiθj denotes a second-derivative signal. Thus, we can interpret Newton's method geometrically as (essentially) a sequence of successive projections onto tangent spaces on the manifold.

Again, the above discussion assumes the manifold to be differentiable. Many interesting parametric signal manifolds are in fact nowhere differentiable — the tangent spaces demanded by Newton's method do not exist. However, in [1] we have identified a type of multiscale tangent structure to the manifold that permits a coarse-to-fine technique for parameter estimation.

Footnotes

  1. We use the notation f(α)g(α)f(α)g(α), or f(α)=O(g(α))f(α)=O(g(α)), if there exists a constant CC, possibly large but not dependent on the argument αα, such that f(α)Cg(α)f(α)Cg(α).

References

  1. M. B. Wakin and D. L. Donoho and H. Choi, and R. G. Baraniuk. (2005). High-resolution navigation on non-differentiable image manifolds. In Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP). IEEE.
  2. Bates, D. M. and Watts, D. G. (1988). Nonlinear Regression Analysis and Its Applications. New York: John Wiley and Sons.
  3. Cohen, A. and Dahmen, W. and Daubechies, I. and DeVore, R. (2001). Tree Approximation and Optimal Encoding. Appl. Comput. Harmon. Anal., 11, 192-226.
  4. Cohen, A. and Daubechies, I. and Guleryuz, O. G. and Orchard, M. T. (2002, July). On the importance of combining wavelet-based nonlinear approximation with coding strategies. IEEE Trans. Inform. Theory, 48(7), 1895-1921.
  5. Chen, S. and Donoho, D. and Saunders, M. (1998). Atomic decomposition by basis pursuit. SIAM J. on Sci. Comp., 20(1), 33-61.
  6. Candès, E. and Tao, T. (2005). Error correction via linear programming. [Preprint]. Found. of Comp. Math..
  7. DeVore, R. A. (Spring 2006). Lecture notes on Compressed Sensing. Rice University ELEC 631 Course Notes.
  8. DeVore, R. A. (1998). Nonlinear Approximation. Acta Numerica, 7, 51-150.
  9. Donoho, D. and Tanner, J. (2005). Neighborliness of randomly-projected simplices in high dimensions. [Preprint].
  10. Donoho, D. L. and Tanner, J. (2006). Counting faces of randomly-projected polytopes when then projection radically lowers dimension. (2006-11). Technical report. Stanford University Department of Statistics.
  11. Mallat, S. (1999). A wavelet tour of signal processing. San Diego, CA, USA: Academic Press.

Content actions

Download module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks