# Connexions

You are here: Home » Content » Greedy algorithms

### Recently Viewed

This feature requires Javascript to be enabled.

# Greedy algorithms

Module by: Chinmay Hegde. E-mail the author

Summary: In this module we provide an overview of some of the most common greedy algorithms and their application to the problem of sparse recovery.

## Setup

As opposed to solving a (possibly computationally expensive) convex optimization program, an alternate flavor to sparse recovery is to apply methods of sparse approximation. Recall that the goal of sparse recovery is to recover the sparsest vector xx which explains the linear measurements yy. In other words, we aim to solve the (nonconvex) problem:

min I | I | : y = i I φ i x i , min I | I | : y = i I φ i x i ,
(1)

where II denotes a particular subset of the indices i=1,...,Ni=1,...,N, and φiφi denotes the i th i th column of ΦΦ. It is well known that searching over the power set formed by the columns of ΦΦ for the optimal subset I*I* with smallest cardinality is NP-hard. Instead, classical sparse approximation methods tackle this problem by greedily selecting columns of ΦΦ and forming successively better approximations to yy.

## Matching Pursuit

Matching Pursuit (MP), named and introduced to the signal processing community by Mallat and Zhang [9], [10], is an iterative greedy algorithm that decomposes a signal into a linear combination of elements from a dictionary. In sparse recovery, this dictionary is merely the sampling matrix ΦRM×NΦRM×N; we seek a sparse representation (xx) of our “signal” yy.

MP is conceptually very simple. A key quantity in MP is the residual rRMrRM; the residual represents the as-yet “unexplained” portion of the measurements. At each iteration of the algorithm, we select a vector from the dictionary that is maximally correlated with the residual rr:

λ k = arg max λ r k , φ λ φ λ φ λ 2 . λ k = arg max λ r k , φ λ φ λ φ λ 2 .
(2)

Once this column is selected, we possess a “better” representation of the signal, since a new coefficient indexed by λkλk has been added to our signal approximation. Thus, we update both the residual and the approximation as follows:

r k = r k - 1 - r k - 1 , φ λ k φ λ k φ λ k 2 , x ^ λ k = x ^ λ k + r k - 1 , φ λ k . r k = r k - 1 - r k - 1 , φ λ k φ λ k φ λ k 2 , x ^ λ k = x ^ λ k + r k - 1 , φ λ k .
(3)

and repeat the iteration. A suitable stopping criterion is when the norm of rr becomes smaller than some quantity. MP is described in pseudocode form below.

Inputs: Measurement matrix ΦΦ, signal measurements yy
Outputs: Sparse signal x^x^
initialize: x^0=0x^0=0, r=yr=y, i=0i=0.
while ħalting criterion false do
1. i←i+1i←i+1
2. b←ΦTrb←ΦTr {form residual signal estimate}
3. x^i←x^i-1+T(1)x^i←x^i-1+T(1) {update largest magnitude coefficient}
4. r←r-Φx^ir←r-Φx^i {update measurement residual}
end while
return x^←x^ix^←x^i


Although MP is intuitive and can find an accurate approximation of the signal, it possesses two major drawbacks: (i) it offers no guarantees in terms of recovery error; indeed, it does not exploit the special structure present in the dictionary ΦΦ; (ii) the required number of iterations required can be quite large. The complexity of MP is O(MNT)O(MNT) [7] , where TT is the number of MP iterations

## Orthogonal Matching Pursuit (OMP)

Matching Pursuit (MP) can prove to be computationally infeasible for many problems, since the complexity of MP grows linearly in the number of iterations TT. By employing a simple modification of MP, the maximum number of MP iterations can be upper bounded as follows. At any iteration kk, Instead of subtracting the contribution of the dictionary element with which the residual rr is maximally correlated, we compute the projection of rr onto the orthogonal subspace to the linear span of the currently selected dictionary elements. This quantity thus better represents the “unexplained” portion of the residual, and is subtracted from rr to form a new residual, and the process is repeated. If ΦΩΦΩ is the submatrix formed by the columns of ΦΦ selected at time step tt, the following operations are performed:

x k = arg min x y - Φ Ω x 2 , α ^ t = Φ Ω x t , r t = y - α ^ t . x k = arg min x y - Φ Ω x 2 , α ^ t = Φ Ω x t , r t = y - α ^ t .
(4)

These steps are repeated until convergence. This is known as Orthogonal Matching Pursuit (OMP) [14]. Tropp and Gilbert [15] proved that OMP can be used to recover a sparse signal with high probability using compressive measurements. The algorithm converges in at most KK iterations, where K is the sparsity, but requires the added computational cost of orthogonalization at each iteration. Indeed, the total complexity of OMP can be shown to be O(MNK).O(MNK).

While OMP is provably fast and can be shown to lead to exact recovery, the guarantees accompanying OMP for sparse recovery are weaker than those associated with optimization techniques. In particular, the reconstruction guarantees are not uniform, i.e., it cannot be shown that a single measurement matrix with M=CKlogNM=CKlogN rows can be used to recover every possible K-K-sparse signal with M=CKlogNM=CKlogN measurements. (Although it is possible to obtain such uniform guarantees when it is acceptable to take more measurements. For example, see [6].) Another issue with OMP is robustness to noise; it is unknown whether the solution obtained by OMP will only be perturbed slightly by the addition of a small amount of noise in the measurements. Nevertheless, OMP is an efficient method for CS recovery, especially when the signal sparsity KK is low. A pseudocode representation of OMP is shown below.

Inputs: Measurement matrix ΦΦ, signal measurements yy
Outputs: Sparse representation x^x^
Initialize: θ^0=0θ^0=0, r=yr=y, Ω=∅Ω=∅, i=0i=0.
while ħalting criterion false do
1. i←i+1i←i+1
2. b←ΦTrb←ΦTr {form residual signal estimate}
3. Ω←Ω∪ supp (T(b,1))Ω←Ω∪ supp (T(b,1)) {add index of residual's largest magnitude entry to signal support}
4. x^i|Ω←ΦΩ†xx^i|Ω←ΦΩ†x, x^i|ΩC←0x^i|ΩC←0 {form signal estimate}
5. r←y-Φx^ir←y-Φx^i {update measurement residual}
end while
return x^←x^ix^←x^i


## Stagewise Orthogonal Matching Pursuit (StOMP)

Orthogonal Matching Pursuit is ineffective when the signal is not very sparse as the computational cost increases quadratically with the number of nonzeros KK. In this setting, Stagewise Orthogonal Matching Pursuit (StOMP) [3] is a better choice for approximately sparse signals in a large-scale setting.

StOMP offers considerable computational advantages over 11 minimization and Orthogonal Matching Pursuit for large scale problems with sparse solutions. The algorithm starts with an initial residual r0=yr0=y and calculates the set of all projections ΦTrk-1ΦTrk-1 at the kthkth stage (as in OMP). However, instead of picking a single dictionary element, it uses a threshold parameter ττ to determine the next best set of columns of ΦΦ whose correlations with the current residual exceed ττ. The new residual is calculated using a least squares estimate of the signal using this expanded set of columns, just as before.

Unlike OMP, the number of iterations in StOMP is fixed and chosen before hand; S=10S=10 is recommended in [3]. In general, the complexity of StOMP is O(KNlogN)O(KNlogN) , a significant improvement over OMP. However, StOMP does not bring in its wake any reconstruction guarantees. StOMP also has moderate memory requirements compared to OMP where the orthogonalization requires the maintenance of a Cholesky factorization of the dictionary elements.

## Compressive Sampling Matching Pursuit (CoSaMP)

Greedy pursuit algorithms (such as MP and OMP) alleviate the issue of computational complexity encountered in optimization-based sparse recovery, but lose the associated strong guarantees for uniform signal recovery, given a requisite number of measurements of the signal. In addition, it is unknown whether these greedy algorithms are robust to signal and/or measurement noise.

There have been some recent attempts to develop greedy algorithms (Regularized OMP [12], [13], Compressive Sampling Matching Pursuit (CoSaMP) [11] and Subspace Pursuit [4]) that bridge this gap between uniformity and complexity. Intriguingly, the restricted isometry property (RIP), developed in the context of analyzing 11 minimization, plays a central role in such algorithms. Indeed, if the matrix ΦΦ satisfies the RIP of order KK, this implies that every subset of KK columns of the matrix is approximately orthonormal. This property is used to prove strong convergence results of these greedy-like methods.

One variant of such an approach is employed by the CoSaMP algorithm. An interesting feature of CoSaMP is that unlike MP, OMP and StOMP, new indices in a signal estimate can be added as well as deleted from the current set of chosen indices. In contrast, greedy pursuit algorithms suffer from the fact that a chosen index (or equivalently, a chosen atom from the dictionary ΦΦ remains in the signal representation until the end. A brief description of CoSaMP is as follows: at the start of a given iteration ii, suppose the signal estimate is x^i-1x^i-1.

• Form signal residual estimate: eΦTreΦTr
• Find the biggest 2K2K coefficients of the signal residual ee; call this set of indices ΩΩ.
• Merge supports: TΩ supp (x^i-1)TΩ supp (x^i-1) .
• Form signal estimate bb by subspace projection: b|TΦTyb|TΦTy, b|TC0b|TC0 .
• Prune bb by retaining its KK largest coefficients. Call this new estimate x^ix^i.
• Update measurement residual: ry-Φx^iry-Φx^i.

This procedure is summarized in pseudocode form below.

Inputs: Measurement matrix ΦΦ, measurements yy, signal sparsity KK
Output: KK-sparse approximation x^x^ to true signal representation xx
Initialize: x^0=0x^0=0 , r=yr=y; i=0i=0
while ħalting criterion false do
1.
i←i+1i←i+1
2. e←ΦTre←ΦTr {form signal residual estimate}
3. Ω← supp (T(e,2K))Ω← supp (T(e,2K)) {prune signal residual estimate}
4. T←Ω∪ supp (x^i-1)T←Ω∪ supp (x^i-1) {merge supports}
5. b|T←ΦT†yb|T←ΦT†y, b|TCb|TC {form signal estimate}
6. x^i←T(b,K)x^i←T(b,K) {prune signal estimate}
7. r←y-Φx^ir←y-Φx^i {update measurement residual}
end while
return x^←x^ix^←x^i


As discussed in [11], the key computational issues for CoSaMP are the formation of the signal residual, and the method used for subspace projection in the signal estimation step. Under certain general assumptions, the computational cost of CoSaMP can be shown to be O(MN)O(MN), which is independent of the sparsity of the original signal. This represents an improvement over both greedy algorithms as well as convex methods.

While CoSaMP arguably represents the state of the art in sparse recovery algorithm performance, it possesses one drawback: the algorithm requires prior knowledge of the sparsity KK of the target signal. An incorrect choice of input sparsity may lead to a worse guarantee than the actual error incurred by a weaker algorithm such as OMP. The stability bounds accompanying CoSaMP ensure that the error due to an incorrect parameter choice is bounded, but it is not yet known how these bounds translate into practice.

## Iterative Hard Thresholding

Iterative Hard Thresholding (IHT) is a well-known algorithm for solving nonlinear inverse problems. The structure of IHT is simple: starting with an initial estimate x^0x^0, iterative hard thresholding (IHT) obtains a sequence of estimates using the iteration:

x ^ i + 1 = T ( x ^ i + Φ T ( y - Φ x ^ i ) , K ) . x ^ i + 1 = T ( x ^ i + Φ T ( y - Φ x ^ i ) , K ) .
(5)

In [1], Blumensath and Davies proved that this sequence of iterations converges to a fixed point x^x^; further, if the matrix ΦΦ possesses the RIP, they showed that the recovered signal x^x^ satisfies an instance-optimality guarantee of the type described earlier. The guarantees (as well as the proof technique) are reminiscent of the ones that are derived in the development of other algorithms such as ROMP and CoSaMP.

## Discussion

While convex optimization techniques are powerful methods for computing sparse representations, there are also a variety of greedy/iterative methods for solving such problems. Greedy algorithms rely on iterative approximation of the signal coefficients and support, either by iteratively identifying the support of the signal until a convergence criterion is met, or alternatively by obtaining an improved estimate of the sparse signal at each iteration by accounting for the mismatch to the measured data. Some greedy methods can actually be shown to have performance guarantees that match those obtained for convex optimization approaches. In fact, some of the more sophisticated greedy algorithms are remarkably similar to those used for 11 minimization described previously. However, the techniques required to prove performance guarantees are substantially different. There also exist iterative techniques for sparse recovery based on message passing schemes for sparse graphical models. In fact, some greedy algorithms (such as those in [2], [8]) can be directly interpreted as message passing methods [5].

## References

1. Blumensath, T. and Davies, M. (2009). Iterative hard thresholding for compressive sensing. Appl. Comput. Harmon. Anal., 27(3), 265–274.
2. Berinde, R. and Indyk, P. and Ruzic, M. (2008, Sept.). Practical near-optimal sparse recovery in the L1 norm. In Proc. Allerton Conf. Communication, Control, and Computing. Monticello, IL
3. Donoho, D. and Drori, I. and Tsaig, Y. and Stark, J.-L. (2006). Sparse Solution of Underdetermined Linear Equations by Stagewise Orthogonal Matching Pursuit. [Preprint].
4. Dai, W. and Milenkovic, O. (2009). Subspace pursuit for compressive sensing signal reconstruction. IEEE Trans. Inform. Theory, 55(5), 2230–2249.
5. Donoho, D. and Maleki, A. and Montanari, A. (2009). Message passing algorithms for compressed sensing. Proc. Natl. Acad. Sci., 106(45), 18914–18919.
6. Davenport, M. and Wakin, M. (2010). Analysis of orthogonal matching pursuit using the restricted isometry property. IEEE Trans. Inform. Theory, 56(9), 4395–4401.
7. Duarte, M. and Wakin, M. and Baraniuk, R. (2005, Nov.). Fast reconstruction of piecewise smooth signals from random projections. In Proc. Work. Struc. Parc. Rep. Adap. Signaux (SPARS). Rennes, France
8. Indyk, P. and Ruzic, M. (2008, Oct.). Near-optimal sparse recovery in the L1 norm. In Proc. IEEE Symp. Found. Comp. Science (FOCS). Philadelphia, PA
9. Mallat, S. (1999). A Wavelet Tour of Signal Processing. San Diego, CA: Academic Press.
10. Mallat, S. and Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Processing, 41(12), 3397–3415.
11. Needell, D. and Tropp, J. (2009). CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal., 26(3), 301–321.
12. Needell, D. and Vershynin, R. (2009). Uniform uncertainty principle and signal recovery via regularized orthogonal matching pursuit. Found. Comput. Math., 9(3), 317–334.
13. Needell, D. and Vershynin, R. (2010). Signal recovery from incomplete and inaccurate measurements via regularized orthogonal matching pursuit. IEEE J. Select. Top. Signal Processing, 4(2), 310–316.
14. Pati, Y. and Rezaifar, R. and Krishnaprasad, P. (1993, Nov.). Orthogonal Matching Pursuit: Recursive function approximation with applications to wavelet decomposition. In Proc. Asilomar Conf. Signals, Systems, and Computers. Pacific Grove, CA
15. Tropp, J. and Gilbert, A. (2007). Signal recovery from partial information via orthogonal matching pursuit. IEEE Trans. Inform. Theory, 53(12), 4655–4666.

## Content actions

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks