Skip to content Skip to navigation


You are here: Home » Content » Convex optimization-based methods


Recently Viewed

This feature requires Javascript to be enabled.

Convex optimization-based methods

Module by: Wotao Yin. E-mail the author

Summary: This module provides an overview of convex optimization approaches to sparse signal recovery.

An important class of sparse recovery algorithms fall under the purview of convex optimization. Algorithms in this category seek to optimize a convex function f(·)f(·) of the unknown variable xx over a (possibly unbounded) convex subset of RNRN.


Let J(x)J(x) be a convex sparsity-promoting cost function (i.e., J(x)J(x) is small for sparse xx.) To recover a sparse signal representation x^x^ from measurements y=Φx,ΦRM×Ny=Φx,ΦRM×N, we may either solve

min x { J ( x ) : y = Φ x } , min x { J ( x ) : y = Φ x } ,

when there is no noise, or solve

min x { J ( x ) : H ( Φ x , y ) ϵ } min x { J ( x ) : H ( Φ x , y ) ϵ }

when there is noise in the measurements. Here, HH is a cost function that penalizes the distance between the vectors ΦxΦx and yy. For an appropriate penalty parameter μμ, Equation 2 is equivalent to the unconstrained formulation:

min x J ( x ) + μ H ( Φ x , y ) min x J ( x ) + μ H ( Φ x , y )

for some μ>0μ>0. The parameter μμ may be chosen by trial-and-error, or by statistical techniques such as cross-validation [2].

For convex programming algorithms, the most common choices of JJ and HH are usually chosen as follows: J(x)=x1J(x)=x1, the 11-norm of xx, and H(Φx,y)=12Φx-y22H(Φx,y)=12Φx-y22, the 22-norm of the error between the observed measurements and the linear projections of the target vector xx. In statistics, minimizing this HH subject to x1δx1δ is known as the Lasso problem. More generally, J(·)J(·) acts as a regularization term and can be replaced by other, more complex, functions; for example, the desired signal may be piecewise constant, and simultaneously have a sparse representation under a known basis transform ΨΨ. In this case, we may use a mixed regularization term:

J ( x ) = T V ( x ) + λ x 1 J ( x ) = T V ( x ) + λ x 1

It might be tempting to use conventional convex optimization packages for the above formulations (Equation 1, Equation 2, and Equation 3). Nevertheless, the above problems pose two key challenges which are specific to practical problems encountered in CS: (i) real-world applications are invariably large-scale (an image of a resolution of 1024×10241024×1024 pixels leads to optimization over a million variables, well beyond the reach of any standard optimization software package); (ii) the objective function is nonsmooth, and standard smoothing techniques do not yield very good results. Hence, for these problems, conventional algorithms (typically involving matrix factorizations) are not effective or even applicable. These unique challenges encountered in the context of CS have led to considerable interest in developing improved sparse recovery algorithms in the optimization community.

Linear programming

In the noiseless case, the 11-minimization problem (obtained by substituting J(x)=x1J(x)=x1 in Equation 1) can be recast as a linear program (LP) with equality constraints. These can be solved in polynomial time (O(N3)O(N3)) using standard interior-point methods [3]. This was the first feasible reconstruction algorithm used for CS recovery and has strong theoretical guarantees, as shown earlier in this course. In the noisy case, the problem can be recast as a second-order cone program (SOCP) with quadratic constraints. Solving LPs and SOCPs is a principal thrust in optimization research; nevertheless, their application in practical CS problems is limited due to the fact that both the signal dimension NN, and the number of constraints MM, can be very large in many scenarios. Note that both LPs and SOCPs correspond to the constrained formulations in Equation 1 and Equation 2 and are solved using first order interior-point methods.

A newer algorithm called “l1_ls" [12] is based on an interior-point algorithm that uses a preconditioned conjugate gradient (PCG) method to approximately solve linear systems in a truncated-Newton framework. The algorithm exploits the structure of the Hessian to construct their preconditioner; thus, this is a second order method. Computational results show that about a hundred PCG steps are sufficient for obtaining accurate reconstruction. This method has been typically shown to be slower than first-order methods, but could be faster in cases where the true target signal is highly sparse.

Fixed-point continuation

As opposed to solving the constrained formulation, an alternate approach is to solve the unconstrained formulation in Equation 3. A widely used method for solving 11-minimization problems of the form

min x μ x 1 + H ( x ) , min x μ x 1 + H ( x ) ,

for a convex and differentiable HH, is an iterative procedure based on shrinkage (also called soft thresholding; see Equation 6 below). In the context of solving Equation 5 with a quadratic HH, this method was independently proposed and analyzed in [1], [9], [13], [14], and then further studied or extended in [4], [5], [6], [7], [11], [16]. Shrinkage is a classic method used in wavelet-based image denoising. The shrinkage operator on any scalar component can be defined as follows:

shrink ( t , α ) = t - α if t > α , 0 if - α t α , and t + α if t < - α . shrink ( t , α ) = t - α if t > α , 0 if - α t α , and t + α if t < - α .

This concept can be used effectively to solve Equation 5. In particular, the basic algorithm can be written as following the fixed-point iteration: for i=1,...,Ni=1,...,N, the i th i th coefficient of xx at the (k+1) th (k+1) th time step is given by

x i k + 1 = shrink ( ( x k - τ H ( x k ) ) i , μ τ ) x i k + 1 = shrink ( ( x k - τ H ( x k ) ) i , μ τ )

where τ>0τ>0 serves as a step-length for gradient descent (which may vary with kk) and μμ is as specified by the user. It is easy to see that the larger μμ is, the larger the allowable distance between xk+1xk+1 and xkxk. For a quadratic penalty term H(·)H(·), the gradient HH can be easily computed as a linear function of xkxk; thus each iteration of Equation 7 essentially boils down to a small number of matrix-vector multiplications.

The simplicity of the iterative approach is quite appealing, both from a computational, as well as a code-design standpoint. Various modifications, enhancements, and generalizations to this approach have been proposed, both to improve the efficiency of the basic iteration in Equation 7, and to extend its applicability to various kinds of JJ [8], [10], [16]. In principle, the basic iteration in Equation 7 would not be practically effective without a continuation (or path-following) strategy [11], [16] in which we choose a gradually decreasing sequence of values for the parameter μμ to guide the intermediate iterates towards the final optimal solution.

This procedure is known as continuation; in [11], the performance of an algorithm known as Fixed-Point Continuation (FPC) has been compared favorably with another similar method known as Gradient Projection for Sparse Reconstruction (GPSR) [10] and “l1_ls” [12]. A key aspect to solving the unconstrained optimization problem is the choice of the parameter μμ. As discussed above, for CS recovery, μμ may be chosen by trial and error; for the noiseless constrained formulation, we may solve the corresponding unconstrained minimization by choosing a large value for μμ.

In the case of recovery from noisy compressive measurements, a commonly used choice for the convex cost function H(x)H(x) is the square of the norm of the residual. Thus we have:

H ( x ) = y - Φ x 2 2 H ( x ) = 2 Φ ( y - Φ x ) . H ( x ) = y - Φ x 2 2 H ( x ) = 2 Φ ( y - Φ x ) .

For this particular choice of penalty function, Equation 7 reduces to the following iteration:

x i k + 1 = shrink ( ( x k - τ H ( y - Φ x k ) i , μ τ ) x i k + 1 = shrink ( ( x k - τ H ( y - Φ x k ) i , μ τ )

which is run until convergence to a fixed point. The algorithm is detailed in pseudocode form below.

Inputs: CS matrix ΦΦ, signal measurements yy, parameter sequence μnμn Outputs: Signal estimate x^x^ initialize: x^0=0x^0=0, r=yr=y, k=0k=0. while ħalting criterion false do 1. kk+1kk+1 2. xx^-τΦTrxx^-τΦTr {take a gradient step} 3. x^ shrink (x,μkτ)x^ shrink (x,μkτ) {perform soft thresholding} 4. ry-Φx^ry-Φx^ {update measurement residual} end while return x^x^x^x^

Bregman iteration methods

It turns out that an efficient method to obtain the solution to the constrained optimization problem in Equation 1 can be devised by solving a small number of the unconstrained problems in the form of Equation 3. These subproblems are commonly referred to as Bregman iterations. A simple version can be written as follows:

y k + 1 = y k + y - Φ x k x k + 1 = arg min J ( x ) + μ 2 Φ x - y k + 1 2 . y k + 1 = y k + y - Φ x k x k + 1 = arg min J ( x ) + μ 2 Φ x - y k + 1 2 .

The problem in the second step can be solved by the algorithms reviewed above. Bregman iterations were introduced in [15] for constrained total variation minimization problems, and was proved to converge for closed, convex functions J(x)J(x). In [17], it is applied to Equation 1 for J(x)=x1J(x)=x1 and shown to converge in a finite number of steps for any μ>0μ>0. For moderate μμ, the number of iterations needed is typically lesser than 5. Compared to the alternate approach that solves Equation 1 through directly solving the unconstrained problem in Equation 3 with a very large μμ, Bregman iterations are often more stable and sometimes much faster.


All the methods discussed in this section optimize a convex function (usually the 11-norm) over a convex (possibly unbounded) set. This implies guaranteed convergence to the global optimum. In other words, given that the sampling matrix ΦΦ satisfies the conditions specified in "Signal recovery via 11 minimization", convex optimization methods will recover the underlying signal xx. In addition, convex relaxation methods also guarantee stable recovery by reformulating the recovery problem as the SOCP, or the unconstrained formulation.


  1. Bect, J. and Blanc-Feraud, L. and Aubert, G. and Chambolle, A. (2004, May). A -Unified Variational Framework for Image Restoration. In Proc. European Conf. Comp. Vision (ECCV). Prague, Czech Republic
  2. Boufounos, P. and Duarte, M. and Baraniuk, R. (2007, Aug.). Sparse signal reconstruction from noisy compressive measurements using cross validation. In Proc. IEEE Work. Stat. Signal Processing. Madison, WI
  3. Boyd, S. and Vanderberghe, L. (2004). Convex Optimization. Cambridge, England: Cambridge Univ. Press.
  4. Combettes, P. and Pesquet, J. (2007). Proximal thresholding algorithm for minimization over orthonormal bases. SIAM J. Optimization, 18(4), 1351–1376.
  5. Daubechies, I. and Defrise, M. and Mol, C. De. (2004). An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm. Pure Appl. Math., 57(11), 1413–1457.
  6. Elad, M. (2006). Why simple shrinkage is still relevant for redundant representations? IEEE Trans. Inform. Theory, 52(12), 5559–5569.
  7. Elad, M. and Matalon, B. and Shtok, J. and Zibulevsky, M. (2007, Apr.). A wide-angle view at iterated shrinkage algorithms. In Proc. SPIE Optics Photonics: Wavelets. San Diego, CA
  8. Elad, M. and Matalon, B. and Zibulevsky, M. (2007). Coordinate and subspace optimization methods for linear least squares with non-quadratic regularization. Appl. Comput. Harmon. Anal., 23(3), 346–367.
  9. Figueiredo, M. and Nowak, R. (2003). An EM algorithm for wavelet-based image restoration. IEEE Trans. Image Processing, 12(8), 906–916.
  10. Figueiredo, M. and Nowak, R. and Wright, S. (2007). Gradient Projections For Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems. IEEE J. Select. Top. Signal Processing, 1(4), 586–597.
  11. Hale, E. and Yin, W. and Zhang, Y. (2007). A Fixed-Point Continuation Method for -Regularized Minimization with Applications to Compressed Sensing. (TR07-07). Technical report. Rice Univ., CAAM Dept.
  12. Kim, S.-J. and Koh, K. and Lustig, M. and Boyd, S. and Gorinevsky, D. (2007). An interior point method for large-scale -regularized least squares. IEEE J. Select. Top. Signal Processing, 1(4), 606–617.
  13. Mol, C. De and Defrise, M. (2002). A note on wavelet-based inversion algorithms. Contemporary Math., 313, 85–96.
  14. Nowak, R. and Figueiredo, M. (2001, Nov.). Fast wavelet-based image deconvolution using the EM algorithm. In Proc. Asilomar Conf. Signals, Systems, and Computers. Pacific Grove, CA
  15. Osher, S. and Burger, M. and Goldfarb, D. and Xu, J. and Yin, W. (2005). An iterative regularization method for total variation-based image restoration. SIAM J. Multiscale Modeling and Simulation, 4(2), 460–489.
  16. Wright, S. and Nowak, R. and Figueiredo, M. (2009). Sparse reconstruction by separable approximation. IEEE Trans. Signal Processing, 57(7), 2479–2493.
  17. Yin, W. and Osher, S. and Goldfarb, D. and Darbon, J. (2008). Bregman iterative algorithms for -minimization with applications to compressed sensing. SIAM J. Imag. Sci., 1(1), 143–168.

Content actions

Download module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens


A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks