Skip to content Skip to navigation


You are here: Home » Content » Sparse Deconvolution (An MM Algorithm)


Recently Viewed

This feature requires Javascript to be enabled.

Sparse Deconvolution (An MM Algorithm)

Module by: Ivan Selesnick. E-mail the author

Summary: This tutorial describes a computationally efficient method for deconvolution of 1-D signals. It is assumed that the convolution system is described by a difference equation. The algorithm is derived using the majorization-minimization optimization approach and exploits the computational efficiency of fast solvers for banded systems of linear equations.


Deconvolution refers to the problem of estimating the unknown input to an LTI system when the output signal and system response is known. In practice, the available output signal is also noisy. For some systems, the deconvolution problem is quite straight forward; however, for systems that are non-invertible or nearly non-invertible (e.g. narrow-band or with frequency response nulls), the problem is more difficult. The use of an exact inverse system can greatly amplify the noise rendering the result useless. In such cases, it is important to utilize prior knowledge regarding the input signal so as to obtain a more accurate estimate of the input signal, even when the system is nearly non-invertible and the observed output signal is noisy.

In some applications of deconvolution, it is known that the input signal is sparse (i.e. a spike train, etc.) or approximately sparse. Applications of `sparse deconvolution' include geophysics, ultrasonic non-destructive evaluation, speech processing, and astronomy [11]. One approach to sparse deconvolution involves the minimization of a cost function defined in terms of the 11 norm [6], [16], [4]. The minimization of cost functions defined in terms of the 11 norm is useful not just for deconvolution, but for sparse signal processing more generally. Indeed, since its early application in geophysics, the 11 norm and sparsity have become important tools in signal processing [2]. The tutorial [14] compares least squares and 11 norm solutions for several signal processing problems, illustrating the advantages of a sparse signal model (when valid).

This tutorial aims to illustrate some of the principles and algorithms of sparse signal processing, by way of considering the sparse deconvolution problem. A computationally efficient iterative algorithm for sparse deconvolution is derived using the majorization-minimization (MM) optimization method. The MM method is a simple, yet effective and widely applicable, method that replaces a difficult minimization problem with a sequence of simpler ones [8]. Other algorithms, developed for general 11 norm minimization, can also be used here [5], [17], [13]. However, the MM-derived algorithm takes advantage of the banded structure of the matrices arising in the sparse deconvolution problem. The resulting algorithm uses fast solvers for banded linear systems [1], [12].

The conditions that characterize the optimal solution are described and illustrated in "Optimality conditions". With these simple conditions, the optimality of the result computed by a numerical algorithm can be readily verified. Moreover, as described in "Setting the regularization parameter, λ", the optimality conditions provide a straight forward way to set the regularization parameter λλ based on the noise variance (if it is known).


We assume the noisy data y(n)y(n) is of the form

y ( n ) = ( h * x ) ( n ) + w ( n ) , y ( n ) = ( h * x ) ( n ) + w ( n ) ,

where h(n)h(n) is the impulse response of an LTI system, x(n)x(n) is a sparse signal, w(n)w(n) is white Gaussian noise, and `**' denotes convolution. We assume here that the LTI system can be described by a recursive difference equation,

y ( n ) = k b ( k ) x ( n - k ) - k a ( k ) y ( n - k ) y ( n ) = k b ( k ) x ( n - k ) - k a ( k ) y ( n - k )

where x(n)x(n) is the input signal and y(n)y(n) is the output signal.

x ( n ) h ( n ) y ( n ) = ( h * x ) ( n ) x ( n ) h ( n ) y ( n ) = ( h * x ) ( n )

Note that the difference equation in Equation 2 can be written in matrix form as

A y = B x A y = B x

where AA and BB are banded matrices. For example, if the difference equation is first order,

y ( n ) = b ( 0 ) x ( n ) + b ( 1 ) x ( n - 1 ) - a ( 1 ) y ( n - 1 ) , y ( n ) = b ( 0 ) x ( n ) + b ( 1 ) x ( n - 1 ) - a ( 1 ) y ( n - 1 ) ,

then AA and BB have the form

A = 1 a 1 1 a 1 1 a 1 1 , B = b 0 b 1 b 0 b 1 b 0 b 1 b 0 . A = 1 a 1 1 a 1 1 a 1 1 , B = b 0 b 1 b 0 b 1 b 0 b 1 b 0 .

From Equation 4, the output yy of the system can be written as

y = A - 1 B x = H x y = A - 1 B x = H x

where the system matrix HH is

H = A - 1 B H = A - 1 B

Note that, even though AA and BB are banded matrices, HH is not. (The inverse of a banded matrix is not banded in general.) The data model Equation 1 is written in matrix form as

y = H x + w . y = H x + w .

Example. An example is illustrated in Figure 1. The sparse signal x(n)x(n)

x ( n ) = δ ( n - 50 ) + 0 . 5 δ ( n - 80 ) - 0 . 3 δ ( n - 100 ) x ( n ) = δ ( n - 50 ) + 0 . 5 δ ( n - 80 ) - 0 . 3 δ ( n - 100 )

is the input to a convolution system. The system is second order, defined by the difference equation coefficients:

b 0 = 1 , b 1 = 0 . 8 , a 0 = 1 , a 1 = - 2 r cos ( ω ) , a 2 = r 2 where ω = 0 . 95 , r = 0 . 9 b 0 = 1 , b 1 = 0 . 8 , a 0 = 1 , a 1 = - 2 r cos ( ω ) , a 2 = r 2 where ω = 0 . 95 , r = 0 . 9

The system has complex poles at re±jω.re±jω. The output (h*x)(n)(h*x)(n) exhibits decaying oscillations. The output signal is then corrupted by additive white Gaussian noise with standard deviation σ=0.2σ=0.2.

Figure 1: Sparse signal x(n)x(n), output of convolution system, and observed data y(n)y(n).
Figure 1(a) (Example1_original.png)
Figure 1(b) (Example1_data.png)

Optimization formulation

In order to estimate x(n)x(n) from the data y(n)y(n), we consider the optimization problem

arg min x 1 2 y - H x 2 2 + λ x 1 arg min x 1 2 y - H x 2 2 + λ x 1

The use of the 11 norm as the regularization term captures the prior knowledge (or model) that x(n)x(n) is sparse. The 11 norm is not the only way to measure sparsity, however, among convex functionals, it is the most natural. The regularization parameter λ>0λ>0 should be chosen according to the level of the noise w(n)w(n). A method for setting λλ is described in "Setting the regularization parameter, λ".

The problem in Equation 12 is useful for many sparse signal processing problems, not just deconvolution. In the general case, Equation 12 is called `basis pursuit denoising' (BPD) [5] or the `lasso' [17]. References [5], [17] give examples (other than deconvolution) of problems posed in this form, and motivations for using the 11 norm.


The NN-point signal xx is represented by the vector

x = [ x ( 0 ) , , x ( N - 1 ) ] t . x = [ x ( 0 ) , , x ( N - 1 ) ] t .

The 11 norm of a vector vv is defined as

v 1 = n | v ( n ) | . v 1 = n | v ( n ) | .

The 22 norm of a vector vv is defined as

v 2 = n | v ( n ) | 2 1 2 . v 2 = n | v ( n ) | 2 1 2 .

Minimizing the cost function

The optimization problem in Equation 12 can not be solved directly — it is not differentiable. However, the cost function is convex and therefore the theory and practice of convex optimization can be brought to the problem. In these notes we describe an approach for solving Equation 12, suitable whenever HH has the form A-1BA-1B where AA and BB are both banded. The algorithm described below converges in practice in few iterations and is computationally efficient. It is based on the majoriziation-minimization optimization method and exploits the availability of fast algorithms for solving banded systems of linear equations [12]. The derivation below closely follows that in Ref. [8].


This section briefly describes the majorization-minimization (MM) approach [8] for minimizing a convex cost function F(x)F(x). To minimize F(x)F(x), the majorization-minimization approach solves a sequence of simpler optimization problems:

x k + 1 = arg min x G k ( x ) x k + 1 = arg min x G k ( x )

where kk is the iteration counter, k=0,1,2,k=0,1,2,. The function Gk(x)Gk(x) must be chosen as a majorizor of F(x)F(x), i.e.

G k ( x ) F ( x ) , x G k ( x k ) = F ( x k ) G k ( x ) F ( x ) , x G k ( x k ) = F ( x k )

The majorizor should be chosen so as to be relatively easy to minimize. With initialization x0x0, the update in Equation 16 produces a sequence of vectors xkxk, k1k1, converging to the minimizer of F(x)F(x). For more details see Ref. [8] and references therein.

In these notes, we will use a majorizor for the 11 norm. Note that

1 2 x * Λ k - 1 x + 1 2 x k 1 x 1 , Λ k = diag ( | x k | ) 1 2 x * Λ k - 1 x + 1 2 x k 1 x 1 , Λ k = diag ( | x k | )

with equality when x=xkx=xk. The notation diag (|xk|) diag (|xk|) denotes the diagonal matrix with vector |xk||xk| along its diagonal,

Λ k = diag ( | x k | ) = | x k ( 0 ) | | x k ( N - 1 ) | . Λ k = diag ( | x k | ) = | x k ( 0 ) | | x k ( N - 1 ) | .

Therefore, the left-hand-side of Equation 18 is a majorizor of x1x1. The derivation of this majorizor is illustrated in more detail in Ref. [15] where it is used to derive an algorithm for total variation denoising.

Iterative algorithm

We will minimize the cost function

F ( x ) = 1 2 y - H x 2 2 + λ x 1 F ( x ) = 1 2 y - H x 2 2 + λ x 1

using majorization-minimization (MM). A majorizor of F(x)F(x) can be obtained by majorizing just the 11 norm using Equation 18. In this way, a majorizor of F(x)F(x) is given by

G k ( x ) = 1 2 y - H x 2 2 + λ 2 x * Λ k - 1 x + λ 2 x k 1 G k ( x ) = 1 2 y - H x 2 2 + λ 2 x * Λ k - 1 x + λ 2 x k 1


Λ k = diag ( | x k | ) . Λ k = diag ( | x k | ) .

Therefore, the MM update in Equation 16 for xkxk is

x k + 1 = arg min x 1 2 y - H x 2 2 + λ 2 x * Λ k - 1 x . x k + 1 = arg min x 1 2 y - H x 2 2 + λ 2 x * Λ k - 1 x .

The last term of Gk(x)Gk(x) has been omitted because it does not depend on xx. Note that Equation 23 is quadratic in xx so the minimizer can be written using linear algebra. The solution to Equation 23 can be written explicitly as

x k + 1 = H t H + λ Λ k - 1 - 1 H t y x k + 1 = H t H + λ Λ k - 1 - 1 H t y

or in terms of AA and BB, as

x k + 1 = B t ( A A t ) - 1 B + λ Λ k - 1 - 1 B t A - t y . x k + 1 = B t ( A A t ) - 1 B + λ Λ k - 1 - 1 B t A - t y .

Although Equation 25 is mathematically valid, there are two problems with this update. First, as elements of xx go to zero, elements of Λk-1Λk-1 go to infinity; so the update in Equation 25 may become numerically inaccurate. (Note that, due to sparsity of xx, it is expected and intended that elements of xx do in fact go to zero!) Second, the update in Equation 25 calls for the solution to a large system of linear equations which has a high computational cost. Moreover, the system matrix is not banded, because (AAt)-1(AAt)-1 is not banded; so fast solvers for banded systems can not be used here. Both issues are by-passed by using the matrix inverse lemma as suggested in Ref. [9].

Using the matrix inverse lemma, the matrix inverse in Equation 25 can be rewritten as

B t ( A A t ) - 1 B + λ Λ k - 1 - 1 = 1 λ Λ k - 1 λ Λ k B t λ A A t + B Λ k B t - 1 B Λ k B t ( A A t ) - 1 B + λ Λ k - 1 - 1 = 1 λ Λ k - 1 λ Λ k B t λ A A t + B Λ k B t - 1 B Λ k

The matrix on the right-hand-side to be inverted is banded because AA, BB, and ΛΛ are all banded. Using Equation 26, the update Equation 25 can be implemented as

g 1 λ B t A - t y Λ k diag ( | x k | ) x k + 1 Λ k g - B t λ A A t + B Λ k B t - 1 B Λ k g g 1 λ B t A - t y Λ k diag ( | x k | ) x k + 1 Λ k g - B t λ A A t + B Λ k B t - 1 B Λ k g

Note that all matrices arising in the update equations are banded, hence the matrix computations can be implemented with fast high efficiency low-memory algorithms [1]. Also note that ΛkΛk, not Λk-1Λk-1, appears in the update equations, so the problem of division by zero (or small numbers) does not arise.

The update equations Equation 27 constitute an algorithm for solving Equation 12. It is only necessary to initialize x0x0 and set k=1k=1. Suitable initializations for x0x0 include either x0=yx0=y or x0=Htyx0=Hty.

A MATLAB program deconvL1 implementing sparse deconvolution using this algorithm is given in "MATLAB program". The program uses sparse matrix structures so that MATLAB will call fast solvers for banded systems [1].


Figure 2: Sparse deconvolution of the data shown in Figure 1. The deconvolution is performed by 11 norm minimization using the MM algorithm.
Figure 2(a) (Example1_cost.png)
Figure 2(b) (Example1_deconv_L1.png)

For the data shown in Figure 1, the sparse deconvolution solution is shown in Figure 2. The solution was obtained using the MM algorithm described above. Due to the properties of MM algorithms, the cost function is monotonically decreasing, as reflected by the cost function history. For this example, 100 iterations of the MATLAB program deconvL1 took 54 msec on a MacBook (2.4 GHz Intel Core 2 Duo) running MATLAB version 7.8.

Note that the solution depends on λλ. Figure 3 is an animation that shows the sparse deconvolution as a function of λλ. Each frame of the animation shows the solution for a different value of λλ. The animation illustrates how the solution depends on the regularization parameter λλ. For small λλ, the solution is very noisy; for large λλ, the solution is overly attenuated. The question of how to set λλ will be discussed in "Setting the regularization parameter, λ".

Figure 3: Animation of sparse deconvolution as a function of the regularization parameter λλ. The animation shows the effect of λλ on the solution.
Figure 3 (Example1_deconv_L1.png)

Least squares. It is informative to compare sparse deconvolution with least squares deconvolution. A least squares version of Equation 12 is:

arg min x 1 2 y - H x 2 2 + λ 2 x 2 2 arg min x 1 2 y - H x 2 2 + λ 2 x 2 2

which has the solution

x LS = H t H + λ I - 1 H t y . x LS = H t H + λ I - 1 H t y .

Like the 11 norm solution, this can be rewritten using the matrix inverse lemma so as to exploit the computational efficiency of fast solvers for banded systems.

Figure 4 shows the least square solution for a particular value of λλ. Compared to the sparse deconvolution solution, the least square solution is both noisier and attenuated. Reducing λλ leads to an even noisier solution. Increasing λλ leads to further attenuation of the solution. The sparse solution is superior to the least square solution here, because in this example, the sparse signal model is valid.

Figure 4: Least squares deconvolution.
Figure 4 (Example1_deconv_LS.png)

Optimality conditions

It turns out that the solution to the 11 norm optimization problem Equation 12 must satisfy conditions that can be readily checked [3]. Using these conditions, it can be verified if the obtained result really is the optimal solution. In addition, the optimality conditions lead to a simple approach for setting the regularization parameter, λλ, to be discussed in "Setting the regularization parameter, λ".

If xx minimizes Equation 12, then it must satisfy:

| H t ( y - H x ) | λ | H t ( y - H x ) | λ

where the absolute value |·||·| is element-wise.

For the sparse signal xx illustrated above in Figure 2, the vector Ht(y-Hx)Ht(y-Hx) is illustrated in Figure 5a. The dashed line in Figure 5a is λλ. The figure shows that xx does indeed satisfy Equation 30.

Condition Equation 30 by itself does not guarantee that xx is the solution to Equation 12. Define

g = H t ( y - H x ) . g = H t ( y - H x ) .

Then xx minimizes Equation 12 if and only if

g ( n ) = λ , for x ( n ) > 0 g ( n ) = - λ , for x ( n ) < 0 | g ( n ) | λ , for x ( n ) = 0 . g ( n ) = λ , for x ( n ) > 0 g ( n ) = - λ , for x ( n ) < 0 | g ( n ) | λ , for x ( n ) = 0 .

This condition is illustrated in Figure 5b as a scatter plot. Each point in the plot represents a pair (x(n),g(n))(x(n),g(n)). It can be seen that each time sample (x(n),g(n))(x(n),g(n)) must lie on one of the line segments indicated by dashes. The sparsity of xx can be recognized as most of the points lie on the x=0x=0 line.

Figure 5: Optimality conditions. The solution to Equation 12 must satisfy the constraints Equation 32.
Figure 5(a) (Example1_optim.png)
Figure 5(b) (Example1_scatter.png)

The convergence of the MM algorithm to the solution of Equation 12 can be observed by monitoring how closely xkxk satisfies the optimality conditions Equation 32. Figure 6 is an animation of the scatter plot (x(n),g(n))(x(n),g(n)) as the algorithm progresses through its iterations. Each frame shows a different iteration of the algorithm. As the algorithm progresses, the points in the scatter plot approach the line segments representing the optimality constraints.

Figure 6: Animation of convergence of the MM algorithm. The points in the scatter plot approach the optimality constraints.
Figure 6 (Example1_scatter.png)

Setting the regularization parameter, λλ

Note that setting λλ larger (more regularization) yields an estimate with less noise, but more signal distortion (attenuation). Setting λλ smaller (less regularization) yields an estimate with more noise. How can a suitable value of λλ be found based on the known convolution system and noise statistics?

The optimality conditions Equation 32 suggest a simple procedure to set the regularization parameter, λλ. First, note that if λλ is sufficiently large, i.e.

λ max ( | H t y | ) , λ max ( | H t y | ) ,

then the solution to Equation 12 will be be identically zero, x0x0. This follows from Equation 32. For x0x0 to be a solution, according to Equation 32, λ|g(n)|λ|g(n)|. When x0x0, then g=Htyg=Hty.

One approach to set λλ is based on considering the case x0x0. When x0x0, then the data yy is simply the noise signal ww. To ensure that the solution in this case is indeed identically zero, λλ should be chosen so that

λ max | H t w | . λ max | H t w | .

In practice, the noise ww is not known, but its statistics may be known. In many cases, it is modeled as zero-mean white Gaussian with known variance σ2σ2. In this case HtwHtw will be zero-mean Gaussian with variance σ2·n|h(n)|2σ2·n|h(n)|2 where h(n)h(n) is the impulse response of the convolution system. While Equation 34 asks that λλ be greater than the largest value of the sequence HtwHtw, this is not directly applicable when ww is a Gaussian random vector, because its pdf is not finitely supported, i.e. the maximum is not finite. However, a practical implementation of Equation 34 is obtained by replacing max|Htw|max|Htw| by, say, three times the standard deviation of HtwHtw. According to the `3-sigma rule', a zero-mean Gaussian random variable rarely exceeds three times its standard deviation. Hence, we get the rule

λ 3 σ n | h ( n ) | 2 . λ 3 σ n | h ( n ) | 2 .

Note that using a value λλ larger than necessary will lead to more attenuation of the signal, therefore, we should use the smallest value λλ that effectively eliminates the noise. Hence, we should set λλequal to the value shown in Equation 35.

Figure 7: Noise analysis for setting λλ.
Figure 7 (Example1_HTnoise.png)

For the system in Equation 11 and σ=0.2σ=0.2 as used in the example illustrated in Figure 1, the rule Equation 35 gives λ=2.01λ=2.01. In the sparse deconvolution result shown in Figure 2, λ=2λ=2 was used. It is informative to note that the signal HtwHtw, shown in Figure 7, has a maximum absolute value of 1.851.85. The value of λλ, 2.012.01, provided by the `3-sigma rule', is quite close to this value. This illustrates the effectiveness of the rule.

If the noise ww is not white, or is non-Gaussian, then the same concept can be adapted to obtain a suitable value for λλ. The result will depend on the autcorrelation function or other properties of the noise, depending on what is known.

Example: Sparse deconvolution of speech

As a second example, we apply sparse deconvolution to a segment of voiced speech, illustrated in Figure 8. The speech waveform is eight milliseconds in duration, with a sampling rate of Fs=7418Fs=7418 samples/second.1

It is common to model voiced speech y(n)y(n) as the output of a filter driven by a spiky pulse-train x(n)x(n),

x ( n ) H ( z ) y ( n ) x ( n ) H ( z ) y ( n )

where the filter H(z)H(z) is an all-pole filter (AR filter). The filter H(z)H(z) is usually obtained by using AR modeling applied to a short segment of speech. Each segment of speech is modeled using a different filter; so the filter is effectively time-varying.

Figure 8: AR modeling and sparse deconvolution applied to a segment of voiced speech.
Figure 8(a) (Example2_y.png)
Figure 8(b) (Example2_xf.png)
Figure 8(c) (Example2_exact.png)

Here we use an AR model of order 12 to model the speech waveform. The all-pole filter, obtained from the speech waveform, is shown in Figure 9. Both the impulse response and pole-zero diagram are shown. Instead of conventional AR modeling, sparse linear prediction can be used to obtain the filter with improved results for speech analysis and coding [10]. Here we have used conventional AR modeling to obtain the filter.

Figure 9: Filter H(z)H(z) obtained from speech waveform via AR modeling.
Figure 9(a) (Example2_h.png)
Figure 9(b) (Example2_zplane.png)

Using sparse deconvolution, with the output signal y(n)y(n) being the speech waveform, and with impulse response hh obtained from AR modeling, we obtain the input signal x(n)x(n), shown in Figure 8. To do the sparse deconvolution, we minimized Equation 12 using the program deconvL1 below. The signal x(n)x(n) is quite spiky and sparse. Note that the convolution r(n)=(h*x)(n)r(n)=(h*x)(n) is not equal to y(n)y(n) exactly, because the cost function Equation 12 does not enforce the equality. The cost function is intended for the case where the observed signal is noisy, which is not the case here. Therefore, the signal r(n)=(h*x)(n)r(n)=(h*x)(n), shown in Figure 8, is only an approximate reconstruction of the original speech waveform. The parameter λλ can be used to trade-off between reconstruction error and sparsity of x(n)x(n).

The exact deconvolution of the signal x(n)x(n) can be easily obtained because the filter h(n)h(n) is a minimum-phase all-pole filter. The exact inverse system is therefore an FIR filter. Applying the FIR filter 1/H(z)1/H(z) to the speech waveform yields the exact deconvolution, shown in Figure 8. When this signal is used as input to the filter H(z)H(z), then the output will be exactly the speech waveform x(n)x(n). However, as clear in the figure, the exact deconvolution yields a signal that is not sparse (compare with signal x(n)x(n)).

For the purpose of speech coding, the sparse input signal x(n)x(n) has an advantage compared to the non-sparse signal: it is more readily compressed.


If it is known or expected that the unknown input signal xx to an LTI system is sparse (e.g. `spiky'), then sparse deconvolution is an appropriate approach for estimating xx. It is assumed in these notes that the system is known and is described by a recursive (or non-recursive) difference equation, and that the available output signal is corrupted by additive white Gaussian noise.

Sparse deconvolution can be formulated as the minimization of a sparsity-regularized inverse problem. Although not the only choice, the 11 norm is a common choice of regularizer due to its being convex.

Sparse deconvolution based on 11 norm minimization can be implemented efficiently using:

  1. A recursive difference equation as a model for the convolution system.
  2. The majorization-minimization algorithm (with MM applied to the 11 norm).
  3. Fast algorithms for the solution of banded systems of linear equations.

The regularization parameter λλ can be set according to the noise variance (assuming the noise is stationary and its variance is known).

When both the filter and the input signal are unknown, then the problem is more difficult. This is the blind deconvolution problem. Sparsity and related non-Gaussian-based approaches can be effective for blind deconvolution; for example, see Refs. [10], [7], [18].

MATLAB program

function [x, cost] = deconvL1(y, lam, b, a, Nit)
% [x, cost] = deconvL1(y, lam, b, a, Nit)
% Sparse deconvolution by L1 norm minimization
% Cost function : 0.5 * sum(abs(y-H(x)).^2) + lam * sum(abs(x));
%   y - noisy data
%   b, a - filter coefficients for LTI convolution system H
%   lam - regularization parameter
%   Nit - number of iterations
%   x - deconvolved sparse signal
%   cost - cost function history
% Ivan Selesnick,, 2012
% Algorithm: majorization-minimization with banded system solver.
y = y(:);                                                 % convert column vector
cost = zeros(1, Nit);                                     % cost function history
N = length(y);
Nb = length(b);
Na = length(a);
B = spdiags(b(ones(N,1), :), 0:-1:1-Nb, N, N);            % create sparse matrices
A = spdiags(a(ones(N,1), :), 0:-1:1-Na, N, N);
H = @(x) A\(B*x);                                         % filter
AAT = A*A';                                               % A*A' : sparse matrix
x = y;                                                    % initialization
g = (1/lam) * (A'\(B'*y));                                % (1/lam) H'*y
for k = 1:Nit
    Lam = spdiags(abs(x), 0, N, N);                       % Lam : diag(abs(x)) (sparse)
    F = lam * AAT + B*Lam*B';                             % F : banded matrix (sparse)
    x = Lam * (g - (B'*(F\(B*(Lam*g)))));                 % update
    cost(k) = 0.5*sum(abs(y-H(x)).^2) + lam*sum(abs(x));  % cost function value


  1. The speech waveform is an extract of a person saying `Matlab'. In MATLAB: load mtlb; [b,a] = butter(2, 0.05, 'high'); y = filtfilt(b, a, mtlb); y = y(1000:1600); The high-pass filter removes baseline drift.


  1. Anderson, E. and Bai, Z. and Bischof, C. and Demmel, J. and Dongarra, J. and Croz, J. Du and Greenbaum, A. and Hammarling, S. and McKenney, A. and Ostrouchov, S. and Sorensen, D. (1992). LAPACK's user's guide. []. SIAM.
  2. Baraniuk, R. G. and Candes, E. and Elad, M. and Y. Ma, editors,. (2010, June). Special issue on Applications of Sparse Representation and Compressive Sensing. Proc. IEEE, 98(6),
  3. Bach, F. and Jenatton, R. and Mairal, J. and Obozinski, G. (2012). Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning, 4(1), 1-106.
  4. Brien, M. S. O' and Sinclair, A. N. and Kramer, S. M. (1994, December). Recovery of a sparse spike time series by L1 norm deconvolution. IEEE Trans. Signal Process., 42(12), 3353-3365.
  5. Chen, S. and Donoho, D. L. and Saunders, M. A. (1998). Atomic Decomposition by Basis Pursuit. SIAM J. Sci. Comput., 20(1), 33-61.
  6. Claerbout, J. F. and Muir, F. (1973). Robust modeling of erratic data. Geophysics, 38(5), 826–844.
  7. Donoho, D. (1981). On Minimum Entropy Deconvolution. In Applied Time Series Analysis II. (pp. 565-609). Academic Press.
  8. Figueiredo, M. and Bioucas-Dias, J. and Nowak, R. (2007, December). Majorization-minimization algorithms for wavelet-based image restoration. IEEE Trans. Image Process., 16(12), 2980-2991.
  9. Figueiredo, M. and Bioucas-Dias, J. and Oliveira, J. P. and Nowak, R. D. (2006). On total-variation denoising: A new majorization-minimization algorithm and an experimental comparison with wavelet denoising. In Proc. IEEE Int. Conf. Image Processing.
  10. Giacobello, D. and Christensen, M. G. and Murthi, M. N. and Jensen, S. H. and Moonen, M. (2012, July). Sparse Linear Prediction and Its Applications to Speech Processing. IEEE Trans. on Audio, Speech, and Lang. Proc., 20(5), 1644-1657.
  11. Kaaresen, K. F. (1997, May). Deconvolution of Sparse Spike Trains by Iterated Window Maximization. IEEE Trans. Signal Process., 45(5), 1173-1183.
  12. Press, W. H. and Teukolsky, S. A. and Vetterling, W. T. and Flannery, B. P. (1992). Numerical recipes in C: the art of scientific computing (2nd ed.). Cambridge University Press.
  13. Rao, B. D. and Engan, K. and Cotter, S. F. and Palmer, J. and Kreutz-Delgado, K. (2003, March). Subset selection in noise based on diversity measure minimization. IEEE Trans. Signal Process., 51(3), 760-770.
  14. Selesnick, I. (2012). Introduction to Sparsity in Signal Processing. []. Connexions Web site.
  15. Selesnick, I. (2012). Total Variation Denoising (An MM Algorithm). []. Connexions Web site.
  16. Taylor, H. L. and Banks, S. C. and McCoy, J. F. (1979). Deconvolution with the l1 norm. Geophysics, 44(1), 39-52.
  17. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc., Ser. B, 58(1), 267–288.
  18. Wiggins, R. A. (1978). Minimum entropy deconvolution. Geoexploration, 16(1-2), 21-35.

Content actions

Download module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens


A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks