Skip to content Skip to navigation

Connexions

You are here: Home » Content » Total Variation Denoising (An MM Algorithm)

Navigation

Recently Viewed

This feature requires Javascript to be enabled.
 

Total Variation Denoising (An MM Algorithm)

Module by: Ivan Selesnick. E-mail the author

Summary: Total variation denoising (TVD) is an approach for noise reduction developed so as to preserve sharp edges in the underlying signal. Unlike a conventional low-pass filter, TV denoising is defined in terms of an optimization problem. This module describes an algorithm for TV denoising derived using the majorization-minimization (MM) approach, developed by Figueiredo et al. [ICIP 2006]. To keep it simple, this module addresses TV denoising of 1-D signals only. For computational efficiency, the algorithm may use a solver for sparse banded systems.

Introduction

Total variation denoising (TVD) is an approach for noise reduction developed so as to preserve sharp edges in the underlying signal [12]. Unlike a conventional low-pass filter, TV denoising is defined in terms of an optimization problem. The output of the TV denoising `filter' is obtained by minimizing a particular cost function. Any algorithm that solves the optimization problem can be used to implement TV denoising. However, it is not trivial because the TVD cost function is non-differentiable. Numerous algorithms have been developed to solve the TVD problem, e.g. [5], [7], [4], [15], [16].

Total variation is used not just for denoising, but for more general signal restoration problems, including deconvolution, interpolation, in-painting, compressed sensing, etc. [2]. In addition, the concept of total variation has been generalized and extended in various ways [13], [10], [3].

These notes describe an algorithm for TV denoising derived using the majorization-minimization (MM) approach, developed by Figueiredo et al. [9]. To keep it simple, these notes address TV denoising of 1-D signals only (ref. [9] considers 2D TV denoising for images). Interestingly, it is possible to obtain the exact solution to the TV denoising problem (for the 1-D case) without optimization, but instead using a direct algorithm based on a characterization of the solution. Recently, a fast algorithm has been developed and C code made available [6].

Total variation denoising assumes that the noisy data y(n)y(n) is of the form

y ( n ) = x ( n ) + w ( n ) , n = 0 , , N - 1 y ( n ) = x ( n ) + w ( n ) , n = 0 , , N - 1
(1)

where x(n)x(n) is a (approximately) piecewise constant signal and w(n)w(n) is white Gaussian noise. TV denoising estimates the signal x(n)x(n) by solving the optimization problem:

arg min x n = 0 N - 1 | y ( n ) - x ( n ) | 2 + λ n = 1 N - 1 | x ( n ) - x ( n - 1 ) | . arg min x n = 0 N - 1 | y ( n ) - x ( n ) | 2 + λ n = 1 N - 1 | x ( n ) - x ( n - 1 ) | .
(2)

The regularization parameter λ>0λ>0 controls the degree of smoothing. Increasing λλ gives more weight to the second term which measures the fluctuation of the signal x(n)x(n).

MATLAB software is available online at:

http://eeweb.poly.edu/iselesni/lecture_notes/TVDmm/index.html

Notation

The NN-point signal xx is represented by the vector

x = [ x ( 0 ) , , x ( N - 1 ) ] t . x = [ x ( 0 ) , , x ( N - 1 ) ] t .
(3)

The 11 norm of a vector vv is defined as

v 1 = n | v ( n ) | . v 1 = n | v ( n ) | .
(4)

The 22 norm of a vector vv is defined as

v 2 = n | v ( n ) | 2 1 2 . v 2 = n | v ( n ) | 2 1 2 .
(5)

The matrix DD is defined as

D = - 1 1 - 1 1 - 1 1 . D = - 1 1 - 1 1 - 1 1 .
(6)

The first-order difference of an NN-point signal xx is given by DxDx where DD is of size (N-1)×N(N-1)×N.

Note, for later, that DDtDDt is a tridiagonal matrix of the form:

D D t = 2 - 1 - 1 2 - 1 - 1 2 - 1 - 1 2 . D D t = 2 - 1 - 1 2 - 1 - 1 2 - 1 - 1 2 .
(7)

The total variation of the NN-point signal x(n)x(n) is given by

TV ( x ) : = D x 1 = n = 1 N - 1 | x ( n ) - x ( n - 1 ) | . TV ( x ) : = D x 1 = n = 1 N - 1 | x ( n ) - x ( n - 1 ) | .
(8)

With this notation, the TV denoising problem Equation 2 can be written compactly as

arg min x R N y - x 2 2 + λ D x 1 . arg min x R N y - x 2 2 + λ D x 1 .
(9)

Majorization-Minimization

Majorization-minimization (MM) is an approach to solve optimization problems that are too difficult to solve directly. Instead of minimizing the cost function F(x)F(x) directly, the MM approach solves a sequence of optimization problems, Gk(x)Gk(x), k=0,1,2,.k=0,1,2,. The idea is that each Gk(x)Gk(x) is easier to solve that F(x)F(x). The MM approach produces a sequence xkxk, each being obtained by minimizing Gk-1(x)Gk-1(x). To use MM, one must specify the functions Gk(x)Gk(x). The trick is to choose the Gk(x)Gk(x) so that they are easy to solve, but they should also each approximate F(x)F(x).

The MM approach requires that each function Gk(x)Gk(x) is a majorizor of F(x)F(x), i.e.,

G k ( x ) F ( x ) , x G k ( x ) F ( x ) , x
(10)

and that it agrees with F(x)F(x) at xkxk,

G k ( x k ) = F ( x k ) . G k ( x k ) = F ( x k ) .
(11)

In addition, Gk(x)Gk(x) should be convex functions. The MM approach then obtains xk+1xk+1 by minimizing Gk(x)Gk(x).

Figure 1 illustrates the MM procedure with a simple example. For clarity, the figure illustrates the minimization of a univariate function. However, the MM procedure works in the same way for the minimization of multivariate functions, and it is in the multivariate case where the MM procedure is especially useful.

Figure 1: Illustration of majorization-minimization (MM) procedure. (a) Cost function F(x)F(x) to be minimized; and initialization, x0x0. (b) Iteration 1. Majorizor G0(x)G0(x) coincides with F(x)F(x) at x0x0. Minimize G0(x)G0(x) to get x1x1. (c) Iteration 2. Majorizor G1(x)G1(x) coincides with F(x)F(x) at x1x1. Minimize G1(x)G1(x) to get x2x2. The xkxk converge to the minimizer of F(x)F(x).
(a)
Figure 1(a) (MM_fig1.png)
(b)
Figure 1(b) (MM_fig2.png)
(c)
Figure 1(c) (MM_fig3.png)

The majorization-minimization approach to minimize the function F(x)F(x) can be summarized as:

  1. Set k=0k=0. Initialize x0x0.
  2. Choose Gk(x)Gk(x) such that
    1. Gk(x)F(x)Gk(x)F(x) for all xx
    2. Gk(xk)=F(xk)Gk(xk)=F(xk)
  3. Set xk+1xk+1 as the minimizer of Gk(x)Gk(x).
    xk+1= arg minxGk(x)xk+1= arg minxGk(x)
    (12)
  4. Set k=k+1k=k+1 and go to step (2.)

When F(x)F(x) is convex, then under mild conditions, the sequence xkxk produced by MM converges to the minimizer of F(x)F(x). More details about the majorization-minimization procedure can be found in [8] and references therein.

Example majorizor. An upper bound (majorizor) of f(t)=|t|f(t)=|t| that agrees with f(t)f(t) at t=tkt=tk is

g ( t ) = 1 2 | t k | t 2 + 1 2 | t k | g ( t ) = 1 2 | t k | t 2 + 1 2 | t k |
(13)

as illustrated in Figure 2. The figure makes clear that

g ( t ) f ( t ) , t g ( t ) f ( t ) , t
(14)
g ( t k ) = f ( t k ) g ( t k ) = f ( t k )
(15)

The derivation of the majorizor in Equation 13 is left as exercise Item 11.

It is convenient to use second-order polynomials as majorizors because they are easy to minimize. Setting the derivatives to zero gives linear equations. A higher order polynomial could be used to give a closer fit to the function f(t)f(t) to be minimized, however, then the minimization will be more difficult (involving polynomial root finding, etc.)

Figure 2: Majorization of f(t)=|t|f(t)=|t| by g(t)=at2+bg(t)=at2+b.
Figure 2 (MM_figure.png)

TV Denoising Algorithm

One way to apply MM to TV denoising is to majorize TV (x) TV (x) by a quadratic function of xx, as described in ref. [9]. Then the TVD cost function F(x)F(x) can be majorized by a quadratic function, which can in turn be minimized by solving a system of linear equations.

To that end, using Equation 13, we can write

1 2 | t k | t 2 + 1 2 | t k | | t | t R 1 2 | t k | t 2 + 1 2 | t k | | t | t R
(16)

Using v(n)v(n) for tt and summing over nn gives

n 1 2 | v k ( n ) | v 2 ( n ) + 1 2 | v k ( n ) | n | v ( n ) | n 1 2 | v k ( n ) | v 2 ( n ) + 1 2 | v k ( n ) | n | v ( n ) |
(17)

which can be written compactly as

1 2 v t Λ k - 1 v + 1 2 v k 1 v 1 1 2 v t Λ k - 1 v + 1 2 v k 1 v 1
(18)

where ΛkΛk is the diagonal matrix

Λ k : = | v k ( 1 ) | | v k ( 2 ) | | v k ( N ) | = diag ( | v k | ) . Λ k : = | v k ( 1 ) | | v k ( 2 ) | | v k ( N ) | = diag ( | v k | ) .
(19)

In the notation, diag (|v|) diag (|v|), the absolute value is applied element-wise to the vector vv.

Using DxDx for vv, we can write

1 2 x t D t Λ k - 1 D x + 1 2 D x k 1 D x 1 1 2 x t D t Λ k - 1 D x + 1 2 D x k 1 D x 1
(20)

where

Λ k : = diag ( | D x k | ) . Λ k : = diag ( | D x k | ) .
(21)

Note in Equation 20 that the majorizor of Dx1Dx1 is a quadratic function of xx. Also note that the term Dxk1Dxk1 in Equation 20 should be considered a constant — it is fixed as xkxk is the value of xx at the previous iteration. Similarly, ΛkΛk in Equation 20 is also not a function of xx.

A majorizor of the TV cost function Equation 9 can be obtained from Equation 20 by adding y-x22y-x22 to both sides,

y - x 2 2 + λ 1 2 x t D t Λ k - 1 D x + λ 1 2 D x k 1 y - x 2 2 + λ D x 1 . y - x 2 2 + λ 1 2 x t D t Λ k - 1 D x + λ 1 2 D x k 1 y - x 2 2 + λ D x 1 .
(22)

Therefore a majorizor Gk(x)Gk(x) for the TV cost function is given by

G k ( x ) = y - x 2 2 + λ 1 2 x t D t Λ k - 1 D x + λ 1 2 D x k 1 , Λ k = diag ( | D x k | ) G k ( x ) = y - x 2 2 + λ 1 2 x t D t Λ k - 1 D x + λ 1 2 D x k 1 , Λ k = diag ( | D x k | )
(23)

which satisfies Gk(xk)=F(xk)Gk(xk)=F(xk) by design. Using MM, we obtain xkxk by minimizing Gk(x)Gk(x),

x k + 1 = arg min x y - x 2 2 + λ 1 2 x t D t Λ k - 1 D x + λ 1 2 D x k 1 . x k + 1 = arg min x y - x 2 2 + λ 1 2 x t D t Λ k - 1 D x + λ 1 2 D x k 1 .
(24)

An explicit solution to Equation 24 is given by

x k + 1 = I + λ 2 D t Λ k - 1 D - 1 y . x k + 1 = I + λ 2 D t Λ k - 1 D - 1 y .
(25)

A problem with update Equation 25 is that as the iterations progress, some values of DxkDxk will generally go to zero, and therefore some entries of Λk-1Λk-1 in Equation 25 will go to infinity. This issue is addressed in Ref. [9] by rewriting the equation using the matrix inverse lemma. By the matrix inverse lemma,

I + λ 2 D t Λ k - 1 D - 1 = I - D t 2 λ Λ k + D D t - 1 D I + λ 2 D t Λ k - 1 D - 1 = I - D t 2 λ Λ k + D D t - 1 D
(26)

where

Λ k = diag ( | D x k | ) . Λ k = diag ( | D x k | ) .
(27)

Now the update equation Equation 25 becomes

x k + 1 = y - D t 2 λ diag ( | D x k | ) + D D t - 1 D y . x k + 1 = y - D t 2 λ diag ( | D x k | ) + D D t - 1 D y .
(28)

Observe that even if some elements of DxkDxk are zero, no division by zero arises in Equation 28.

The update Equation 28 calls for the solution to a linear system of equations. In general, it is desirable to avoid such a computation in an iterative filtering algorithm due to the high computational cost of solving linear systems (especially when the signal yy is very long and the system is very large). However, the matrix [2λ diag (|Dxk|)+DDt][2λ diag (|Dxk|)+DDt] in Equation 28 is a sparse banded matrix; it consists of only three diagonals — the main diagonal, one upper diagonal, and one lower diagonal. This is because DDtDDt is tridiagonal as shown in Equation 7. Therefore, the linear system in Equation 28 can be solved very efficiently [11]. Further, the whole matrix need not be stored in memory, only the three diagonals.

The MATLAB function TVD_mm implements TV denoising based on the update Equation 28. The function uses the sparse matrix structure in MATLAB so as to avoid high memory requirements and so as to invoke sparse banded system solvers. MATLAB uses LAPACK [1] to solve the sparse banded system in the program TVD_mm. The algorithm used by MATLAB to solve a sparse linear system can be monitored using the command spparms('spumoni',3).

Listing 1: MATLAB program for TV denoising using majorization-minimization. The program is based on update Equation 28.

function [x, cost] = TVD_mm(y, lam, Nit)
% [x, cost] = TVD_mm(y, lam, Nit)
% Total variation denoising using majorization-minimization
% and a banded linear systems.
%
% INPUT
%   y - noisy signal
%   lam - regularization parameter
%   Nit - number of iterations
%
% OUTPUT
%   x - denoised signal
%   cost - cost function history
%
% Reference
% 'On total-variation denoising: A new majorization-minimization
% algorithm and an experimental comparison with wavalet denoising.'
% M. Figueiredo, J. Bioucas-Dias, J. P. Oliveira, and R. D. Nowak.
% Proc. IEEE Int. Conf. Image Processing, 2006.
 
% Ivan Selesnick, selesi@poly.edu, 2011
 
y = y(:);                                               % Ensure column vector
cost = zeros(1, Nit);                                   % Cost function history
N = length(y);
 
e = ones(N-1, 1);
DDT = spdiags([-e 2*e -e], [-1 0 1], N-1, N-1);         % D*D' (sparse matrix)
D = @(x) diff(x);                                       % D (operator)
DT = @(x) [-x(1); -diff(x); x(end)];                    % D'
 
x = y;                                                  % Initialization
Dx = D(x);
 
for k = 1:Nit
    F = 2/lam * spdiags(abs(Dx), 0, N-1, N-1) + DDT;    % F : Sparse matrix structure
    % F = 2/lam * diag(abs(D(x))) + DDT;                % Not stored as sparse matrix
 
    x = y - DT(F\D(y));                                 % Solve sparse linear system
    Dx = D(x);
 
    cost(k) = sum(abs(x-y).^2) + lam * sum(abs(Dx));    % Save cost function history
end

Figure 3: TV denoising example.
(a)
Figure 3(a) (Example1_fig1.png)
(b)
Figure 3(b) (Example1_fig2.png)
Figure 4: Comparison of convergence behavior of two TV denoising algorithms.
Figure 4 (Example1_fig3.png)

An example of TV denoising is shown in Figure 3. The history of the cost function through the progression of the algorithm is shown in the figure. It can be seen that after 20 iterations the cost function has leveled out, suggesting that the algorithm has practically converged.

Another algorithm for 1-D TV denoising is Chambolle's algorithm [5], a variant of which is the `iterative clipping' algorithm [14]. This algorithm is computationally simpler than the MM algorithm because it does not call for the solution to a linear system at each iteration. However, it may converge slowly. For the denoising problem illustrated in Figure 3, the convergence of both the iterative clipping and MM algorithms are shown in Figure 4. It can be seen that the MM algorithm converges in fewer iterations.

Optimality Condition

It turns out that the solution to the TV denoising problem can be concisely characterized [6]. Suppose the noisy data is yy and the regularization parameter is λλ. If xx is the solution to the TV denoising problem, then it must satisfy

| s ( n ) | λ 2 , n = 0 , , N - 1 | s ( n ) | λ 2 , n = 0 , , N - 1
(29)

where s(n)s(n) is the `cumulative sum' of the residual, i.e.

s ( n ) : = k = 0 n y ( k ) - x ( k ) . s ( n ) : = k = 0 n y ( k ) - x ( k ) .
(30)

The condition Equation 29 is illustrated in Figure 5a for the TV denoising example of Figure 3.

The condition Equation 29 by itself is not sufficient for x(n)x(n) to be the solution to the TV denoising problem. It is further necessary that x(n)x(n) satisfy

d ( n ) > 0 , s ( n ) = λ / 2 d ( n ) < 0 , s ( n ) = - λ / 2 d ( n ) = 0 , | s ( n ) | < λ / 2 d ( n ) > 0 , s ( n ) = λ / 2 d ( n ) < 0 , s ( n ) = - λ / 2 d ( n ) = 0 , | s ( n ) | < λ / 2
(31)

where d(n)d(n) is the first-order difference function of x(n)x(n), i.e.

d ( n ) = x ( n + 1 ) - x ( n ) . d ( n ) = x ( n + 1 ) - x ( n ) .
(32)

The condition Equation 31 is illustrated in Figure 5b. The figure shows (d(n),s(n))(d(n),s(n)) as a scatter plot. It can be seen that this condition requires the points to lie on a curve consisting of three line segments (a `double-L' shape). Notice in the figure that d(n)d(n) is mostly zero, reflecting the sparsity of the derivative of x(n)x(n).

Figure 5: Optimality condition for TV denoising. (a) The cumulative sum s(n)s(n) is bounded by λ/2λ/2. (b) Scatter plot of d(n)d(n) versus s(n)s(n). The points lie on a `double-L' curve.
(a)
Figure 5(a) (Example1_optim.png)
(b)
Figure 5(b) (Example1_optim_2.png)

Conclusion

Total variation (TV) denoising is a method to smooth signals based on a sparse-deriviative signal model. TV denoising is formulated as the minimization of a non-differentiable cost function. Unlike a conventional low-pass filter, the output of the TV denoising `filter' can only be obtained through a numerical algorithm. Total variation denoising is most appropriate for piecewise constant signals, however, it has been modified and extended so as to be effective for more general signals.

Exercises

  1. Reproduce figures like those of the example (using a `blocky' signal). Try different values of λλ. How does the solution change as λλ is increased or decreased?
  2. Compare TV denoising with low-pass filtering (e.g. a Butterworth or FIR filter, etc). Apply each method to the same signal. Plot the denoised/filtered signals using each method and discuss the differences you observe.
  3. Perform TV denoising on a signal that is not `blocky' (which has slopes or oscillatory behavior). You should see `stair-case' artifacts in the denoised signal. Show these artifacts in a figure and explain why they arise.
  4. Is TV denoising linear? (Conventional low-pass filters are linear, e.g. Butterworth filter.) Illustrate that TV denoising satisfies (or does not) the superposition property by performing TV denoising on each of two signals and their sum.
  5. Find a majorizor of the function f(t)=|t|f(t)=|t| of the form
    g(t)=at2+bt+c,g(t)=at2+bt+c,
    (33)
    that coincides with f(t)f(t) at t=tkt=tk. As illustrated in Figure 2, the function g(t)g(t) should satisfy
    g(t)f(t)tR,g(t)f(t)tR,
    (34)
    g(tk)=f(tk).g(tk)=f(tk).
    (35)
  6. Show the solution to problem Equation 24 is given by Equation 25.
  7. For denoising a noisy signal using TV denoising, devise a method or formula to set the regularization parameter λλ. You can assume that the variance σ2σ2 of the noise is known. Show examples of your method.
  8. Explain why DtDt can be implemented in MATLAB by the command
    	DT = @(x) [-x(1); -diff(x); x(end)];
    
  9. Modify the TV denoising MATLAB program so that the matrix FF is not sparse. Measure the run-times of the original and modified programs. Is the sparse version faster? Use a long signal and many iterations to see the difference more clearly.
  10. Second-order TV denoising is based the second-order difference instead of the first-order difference. Modify the algorithm and MATLAB program so that it performs second-order TV denoising. Compare first and second order TV denoising using `blocky' and non-`blocky' signals, and comment on your observations.

References

  1. Anderson, E. and Bai, Z. and Bischof, C. and Demmel, J. and Dongarra, J. and Croz, J. Du and Greenbaum, A. and Hammarling, S. and McKenney, A. and Ostrouchov, S. and Sorensen, D. (1992). LAPACK's user's guide. [http://www.netlib.org/lapack]. SIAM.
  2. Bect, J. and Blanc-Feaud, L. and Aubert, G. and Chambolle, A. (2004). A L1-Unified Variational Framework for Image Restoration. In European Conference on Computer Vision, Lecture Notes in Computer Sciences. (Vol. 3024, pp. 1-13).
  3. Bredies, K. and Kunisch, K. and Pock, T. (2010). Total generalized variation. SIAM J. Imag. Sci., 3(3), 492-526.
  4. Chambolle, A. (2004). An algorithm for total variation minimization and applications. J. of Math. Imaging and Vision, 20, 89-97.
  5. Chambolle, A. and Lions, P.-L. (1997). Image recovery via total variation minimization and related problems. Numerische Mathematik, 76, 167–188.
  6. Condat, L. (2012). A direct algorithm for 1D total variation denoising. [http://hal.archives-ouvertes.fr/]. Technical report. Hal-00675043.
  7. Chan, T. F. and Osher, S. and Shen, J. (2001, February). The Digital TV Filter and Nonlinear Denoising. IEEE Trans. Image Process., 10(2), 231-241.
  8. Figueiredo, M. and Bioucas-Dias, J. and Nowak, R. (2007, December). Majorization-minimization algorithms for wavelet-based image restoration. IEEE Trans. Image Process., 16(12), 2980-2991.
  9. Figueiredo, M. and Bioucas-Dias, J. and Oliveira, J. P. and Nowak, R. D. (2006). On total-variation denoising: A new majorization-minimization algorithm and an experimental comparison with wavalet denoising. In Proc. IEEE Int. Conf. Image Processing.
  10. Hu, Y. and Jacob, M. (2012, May). Higher Degree Total Variation (HDTV) Regularization for Image Recovery. IEEE Trans. Image Process., 21(5), 2559-2571.
  11. Press, W. H. and Teukolsky, S. A. and Vetterling, W. T. and Flannery, B. P. (1992). Numerical recipes in C: the art of scientific computing (2nd ed.). Cambridge University Press.
  12. Rudin, L. and Osher, S. and Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Physica D, 60, 259-268.
  13. Rodriguez, P. and Wohlberg, B. (2009, February). Efficient Minimization Method for a Generalized Total Variation Functional. IEEE Trans. Image Process., 18(2), 322-332.
  14. Selesnick, I. and Bayram, I. (2009). Total Variation Filtering. [http://cnx.org/content/m31292/1.1/]. Connexions Web site.
  15. Wang, Y. and Yang, J. and Yin, W. and Zhang, Y. (2008). A new alternating minimization algorithm for total variation image reconstruction. SIAM J. on Imaging Sciences, 1(3), 248-272.
  16. Zhu, M. and Wright, S. J. and Chan, T. F. (2008). Duality-based algorithms for total-variation-regularized image restoration. J. Comp. Optimization and Applications.

Content actions

Download module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks