Transforming a signal to a new basis or frame may allow us to represent a signal more concisely. The resulting compression is useful for reducing data storage and data transmission, which can be quite expensive in some applications. Hence, one might wish to simply transmit the analysis coefficients obtained in our basis or frame expansion instead of its highdimensional correlate. In cases where the number of nonzero coefficients is small, we say that we have a sparse representation. Sparse signal models allow us to achieve high rates of compression and in the case of compressive sensing, we may use the knowledge that our signal is sparse in a known basis or frame to recover our original signal from a small number of measurements. For sparse data, only the nonzero coefficients need to be stored or transmitted in many cases; the rest can be assumed to be zero).
Mathematically, we say that a signal xx is KKsparse when it has at most KK nonzeros, i.e., x0≤Kx0≤K. We let
Σ
K
=
x
:
x
0
≤
K
Σ
K
=
x
:
x
0
≤
K
(1)
denote the set of all KKsparse signals. Typically, we will be dealing with signals that are not themselves sparse, but which admit a sparse representation in some basis ΨΨ. In this case we will still refer to xx as being KKsparse, with the understanding that we can express xx as x=Ψαx=Ψα where ∥α∥0≤K∥α∥0≤K.
Sparsity has long been exploited in signal processing and approximation theory for tasks such as compression [1], [7], [9] and denoising [2], and in statistics and learning theory as a method for avoiding overfitting [10]. Sparsity also figures prominently in the theory of statistical estimation and model selection [4], [8], in the study of the human visual system [6], and has been exploited heavily in image processing tasks, since the multiscale wavelet transform [5] provides nearly sparse representations for natural images. Below, we briefly describe some onedimensional (1D) and twodimensional (2D) examples.
We will now present an example of three basis expansions that yield different levels of sparsity for the same signal. A simple periodic signal is sampled and represented as a periodic train of weighted impulses (see Figure 1). One can interpret sampling as a basis expansion where our elements in our basis are impulses placed at periodic points along the time axis. We know that in this case, our dual basis consists of sinc functions used to reconstruct our signal from discretetime samples. This representation contains many nonzero coefficients, and due to the signal's periodicity, there are many redundant measurements. Representing the signal in the Fourier basis, on the other hand, requires only two nonzero basis vectors, scaled appropriately at the positive and negative frequencies (see Figure 1). Driving the number of coefficients needed even lower, we may apply the discrete cosine transform (DCT) to our signal, thereby requiring only a single nonzero coefficient in our expansion (see Figure 1). The DCT equation is Xk=∑n=0N1xncos(πN(n+12)k)Xk=∑n=0N1xncos(πN(n+12)k) with k=0,⋯,N1k=0,⋯,N1, xnxn the input signal, and N the length of the transform.
This same concept can be extended to 2D signals as well. For instance, a binary picture of a nighttime sky is sparse in the standard pixel domain because most of the pixels are zerovalued black pixels. Likewise, natural images are characterized by large smooth or textured regions and relatively few sharp edges. Signals with this structure are known to be very nearly sparse when represented using a multiscale wavelet transform [5]. The wavelet transform consists of recursively dividing the image into its low and highfrequency components. The lowest frequency components provide a coarse scale approximation of the image, while the higher frequency components fill in the detail and resolve edges. What we see when we compute a wavelet transform of a typical natural image, as shown in Figure 2, is that most coefficients are very small. Hence, we can obtain a good approximation of the signal by setting the small coefficients to zero, or thresholding the coefficients, to obtain a KKsparse representation. When measuring the approximation error using an ℓpℓp norm, this procedure yields the best KKterm approximation of the original signal, i.e., the best approximation of the signal using only KK basis elements.
Sparsity results through this decomposition because in most natural images most pixel values vary little from their neighbors. Areas with little contrast difference can be represent with low frequency wavelets. Low frequency wavelets are created through stretching a mother wavelet and thus expanding it in space. On the other hand, discontinuities, or edges in the picture, require high frequency wavelets, which are created through compacting a mother wavelet. At the same time, the transitions are generally limited in space, mimicking the properties of the high frequency compacted wavelet. See "Compressible signals" for an example.

DeVore, R. (1998). Nonlinear Approximation. Acta Numerica, 7, 51–150.

Donoho, D. (1995). Denoising by softthresholding. IEEE Trans. Inform. Theory, 41(3), 613–627.

Elad, M. (2010). Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. New York, NY: Springer.

Hastie, T. and Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning. New York, NY: Springer.

Mallat, S. (1999). A Wavelet Tour of Signal Processing. San Diego, CA: Academic Press.

Olshausen, B. and Field, D. (1996). Emergence of simplecell receptive field properties by learning a sparse representation. Nature, 381, 607–609.

Pennebaker, W. and Mitchell, J. (1993). JPEG Still Image Data Compression Standard. Van Nostrand Reinhold.

Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Royal Statist. Soc B, 58(1), 267–288.

Taubman, D. and Marcellin, M. (2001). JPEG 2000: Image Compression Fundamentals, Standards and Practice. Kluwer.

Vapnik, V. (1999). The Nature of Statistical Learning Theory. New York, NY: SpringerVerlag.