Connexions

You are here: Home » Content » Introduction to Compressive Sensing » Instance-optimal guarantees revisited

Recently Viewed

This feature requires Javascript to be enabled.

Inside Collection (Course):

Course by: Marco F. Duarte. E-mail the author

Instance-optimal guarantees revisited

Module by: Mark A. Davenport. E-mail the author

Summary: In this module we demonstrate the difficulty of obtaining instance-optimal guarantees in the L2 norm. We then show that it is much easier to obtain such guarantees in the probabilistic setting.

We now briefly return to the noise-free setting to take a closer look at instance-optimal guarantees for recovering non-sparse signals. To begin, recall that in Theorem 1 from "Noise-free signal recovery" we bounded the 22-norm of the reconstruction error of

x ^ = arg min z z 1 subject to z B ( y ) . x ^ = arg min z z 1 subject to z B ( y ) .
(1)

as

x ^ - x 2 C 0 σ K ( x ) 1 / K x ^ - x 2 C 0 σ K ( x ) 1 / K
(2)

when B(y)={z:Φz=y}B(y)={z:Φz=y}. One can generalize this result to measure the reconstruction error using the pp-norm for any p[1,2]p[1,2]. For example, by a slight modification of these arguments, one can also show that x^-x1C0σK(x)1x^-x1C0σK(x)1 (see [1]). This leads us to ask whether we might replace the bound for the 22 error with a result of the form x^-x2CσK(x)2x^-x2CσK(x)2. Unfortunately, obtaining such a result requires an unreasonably large number of measurements, as quantified by the following theorem of [3].

Theorem 1: (Theorem 5.1 of [3])

Suppose that ΦΦ is an M×NM×N matrix and that Δ:RMRNΔ:RMRN is a recovery algorithm that satisfies

x - Δ ( Φ x ) 2 C σ K ( x ) 2 x - Δ ( Φ x ) 2 C σ K ( x ) 2
(3)

for some K1K1, then M>1-1-1/C2NM>1-1-1/C2N.

Proof

We begin by letting hRNhRN denote any vector in N(Φ)N(Φ). We write h=hΛ+hΛch=hΛ+hΛc where ΛΛ is an arbitrary set of indices satisfying |Λ|K|Λ|K. Set x=hΛcx=hΛc, and note that Φx=ΦhΛc=Φh-ΦhΛ=-ΦhΛΦx=ΦhΛc=Φh-ΦhΛ=-ΦhΛ since hN(Φ)hN(Φ). Since hΛΣKhΛΣK, Equation 3 implies that Δ(Φx)=Δ(-ΦhΛ)=-hΛΔ(Φx)=Δ(-ΦhΛ)=-hΛ. Hence, x-Δ(Φx)2=hΛc-(-hΛ)2=h2x-Δ(Φx)2=hΛc-(-hΛ)2=h2. Furthermore, we observe that σK(x)2x2σK(x)2x2, since by definition σK(x)2x-x˜2σK(x)2x-x˜2 for all x˜ΣKx˜ΣK, including x˜=0x˜=0. Thus h2ChΛc2h2ChΛc2. Since h22=hΛ22+hΛc22h22=hΛ22+hΛc22, this yields

h Λ 2 2 = h 2 2 - h Λ c 2 2 h 2 2 - 1 C 2 h 2 2 = 1 - 1 C 2 h 2 2 . h Λ 2 2 = h 2 2 - h Λ c 2 2 h 2 2 - 1 C 2 h 2 2 = 1 - 1 C 2 h 2 2 .
(4)

This must hold for any vector hN(Φ)hN(Φ) and for any set of indices ΛΛ such that |Λ|K|Λ|K. In particular, let {vi}i=1N-M{vi}i=1N-M be an orthonormal basis for N(Φ)N(Φ), and define the vectors {hi}i=1N{hi}i=1N as follows:

h j = i = 1 N - M v i ( j ) v i . h j = i = 1 N - M v i ( j ) v i .
(5)

We note that hj=i=1N-Mej,vivihj=i=1N-Mej,vivi where ejej denotes the vector of all zeros except for a 1 in the jj-th entry. Thus we see that hj=PNejhj=PNej where PNPN denotes an orthogonal projection onto N(Φ)N(Φ). Since PNej22+PNej22=ej22=1PNej22+PNej22=ej22=1, we have that hj21hj21. Thus, by setting Λ={j}Λ={j} for hjhj we observe that

i = 1 N - M | v i ( j ) | 2 2 = | h j ( j ) | 2 1 - 1 C 2 h j 2 2 1 - 1 C 2 . i = 1 N - M | v i ( j ) | 2 2 = | h j ( j ) | 2 1 - 1 C 2 h j 2 2 1 - 1 C 2 .
(6)

Summing over j=1,2,...,Nj=1,2,...,N, we obtain

N 1 - 1 / C 2 j = 1 N i = 1 N - M | v i ( j ) | 2 = i = 1 N - M j = 1 N | v i ( j ) | 2 = i = 1 N - M v i 2 2 = N - M , N 1 - 1 / C 2 j = 1 N i = 1 N - M | v i ( j ) | 2 = i = 1 N - M j = 1 N | v i ( j ) | 2 = i = 1 N - M v i 2 2 = N - M ,
(7)

and thus M1-1-1/C2NM1-1-1/C2N as desired.

Thus, if we want a bound of the form Equation 3 that holds for all signals xx with a constant C1C1, then regardless of what recovery algorithm we use we will need to take MNMN measurements. However, in a sense this result is overly pessimistic, and we will now see that the results we just established for signal recovery in noise can actually allow us to overcome this limitation by essentially treating the approximation error as noise.

Towards this end, notice that all the results concerning 11 minimization stated thus far are deterministic instance-optimal guarantees that apply simultaneously to all xx given any matrix that satisfies the restricted isometry property (RIP). This is an important theoretical property, but as noted in "Matrices that satisfy the RIP", in practice it is very difficult to obtain a deterministic guarantee that the matrix ΦΦ satisfies the RIP. In particular, constructions that rely on randomness are only known to satisfy the RIP with high probability. As an example, recall Theorem 1 from "Matrices that satisfy the RIP", which opens the door to slightly weaker results that hold only with high probability.

Theorem 2

Fix δ(0,1)δ(0,1). Let ΦΦ be an M×NM×N random matrix whose entries φijφij are i.i.d. with φijφij drawn according to a strictly sub-Gaussian distribution with c2=1/Mc2=1/M. If

M κ 1 K log N K , M κ 1 K log N K ,
(8)

then ΦΦ satisfies the RIP of order KK with the prescribed δδ with probability exceeding 1-2e-κ2M1-2e-κ2M, where κ1κ1 is arbitrary and κ2=δ2/2κ*-log(42e/δ)/κ1κ2=δ2/2κ*-log(42e/δ)/κ1.

Even within the class of probabilistic results, there are two distinct flavors. The typical approach is to combine a probabilistic construction of a matrix that will satisfy the RIP with high probability with the previous results in this chapter. This yields a procedure that, with high probability, will satisfy a deterministic guarantee applying to all possible signals xx. A weaker kind of result is one that states that given a signal xx, we can draw a random matrix ΦΦ and with high probability expect certain performance for that signal xx. This type of guarantee is sometimes called instance-optimal in probability. The distinction is essentially whether or not we need to draw a new random ΦΦ for each signal xx. This may be an important distinction in practice, but if we assume for the moment that it is permissible to draw a new matrix ΦΦ for each xx, then we can see that Theorem 1 may be somewhat pessimistic. In order to establish our main result we will rely on the fact, previously used in "Matrices that satisfy the RIP", that sub-Gaussian matrices preserve the norm of an arbitrary vector with high probability. Specifically, a slight modification of Corollary 1 from "Matrices that satisfy the RIP" shows that for any xRNxRN, if we choose ΦΦ according to the procedure in Theorem 2, then we also have that

P Φ x 2 2 2 x 2 2 exp - κ 3 M P Φ x 2 2 2 x 2 2 exp - κ 3 M
(9)

with κ3=4/κ*κ3=4/κ*. Using this we obtain the following result.

Theorem 3

Let xRNxRN be fixed. Set δ2K<2-1δ2K<2-1 Suppose that ΦΦ is an M×NM×N sub-Gaussian random matrix with Mκ1KlogN/KMκ1KlogN/K. Suppose we obtain measurements of the form y=Φxy=Φx. Set ϵ=2σK(x)2ϵ=2σK(x)2. Then with probability exceeding 1-2exp(-κ2M)-exp(-κ3M)1-2exp(-κ2M)-exp(-κ3M), when B(y)={z:Φz-y2ϵ}B(y)={z:Φz-y2ϵ}, the solution x^x^ to Equation 1 obeys

x ^ - x 2 8 1 + δ 2 K - ( 1 + 2 ) δ 2 K 1 - ( 1 + 2 ) δ 2 K σ K ( x ) 2 . x ^ - x 2 8 1 + δ 2 K - ( 1 + 2 ) δ 2 K 1 - ( 1 + 2 ) δ 2 K σ K ( x ) 2 .
(10)

Proof

First we recall that, as noted above, from Theorem 2 we have that ΦΦ will satisfy the RIP of order 2K2K with probability at least 1-2exp(-κ2M)1-2exp(-κ2M). Next, let ΛΛ denote the index set corresponding to the KK entries of xx with largest magnitude and write x=xΛ+xΛcx=xΛ+xΛc. Since xΛΣKxΛΣK, we can write Φx=ΦxΛ+ΦxΛc=ΦxΛ+eΦx=ΦxΛ+ΦxΛc=ΦxΛ+e. If ΦΦ is sub-Gaussian then from Lemma 2 from "Sub-Gaussian random variables" we have that ΦxΛcΦxΛc is also sub-Gaussian, and one can apply Equation 9 to obtain that with probability at least 1-exp(-κ3M)1-exp(-κ3M), ΦxΛc22xΛc2=2σK(x)2ΦxΛc22xΛc2=2σK(x)2. Thus, applying the union bound we have that with probability exceeding 1-2exp(-κ2M)-exp(-κ3M)1-2exp(-κ2M)-exp(-κ3M), we satisfy the necessary conditions to apply Theorem 1 from "Signal recovery in noise" to xΛxΛ, in which case σK(xΛ)1=0σK(xΛ)1=0 and hence

x ^ - x Λ 2 2 C 2 σ K ( x ) 2 . x ^ - x Λ 2 2 C 2 σ K ( x ) 2 .
(11)

From the triangle inequality we thus obtain

x ^ - x 2 = x ^ - x Λ + x Λ - x 2 x ^ - x Λ 2 + x Λ - x 2 2 C 2 + 1 σ K ( x ) 2 x ^ - x 2 = x ^ - x Λ + x Λ - x 2 x ^ - x Λ 2 + x Λ - x 2 2 C 2 + 1 σ K ( x ) 2
(12)

which establishes the theorem.

Thus, although it is not possible to achieve a deterministic guarantee of the form in Equation 3 without taking a prohibitively large number of measurements, it is possible to show that such performance guarantees can hold with high probability while simultaneously taking far fewer measurements than would be suggested by Theorem 1. Note that the above result applies only to the case where the parameter is selected correctly, which requires some limited knowledge of xx, namely σK(x)2σK(x)2. In practice this limitation can easily be overcome through a parameter selection technique such as cross-validation [5], but there also exist more intricate analyses of 11 minimization that show it is possible to obtain similar performance without requiring an oracle for parameter selection [6]. Note that Theorem 3 can also be generalized to handle other measurement matrices and to the case where xx is compressible rather than sparse. Moreover, this proof technique is applicable to a variety of the greedy algorithms described later in this course that do not require knowledge of the noise level to establish similar results [2], [4].

References

1. Candès, E. (2008). The restricted isometry property and its implications for compressed sensing. Comptes rendus de l'Académie des Sciences, Série I, 346(9-10), 589–592.
2. Cohen, A. and Dahmen, W. and DeVore, R. (2008, Jun.). Instance optimal decoding by thresholding in compressed sensing. In Int. Conf. Harmonic Analysis and Partial Differential Equations. Madrid, Spain
3. Cohen, A. and Dahmen, W. and DeVore, R. (2009). Compressed Sensing and Best -term Approximation. J. Amer. Math. Soc., 22(1), 211–231.
4. Needell, D. and Tropp, J. (2009). CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal., 26(3), 301–321.
5. Ward, R. (2009). Compressive sensing with cross validation. IEEE Trans. Inform. Theory, 55(12), 5773–5782.
6. Wojtaszczyk, P. (2010). Stability and instance optimality for Gaussian measurements in compressed sensing. Found. Comput. Math., 10(1), 1–13.

Content actions

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks