We now briefly return to the noise-free setting to take a closer look at instance-optimal guarantees for recovering non-sparse signals. To begin, recall that in Theorem 1 from "Noise-free signal recovery" we bounded the ℓ2ℓ2-norm of the reconstruction error of
x
^
=
arg
min
z
z
1
subject to
z
∈
B
(
y
)
.
x
^
=
arg
min
z
z
1
subject to
z
∈
B
(
y
)
.
(1)
as
x
^
-
x
2
≤
C
0
σ
K
(
x
)
1
/
K
x
^
-
x
2
≤
C
0
σ
K
(
x
)
1
/
K
(2)
when B(y)={z:Φz=y}B(y)={z:Φz=y}. One can generalize this result to measure the reconstruction error using the ℓpℓp-norm for any p∈[1,2]p∈[1,2]. For example, by a slight modification of these arguments, one can also show that x^-x1≤C0σK(x)1x^-x1≤C0σK(x)1 (see [1]). This leads us to ask whether we might replace the bound for the ℓ2ℓ2 error with a result of the form x^-x2≤CσK(x)2x^-x2≤CσK(x)2. Unfortunately, obtaining such a result requires an unreasonably large number of measurements, as quantified by the following theorem of [3].
Suppose that ΦΦ is an M×NM×N matrix and that Δ:RM→RNΔ:RM→RN is a recovery algorithm that satisfies
x
-
Δ
(
Φ
x
)
2
≤
C
σ
K
(
x
)
2
x
-
Δ
(
Φ
x
)
2
≤
C
σ
K
(
x
)
2
(3)for some K≥1K≥1, then M>1-1-1/C2NM>1-1-1/C2N.
We begin by letting h∈RNh∈RN denote any vector in N(Φ)N(Φ). We write h=hΛ+hΛch=hΛ+hΛc where ΛΛ is an arbitrary set of indices satisfying |Λ|≤K|Λ|≤K. Set x=hΛcx=hΛc, and note that Φx=ΦhΛc=Φh-ΦhΛ=-ΦhΛΦx=ΦhΛc=Φh-ΦhΛ=-ΦhΛ since h∈N(Φ)h∈N(Φ). Since hΛ∈ΣKhΛ∈ΣK, Equation 3 implies that Δ(Φx)=Δ(-ΦhΛ)=-hΛΔ(Φx)=Δ(-ΦhΛ)=-hΛ. Hence, x-Δ(Φx)2=hΛc-(-hΛ)2=h2x-Δ(Φx)2=hΛc-(-hΛ)2=h2. Furthermore, we observe that σK(x)2≤x2σK(x)2≤x2, since by definition σK(x)2≤x-x˜2σK(x)2≤x-x˜2 for all x˜∈ΣKx˜∈ΣK, including x˜=0x˜=0. Thus h2≤ChΛc2h2≤ChΛc2. Since h22=hΛ22+hΛc22h22=hΛ22+hΛc22, this yields
h
Λ
2
2
=
h
2
2
-
h
Λ
c
2
2
≤
h
2
2
-
1
C
2
h
2
2
=
1
-
1
C
2
h
2
2
.
h
Λ
2
2
=
h
2
2
-
h
Λ
c
2
2
≤
h
2
2
-
1
C
2
h
2
2
=
1
-
1
C
2
h
2
2
.
(4)This must hold for any vector h∈N(Φ)h∈N(Φ) and for any set of indices ΛΛ such that |Λ|≤K|Λ|≤K. In particular, let {vi}i=1N-M{vi}i=1N-M be an orthonormal basis for N(Φ)N(Φ), and define the vectors {hi}i=1N{hi}i=1N as follows:
h
j
=
∑
i
=
1
N
-
M
v
i
(
j
)
v
i
.
h
j
=
∑
i
=
1
N
-
M
v
i
(
j
)
v
i
.
(5)We note that hj=∑i=1N-M〈ej,vi〉vihj=∑i=1N-M〈ej,vi〉vi where ejej denotes the vector of all zeros except for a 1 in the jj-th entry. Thus we see that hj=PNejhj=PNej where PNPN denotes an orthogonal projection onto N(Φ)N(Φ). Since PNej22+PN⊥ej22=ej22=1PNej22+PN⊥ej22=ej22=1, we have that hj2≤1hj2≤1. Thus, by setting Λ={j}Λ={j} for hjhj we observe that
∑
i
=
1
N
-
M
|
v
i
(
j
)
|
2
2
=
|
h
j
(
j
)
|
2
≤
1
-
1
C
2
h
j
2
2
≤
1
-
1
C
2
.
∑
i
=
1
N
-
M
|
v
i
(
j
)
|
2
2
=
|
h
j
(
j
)
|
2
≤
1
-
1
C
2
h
j
2
2
≤
1
-
1
C
2
.
(6)Summing over j=1,2,...,Nj=1,2,...,N, we obtain
N
1
-
1
/
C
2
≥
∑
j
=
1
N
∑
i
=
1
N
-
M
|
v
i
(
j
)
|
2
=
∑
i
=
1
N
-
M
∑
j
=
1
N
|
v
i
(
j
)
|
2
=
∑
i
=
1
N
-
M
v
i
2
2
=
N
-
M
,
N
1
-
1
/
C
2
≥
∑
j
=
1
N
∑
i
=
1
N
-
M
|
v
i
(
j
)
|
2
=
∑
i
=
1
N
-
M
∑
j
=
1
N
|
v
i
(
j
)
|
2
=
∑
i
=
1
N
-
M
v
i
2
2
=
N
-
M
,
(7)and thus M≥1-1-1/C2NM≥1-1-1/C2N as desired.
Thus, if we want a bound of the form Equation 3 that holds for all signals xx with a constant C≈1C≈1, then regardless of what recovery algorithm we use we will need to take M≈NM≈N measurements. However, in a sense this result is overly pessimistic, and we will now see that the results we just established for signal recovery in noise can actually allow us to overcome this limitation by essentially treating the approximation error as noise.
Towards this end, notice that all the results concerning ℓ1ℓ1 minimization stated thus far are deterministic instance-optimal guarantees that apply simultaneously to all xx given any matrix that satisfies the restricted isometry property (RIP). This is an important theoretical property, but as noted in "Matrices that satisfy the RIP", in practice it is very difficult to obtain a deterministic guarantee that the matrix ΦΦ satisfies the RIP. In particular, constructions that rely on randomness are only known to satisfy the RIP with high probability. As an example, recall Theorem 1 from "Matrices that satisfy the RIP", which opens the door to slightly weaker results that hold only with high probability.
Fix δ∈(0,1)δ∈(0,1). Let ΦΦ be an M×NM×N random matrix whose entries φijφij are i.i.d. with φijφij drawn according to a strictly sub-Gaussian distribution with c2=1/Mc2=1/M. If
M
≥
κ
1
K
log
N
K
,
M
≥
κ
1
K
log
N
K
,
(8)then ΦΦ satisfies the RIP of order KK with the prescribed δδ with probability exceeding 1-2e-κ2M1-2e-κ2M, where κ1κ1 is arbitrary and κ2=δ2/2κ*-log(42e/δ)/κ1κ2=δ2/2κ*-log(42e/δ)/κ1.
Even within the class of probabilistic results, there are two distinct flavors. The typical approach is to combine a probabilistic construction of a matrix that will satisfy the RIP with high probability with the previous results in this chapter. This yields a procedure that, with high probability, will satisfy a deterministic guarantee applying to all possible signals xx. A weaker kind of result is one that states that given a signal xx, we can draw a random matrix ΦΦ and with high probability expect certain performance for that signal xx. This type of guarantee is sometimes called instance-optimal in probability. The distinction is essentially whether or not we need to draw a new random ΦΦ for each signal xx. This may be an important distinction in practice, but if we assume for the moment that it is permissible to draw a new matrix ΦΦ for each xx, then we can see that Theorem 1 may be somewhat pessimistic. In order to establish our main result we will rely on the fact, previously used in "Matrices that satisfy the RIP", that sub-Gaussian matrices preserve the norm of an arbitrary vector with high probability. Specifically, a slight modification of Corollary 1 from "Matrices that satisfy the RIP" shows that for any x∈RNx∈RN, if we choose ΦΦ according to the procedure in Theorem 2, then we also have that
P
∥
Φ
x
∥
2
2
≥
2
∥
x
∥
2
2
≤
exp
-
κ
3
M
P
∥
Φ
x
∥
2
2
≥
2
∥
x
∥
2
2
≤
exp
-
κ
3
M
(9)
with κ3=4/κ*κ3=4/κ*. Using this we obtain the following result.
Let x∈RNx∈RN be fixed. Set δ2K<2-1δ2K<2-1 Suppose that ΦΦ is an M×NM×N sub-Gaussian random matrix with M≥κ1KlogN/KM≥κ1KlogN/K. Suppose we obtain measurements of the form y=Φxy=Φx. Set ϵ=2σK(x)2ϵ=2σK(x)2. Then with probability exceeding 1-2exp(-κ2M)-exp(-κ3M)1-2exp(-κ2M)-exp(-κ3M), when B(y)={z:Φz-y2≤ϵ}B(y)={z:Φz-y2≤ϵ}, the solution x^x^ to Equation 1 obeys
x
^
-
x
2
≤
8
1
+
δ
2
K
-
(
1
+
2
)
δ
2
K
1
-
(
1
+
2
)
δ
2
K
σ
K
(
x
)
2
.
x
^
-
x
2
≤
8
1
+
δ
2
K
-
(
1
+
2
)
δ
2
K
1
-
(
1
+
2
)
δ
2
K
σ
K
(
x
)
2
.
(10)First we recall that, as noted above, from Theorem 2 we have that ΦΦ will satisfy the RIP of order 2K2K with probability at least 1-2exp(-κ2M)1-2exp(-κ2M). Next, let ΛΛ denote the index set corresponding to the KK entries of xx with largest magnitude and write x=xΛ+xΛcx=xΛ+xΛc. Since xΛ∈ΣKxΛ∈ΣK, we can write Φx=ΦxΛ+ΦxΛc=ΦxΛ+eΦx=ΦxΛ+ΦxΛc=ΦxΛ+e. If ΦΦ is sub-Gaussian then from Lemma 2 from "Sub-Gaussian random variables" we have that ΦxΛcΦxΛc is also sub-Gaussian, and one can apply Equation 9 to obtain that with probability at least 1-exp(-κ3M)1-exp(-κ3M), ΦxΛc2≤2xΛc2=2σK(x)2ΦxΛc2≤2xΛc2=2σK(x)2. Thus, applying the union bound we have that with probability exceeding 1-2exp(-κ2M)-exp(-κ3M)1-2exp(-κ2M)-exp(-κ3M), we satisfy the necessary conditions to apply Theorem 1 from "Signal recovery in noise" to xΛxΛ, in which case σK(xΛ)1=0σK(xΛ)1=0 and hence
x
^
-
x
Λ
2
≤
2
C
2
σ
K
(
x
)
2
.
x
^
-
x
Λ
2
≤
2
C
2
σ
K
(
x
)
2
.
(11)From the triangle inequality we thus obtain
x
^
-
x
2
=
x
^
-
x
Λ
+
x
Λ
-
x
2
≤
x
^
-
x
Λ
2
+
x
Λ
-
x
2
≤
2
C
2
+
1
σ
K
(
x
)
2
x
^
-
x
2
=
x
^
-
x
Λ
+
x
Λ
-
x
2
≤
x
^
-
x
Λ
2
+
x
Λ
-
x
2
≤
2
C
2
+
1
σ
K
(
x
)
2
(12)which establishes the theorem.
Thus, although it is not possible to achieve a deterministic guarantee of the form in Equation 3 without taking a prohibitively large number of measurements, it is possible to show that such performance guarantees can hold with high probability while simultaneously taking far fewer measurements than would be suggested by Theorem 1. Note that the above result applies only to the case where the parameter is selected correctly, which requires some limited knowledge of xx, namely σK(x)2σK(x)2. In practice this limitation can easily be overcome through a parameter selection technique such as cross-validation [5], but there also exist more intricate analyses of ℓ1ℓ1 minimization that show it is possible to obtain similar performance without requiring an oracle for parameter selection [6]. Note that Theorem 3 can also be generalized to handle other measurement matrices and to the case where xx is compressible rather than sparse. Moreover, this proof technique is applicable to a variety of the greedy algorithms described later in this course that do not require knowledge of the noise level to establish similar results [2], [4].
-
Candès, E. (2008). The restricted isometry property and its implications for compressed sensing. Comptes rendus de l'Académie des Sciences, Série I, 346(9-10), 589–592.
-
Cohen, A. and Dahmen, W. and DeVore, R. (2008, Jun.). Instance optimal decoding by thresholding in compressed sensing. In Int. Conf. Harmonic Analysis and Partial Differential Equations. Madrid, Spain
-
Cohen, A. and Dahmen, W. and DeVore, R. (2009). Compressed Sensing and Best -term Approximation. J. Amer. Math. Soc., 22(1), 211–231.
-
Needell, D. and Tropp, J. (2009). CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal., 26(3), 301–321.
-
Ward, R. (2009). Compressive sensing with cross validation. IEEE Trans. Inform. Theory, 55(12), 5773–5782.
-
Wojtaszczyk, P. (2010). Stability and instance optimality for Gaussian measurements in compressed sensing. Found. Comput. Math., 10(1), 1–13.