We now consider a different setting. Suppose
x∈ℝNx∈ℝNx in ℝ^N and we wish to sample
xxx, where taking a sample means the application of a linear functional
λ∈ℝNλ∈ℝNλ in ℝ^N to
xxx. Next, we prescribe a budget of
nnn samples, and consider all linear encoders using
nnn samples. We can write these
nnn linear functionals as an
n×Nn×Nn times N matrix
Φ:ℝN→ℝnΦ:ℝN→ℝnΦ :ℝ^N rightarrow ℝ^n. We then consider a decoder
Δ:ℝn→ℝNΔ:ℝn→ℝNΔ :ℝ^n rightarrow ℝ^N. Our approximation to
xxx is thus
Δ(Φ(x))Δ(Φ(x))Δ { \( {Φ { \( x \)}} \)}.
To make the problem precise, we first pick a measure for distortion:
error
=
‖
x
-
Δ
(
Φ
(
x
)
)
‖
ℓ
p
.
error
=
‖
x
-
Δ
(
Φ
(
x
)
)
‖
ℓ
p
.
We next must make some assumption about
xxx. For example, we can assume that
x
∈
Σ
k
=
{
x
:
x
i
=
0
for
i
∉
Λ
,
♯
Λ
≤
k
},
x
∈
Σ
k
=
{
x
:
x
i
=
0
for
i
∉
Λ
,
♯
Λ
≤
k
},
x in Σ_k ={ lbrace {x : x_i = 0 "for" i notin Λ , ♯ Λ <= k} rbrace}
or
x
∈
ℓ
τ
x∈
ℓ
τ
or
x
∈
w
ℓ
τ
x∈
w
ℓ
τ
We recall our basic problem: a signal
x∈ℝNx∈ℝNx in ℝ^N will be “sampled” or “sensed” by applying the
nnn linear projections represented by the columns of the sampling matrix
Φn×NΦn×NΦ_{n times N}. The resulting measurements are given in the vector
y∈ℝny∈ℝny in ℝ^n, where
y=Φxy=Φxy =Φ x. We will assume that
n<Nn<Nn <N, meaning that in addition to thinking of of the sampling operation
ΦΦΦ as an encoder, we can also view it as a projection to a lower dimensional linear subspace. In either case, we would like the measurements
yyy to preserve as much information about the signal
xxx as possible. To proceed in finding optimal solutions, we must formalize this problem and define how we will measure this information loss.
Moving forward, a critical quantity for us will be the null space of the sampling matrix,
N=N(Φ)={x:Φx=0}N=N(Φ)={x:Φx=0}N =N { \( Φ \)} ={ lbrace {x : Φ x = 0} rbrace}. Because we are trying to take as few measurements as possible, we will assume that we are taking measurements efficiently so that the rows of
ΦΦΦ are all linearly independent. Therefore, assume that
rank(Φ)=nrank(Φ)=n"rank" { \( Φ \)} =n, implying that
NNN has dimension
(N−n)(N−n) \( {N - n} \). The non-trivial null space of
ΦΦΦ means that it is not an invertible mapping. For any measurement vector
yyy we can define the class of all observable signals that would result in the same measurement,
ℱ(y):={x:Φx=y}⊆ℝNℱ(y):={x:Φx=y}⊆ℝNℱ { \( y \)} :={ lbrace {x : Φ x = y} rbrace} subseteq ℝ^N. The class
ℱ(y)ℱ(y)ℱ { \( y \)} can always be written as a sum of a vector in the class and a vector in the null space of the sampling matrix,
ℱ
(
y
)
=
x
0
+
N
ℱ
(
y
)
=
x
0
+
N
, where
x
0
∈
ℱ
(
y
)
x
0
∈
ℱ
(
y
)
x 0 in ℱ { \( y \)}
.
If we have two vectors
x
0
,
x
1
∈
ℱ
(
y
)
x
0
,
x
1
∈
ℱ
(
y
)
,
then by linearity we know that
Φ
(
x
1
−
x
0
)
=
0
Φ
(
x
1
−
x
0
)
=
0
.
This fact implies that
(
x
1
−
x
0
)
∈
N
(
x
1
−
x
0
)
∈
N
,
and consequently that
x
1
∈
x
0
+
N
x
1
∈
x
0
+
N
.
Associated to our encoder
ΦΦΦ, we shall have to describe a decoder
ΔΔΔ, which is a (not necessarily linear) map
Δ:ℝn→ℝNΔ:ℝn→ℝNΔ :ℝ^n rightarrow ℝ^N. This decoder will take the measurements
yyy and try to recover
xxx as closely as possible,
x¯=Δ(Φx)x¯=Δ(Φx){overline x} =Δ { \( {Φ x} \)}. In order to design the best decoder
ΔΔΔ, we must specify our metric for measuring the quality of the estimate
x¯x¯overline x. For the moment we shall always think of taking an optimal decoder for the problem at hand. Later we shall discuss specific and concrete decoders.
To begin our discussion of the efficiency of the enoder
ΦΦΦ we consider the following problem.
We will fix
nnn
and
NNN
and try to find
ΦΦΦ
and a decoder
ΔΔΔ
such that for all input signals in a sparsity class
x
∈
Σ
k
x
∈
Σ
k
x in Σ k
we can get perfect reconstruction,
Δ(Φx)=xΔ(Φx)=xΔ { \( {Φ x} \)} =x. We will be interested in determining what the largest value of
kkk is that we can find such an encoder/decoder pair.
An important role in this problem and later problems of compressed sensing is played by certain submatrices of
Φ
Φ
Φ
.
Given a set
T⊆{1,2,…,N}T⊆{1,2,…,N}T subseteq { lbrace {1 , 2 , dotslow , N} rbrace}, representing a collection of column indices we define the matrix
ΦTΦTΦ_T as the one formed from
ΦΦΦ by using the columns from the set
TTT. The matrix
ΦTΦTΦ_T is a
(n×#(T))(n×#(T)) \( {n times italic "#" { \( T \)}} \) matrix. But sometimes we will also use the same notation
ΦTΦTΦ_T to denote the matrix obtained from
ΦΦΦ by setting all entries not in the columns of
TTT to zero.
We can now state the following theorem.
The following statements are all equivalent for any given
ΦΦΦ:
- There exists a decoder
ΔΔΔ such that
Δ(Φx)=xΔ(Φx)=xΔ { \( {Φ x} \)} =x for all
x
∈
Σ
k
x
∈
Σ
k
x in Σ k
.
-
N
(
Φ
)
∩
Σ
2
k
=
{
0
}
N
(
Φ
)
∩
Σ
2
k
=
{
0
}
N { \( Φ \)} intersection Σ 2 k ={ lbrace 0 rbrace}
.
-
Φ
T
Φ
T
has rank
#(T)#(T)italic "#" { \( T \)} for all
TTT with
#(T)=2k#(T)=2kitalic "#" { \( T \)} =2 k.
-
Φ
T
t
Φ
T
Φ
T
t
Φ
T
is non-singular (i.e., invertible) for all
TTT with
#(T)=2k#(T)=2kitalic "#" { \( T \)} =2 k.
The equivalence of
b
↔
c
↔
d
b
↔
c
↔
d
is simple linear algebra.
First let us prove that
a→ba→ba rightarrow b. Assume
aaa. Suppose that there was a vector
η
∈
N
∩
Σ
2
k
η
∈
N
∩
Σ
2
k
η in N intersection Σ 2 k
. We know that we can write
η
=
x
0
−
x
1
η
=
x
0
−
x
1
η =x 0 - x 1
, where
x0,x1x0,x1x 0 ,x 1 both have support less than
kkk. We could write
ηηη simply as a composite vector with support
2k2k2 k where the first half of
ηηη is
x0x0x 0 and the second half of
ηηη is
x1x1x 1,
η
=
(
x
0
∣
x
1
)
η
=
(
x
0
∣
x
1
)
η ={ \( {x 0 \lline x 1} \)}. We know that
η∈Nη∈Nη in N, which implies that
Φη=0Φη=0Φ η =0. It then follows that
Φ
x
0
=
Φ
x
1
Φ
x
0
=
Φ
x
1
Φ x 0 =Φ x 1, and we have as a consequence that
Δ
Φ
x
0
=
Δ
Φ
x
1
Δ
Φ
x
0
=
Δ
Φ
x
1
Δ Φ x 0 =Δ Φ x 1 and finally that
x
0
=
x
1
x
0
=
x
1
x 0 =x 1. This proves that
η=0η=0η =0 as desired.
Now let us prove that
b→ab→ab rightarrow a. Suppose that we have a measurement
Φx=y∈ℝnΦx=y∈ℝnΦ x =y in ℝ^n, where
x
∈
Σ
k
x
∈
Σ
k
x in Σ k. We will define the decoder
Δ(y)Δ(y)Δ { \( y \)} to be the signal
̄
x
∈
ℱ
(y)
̄
x
∈
ℱ
(y){̄x} in ℱ { \( y \)} with smallest support. Since
xxx has support
kk <= k so will
̄
x
̄
x
̄x. We claim that there is no other
x′∈ℱ(y)∩Σkx′∈ℱ(y)∩Σkx^′ in ℱ { \( y \)} intersection Σ k. Indeed, if
x′x′x^′ existed, then
x−x′∈N∩Σ2kx−x′∈N∩Σ2kx - x^′ in N intersection Σ 2 k. But, the only vector in
N∩Σ2kN∩Σ2kN intersection Σ 2 k is zero, implying that
x=x′x=x′x =x^′. This finally gives us that
Δ(Φx)=xΔ(Φx)=xΔ { \( {Φ x} \)} =x. □
Using the previous theorem, we can turn to the question of finding good encoder/decoder pairs.
Given a fixed
NNN, how large can
kkk be, and what is the best
ΦΦΦ? Given a fixed
nnn, the largest
kkk is
k=⌊n2⌋k=⌊n2⌋k ={⌊ n over 2 ⌋}. Alternatively, we can say that given a fixed
kkk, we need at least
n=2kn=2kn =2 k measurements. Another way to say this is that there exist encoding matrices
Φ2k×NΦ2k×NΦ_{2 k times N} such that any selection of
2k2k2 k columns are linearly independent. Examples are the DFT matrix or the Vandermonde matrix corresponding to interpolation at distinct points
z1,…,zNz1,…,zNz_1 , dotslow ,z_N.