Another probabilistic approach used to estimate the components of xx is by using Relevance Vector Machines (RVMs). An RVM is essentially a Bayesian learning method that produces sparse classification by linearly weighting a small number of fixed basis functions from a large dictionary of potential candidates (for more details the interested reader may refer to [6], [7]). From the CS perspective, we may view this as a method to determine the elements of a sparse xx which linearly weight the basis functions comprising the columns of ΦΦ.
The RVM setup employs a hierarchy of priors; first, a Gaussian prior is assigned to each of the NN elements of xx; subsequently, a Gamma prior is assigned to the inversevariance αiαi of the i th i th Gaussian prior. Therefore each αiαi controls the strength of the prior on its associated weight in xixi. If xx is the sparse vector to be reconstructed, its associated Gaussian prior is given by:
p
(
x

α
)
=
∏
i
=
1
N
N
(
x
i

0
,
α
i

1
)
p
(
x

α
)
=
∏
i
=
1
N
N
(
x
i

0
,
α
i

1
)
(1)and the Gamma prior on αα is written as:
p
(
α

a
,
b
)
=
∏
i
=
1
N
Γ
(
α
i

a
,
b
)
p
(
α

a
,
b
)
=
∏
i
=
1
N
Γ
(
α
i

a
,
b
)
(2)The overall prior on xx can be analytically evaluated to be the Studentt distribution, which can be designed to peak at xi=0xi=0 with appropriate choice of aa and bb. This enables the desired solution xx to be sparse. The RVM approach can be visualized using a graphical model similar to the one in "Sparse recovery via belief propagation". Using the observed measurements yy, the posterior density on each xixi is estimated by an iterative algorithm (e.g., Markov Chain Monte Carlo (MCMC) methods). For a detailed analysis of the RVM with a measurement noise prior, refer to [2], [7].
Alternatively, we can eliminate the need to set the hyperparameters aa and bb as follows. Assuming Gaussian measurement noise with mean 0 and variance σ2σ2, we can directly find the marginal log likelihood for αα and maximize it by the EM algorithm (or directly differentiate) to find estimates for αα.
L
(
α
)
=
log
p
(
y

α
,
σ
2
)
=
log
∫
p
(
y

x
,
σ
2
)
p
(
y

α
)
d
x
.
L
(
α
)
=
log
p
(
y

α
,
σ
2
)
=
log
∫
p
(
y

x
,
σ
2
)
p
(
y

α
)
d
x
.
(3)