Given the regression model Equation 1, we can decompose the empirical detail coefficient d^jkd^jk in Equation 3 as
d
^
j
k
=
1
n
∑
i
=
1
n
m
(
x
i
)
ψ
j
k
(
x
i
)
+
1
n
∑
i
=
1
n
ϵ
i
ψ
j
k
(
x
i
)
=
d
j
k
+
ρ
j
k
d
^
j
k
=
1
n
∑
i
=
1
n
m
(
x
i
)
ψ
j
k
(
x
i
)
+
1
n
∑
i
=
1
n
ϵ
i
ψ
j
k
(
x
i
)
=
d
j
k
+
ρ
j
k
(9)If the function m(x)m(x) allows for a sparse wavelet representation, only
a few number of detail coefficients djkdjk contribute to the signal and are
nonnegligible. However, every empirical coefficient d^jkd^jk has a nonzero
contribution coming from the noise part ρjkρjk.
Note the link between the coefficients
djkdjk in
Equation 9 and the theoretical coefficients
djk∘djk∘ in
Equation 8:
d
j
k
=
1
n
∑
i
=
1
n
m
(
x
i
)
ψ
j
,
k
(
x
i
)
=
∫
m
(
x
)
ψ
j
k
(
x
)
d
x
+
O
1
n
=
d
j
k
∘
+
O
1
n
.
d
j
k
=
1
n
∑
i
=
1
n
m
(
x
i
)
ψ
j
,
k
(
x
i
)
=
∫
m
(
x
)
ψ
j
k
(
x
)
d
x
+
O
1
n
=
d
j
k
∘
+
O
1
n
.
(10)In words, djkdjk constitutes a first order approximation (using the trapezoidal rule) of the integral djk∘djk∘.
For the scaling coefficients sjk∘sjk∘, it can be proved [23] that the order of accuracy of the trapezoidal rule is equal to N1N1, where NN is the order of the MRA associated to the scaling function.
Suppose the noise level is not too high, so that the signal can be distinguished from the noise.
Then, from the sparsity property of the wavelet, only the largest detail coefficients should be included in the wavelet estimator.
Hence, when estimating an unknown function, it makes sense to include only those coefficients that are larger than some specified threshold value tt:
η
H
(
d
^
j
k
,
t
)
=
d
^
j
k
1
{

d
^
j
k

>
t
}
.
η
H
(
d
^
j
k
,
t
)
=
d
^
j
k
1
{

d
^
j
k

>
t
}
.
(11)This `keeporkill' operation is called hard thresholding, see Figure 1(a).
Now, since each empirical coefficient consists of both a signal part and a noise part, it may be desirable to shrink even the coefficients that are larger than the threshold:
d
^
j
k
t
:
=
η
S
(
d
^
j
k
,
t
)
=
sign
(
d
^
j
k
)
(

d
^
j
k


t
)
+
.
d
^
j
k
t
:
=
η
S
(
d
^
j
k
,
t
)
=
sign
(
d
^
j
k
)
(

d
^
j
k


t
)
+
.
(12)Since the function ηSηS is continuous in its first argument,
this procedure is called soft thresholding.
More complex thresholding schemes have been proposed in the literature [2], [6], [13]. They often appear as a compromise between soft and hard thresholding, see Figure 1(b) for an example.
For a given threshold value tt and a thresholding scheme η(.)η(.), the nonlinear wavelet estimator is given by
m
^
(
x
)
=
∑
k
s
^
j
0
k
ϕ
j
0
k
(
x
)
+
∑
j
,
k
η
(
.
)
(
d
^
j
k
,
t
)
ψ
j
k
(
x
)
,
m
^
(
x
)
=
∑
k
s
^
j
0
k
ϕ
j
0
k
(
x
)
+
∑
j
,
k
η
(
.
)
(
d
^
j
k
,
t
)
ψ
j
k
(
x
)
,
(13)where j0j0 denotes the primary resolution level. It indicates the level above which the detail coefficients are being manipulated.
Let now d^j={d^jk,k=0,...,2j1}d^j={d^jk,k=0,...,2j1} denote the vector of empirical detail coefficients at level jj and similarly define s^js^j.
In practice a nonlinear wavelet estimator is obtained in three steps.
 Apply the analyzing (forward) wavelet transform on the observations {Yi}i=1n{Yi}i=1n, yielding s^j0s^j0 and d^j,d^j, for
j=j0,...,J1j=j0,...,J1.
 Manipulate the detail coefficients above the level j0j0, e.g. by softthresholding them.
 Invert the wavelet transform and produce an estimation of mm at the design points: {m^(xi)}i=1n{m^(xi)}i=1n.
If necessary, a continuous estimator m^m^ can then be constructed by an appropriate interpolation of the estimated m^(xi)m^(xi) values [12].
The choice of the primary resolution level in nonlinear wavelet estimation has the same
importance as the choice of a particular kernel in local polynomial estimation, i.e., it is not the most important factor. It is common practice to take j0=2j0=2 or j0=3j0=3, although a crossvalidation determination is of course possible [22].
The selection of a threshold value is much more crucial.
If it is chosen too large, the thresholding operation will kill too many coefficients. Too few coefficients will then be included in the reconstruction, resulting in an oversmoothed estimator. Conversely, a small threshold value will allow many coefficients to be included in the reconstruction, giving a rough, or undersmoothed estimator. A proper choice of the threshold involves thus a careful balance between smoothness and closeness of fit.
In case of an orthogonal transform and i.i.d. white noise, the same threshold can be applied to all detail coefficients, since the errors in the wavelet domain are still i.i.d. white noise. However, if the errors are stationary but correlated, or if the transform is biorthogonal, a leveldependent threshold is necessary to obtain optimal results [20], [7]. Finally, in the irregular setting, a level and location dependent threshold must be utilized.
Many efforts have been devoted to propose methods for selecting the threshold. We now review some of the procedures encountered in the literature.