# Connexions

You are here: Home » Content » An Introduction to Wavelet Analysis » Nonparametric regression with wavelets

### Lenses

What is a lens?

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

#### Endorsed by (What does "Endorsed by" mean?)

This content has been endorsed by the organizations listed. Click each link for a list of all content endorsed by the organization.
• IEEE-SPS

This collection is included inLens: IEEE Signal Processing Society Lens
By: IEEE Signal Processing Society

Click the "IEEE-SPS" link to see all content they endorse.

#### Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
• NSF Partnership

This collection is included inLens: NSF Partnership in Signal Processing
By: Sidney Burrus

Click the "NSF Partnership" link to see all content affiliated with them.

Click the tag icon to display tags associated with this content.

• Featured Content

This collection is included inLens: Connexions Featured Content
By: Connexions

Click the "Featured Content" link to see all content affiliated with them.

Click the tag icon to display tags associated with this content.

#### Also in these lenses

• UniqU content

This collection is included inLens: UniqU's lens
By: UniqU, LLC

Click the "UniqU content" link to see all content selected in this lens.

• Lens for Engineering

This module and collection are included inLens: Lens for Engineering
By: Sidney Burrus

Click the "Lens for Engineering" link to see all content selected in this lens.

### Recently Viewed

This feature requires Javascript to be enabled.

### Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.

Inside Collection:

Collection by: Veronique Delouille. E-mail the author

# Nonparametric regression with wavelets

Module by: Veronique Delouille. E-mail the author

In this section, we consider only real-valued wavelet functions that form an orthogonal basis, hence ϕϕ˜ϕϕ˜ and ψψ˜ψψ˜. We saw in Orthogonal Bases from Multiresolution analysis and wavelets how a given function belonging to L2(R)L2(R) could be represented as a wavelet series. Here, we explain how to use a wavelet basis to construct a nonparametric estimator for the regression function mm in the model

Y i = m ( x i ) + ϵ i , i = 1 , ... , n , n = 2 J , J N , Y i = m ( x i ) + ϵ i , i = 1 , ... , n , n = 2 J , J N ,
(1)

where xi=inxi=in are equispaced design points and the errors are i.i.d. Gaussian, ϵiN(0,σϵ2)ϵiN(0,σϵ2).

A wavelet estimator can be linear or nonlinear. The linear wavelet estimator proceeds by projecting the data onto a coarse level space. This estimator is of a kernel-type, see "Linear smoothing with wavelets". Another possibility for estimating mm is to detect which detail coefficients convey the important information about the function mm and to put equal to zero all the other coefficients. This yields a nonlinear wavelet estimator as described in "Nonlinear smoothing with wavelets".

## Linear smoothing with wavelets

Suppose we are given data (xi,Yi)i=1n(xi,Yi)i=1n coming from the model Equation 1 and an orthogonal wavelet basis generated by {ϕ,ψ}{ϕ,ψ}. The linear wavelet estimator proceeds by choosing a cutting level j1j1 and represents an estimation of the projection of mm onto the space Vj1Vj1:

m ^ ( x ) = k = 0 2 j 0 - 1 s ^ j 0 , k ϕ j 0 , k ( x ) + j = j 0 j 1 - 1 k = 0 2 j - 1 d ^ j , k ψ j , k ( x ) = k s ^ j 1 , k ϕ j 1 , k ( x ) , m ^ ( x ) = k = 0 2 j 0 - 1 s ^ j 0 , k ϕ j 0 , k ( x ) + j = j 0 j 1 - 1 k = 0 2 j - 1 d ^ j , k ψ j , k ( x ) = k s ^ j 1 , k ϕ j 1 , k ( x ) ,
(2)

with j0j0 the coarsest level in the decomposition, and where the so-called empirical coefficients are computed as

s ^ j , k = 1 n i = 1 n Y i ϕ j k ( x i ) and d ^ j , k = 1 n i = 1 n Y i ψ j k ( x i ) . s ^ j , k = 1 n i = 1 n Y i ϕ j k ( x i ) and d ^ j , k = 1 n i = 1 n Y i ψ j k ( x i ) .
(3)

The cutting level j1j1 plays the role of a smoothing parameter: a small value of j1j1 means that many detail coefficients are left out, and this may lead to oversmoothing. On the other hand, if j1j1 is too large, too many coefficients will be kept, and some artificial bumps will probably remain in the estimation of m(x)m(x).

To see that the estimator Equation 2 is of a kernel-type, consider first the projection of mm onto Vj1Vj1:

P V j 1 m ( x ) = k m ( y ) ϕ j 1 , k ( y ) d y ϕ j 1 , k ( x ) = K j 1 ( x , y ) m ( y ) d y , P V j 1 m ( x ) = k m ( y ) ϕ j 1 , k ( y ) d y ϕ j 1 , k ( x ) = K j 1 ( x , y ) m ( y ) d y ,
(4)

where the (convolution) kernel Kj1(x,y)Kj1(x,y) is given by

K j 1 ( x , y ) = k ϕ j 1 , k ( y ) ϕ j 1 , k ( x ) . K j 1 ( x , y ) = k ϕ j 1 , k ( y ) ϕ j 1 , k ( x ) .
(5)

Härdle et al. [14] studied the approximation properties of this projection operator. In order to estimate Equation 4, Antoniadis et al. [3] proposed to take:

P V j 1 ^ m ( x ) = i = 1 n Y i ( i - 1 ) / n i / n K j 1 ( x , y ) d y = k i = 1 n Y i ( i - 1 ) / n i / n ϕ j 1 , k ( y ) d y ϕ j 1 , k ( x ) . P V j 1 ^ m ( x ) = i = 1 n Y i ( i - 1 ) / n i / n K j 1 ( x , y ) d y = k i = 1 n Y i ( i - 1 ) / n i / n ϕ j 1 , k ( y ) d y ϕ j 1 , k ( x ) .
(6)

Approximating the last integral by 1nϕj1,k(xi)1nϕj1,k(xi), we find back the estimator m^(x)m^(x) in Equation 2.

By orthogonality of the wavelet transform and Parseval's equality, the L2-L2-risk (or integrated mean square error IMSE) of a linear wavelet estimator is equal to the l2-l2-risk of its wavelet coefficients:

IMSE = E m ^ - m L 2 2 = k E [ s ^ j 0 , k - s j 0 , k ] 2 + j = j 0 j 1 - 1 k E [ d ^ j k - d j k ] 2 + j = j 1 k d j k 2 = S 1 + S 2 + S 3 , IMSE = E m ^ - m L 2 2 = k E [ s ^ j 0 , k - s j 0 , k ] 2 + j = j 0 j 1 - 1 k E [ d ^ j k - d j k ] 2 + j = j 1 k d j k 2 = S 1 + S 2 + S 3 ,
(7)

where

s j k : = m , ϕ j k and d j k = m , ψ j k s j k : = m , ϕ j k and d j k = m , ψ j k
(8)

are called theoretical' coefficients in the regression context. The term S1+S2S1+S2 in Equation 7 constitutes the stochastic bias whereas S3S3 is the deterministic bias. The optimal cutting level is such that these two bias are of the same order. If mm is β-β-Hölder continuous, it is easy to see that the optimal cutting level is j1(n)=O(n1/(1+2β))j1(n)=O(n1/(1+2β)). The resulting optimal IMSE is of order n-2β2β+1n-2β2β+1. In practice, cross-validation methods are often used to determine the optimal level j1j1 [3], [22].

## Nonlinear smoothing with wavelets

### Hard-, soft-thresholding and wavelet estimator

Given the regression model Equation 1, we can decompose the empirical detail coefficient d^jkd^jk in Equation 3 as

d ^ j k = 1 n i = 1 n m ( x i ) ψ j k ( x i ) + 1 n i = 1 n ϵ i ψ j k ( x i ) = d j k + ρ j k d ^ j k = 1 n i = 1 n m ( x i ) ψ j k ( x i ) + 1 n i = 1 n ϵ i ψ j k ( x i ) = d j k + ρ j k
(9)

If the function m(x)m(x) allows for a sparse wavelet representation, only a few number of detail coefficients djkdjk contribute to the signal and are non-negligible. However, every empirical coefficient d^jkd^jk has a non-zero contribution coming from the noise part ρjkρjk.

#### Remark:

Note the link between the coefficients djkdjk in Equation 9 and the theoretical coefficients djkdjk in Equation 8:
d j k = 1 n i = 1 n m ( x i ) ψ j , k ( x i ) = m ( x ) ψ j k ( x ) d x + O 1 n = d j k + O 1 n . d j k = 1 n i = 1 n m ( x i ) ψ j , k ( x i ) = m ( x ) ψ j k ( x ) d x + O 1 n = d j k + O 1 n .
(10)

In words, djkdjk constitutes a first order approximation (using the trapezoidal rule) of the integral djkdjk. For the scaling coefficients sjksjk, it can be proved [23] that the order of accuracy of the trapezoidal rule is equal to N-1N-1, where NN is the order of the MRA associated to the scaling function.

Suppose the noise level is not too high, so that the signal can be distinguished from the noise. Then, from the sparsity property of the wavelet, only the largest detail coefficients should be included in the wavelet estimator. Hence, when estimating an unknown function, it makes sense to include only those coefficients that are larger than some specified threshold value tt:

η H ( d ^ j k , t ) = d ^ j k 1 { | d ^ j k | > t } . η H ( d ^ j k , t ) = d ^ j k 1 { | d ^ j k | > t } .
(11)

This keep-or-kill' operation is called hard thresholding, see Figure 1(a).

Now, since each empirical coefficient consists of both a signal part and a noise part, it may be desirable to shrink even the coefficients that are larger than the threshold:

d ^ j k t : = η S ( d ^ j k , t ) = sign ( d ^ j k ) ( | d ^ j k | - t ) + . d ^ j k t : = η S ( d ^ j k , t ) = sign ( d ^ j k ) ( | d ^ j k | - t ) + .
(12)

Since the function ηSηS is continuous in its first argument, this procedure is called soft thresholding. More complex thresholding schemes have been proposed in the literature [2], [6], [13]. They often appear as a compromise between soft and hard thresholding, see Figure 1(b) for an example.

For a given threshold value tt and a thresholding scheme η(.)η(.), the nonlinear wavelet estimator is given by

m ^ ( x ) = k s ^ j 0 k ϕ j 0 k ( x ) + j , k η ( . ) ( d ^ j k , t ) ψ j k ( x ) , m ^ ( x ) = k s ^ j 0 k ϕ j 0 k ( x ) + j , k η ( . ) ( d ^ j k , t ) ψ j k ( x ) ,
(13)

where j0j0 denotes the primary resolution level. It indicates the level above which the detail coefficients are being manipulated.

Let now d^j={d^jk,k=0,...,2j-1}d^j={d^jk,k=0,...,2j-1} denote the vector of empirical detail coefficients at level jj and similarly define s^js^j. In practice a nonlinear wavelet estimator is obtained in three steps.

1. Apply the analyzing (forward) wavelet transform on the observations {Yi}i=1n{Yi}i=1n, yielding s^j0s^j0 and d^j,d^j, for j=j0,...,J-1j=j0,...,J-1.
2. Manipulate the detail coefficients above the level j0j0, e.g. by soft-thresholding them.
3. Invert the wavelet transform and produce an estimation of mm at the design points: {m^(xi)}i=1n{m^(xi)}i=1n.

If necessary, a continuous estimator m^m^ can then be constructed by an appropriate interpolation of the estimated m^(xi)m^(xi) values [12].

The choice of the primary resolution level in nonlinear wavelet estimation has the same importance as the choice of a particular kernel in local polynomial estimation, i.e., it is not the most important factor. It is common practice to take j0=2j0=2 or j0=3j0=3, although a cross-validation determination is of course possible [22].

The selection of a threshold value is much more crucial. If it is chosen too large, the thresholding operation will kill too many coefficients. Too few coefficients will then be included in the reconstruction, resulting in an oversmoothed estimator. Conversely, a small threshold value will allow many coefficients to be included in the reconstruction, giving a rough, or undersmoothed estimator. A proper choice of the threshold involves thus a careful balance between smoothness and closeness of fit.

In case of an orthogonal transform and i.i.d. white noise, the same threshold can be applied to all detail coefficients, since the errors in the wavelet domain are still i.i.d. white noise. However, if the errors are stationary but correlated, or if the transform is biorthogonal, a level-dependent threshold is necessary to obtain optimal results [20], [7]. Finally, in the irregular setting, a level and location dependent threshold must be utilized.

Many efforts have been devoted to propose methods for selecting the threshold. We now review some of the procedures encountered in the literature.

### Choice of the threshold

#### Universal threshold

The most simple method to find a threshold whose value is supported by some statistical arguments, is probably to use the so-called universal threshold' [12], [10]

t univ = σ d 2 log n , t univ = σ d 2 log n ,
(14)

where the only quantity to be estimated is σd2σd2, which constitutes the variance of the empirical wavelet coefficients. In case of an orthogonal transform, σd=σϵ/nσd=σϵ/n.

In a wavelet transform, the detail coefficients at fine scales are, with a small fraction of exception, essentially pure noise. This is the reason why Donoho and Johnstone proposed in [11] to estimate σdσd in a robust way using the median absolute deviation from the median (MAD) of d^J-1d^J-1:

σ ^ d = median d ^ J - 1 - median ( d ^ J - 1 ) 0 . 6745 . σ ^ d = median d ^ J - 1 - median ( d ^ J - 1 ) 0 . 6745 .
(15)

If the universal threshold is used in conjunction with soft thresholding, the resulting estimator possesses a noise-free property: with a high probability, an appropriate interpolation of {m^(xi)}{m^(xi)} produces an estimator which is at least as smooth as the function mm, see Theorem 1.1 in [12]. Hence the reconstruction is of good visual quality, so that Donoho and Johnstone called the procedure VisuShrink' [11]. Although simple, this estimator enjoys a near-minimax adaptivity property, see "Adaptivity of wavelet estimator". However, this near-optimality is an asymptotic one: for small sample size tunivtuniv may be too large, leading to a poor mean square error.

#### Oracle inequality

Consider the soft-thresholded detail coefficients d^td^t. Another approach to find an optimal threshold is to look at the l2-l2-risk

R ( d ^ t , d ) = E ( j , k ) ( d ^ j k t - d j k ) 2 = E d ^ t - d l 2 2 , R ( d ^ t , d ) = E ( j , k ) ( d ^ j k t - d j k ) 2 = E d ^ t - d l 2 2 ,
(16)

and to relate this risk with the one of an ideal risk RidealRideal. The ideal risk is the risk obtained if an oracle tells us exactly which coefficients to keep or to kill.

In [10], Donoho and Johnstone showed that, when using the universal threshold, the following oracle inequality prevails

R ( d ^ t , d ) ( 2 log n + 1 ) σ ϵ 2 n + R ideal . R ( d ^ t , d ) ( 2 log n + 1 ) σ ϵ 2 n + R ideal .
(17)

However, this inequality is not optimal. Donoho and Johnstone looked for the optimal threshold t*(n)t*(n) which leads to the smallest possible constant Λn*Λn* in place of 2logn+12logn+1. Such a threshold does not exist in closed form, but can be approximated numerically. For small sample size, it is sensibly smaller than the universal threshold.

#### SureShrink procedure

Given the expression Equation 16 for the l2l2-risk, it is natural to look for a threshold that minimizes an estimation of this risk.

By minimizing Stein's unbiased estimate of the risk [24] and using a soft thresholding scheme, the resulting estimator, called SureShrink', is adaptive over a wide range of function spaces including Hölder, Sobolev, and Besov spaces, see "Adaptivity of wavelet estimator". That is, without any a priori knowledge on the type or amount of regularity of the function, the SURE procedure nevertheless achieves the optimal rate of convergence that one could attain by knowing the regularity of the function.

#### Other thresholding procedures

We mention some of the other thresholding or shrinkage procedures proposed in the literature.

Instead of considering each coefficient individually, Cai et al. [8], [9] consider blocks of empirical wavelet coefficients in order to make simultaneous shrinkage decisions about all coefficients within a block.

Another fruitful idea is to use the Bayesian framework. There a prior distribution is imposed on the wavelet coefficients djkdjk. This prior model is designed to capture the sparseness of the wavelet expansion. Next, the function is estimated by applying some Bayes rules on the resulting posterior distribution of the wavelet coefficients, see for example [5], [4], [18], [19].

Antoniadis and Fan [2] treat the problem of selecting the wavelet coefficients as a penalized least squares issue. Let WW be the matrix of an orthogonal wavelet transform and Y:={Yi}i=1nY:={Yi}i=1n. The detail coefficients d:={djk}d:={djk} which minimize

W Y - d l 2 2 + j , k p λ ( | d j k | ) W Y - d l 2 2 + j , k p λ ( | d j k | )
(18)

are used to estimate the true wavelet coefficients. In equation Equation 18, pλ(·)pλ(·) is a penalty function which depends on the regularization parameter λλ. The authors provide a general framework, where different penalty functions correspond to different type of thresholding procedures (like, e.g., the soft- and hard- thresholding) and obtain oracle inequalities for a large class of penalty functions.

Other methods include threshold selection by hypothesis testing [1], cross-validation [21], or generalized cross-validation [16], [17], which is used to estimated the l2l2-risk of the empirical detail coefficients.

## Linear versus nonlinear wavelet estimator

In order to differenciate the behaviours of a linear and of a nonlinear wavelet estimator, we consider the Sobolev class Wqs(C)Wqs(C) defined as

W q s ( C ) = { f : f q q + d s d x s f ( x ) q q C 2 } , W q s ( C ) = { f : f q q + d s d x s f ( x ) q q C 2 } ,
(19)

and that we denote VV in short. Assume we know that mm, the function to be estimated, belongs to VV. In the next section, we will release this assumption. The Lp-Lp-risk of an arbitrary estimator TnTn based on the sample data is defined as ETn-mpp,1p<ETn-mpp,1p<, whereas the Lp-Lp-minimax risk is given by

R n ( V , p ) = inf T n sup m V E T n - m p p , R n ( V , p ) = inf T n sup m V E T n - m p p ,
(20)

where the infimum is taken over all measurable estimators TnTn of m.m. Similarly, we define the linear Lp-Lp-minimax risk as

R n lin ( V , p ) = inf T n lin sup m V E T n lin - m p p , R n lin ( V , p ) = inf T n lin sup m V E T n lin - m p p ,
(21)

where the infimum is now taken over all linear estimators Tnlin.Tnlin. Obviously, Rnlin(V,p)Rn(V,p).Rnlin(V,p)Rn(V,p). We first state some definitions.

### Definition:

The sequences {an}{an} and {bn}{bn} are said to be asymptotically equivalent and are noted anbnanbn if the ratio an/bnan/bn is bounded away from zero and as n.n.

### Definition:

The sequence anan is called optimal rate of convergence , (or minimax rate of convergence) on the class VV for the Lp-Lp-risk if anRn(V,p)1/panRn(V,p)1/p. We say that an estimator mnmn of mm attains the optimal rate of convergence if sup m V E m n - m p p R n ( V , p ) . sup m V E m n - m p p R n ( V , p ) .

In order to fix the idea, we consider only the L2-L2-risk in the remaining part of this section, thus p:=2p:=2.

In [15], [25], the authors found that the optimal rate of convergence attainable by an estimator when the underlying function belongs to the Sobolev class WqsWqs is an=n-s2s+1an=n-s2s+1, hence Rn(V,2)=n-2s2s+1Rn(V,2)=n-2s2s+1. We saw in "Linear smoothing with wavelets" that linear wavelet estimators attain the optimal rate for s-s-Hölder function in case of the L2-L2-risk (also called IMSE'). For a Sobolev class WqsWqs, the same result holds provided that q2q2. More precisely, we have the two following situations.

1. If q2,q2, we are in the so-called homogeneous zone. In this zone of spatial homogeneity, linear estimators can attain the optimal rate of convergence n-s/(2s+1).n-s/(2s+1).
2. If q<2,q<2, we are in the non-homogeneous zone, where linear estimators do not attain the optimal rate of convergence. Instead, we have:
Rnlin(V,2)/Rn(V,2),asn.Rnlin(V,2)/Rn(V,2),asn.
(22)

The second result is due to the spatial variability of functions in Sobolev spaces with small index qq. Linear estimators are based on the idea of spatial homogeneity of the function and hence do perform poorly in the presence of non-homogeneous functions. In contrast, even if q<2q<2, the SureShrink estimator attains the minimax rate [11]. The same type of results holds for more general Besov spaces, see for example [14], Chapter 10.

## Adaptivity of wavelet estimator

We just saw that a nonlinear wavelet estimator is able to estimate in an optimal way functions of inhomogeneous regularity. However, it may not be sufficient to know that for mm belonging to a given space, the estimator performs well. Indeed, in general we do not know which space the function belongs to. Hence it is of great interest to consider a scale of function classes and to look for an estimator that attains simultaneously the best rates of convergence across the whole scale. For example, the Lq-Lq-Sobolev scale is a set of Sobolev function classes Wqs(C)Wqs(C) indexed by the parameters ss and CC, see Equation 19 for the definition of a Sobolev class. We now formalize the notion of an adaptive estimator.

Let AA be a given set and let {Fα,αA}{Fα,αA} be the scale of functional classes FαFα indexed by αA.αA. Denote Rn(α,p)Rn(α,p) the minimax risk over FαFα for the Lp-Lp-loss:

R n ( α , p ) = inf m ^ n sup m F α E m ^ n - m p p . R n ( α , p ) = inf m ^ n sup m F α E m ^ n - m p p .
(23)

### Definition:

The estimator mn*mn* is called rate adaptive for the Lp-Lp-loss and the scale of classes Fα,αAFα,αA if for any αAαA there exists cα>0cα>0 such that
sup m F α E m n * - m p p c α R n ( α , p ) n 1 . sup m F α E m n * - m p p c α R n ( α , p ) n 1 .
(24)

The estimator mn*mn* is called adaptive up to a logarithmic factor for the Lp-Lp-loss and the scale of classes Fα,αAFα,αA if for any αAαA there exist cα>0cα>0 and γ=γα>0γ=γα>0 such that

sup m F α E m n * - m p p c α ( log n ) γ R n ( α , p ) n 1 . sup m F α E m n * - m p p c α ( log n ) γ R n ( α , p ) n 1 .
(25)

Thus, adaptive estimators have an optimal rate of convergence and behave as if they know in advance in which class the function to be estimated lies.

The VisuShrink procedure is adaptive up to a logarithmic factor for the L2-L2-loss over every Besov, Hölder and Sobolev class that is contained in C[0,1]C[0,1], see Theorem 1.2 in [12]. The SureShrink estimator does better: it is adaptive for the L2-L2-loss, for a large scale of Besov, Hölder and Sobolev classes, see Theorem 1 in [11].

## Conclusion

In this chapter, we saw the basic properties of standard wavelet theory and explained how these are related to the construction of wavelet regression estimators.

## References

1. Abramovich, F. and Benjamini, B.Y. (1995). Thresholding of wavelet coefficients as multiple hypotheses testing procedure. In Antoniadis, A. and Oppenheim, G. (Eds.), Wavelets in Statistics, Lectures Notes in Statistics, Vol. 103. (pp. 5-14). Springer-Verlag.
2. Antoniadis, A. and Fan, J. (2001). Regularization of Wavelets approximations (with discussion). J. Am. Statist. Assoc., 96, 939-967.
3. Antoniadis, A. and Grégoire, G. and McKeague, I. (1994). Wavelet methods for curve estimation. J. Am. Statist. Assoc., 89, 1340-1353.
4. Antoniadis, A. and Leporini, D. and Pesquet, J.C. (2000). Wavelet thresholding for some classes of non-Gaussian noise. Technical report. IMAG, Grenoble, France.
5. Abramovich, F. and Sapatinas, T. and Silverman, B.W. (1998). Wavelet Thresholding via a Bayesian Approach. J. Roy. Statist. Soc., Series B, 60, 725-749.
6. Bruce, A.G. and Gao, H.-Y. (1997). Waveshrink with firm shrinkage. Statistica Sinica, 4, 855-874.
7. Berkner, K. and Wells, R.O. (1998). A correlation-dependent model for denoising via nonorthogonal wavelet transforms. Technical report. Computational Mathematics Laboratory, Rice University.
8. Cai, T. (1999). Adaptive wavelet estimation: a block thresholding and oracle inequality approach. Annals of Statistics, 27, 898-924.
9. Cai, T. and Silverman, B.W. (2001). Incorporating information on neighboring coefficients into wavelet estimation. Sankhya: The Indian Journal of Statistics. Special Issue on Wavelets, 63, 127-148.
10. Donoho, D.L. and Johnstone, I.M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81, 425-455.
11. Donoho, D.L. and Johnstone, I.M. (1995). Adapting to unknown smoothness via wavelet shrinkage. J. Am. Statist. Assoc., 90, 1200-1224.
12. Donoho, D.L. (1995). De-noising via soft-thresholding. IEEE Transactions on Information Theory, 41, 613-627.
13. Gao, H.-Y. (1998). Wavelet shrinkage denoising using the non-negative garrote. Journal of Computational and Graphical Statistics, 7, 469-488.
14. Härdle, W. and Kerkyacharian, G. and Picard, D. and Tsybakov, A. (1998). Lecture Notes in Statistics 129: Wavelets, Approximation, and Statistical Applications. Springer-Verlag.
15. Ibragimov, I.A. and Hasminskii, R.Z. (1981). Statistical Estimation: Asymptotic Theory. New York: Springer-Verlag.
16. Jansen, M. and Bultheel, A. (1999). Multiple wavelet threshold estimation by generalized cross validation for images with correlated noise. IEEE Transactions on Image Processing, 8(7), 947-953.
17. Jansen, M. and Malfait, M. and Bultheel, A. (1997). Generalized Cross Validation for wavelet thresholding. Signal Processing, 56, 33-44.
18. Johnstone, I.M. and Silverman., B.W. (2002). Empirical Bayes selection of wavelet thresholds. [Stanford University and University of Bristol].
19. Johnstone, I.M. and Silverman., B.W. (2002). Finding needles and hay in haystacks: Risk bounds for Empirical Bayes estimates of possibly sparse sequences. [Stanford University and University of Bristol].
20. Johnstone, I.M. and Silverman, B.W. (1997). Wavelet methods for data with correlated noise. J. Roy. Statist. Soc., Series B, 59, 319-351.
21. Nason, G. (1994). Wavelet regression by cross-validation. Technical report. Stanford University.
22. Nason, G.P. (1999). Fast cross-validatory choice of wavelet smoothness, primary resolution and threshold in wavelet shrinkage using the Kovac-Silverman algorithm. Technical report. University of Bristol.
23. Sweldens, W. and Piessens, R. (1994). Quadrature Formulae and Asymptotic Error Expansions for wavelet approximations of smooth functions. SIAM J. Numer. Anal., 31(4), 1240-1264.
24. Stein, C.M. (1981). Estimation of the mean of a multivariate normal distribution. Annals of Statistics, 9, 1135-1151.
25. Stone, C.J. (1982). Optimal global rates of convergence for nonparametric regression. Annals of Statistics, 10, 1040-1053.

## Content actions

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

#### Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

#### Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks