# Connexions

You are here: Home » Content » Generalizations of the Basic Multiresolution Wavelet System

### Recently Viewed

This feature requires Javascript to be enabled.

# Generalizations of the Basic Multiresolution Wavelet System

Module by: C. Sidney Burrus. E-mail the author

Note: You are viewing an old version of this document. The latest version is available here.

Up to this point in the book, we have developed the basic two-band wavelet system in some detail, trying to provide insight and intuition into this new mathematical tool. We will now develop a variety of interesting and valuable generalizations and extensions to the basic system, but in much less detail. We hope the detail of the earlier part of the book can be transferred to these generalizations and, together with the references, will provide an introduction to the topics.

## Tiling the Time–Frequency or Time–Scale Plane

A qualitative descriptive presentation of the decomposition of a signal using wavelet systems or wavelet transforms consists of partitioning the time–scale plane into tiles according to the indices kk and jj defined in (Reference). That is possible for orthogonal bases (or tight frames) because of Parseval's theorem. Indeed, it is Parseval's theorem that states that the signal energy can be partitioned on the time-scale plane. The shape and location of the tiles shows the logarithmic nature of the partitioning using basic wavelets and how the M-band systems or wavelet packets modify the basic picture. It also allows showing that the effects of time- or shift-varying wavelet systems, together with M-band and packets, can give an almost arbitrary partitioning of the plane.

The energy in a signal is given in terms of the DWT by Parseval's relation in (Reference) or (Reference). This shows the energy is a function of the translation index kk and the scale index jj.

| g ( t ) | 2 d t = l = - | c ( l ) | 2 + j = 0 k = - | d ( j , k ) | 2 | g ( t ) | 2 d t = l = - | c ( l ) | 2 + j = 0 k = - | d ( j , k ) | 2
(1)

The wavelet transform allows analysis of a signal or parameterization of a signal that can locate energy in both the time and scale (or frequency) domain within the constraints of the uncertainty principle. The spectrogram used in speech analysis is an example of using the short-time Fourier transform to describe speech simultaneously in the time and frequency domains.

This graphical or visual description of the partitioning of energy in a signal using tiling depends on the structure of the system, not the parameters of the system. In other words, the tiling partitioning will depend on whether one uses M=2M=2 or M=3M=3, whether one uses wavelet packets or time-varying wavelets, or whether one uses over-complete frame systems. It does not depend on the particular coefficients h(n)h(n) or hi(n)hi(n), on the number of coefficients NN, or the number of zero moments. One should remember that the tiling may look as if the indices jj and kk are continuous variables, but they are not. The energy is really a function of discrete variables in the DWT domain, and the boundaries of the tiles are symbolic of the partitioning. These tiling boundaries become more literal when the continuous wavelet transform (CWT) is used as described in "Discrete Multiresolution Analysis, the Discrete-Time Wavelet Transform, and the Continuous Wavelet Transform", but even there it does not mean that the partitioned energy is literally confined to the tiles.

### Nonstationary Signal Analysis

In many applications, one studies the decomposition of a signal in terms of basis functions. For example, stationary signals are decomposed into the Fourier basis using the Fourier transform. For nonstationary signals (i.e., signals whose frequency characteristics are time-varying like music, speech, images, etc.) the Fourier basis is ill-suited because of the poor time-localization. The classical solution to this problem is to use the short-time (or windowed) Fourier transform (STFT). However, the STFT has several problems, the most severe being the fixed time-frequency resolution of the basis functions. Wavelet techniques give a new class of (potentially signal dependent) bases that have desired time-frequency resolution properties. The “optimal” decomposition depends on the signal (or class of signals) studied. All classical time-frequency decompositions like the Discrete STFT (DSTFT), however, are signal independent. Each function in a basis can be considered schematically as a tile in the time-frequency plane, where most of its energy is concentrated. Orthonormality of the basis functions can be schematically captured by nonoverlapping tiles. With this assumption, the time-frequency tiles for the standard basis (i.e., delta basis) and the Fourier basis (i.e., sinusoidal basis) are shown in (Reference).

### Tiling with the Discrete-Time Short-Time Fourier Transform

The DSTFT basis functions are of the form

w j , k ( t ) = w ( t - k τ 0 ) e - ı j ω 0 t w j , k ( t ) = w ( t - k τ 0 ) e - ı j ω 0 t
(2)

where w(t)w(t) is a window function (Reference). If these functions form an orthogonal (orthonormal) basis, x(t)=j,kx,wj,kwj,k(t)x(t)=j,kx,wj,kwj,k(t). The DSTFT coefficients, x,wj,kx,wj,k, estimate the presence of signal components centered at (kτ0,jω0)(kτ0,jω0) in the time-frequency plane, i.e., the DSTFT gives a uniform tiling of the time-frequency plane with the basis functions wj,k(t)wj,k(t). If ΔtΔt and ΔωΔω are time and frequency resolutions respectively of w(t)w(t), then the uncertainty principle demands that ΔtΔω1/2ΔtΔω1/2(Reference), (Reference). Moreover, if the basis is orthonormal, the Balian-Low theorem implies either ΔtΔt or ΔωΔω is infinite. Both ΔtΔt and ΔωΔω can be controlled by the choice of w(t)w(t), but for any particular choice, there will be signals for which either the time or frequency resolution is not adequate. (Reference) shows the time-frequency tiles associated with the STFT basis for a narrow and wide window, illustrating the inherent time-frequency trade-offs associated with this basis. Notice that the tiling schematic holds for several choices of windows (i.e., each figure represents all DSTFT bases with the particular time-frequency resolution characteristic).

### Tiling with the Discrete Two-Band Wavelet Transform

The discrete wavelet transform (DWT) is another signal-independent tiling of the time-frequency plane suited for signals where high frequency signal components have shorter duration than low frequency signal components. Time-frequency atoms for the DWT, ψj,k(t)=2j/2ψ(2jt-k)ψj,k(t)=2j/2ψ(2jt-k), are obtained by translates and scales of the wavelet function ψ(t)ψ(t). One shrinks/stretches the wavelet to capture high-/low-frequency components of the signal. If these atoms form an orthonormal basis, then x(t)=j,kx,ψj,kψj,k(t)x(t)=j,kx,ψj,kψj,k(t). The DWT coefficients, x,ψj,kx,ψj,k, are a measure of the energy of the signal components located at (2-jk,2j)(2-jk,2j) in the time-frequency plane, giving yet another tiling of the time-frequency plane. As discussed in Chapters (Reference) and (Reference), the DWT (for compactly supported wavelets) can be efficiently computed using two-channel unitary FIR filter banks (Reference). (Reference) shows the corresponding tiling description which illustrates time-frequency resolution properties of a DWT basis. If you look along the frequency (or scale) axis at some particular time (translation), you can imagine seeing the frequency response of the filter bank as shown in (Reference) with the logarithmic bandwidth of each channel. Indeed, each horizontal strip in the tiling of (Reference) corresponds to each channel, which in turn corresponds to a scale jj. The location of the tiles corresponding to each coefficient is shown in (Reference). If at a particular scale, you imagine the translations along the kk axis, you see the construction of the components of a signal at that scale. This makes it obvious that at lower resolutions (smaller jj) the translations are large and at higher resolutions the translations are small.

The tiling of the time-frequency plane is a powerful graphical method for understanding the properties of the DWT and for analyzing signals. For example, if the signal being analyzed were a single wavelet itself, of the form

f ( t ) = ψ ( 4 t - 2 ) , f ( t ) = ψ ( 4 t - 2 ) ,
(3)

the DWT would have only one nonzero coefficient, d2(2)d2(2). To see that the DWT is not time (or shift) invariant, imagine shifting f(t)f(t) some noninteger amount and you see the DWT changes considerably. If the shift is some integer, the energy stays the same in each scale, but it “spreads out" along more values of kk and spreads differently in each scale. If the shift is not an integer, the energy spreads in both jj and kk. There is no such thing as a “scale limited" signal corresponding to a band-limited (Fourier) signal if arbitrary shifting is allowed. For integer shifts, there is a corresponding concept (Reference).

### General Tiling

Notice that for general, nonstationary signal analysis, one desires methods for controlling the tiling of the time-frequency plane, not just using the two special cases above (their importance notwithstanding). An alternative way to obtain orthonormal wavelets ψ(t)ψ(t) is using unitary FIR filter bank (FB) theory. That will be done with M-band DWTs, wavelet packets, and time-varying wavelet transforms addressed in Sections "Multiplicity-M (M-Band) Scaling Functions and Wavelets" and "Wavelet Packets" and Chapter (Reference) respectively.

Remember that the tiles represent the relative size of the translations and scale change. They do not literally mean the partitioned energy is confined to the tiles. Representations with similar tilings can have very different characteristics.

## Multiplicity-M (M-Band) Scaling Functions and Wavelets

While the use of a scale multiplier MM of two in (Reference) or Equation 4 fits many problems, coincides with the concept of an octave, gives a binary tree for the Mallat fast algorithm, and gives the constant-Q or logarithmic frequency bandwidths, the conditions given in Chapter (Reference) and (Reference) can be stated and proved in a more general setting where the basic scaling equation (Reference) uses a general scale factor or multiplicity of MM(Reference), (Reference), (Reference), (Reference), (Reference), (Reference), (Reference) rather than the specific doubling value of M=2M=2. Part of the motivation for a larger MM comes from a desire to have a more flexible tiling of the time-scale plane than that resulting from the M=2M=2 wavelet or the short-time Fourier transform discussed in "Tiling the Time–Frequency or Time–Scale Plane". It also comes from a desire for some regions of uniform band widths rather than the logarithmic spacing of the frequency responses illustrated in (Reference). The motivation for larger MM also comes from filter bank theory which is discussed in Chapter (Reference).

We pose the more general multiresolution formulation where (Reference) becomes

φ ( x ) = n h ( n ) M φ ( M x - n ) . φ ( x ) = n h ( n ) M φ ( M x - n ) .
(4)

In some cases, MM may be allowed to be a rational number; however, in most cases it must be an integer, and in (Reference) it is required to be 2. In the frequency domain, this relationship becomes

Φ ( ω ) = 1 M H ( ω / M ) Φ ( ω / M ) Φ ( ω ) = 1 M H ( ω / M ) Φ ( ω / M )
(5)

and the limit after iteration is

Φ ( ω ) = k = 1 1 M H ( ω M k ) Φ ( 0 ) Φ ( ω ) = k = 1 1 M H ( ω M k ) Φ ( 0 )
(6)

assuming the product converges and Φ(0)Φ(0) is well defined. This is a generalization of (Reference) and is derived in (Reference).

### Properties of M-Band Wavelet Systems

These theorems, relationships, and properties are generalizations of those given in Chapter (Reference) and (Reference) with some outline proofs or derivations given in the Appendix. For the multiplicity-MM problem, if the support of the scaling function and wavelets and their respective coefficients is finite and the system is orthogonal or a tight frame, the length of the scaling function vector or filter h(n)h(n) is a multiple of the multiplier MM. This is N=MGN=MG, where Resnikoff and Wells (Reference) call MM the rank of the system and GG the genus.

The results of (Reference), (Reference), (Reference), and (Reference) become

Theorem 28 If φ(t)φ(t) is an L1L1 solution to Equation 4 and φ(t)dt0φ(t)dt0, then

n h ( n ) = M . n h ( n ) = M .
(7)

This is a generalization of the basic multiplicity-2 result in (Reference) and does not depend on any particular normalization or orthogonality of φ(t)φ(t).

Theorem 29 If integer translates of the solution to Equation 4 are orthogonal, then

n h ( n + M m ) h ( n ) = δ ( m ) . n h ( n + M m ) h ( n ) = δ ( m ) .
(8)

This is a generalization of (Reference) and also does not depend on any normalization. An interesting corollary of this theorem is

Corollary 3 If integer translates of the solution to Equation 4 are orthogonal, then

n | h ( n ) | 2 = 1 . n | h ( n ) | 2 = 1 .
(9)

A second corollary to this theorem is

Corollary 4 If integer translates of the solution to Equation 4 are orthogonal, then

n h ( M n + m ) = 1 / M . m Z n h ( M n + m ) = 1 / M . m Z
(10)

This is also true under weaker conditions than orthogonality as was discussed for the M=2M=2 case.

Using the Fourier transform, the following relations can be derived:

Theorem 30 If φ(t)φ(t) is an L1L1 solution to Equation 4 and φ(t)dt0φ(t)dt0, then

H ( 0 ) = M H ( 0 ) = M
(11)

which is a frequency domain existence condition.

Theorem 31 The integer translates of the solution to Equation 4 are orthogonal if and only if

| Φ ( ω + 2 π ) | 2 = 1 | Φ ( ω + 2 π ) | 2 = 1
(12)

Theorem 32 If φ(t)φ(t) is an L1L1 solution to Equation 4 and φ(t)dt0φ(t)dt0, then

n h ( n + M m ) h ( n ) = δ ( m ) n h ( n + M m ) h ( n ) = δ ( m )
(13)

if and only if

| H ( ω ) | 2 + | H ( ω + 2 π / M ) | 2 + | H ( ω + 4 π / M ) | 2 + + | H ( ω + 2 π ( M - 1 ) / M ) | 2 = M . | H ( ω ) | 2 + | H ( ω + 2 π / M ) | 2 + | H ( ω + 4 π / M ) | 2 + + | H ( ω + 2 π ( M - 1 ) / M ) | 2 = M .
(14)

This is a frequency domain orthogonality condition on h(n)h(n).

Corollary 5

H ( 2 π / M ) = 0 , for = 1 , 2 , , M - 1 H ( 2 π / M ) = 0 , for = 1 , 2 , , M - 1
(15)

which is a generalization of (Reference) stating where the zeros of H(ω)H(ω), the frequency response of the scaling filter, are located. This is an interesting constraint on just where certain zeros of H(z)H(z) must be located.

Theorem 33 If nh(n)=Mnh(n)=M, and h(n)h(n) has finite support or decays fast enough, then a φ(t)L2φ(t)L2 that satisfies Equation 4 exists and is unique.

Theorem 34 If nh(n)=Mnh(n)=M and if nh(n)h(n-Mk)=δ(k)nh(n)h(n-Mk)=δ(k), then φ(t)φ(t) exists, is integrable, and generates a wavelet system that is a tight frame in L2L2.

These results are a significant generalization of the basic M=2M=2 wavelet system that we discussed in the earlier chapters. The definitions, properties, and generation of these more general scaling functions have the same form as for M=2M=2, but there is no longer a single wavelet associated with the scaling function. There are M-1M-1 wavelets. In addition to Equation 4 we now have M-1M-1 wavelet equations, which we denote as

ψ ( t ) = n M h ( n ) φ ( M t - n ) ψ ( t ) = n M h ( n ) φ ( M t - n )
(16)

for

= 1 , 2 , , M - 1 . = 1 , 2 , , M - 1 .
(17)

Some authors use a notation h0(n)h0(n) for h(n)h(n) and φ0(t)φ0(t) for ψ(t)ψ(t), so that h(n)h(n) represents the coefficients for the scaling function and all the wavelets and φ(t)φ(t) represents the scaling function and all the wavelets.

Just as for the M=2M=2 case, the multiplicity-M scaling function and scaling coefficients are unique and are simply the solution of the basic recursive or refinement equation Equation 4. However, the wavelets and wavelet coefficients are no longer unique or easy to design in general.

We now have the possibility of a more general and more flexible multiresolution expansion system with the M-band scaling function and wavelets. There are now M-1M-1 signal spaces spanned by the M-1M-1 wavelets at each scale jj. They are denoted

W , j = Span k { ψ ( M j t + k ) W , j = Span k { ψ ( M j t + k )
(18)

for =1,2,,M-1=1,2,,M-1. For example with M=4M=4,

V 1 = V 0 W 1 , 0 W 2 , 0 W 3 , 0 V 1 = V 0 W 1 , 0 W 2 , 0 W 3 , 0
(19)

and

V 2 = V 1 W 1 , 1 W 2 , 1 W 3 , 1 V 2 = V 1 W 1 , 1 W 2 , 1 W 3 , 1
(20)

or

V 2 = V 0 W 1 , 0 W 2 , 0 W 3 , 0 W 1 , 1 W 2 , 1 W 3 , 1 . V 2 = V 0 W 1 , 0 W 2 , 0 W 3 , 0 W 1 , 1 W 2 , 1 W 3 , 1 .
(21)

In the limit as jj, we have

L 2 = V 0 W 1 , 0 W 2 , 0 W 3 , 0 W 1 , 1 W 2 , 1 W 3 , 1 W 3 , . L 2 = V 0 W 1 , 0 W 2 , 0 W 3 , 0 W 1 , 1 W 2 , 1 W 3 , 1 W 3 , .
(22)

Our notation for M=2M=2 in Chapter (Reference) is W1,j=WjW1,j=Wj

This is illustrated pictorially in (Reference) where we see the nested scaling function spaces VjVj but each annular ring is now divided into M-1M-1 subspaces, each spanned by the M-1M-1 wavelets at that scale. Compare (Reference) with (Reference) for the classical M=2M=2 case.

The expansion of a signal or function in terms of the M-band wavelets now involves a triple sum over ,j,j, and kk.

f ( t ) = k c ( k ) φ k ( t ) + k = - j = 0 = 1 M - 1 M j / 2 d , j ( k ) ψ ( M j t - k ) f ( t ) = k c ( k ) φ k ( t ) + k = - j = 0 = 1 M - 1 M j / 2 d , j ( k ) ψ ( M j t - k )
(23)

where the expansion coefficients (DWT) are found by

c ( k ) = f ( t ) φ ( t - k ) d t c ( k ) = f ( t ) φ ( t - k ) d t
(24)

and

d , j ( k ) = f ( t ) M j / 2 ψ ( M j t - k ) d t . d , j ( k ) = f ( t ) M j / 2 ψ ( M j t - k ) d t .
(25)

We now have an M-band discrete wavelet transform.

Theorem 35 If the scaling function φ(t)φ(t) satisfies the conditions for existence and orthogonality and the wavelets are defined by Equation 16 and if the integer translates of these wavelets span W,0W,0 the orthogonal compliments of V0V0, all being in V1V1, i.e., the wavelets are orthogonal to the scaling function at the same scale; that is, if

φ ( t - n ) ψ ( t - m ) d t = 0 φ ( t - n ) ψ ( t - m ) d t = 0
(26)

for =1,2,,M-1=1,2,,M-1, then

n h ( n ) h ( n - M k ) = 0 n h ( n ) h ( n - M k ) = 0
(27)

for all integers kk and for =1,2,,M-1=1,2,,M-1.

Combining Equation 8 and Equation 27 and calling h0(n)=h(n)h0(n)=h(n) gives

n h m ( n ) h ( n - M k ) = δ ( k ) δ ( m - ) n h m ( n ) h ( n - M k ) = δ ( k ) δ ( m - )
(28)

as necessary conditions on h(n)h(n) for an orthogonal system.

Unlike the M=2M=2 case, for M>2M>2 there is no formula for h(n)h(n) and there are many possible wavelets for a given scaling function.

Mallat's algorithm takes on a more complex form as shown in (Reference). The advantage is a more flexible system that allows a mixture of linear and logarithmic tiling of the time–scale plane. A powerful tool that removes the ambiguity is choosing the wavelets by “modulated cosine" design.

Figure 7 shows the frequency response of the filter band, much as (Reference) did for M=2M=2. Examples of scaling functions and wavelets are illustrated in (Reference), and the tiling of the time-scale plane is shown in Figure 8. Figure 8 shows the time-frequency resolution characteristics of a four-band DWT basis. Notice how it is different from the Standard, Fourier, DSTFT and two-band DWT bases shown in earlier chapters. It gives a mixture of a logarithmic and linear frequency resolution.

We next define the kthkth moments of ψ(t)ψ(t) as

m ( k ) = t k ψ ( t ) d t m ( k ) = t k ψ ( t ) d t
(29)

and the kthkth discrete moments of h(n)h(n) as

μ ( k ) = n n k h ( n ) . μ ( k ) = n n k h ( n ) .
(30)

Theorem 36 (Equivalent Characterizations of K-Regular M-Band Filters) A unitary scaling filter is K-regular if and only if the following equivalent statements are true:

1. All moments of the wavelet filters are zero, μ(k)=0μ(k)=0, for k=0,1,,(K-1)k=0,1,,(K-1) and for =1,2,,(M-1)=1,2,,(M-1)
2. All moments of the wavelets are zero, m(k)=0m(k)=0, for k=0,1,,(K-1)k=0,1,,(K-1) and for =1,2,,(M-1)=1,2,,(M-1)
3. The partial moments of the scaling filter are equal for k=0,1,,(K-1)k=0,1,,(K-1)
4. The frequency response of the scaling filter has zeros of order KK at the MthMth roots of unity, ω=2π/Mω=2π/M for =1,2,,M-1=1,2,,M-1.
5. The magnitude-squared frequency response of the scaling filter is flat to order 2K at ω=0ω=0. This follows from (Reference).
6. All polynomial sequences up to degree (K-1)(K-1) can be expressed as a linear combination of integer-shifted scaling filters.
7. All polynomials of degree up to (K-1)(K-1) can be expressed as a linear combination of integer-shifted scaling functions for all jj.

This powerful result (Reference), (Reference) is similar to the M=2M=2 case presented in Chapter (Reference). It not only ties the number of zero moments to the regularity but also to the degree of polynomials that can be exactly represented by a sum of weighted and shifted scaling functions. Note the location of the zeros of H(z)H(z) are equally spaced around the unit circle, resulting in a narrower frequency response than for the half-band filters if M=2M=2. This is consistent with the requirements given in Equation 14 and illustrated in Section 3.

Sketches of some of the derivations in this section are given in the Appendix or are simple extensions of the M=2M=2 case. More details are given in (Reference), (Reference), (Reference).

### M-Band Scaling Function Design

Calculating values of φ(n)φ(n) can be done by the same methods given in (Reference). However, the design of the scaling coefficients h(n)h(n) parallels that for the two-band case but is somewhat more difficult (Reference).

One special set of cases turns out to be a simple extension of the two-band system. If the multiplier M=2mM=2m, then the scaling function is simply a scaled version of the M=2M=2 case and a particular set of corresponding wavelets are those obtained by iterating the wavelet branches of the Mallat algorithm tree as is done for wavelet packets described in "Wavelet Packets". For other values of MM, especially odd values, the situation is more complex.

### M-Band Wavelet Design and Cosine Modulated Methods

For M>2M>2 the wavelet coefficients h(n)h(n) are not uniquely determined by the scaling coefficients, as was the case for M=2M=2. This is both a blessing and a curse. It gives us more flexibility in designing specific systems, but it complicates the design considerably. For small NN and MM, the designs can be done directly, but for longer lengths and/or for large MM, direct design becomes impossible and something like the cosine modulated design of the wavelets from the scaling function as described in Chapter (Reference), is probably the best approach (Reference), (Reference), (Reference)(Reference), (Reference), (Reference), (Reference), (Reference), (Reference), (Reference), (Reference), (Reference), (Reference), (Reference).

## Wavelet Packets

The classical M=2M=2 wavelet system results in a logarithmic frequency resolution. The low frequencies have narrow bandwidths and the high frequencies have wide bandwidths, as illustrated in (Reference). This is called “constant-Q" filtering and is appropriate for some applications but not all. The wavelet packet system was proposed by Ronald Coifman (Reference), (Reference) to allow a finer and adjustable resolution of frequencies at high frequencies. It also gives a rich structure that allows adaptation to particular signals or signal classes. The cost of this richer structure is a computational complexity of O(Nlog(N))O(Nlog(N)), similar to the FFT, in contrast to the classical wavelet transform which is O(N)O(N).

### Full Wavelet Packet Decomposition

In order to generate a basis system that would allow a higher resolution decomposition at high frequencies, we will iterate (split and down-sample) the highpass wavelet branch of the Mallat algorithm tree as well as the lowpass scaling function branch. Recall that for the discrete wavelet transform we repeatedly split, filter, and decimate the lowpass bands. The resulting three-scale analysis tree (three-stage filter bank) is shown in (Reference). This type of tree results in a logarithmic splitting of the bandwidths and tiling of the time-scale plane, as shown in Figure 3.

If we split both the lowpass and highpass bands at all stages, the resulting filter bank structure is like a full binary tree as in Figure 9. It is this full tree that takes O(NlogN)O(NlogN) calculations and results in a completely evenly spaced frequency resolution. In fact, its structure is somewhat similar to the FFT algorithm. Notice the meaning of the subscripts on the signal spaces. The first integer subscript is the scale jj of that space as illustrated in Figure 10. Each following subscript is a zero or one, depending the path taken through the filter bank illustrated in (Reference). A “zero" indicates going through a lowpass filter (scaling function decomposition) and a “one" indicates going through a highpass filter (wavelet decomposition). This is different from the convention for the M>2M>2 case in "Multiplicity-M (M-Band) Scaling Functions and Wavelets".

Figure 10 pictorially shows the signal vector space decomposition for the scaling functions and wavelets. Figure 11 shows the frequency response of the packet filter bank much as (Reference) did for M=2M=2 and (Reference) for M=3M=3 wavelet systems.

Figure 12 shows the Haar wavelet packets with which we finish the example started in (Reference). This is an informative illustration that shows just what “packetizing" does to the regular wavelet system. It should be compared to the example at the end of Chapter (Reference). This is similar to the Walsh-Haddamar decomposition, and (Reference) shows the full wavelet packet system generated from the Daubechies φD8'φD8' scaling function. The “prime" indicates this is the Daubechies system with the spectral factorization chosen such that zeros are inside the unit circle and some outside. This gives the maximum symmetry possible with a Daubechies system. Notice the three wavelets have increasing “frequency." They are somewhat like windowed sinusoids, hence the name, wavelet packet. Compare the wavelets with the M=2M=2 and M=4M=4 Daubechies wavelets.

Normally we consider the outputs of each channel or band as the wavelet transform and from this have a nonredundant basis system. If, however, we consider the signals at the output of each band and at each stage or scale simultaneously, we have more outputs than inputs and clearly have a redundant system. From all of these outputs, we can choose an independent subset as a basis. This can be done in an adaptive way, depending on the signal characteristics according to some optimization criterion. One possibility is the regular wavelet decomposition shown in

(Reference). Another is the full packet decomposition shown in Figure 9. Any pruning of this full tree would generate a valid packet basis system and would allow a very flexible tiling of the time-scale plane.

We can choose a set of basic vectors and form an orthonormal basis, such that some cost measure on the transformed coefficients is minimized. Moreover, when the cost is additive, the

best orthonormal wavelet packet transform can be found using a binary searching algorithm (Reference) in O(NlogN)O(NlogN) time.

Some examples of the resulting time-frequency tilings are shown in Figure 13. These plots demonstrate the frequency adaptation power of the wavelet packet transform.

There are two approaches to using adaptive wavelet packets. One is to choose a particular decomposition (filter bank pruning) based on the characteristics of the class of signals to be processed, then to use the transform nonadaptively on the individual signals. The other is to adapt the decomposition for each individual signal. The first is a linear process over the class of signals. The second is not and will not obey superposition.

Let P(J)P(J) denote the number of different JJ-scale orthonormal wavelet packet transforms. We can easily see that

P ( J ) = P ( J - 1 ) 2 + 1 , P ( 1 ) = 1 . P ( J ) = P ( J - 1 ) 2 + 1 , P ( 1 ) = 1 .
(31)

So the number of possible choices grows dramatically as the scale increases. This is another reason for the wavelet packets to be a very powerful tool in practice. For example, the FBI standard for fingerprint image compression (Reference), (Reference) is based on wavelet packet transforms. The wavelet packets are successfully used for acoustic signal compression (Reference). In (Reference), a rate-distortion measure is used with the wavelet packet transform to improve image compression performance.

MM-band DWTs give a flexible tiling of the time-frequency plane. They are associated with a particular tree-structured filter bank, where the lowpass channel at any depth is split into MM bands. Combining the MM-band and wavelet packet structure gives a rather arbitrary tree-structured filter bank, where all channels are split into sub-channels (using filter banks with a potentially different number of bands), and would give a very flexible signal decomposition. The wavelet analog of this is known as the wavelet packet decomposition (Reference). For a given signal or class of signals, one can, for a fixed set of filters, obtain the best (in some sense) filter bank tree-topology. For a binary tree an efficient scheme using entropy as the criterion has been developed—the best wavelet packet basis algorithm (Reference), (Reference).

## Biorthogonal Wavelet Systems

Requiring the wavelet expansion system to be orthogonal across both translations and scale gives a clean, robust, and symmetric formulation with a Parseval's theorem. It also places strong limitations on the possibilities of the system. Requiring orthogonality uses up a large number of the degrees of freedom, results in complicated design equations, prevents linear phase analysis and synthesis filter banks, and prevents asymmetric analysis and synthesis systems. This section will develop the biorthogonal wavelet system using a nonorthogonal basis and dual basis to allow greater flexibility in achieving other goals at the expense of the energy partitioning property that Parseval's theorem states (Reference), (Reference), (Reference), (Reference), (Reference), (Reference), (Reference), (Reference), (Reference), (Reference). Some researchers have considered “almost orthogonal" systems where there is some relaxation of the orthogonal constraints in order to improve other characteristics (Reference). Indeed, many image compression schemes (including the fingerprint compression used by the FBI (Reference), (Reference)) use biorthogonal systems.

### Two Channel Biorthogonal Filter Banks

In previous chapters for orthogonal wavelets, the analysis filters and synthesis filters are time reversal of each other; i.e., h˜(n)=h(-n)h˜(n)=h(-n), g˜(n)=g(-n)g˜(n)=g(-n). Here, for the biorthogonal case, we relax these restrictions. However, in order to perfectly reconstruct the input, these four filters still have to satisfy a set of relations.

Let c1(n),nZc1(n),nZ be the input to the filter banks in Figure 14, then the outputs of the analysis filter banks are

c 0 ( k ) = n h ˜ ( 2 k - n ) c 1 ( n ) , d 0 ( k ) = n g ˜ ( 2 k - n ) c 1 ( n ) . c 0 ( k ) = n h ˜ ( 2 k - n ) c 1 ( n ) , d 0 ( k ) = n g ˜ ( 2 k - n ) c 1 ( n ) .
(32)

The output of the synthesis filter bank is

c ˜ 1 ( m ) = k h ( 2 k - m ) c 0 ( k ) + g ( 2 k - m ) d 0 ( k ) . c ˜ 1 ( m ) = k h ( 2 k - m ) c 0 ( k ) + g ( 2 k - m ) d 0 ( k ) .
(33)

Substituting Equation Equation 32 into Equation 33 and interchanging the summations gives

c ˜ 1 ( m ) = n k h ( 2 k - m ) h ˜ ( 2 k - n ) + g ( 2 k - m ) g ˜ ( 2 k - n ) c 1 ( n ) . c ˜ 1 ( m ) = n k h ( 2 k - m ) h ˜ ( 2 k - n ) + g ( 2 k - m ) g ˜ ( 2 k - n ) c 1 ( n ) .
(34)

For perfect reconstruction, i.e., c˜1(m)=c1(m),mZc˜1(m)=c1(m),mZ, we need

k h ( 2 k - m ) h ˜ ( 2 k - n ) + g ( 2 k - m ) g ˜ ( 2 k - n ) = δ ( m - n ) . k h ( 2 k - m ) h ˜ ( 2 k - n ) + g ( 2 k - m ) g ˜ ( 2 k - n ) = δ ( m - n ) .
(35)

Fortunately, this condition can be greatly simplified. In order for it to hold, the four filters have to be related as (Reference)

g ˜ ( n ) = ( - 1 ) n h ( 1 - n ) , g ( n ) = ( - 1 ) n h ˜ ( 1 - n ) , g ˜ ( n ) = ( - 1 ) n h ( 1 - n ) , g ( n ) = ( - 1 ) n h ˜ ( 1 - n ) ,
(36)

up to some constant factors. Thus they are cross-related by time reversal and flipping signs of every other element. Clearly, when h˜=hh˜=h, we get the familiar relations between the scaling coefficients and the wavelet coefficients for orthogonal wavelets, g(n)=(-1)nh(1-n)g(n)=(-1)nh(1-n). Substituting Equation 36 back to Equation 35, we get

n h ˜ ( n ) h ( n + 2 k ) = δ ( k ) . n h ˜ ( n ) h ( n + 2 k ) = δ ( k ) .
(37)

In the orthogonal case, we have nh(n)h(n+2k)=δ(k)nh(n)h(n+2k)=δ(k); i.e., h(n)h(n) is orthogonal to even translations of itself. Here h˜h˜ is orthogonal to hh, thus the name biorthogonal.

Equation Equation 37 is the key to the understanding of the biorthogonal filter banks. Let's assume h˜(n)h˜(n) is nonzero when N˜1nN˜2N˜1nN˜2, and h(n)h(n) is nonzero when N1nN2N1nN2. Equation Equation 37 implies that (Reference)

N 2 - N ˜ 1 = 2 k + 1 , N ˜ 2 - N 1 = 2 k ˜ + 1 , k , k ˜ Z . N 2 - N ˜ 1 = 2 k + 1 , N ˜ 2 - N 1 = 2 k ˜ + 1 , k , k ˜ Z .
(38)

In the orthogonal case, this reduces to the well-known fact that the length of hh has to be even. Equations Equation 38 also imply that the difference between the lengths of h˜h˜ and hh must be even. Thus their lengths must be both even or both odd.

### Biorthogonal Wavelets

We now look at the scaling function and wavelet to see how removing orthogonality and introducing a dual basis changes their characteristics. We start again with the basic multiresolution definition of the scaling function and add to that a similar definition of a dual scaling function.

Φ ( t ) = n h ( n ) 2 Φ ( 2 t - n ) , Φ ( t ) = n h ( n ) 2 Φ ( 2 t - n ) ,
(39)
Φ ˜ ( t ) = n h ˜ ( n ) 2 Φ ˜ ( 2 t - n ) . Φ ˜ ( t ) = n h ˜ ( n ) 2 Φ ˜ ( 2 t - n ) .
(40)

From Theorem (Reference) in Chapter (Reference), we know that for φφ and φ˜φ˜ to exist,

n h ( n ) = n h ˜ ( n ) = 2 . n h ( n ) = n h ˜ ( n ) = 2 .
(41)

Continuing to parallel the construction of the orthogonal wavelets, we also define the wavelet and the dual wavelet as

ψ ( t ) = n g ( n ) 2 Φ ( 2 t - n ) = n ( - 1 ) n h ˜ ( 1 - n ) 2 Φ ( 2 t - n ) , ψ ( t ) = n g ( n ) 2 Φ ( 2 t - n ) = n ( - 1 ) n h ˜ ( 1 - n ) 2 Φ ( 2 t - n ) ,
(42)
ψ ˜ ( t ) = n g ˜ ( n ) 2 Φ ˜ ( 2 t - n ) = n ( - 1 ) n h ( 1 - n ) 2 Φ ˜ ( 2 t - n ) . ψ ˜ ( t ) = n g ˜ ( n ) 2 Φ ˜ ( 2 t - n ) = n ( - 1 ) n h ( 1 - n ) 2 Φ ˜ ( 2 t - n ) .
(43)

Now that we have the scaling and wavelet functions and their duals, the question becomes whether we can expand and reconstruct arbitrary functions using them. The following theorem (Reference) answers this important question.

Theorem 37 For h˜h˜ and hh satisfying Equation 37, suppose that for some C, ϵ>0ϵ>0,

| Φ ( ω ) | C ( 1 + ω ) - 1 / 2 - ϵ , | Φ ˜ ( ω ) | C ( 1 + ω ) - 1 / 2 - ϵ . | Φ ( ω ) | C ( 1 + ω ) - 1 / 2 - ϵ , | Φ ˜ ( ω ) | C ( 1 + ω ) - 1 / 2 - ϵ .
(44)

If ΦΦ and Φ˜Φ˜ defined above have sufficient decay in the frequency domain, then ψj,k= def 2j/2ψ(2jx-k),j,kZψj,k= def 2j/2ψ(2jx-k),j,kZ constitute a frame in L2(R)L2(R). Their dual frame is given by ψ˜j,k= def 2j/2ψ˜(2jx-k),j,kZψ˜j,k= def 2j/2ψ˜(2jx-k),j,kZ; for any fL2(R)fL2(R),

f = j , k f , ψ j , k ψ ˜ j , k = j , k f , ψ ˜ j , k ψ j , k f = j , k f , ψ j , k ψ ˜ j , k = j , k f , ψ ˜ j , k ψ j , k
(45)

where the series converge strongly.

Moreover, the ψj,kψj,k and ψ˜j,kψ˜j,k constitute two Riesz bases, with

ψ j , k , ψ ˜ j ' , k ' = δ ( j - j ' ) δ ( k - k ' ) ψ j , k , ψ ˜ j ' , k ' = δ ( j - j ' ) δ ( k - k ' )
(46)

if and only if

Φ ( x ) Φ ˜ ( x - k ) d x = δ ( k ) . Φ ( x ) Φ ˜ ( x - k ) d x = δ ( k ) .
(47)

This theorem tells us that under some technical conditions, we can expand functions using the wavelets and reconstruct using their duals. The multiresolution formulations in Chapter 2 can be revised as

V - 2 V - 1 V 0 V 1 V 2 V - 2 V - 1 V 0 V 1 V 2
(48)
V ˜ - 2 V ˜ - 1 V ˜ 0 V ˜ 1 V ˜ 2 V ˜ - 2 V ˜ - 1 V ˜ 0 V ˜ 1 V ˜ 2
(49)

where

V j = Span k { Φ j . k } , V ˜ j = Span k { Φ ˜ j . k } . V j = Span k { Φ j . k } , V ˜ j = Span k { Φ ˜ j . k } .
(50)

If Equation 47 holds, we have

V j W ˜ j , V ˜ j W j , V j W ˜ j , V ˜ j W j ,
(51)

where

W j = Span k { ψ j . k } , W ˜ j = Span k { ψ ˜ j . k } . W j = Span k { ψ j . k } , W ˜ j = Span k { ψ ˜ j . k } .
(52)

Although WjWj is not the orthogonal complement to VjVj in Vj+1Vj+1 as before, the dual space W˜jW˜j plays the much needed role. Thus we have four sets of spaces that form two hierarchies to span L2(R)L2(R).

In (Reference), we have a list of properties of the scaling function and wavelet that do not require orthogonality. The results for regularity and moments in Chapter (Reference) can also be generalized to the biorthogonal systems.

### Comparisons of Orthogonal and Biorthogonal Wavelets

The biorthogonal wavelet systems generalize the classical orthogonal wavelet systems. They are more flexible and generally easy to design. The differences between the orthogonal and biorthogonal wavelet systems can be summarized as follows.

• The orthogonal wavelets filter and scaling filter must be of the same length, and the length must be even. This restriction has been greatly relaxed for biorthogonal systems.
• Symmetric wavelets and scaling functions are possible in the framework of biorthogonal wavelets. Actually, this is one of the main reasons to choose biorthogonal wavelets over the orthogonal ones.
• Parseval's theorem no longer holds in biorthogonal wavelet systems; i.e., the norm of the coefficients is not the same as the norm of the functions being spanned. This is one of the main disadvantages of using the biorthogonal systems. Many design efforts have been devoted to making the systems near orthogonal, so that the norms are close.
• In a biorthogonal system, if we switch the roles of the primary and the dual, the overall system is still sound. Thus we can choose the best arrangement for our application. For example, in image compression, we would like to use the smoother one of the pair to reconstruct the coded image to get better visual appearance.
• In statistical signal processing, white Gaussian noise remains white after orthogonal transforms. If the transforms are nonorthogonal, the noise becomes correlated or colored. Thus, when biorthogonal wavelets are used in estimation and detection, we might need to adjust the algorithm to better address the colored noise.

### Example Families of Biorthogonal Systems

Because biorthogonal wavelet systems are very flexible, there are a wide variety of approaches to design different biorthogonal systems. The key is to design a pair of filters hh and h˜h˜ that satisfy Equation 37 and Equation 41 and have other desirable characteristics. Here we review several families of biorthogonal wavelets and discuss their properties and design methods.

### Cohen-Daubechies-Feauveau Family of Biorthogonal Spline Wavelets

Splines have been widely used in approximation theory and numerical algorithms. Therefore, they may be desirable scaling functions, since they are symmetric, smooth, and have dyadic filter coefficients (see (Reference)). However, if we use them as scaling functions in orthogonal wavelet systems, the wavelets have to have infinite support (Reference). On the other hand, it is very easy to use splines in biorthogonal wavelet systems. Choose hh to be a filter that can generate splines, then Equation 37 and Equation 41 are linear in the coefficients of h˜h˜ . Thus we only have to solve a set of linear equations to get h˜h˜ , and the resulting h˜h˜ also have dyadic coefficients. In (Reference), better methods are used to solve these equations indirectly.

The filter coefficients for some members of the Cohen-Daubechies-Feauveau family of biorthogonal spline wavelets are listed in Table 1. Note that they are symmetric. It has been shown that as the length increases, the regularity of φφ and φ˜φ˜ of this family also increases (Reference).

 h/2h/2 h ˜ / 2 h ˜ / 2 1 / 2 , 1 / 2 1 / 2 , 1 / 2 - 1 / 16 , 1 / 16 , 1 / 2 , 1 / 16 , - 1 / 16 - 1 / 16 , 1 / 16 , 1 / 2 , 1 / 16 , - 1 / 16 1 / 4 , 1 / 2 , 1 / 4 1 / 4 , 1 / 2 , 1 / 4 - 1 / 8 , 1 / 4 , 3 / 4 , 1 / 4 , - 1 / 8 - 1 / 8 , 1 / 4 , 3 / 4 , 1 / 4 , - 1 / 8 1 / 8 , 3 / 8 , 3 / 8 , 1 / 8 1 / 8 , 3 / 8 , 3 / 8 , 1 / 8 - 5 / 512 , 15 / 512 , 19 / 512 , - 97 / 512 , - 13 / 256 , 175 / 256 , ⋯ - 5 / 512 , 15 / 512 , 19 / 512 , - 97 / 512 , - 13 / 256 , 175 / 256 , ⋯

### Cohen-Daubechies-Feauveau Family of Biorthogonal Wavelets with Less Dissimilar Filter Length

The Cohen-Daubechies-Feauveau family of biorthogonal wavelets are perhaps the most widely used biorthogonal wavelets, since the scaling function and wavelet are symmetric and have similar lengths. A member of the family is used in the FBI fingerprint compression standard (Reference), (Reference). The design method for this family is remarkably simple and elegant.

In the frequency domain, Equation 37 can be written as

H ( ω ) H ˜ * ( ω ) + H ( ω + π ) H ˜ * ( ω + π ) = 2 . H ( ω ) H ˜ * ( ω ) + H ( ω + π ) H ˜ * ( ω + π ) = 2 .
(53)

Recall from Chapter (Reference) that we have an explicit solution for |H(ω)|2=M(ω)|H(ω)|2=M(ω) such that

M ( ω ) + M ( ω + π ) = 2 , M ( ω ) + M ( ω + π ) = 2 ,
(54)

and the resulting compactly supported orthogonal wavelet has the maximum number of zero moments possible for its length. In the orthogonal case, we get a scaling filter by factoring M(ω)M(ω) as H(ω)H*(ω)H(ω)H*(ω). Here in the biorthogonal case, we can factor the same M(ω)M(ω) to get H(ω)H(ω) and H˜(ω)H˜(ω).

Factorizations that lead to symmetric hh and h˜h˜ with similar lengths have been found in (Reference), and their coefficients are listed in Table 2. Plots of the scaling and wavelet functions, which are members of the family used in the FBI fingerprint compression standard, are in (Reference).

 h˜h˜ h h 0.85269867900889 0.78848561640637 0.37740285561283 0.41809227322204 -0.11062440441844 -0.04068941760920 -0.02384946501956 -0.06453888262876 0.03782845550726

### Tian-Wells Family of Biorthogonal Coiflets

The coiflet system is a family of compactly supported orthogonal wavelets with zero moments of both the scaling functions and wavelets described in (Reference). Compared with Daubechies' wavelets with only zero wavelet moments, the coiflets are more symmetrical and may have better approximation properties when sampled data are used. However, finding the orthogonal coiflets involves solving a set of nonlinear equations. No closed form solutions have been found, and when the length increases, numerically solving these equations becomes less stable.

Tian and Wells (Reference), (Reference) have constructed biorthogonal wavelet systems with both zero scaling function and wavelet moments. Closed form solutions for these biorthogonal coiflets have been found. They have approximation properties similar to the coiflets, and the filter coefficients are dyadic rationals as are the splines. The filter coefficients for these biorthogonal Coiflets are listed in Table 3. Some members of this family are also in the spline family described earlier.

### Lifting Construction of Biorthogonal Systems

We have introduced several families of biorthogonal systems and their design methods. There is another method called a lifting scheme, which is very simple and general. It has a long history

(Reference), (Reference), (Reference), (Reference), (Reference), (Reference), and has been systematically developed recently (Reference), (Reference). The key idea is to build complicated biorthogonal systems using simple and invertible stages. The first stage does nothing but to separate even and odd samples, and it is easily invertible. The structure is shown in Figure 15, and is called the lazy wavelet transform in (Reference).

 2h2h 2 h ˜ 2 h ˜ 1 , 1 1 , 1 1 , 1 1 , 1 1 / 2 , 1 , 1 / 2 1 / 2 , 1 , 1 / 2 - 1 / 4 , 1 / 2 , 3 / 2 , 1 / 2 , - 1 / 4 - 1 / 4 , 1 / 2 , 3 / 2 , 1 / 2 , - 1 / 4 3 / 8 , 1 , 3 / 4 , 0 , - 1 / 8 3 / 8 , 1 , 3 / 4 , 0 , - 1 / 8 3 / 64 , 0 , - 3 / 16 , 3 / 8 , 41 / 32 , 3 / 4 , - 3 / 16 , - 1 / 8 , 3 / 64 3 / 64 , 0 , - 3 / 16 , 3 / 8 , 41 / 32 , 3 / 4 , - 3 / 16 , - 1 / 8 , 3 / 64 - 1 / 16 , 0 , 9 / 16 , 1 , 9 / 16 , 0 , - 1 / 16 - 1 / 16 , 0 , 9 / 16 , 1 , 9 / 16 , 0 , - 1 / 16 - 1 / 256 , 0 , 9 / 128 , - 1 / 16 , - 63 / 256 , 9 / 16 , 87 / 64 , ⋯ - 1 / 256 , 0 , 9 / 128 , - 1 / 16 , - 63 / 256 , 9 / 16 , 87 / 64 , ⋯

After splitting the data into two parts, we can predict one part from the other, and keep only the prediction error, as in Figure 16. We can reconstruct the data by recomputing the prediction and then add back the prediction. In Figure 16, ss and tt are prediction filters.

By concatenating simple stages, we can implement the forward and inverse wavelet transforms as in Figure 17. It is also called the ladder structure, and the reason for the name is clear from the figure. Clearly, the system is invertible, and thus biorthogonal. Moreover, it has been shown the orthogonal wavelet systems can also be implemented using lifting (Reference). The advantages of lifting are numerous:

• Lifting steps can be calculated inplace. As seen in Figure 17, the prediction outputs based on one channel of the data can be added to or subtracted from the data in other channels, and the results can be saved in the same place in the second channel. No auxiliary memory is needed.
• The predictors ss and tt do not have to be linear. Nonlinear operations like the medium filter or rounding can be used, and the system remains invertible. This allows a very simple generalization to nonlinear wavelet transform or nonlinear multiresolution analysis.
• The design of biorthogonal systems boils down to the design of the predictors. This may lead to simple approaches that do not relay on the Fourier transform (Reference), and can be generalized to irregular samples or manifolds.
• For biorthogonal systems, the lifting implementations require less numerical operations than direct implementations (Reference). For orthogonal cases, the lifting schemes have the computational complexity similar to the lattice factorizations, which is almost half of the direct implementation.

## Multiwavelets

In Chapter (Reference), we introduced the multiresolution analysis for the space of L2L2 functions, where we have a set of nesting subspaces

{ 0 } V - 2 V - 1 V 0 V 1 V 2 L 2 , { 0 } V - 2 V - 1 V 0 V 1 V 2 L 2 ,
(55)

where each subspace is spanned by translations of scaled versions of a single scaling function φφ; e.g.,

V j = Span k { 2 j / 2 φ ( 2 j t - k ) } . V j = Span k { 2 j / 2 φ ( 2 j t - k ) } .
(56)

The direct difference between nesting subspaces are spanned by translations of a single wavelet at the corresponding scale; e.g.,

W j = V j + 1 V j = Span k { 2 j / 2 ψ ( 2 j t - k ) } . W j = V j + 1 V j = Span k { 2 j / 2 ψ ( 2 j t - k ) } .
(57)

There are several limitations of this construction. For example, nontrivial orthogonal wavelets can not be symmetric. To avoid this problem, we generalized the basic construction, and introduced multiplicity-MM (MM-band) scaling functions and wavelets in "Multiplicity-M (M-Band) Scaling Functions and Wavelets", where the difference spaces are spanned by translations of M-1M-1 wavelets. The scaling is in terms of the power of MM; i.e.,

φ j , k ( t ) = M j / 2 φ ( M j t - k ) . φ j , k ( t ) = M j / 2 φ ( M j t - k ) .
(58)

In general, there are more degrees of freedom to design the M-band wavelets. However, the nested VV spaces are still spanned by translations of a single scaling function. It is the multiwavelets that removes the above restriction, thus allowing multiple scaling functions to span the nested VV spaces (Reference), (Reference), (Reference). Although it is possible to construct MM-band multiwavelets, here we only present results on the two-band case, as most of the researches in the literature do.

### Construction of Two-Band Multiwavelets

Assume that V0V0 is spanned by translations of RR different scaling functions φi(t)φi(t), i=1,...,Ri=1,...,R. For a two-band system, we define the scaling and translation of these functions by

φ i , j , k ( t ) = 2 j / 2 φ i ( 2 j t - k ) . φ i , j , k ( t ) = 2 j / 2 φ i ( 2 j t - k ) .
(59)

The multiresolution formulation implies

V j = Span k { φ i , j , k ( t ) : i = 1 , ... , R } . V j = Span k { φ i , j , k ( t ) : i = 1 , ... , R } .
(60)

We next construct a vector scaling function by

Φ ( t ) = φ 1 ( t ) , ... , φ R ( t ) T . Φ ( t ) = φ 1 ( t ) , ... , φ R ( t ) T .
(61)

Since V0V1V0V1, we have

Φ ( t ) = 2 n H ( n ) Φ ( 2 t - n ) Φ ( t ) = 2 n H ( n ) Φ ( 2 t - n )
(62)

where H(k)H(k) is a R×RR×R matrix for each kZkZ. This is a matrix version of the scalar recursive equation (Reference). The first and simplest multiscaling functions probably appear in (Reference), and they are shown in Figure 18.

The first scaling function φ1(t)φ1(t) is nothing but the Haar scaling function, and it is the sum of two time-compressed and shifted versions of itself, as shown in (Reference)(a). The second scaling function can be easily decomposed into linear combinations of time-compressed and shifted versions of the Haar scaling function and itself, as

φ 2 ( t ) = 3 2 φ 1 ( 2 t ) + 1 2 φ 2 ( 2 t ) - 3 2 φ 1 ( 2 t - 1 ) + 1 2 φ 2 ( 2 t - 1 ) . φ 2 ( t ) = 3 2 φ 1 ( 2 t ) + 1 2 φ 2 ( 2 t ) - 3 2 φ 1 ( 2 t - 1 ) + 1 2 φ 2 ( 2 t - 1 ) .
(63)

This is shown in Figure 19

Putting the two scaling functions together, we have

φ 1 ( t ) φ 2 ( t ) = 1 0 3 / 2 1 / 2 φ 1 ( 2 t ) φ 2 ( 2 t ) + 1 0 - 3 / 2 1 / 2 φ 1 ( 2 t - 1 ) φ 2 ( 2 t - 1 ) . φ 1 ( t ) φ 2 ( t ) = 1 0 3 / 2 1 / 2 φ 1 ( 2 t ) φ 2 ( 2 t ) + 1 0 - 3 / 2 1 / 2 φ 1 ( 2 t - 1 ) φ 2 ( 2 t - 1 ) .
(64)

Further assume RR wavelets span the difference spaces; i.e.,

W j = V j + 1 V j = Span k { ψ i , j , k ( t ) : i = 1 , ... , R } . W j = V j + 1 V j = Span k { ψ i , j , k ( t ) : i = 1 , ... , R } .
(65)

Since W0V1W0V1 for the stacked wavelets Ψ(t)Ψ(t) there must exist a sequence of R×RR×R matrices G(k)G(k), such that

Ψ ( t ) = 2 k G ( k ) Φ ( 2 t - k ) Ψ ( t ) = 2 k G ( k ) Φ ( 2 t - k )
(66)

These are vector versions of the two scale recursive equations (Reference) and (Reference).

We can also define the discrete-time Fourier transform of H(k)H(k) and G(k)G(k) as

H ( ω ) = k H ( k ) e i ω k , G ( ω ) = k G ( k ) e i ω k . H ( ω ) = k H ( k ) e i ω k , G ( ω ) = k G ( k ) e i ω k .
(67)

### Approximation, Regularity and Smoothness

Recall from Chapter (Reference) that the key to regularity and smoothness is having enough number of zeros at ππ for H(ω)H(ω). For multiwavelets, it has been shown that polynomials can be exactly reproduced by translates of Φ(t)Φ(t) if and only if H(ω)H(ω) can be factored in special form (Reference), (Reference), (Reference). The factorization is used to study the regularity and convergence of refinable function vectors (Reference), and to construct multi-scaling functions with approximation and symmetry (Reference). Approximation and smoothness of multiple refinable functions are also studied in (Reference), (Reference), (Reference).

### Support

In general, the finite length of H(k)H(k) and G(k)G(k) ensure the finite support of Φ(t)Φ(t) and Ψ(t)Ψ(t). However, there are no straightforward relations between the support length and the number of nonzero coefficients in H(k)H(k) and G(k)G(k). An explanation is the existence of nilpotent matrices (Reference). A method to estimate the support is developed in (Reference).

### Orthogonality

For these scaling functions and wavelets to be orthogonal to each other and orthogonal to their translations, we need (Reference)

H ( ω ) H ( ω ) + H ( ω + π ) H ( ω + π ) = I R , H ( ω ) H ( ω ) + H ( ω + π ) H ( ω + π ) = I R ,
(68)
G ( ω ) G ( ω ) + G ( ω + π ) G ( ω + π ) = I R , G ( ω ) G ( ω ) + G ( ω + π ) G ( ω + π ) = I R ,
(69)
H ( ω ) G ( ω ) + H ( ω + π ) G ( ω + π ) = 0 R , H ( ω ) G ( ω ) + H ( ω + π ) G ( ω + π ) = 0 R ,
(70)

where denotes the complex conjugate transpose, IRIR and 0R0R are the R×RR×R identity and zero matrix respectively. These are the matrix versions of (Reference) and (Reference). In the scalar case, (Reference) can be easily satisfied if we choose the wavelet filter by time-reversing the scaling filter and changing the signs of every other coefficients. However, for the matrix case here, since matrices do not commute in general, we cannot derive the G(k)G(k)'s from H(k)H(k)'s so straightforwardly. This presents some difficulty in finding the wavelets from the scaling functions; however, this also gives us flexibility to design different wavelets even if the scaling functions are fixed (Reference).

The conditions in Equation 68Equation 70 are necessary but not sufficient. Generalization of Lawton's sufficient condition (Theorem (Reference) in Chapter (Reference)) has been developed in (Reference), (Reference), (Reference).

### Implementation of Multiwavelet Transform

Let the expansion coefficients of multiscaling functions and multiwavelets be

c i , j ( k ) = f ( t ) , φ i , j , k ( t ) , c i , j ( k ) = f ( t ) , φ i , j , k ( t ) ,
(71)
d i , j ( k ) = f ( t ) , ψ i , j , k ( t ) . d i , j ( k ) = f ( t ) , ψ i , j , k ( t ) .
(72)

We create vectors by

C j ( k ) = [ c 1 , j ( k ) , ... , c R , j ( k ) ] T , C j ( k ) = [ c 1 , j ( k ) , ... , c R , j ( k ) ] T ,
(73)
D j ( k ) = [ d 1 , j ( k ) , ... , d R , j ( k ) ] T . D j ( k ) = [ d 1 , j ( k ) , ... , d R , j ( k ) ] T .
(74)

For f(t)f(t) in V0V0, it can be written as linear combinations of scaling functions and wavelets,

f ( t ) = k C j 0 ( k ) T Φ J 0 , k ( t ) + j = j 0 k D j ( k ) T Ψ j , k ( t ) . f ( t ) = k C j 0 ( k ) T Φ J 0 , k ( t ) + j = j 0 k D j ( k ) T Ψ j , k ( t ) .
(75)

Using Equation 62 and Equation 66, we have

C j - 1 ( k ) = 2 n H ( n ) C j ( 2 k + n ) C j - 1 ( k ) = 2 n H ( n ) C j ( 2 k + n )
(76)

and

D j - 1 ( k ) = 2 n G ( n ) C j ( 2 k + n ) . D j - 1 ( k ) = 2 n G ( n ) C j ( 2 k + n ) .
(77)

Moreover,

C j ( k ) = 2 k H ( k ) C j - 1 ( 2 k + n ) + G ( k ) D j - 1 ( 2 k + n ) . C j ( k ) = 2 k H ( k ) C j - 1 ( 2 k + n ) + G ( k ) D j - 1 ( 2 k + n ) .
(78)

These are the vector forms of (Reference), (Reference), and (Reference). Thus the synthesis and analysis filter banks for multiwavelet transforms have similar structures as the scalar case. The difference is that the filter banks operate on blocks of RR inputs and the filtering and rate-changing are all done in terms of blocks of inputs.

To start the multiwavelet transform, we need to get the scaling coefficients at high resolution. Recall that in the scalar case, the scaling functions are close to delta functions at very high resolution, so the samples of the function are used as the scaling coefficients. However, for multiwavelets we need the expansion coefficients for RR scaling functions. Simply using nearby samples as the scaling coefficients is a bad choice. Data samples need to be preprocessed (prefiltered) to produce reasonable values of the expansion coefficients for multi-scaling function at the highest scale. Prefilters have been designed based on interpolation (Reference), approximation (Reference), and orthogonal projection (Reference).

### Examples

Because of the larger degree of freedom, many methods for constructing multiwavelets have been developed.

### Geronimo-Hardin-Massopust Multiwavelets

A set of multiscaling filters based on fractal interpolation functions were developed in (Reference), and the corresponding multiwavelets were constructed in (Reference). As shown in (Reference), they

are both symmetrical and orthogonal—a combination which is impossible for two-band orthogonal scalar wavelets. They also have short support, and can exactly reproduce the hat function. These interesting properties make multiwavelet a promising expansion system.

### Spline Multiwavelets

Spline bases have a maximal approximation order with respect to their length, however spline uniwavelets are only semiorthogonal (Reference). A family of spline multiwavelets that are symmetric and orthogonal is developed in (Reference).

### Other Constructions

Other types of multiwavelets are constructed using Hermite interpolating conditions (Reference), matrix spectral factorization (Reference), finite elements (Reference), and oblique projections (Reference). Similar to multiwavelets, vector-valued wavelets and vector filter banks are also developed (Reference).

### Applications

Multiwavelets have been used in data compression (Reference), (Reference), (Reference), noise reduction (Reference), (Reference), and solution of integral equations (Reference). Because multiwavelets are able to offer a combination of orthogonality, symmetry, higher order of approximation and short support, methods using multiwavelets frequently outperform those using the comparable scale wavelets. However, it is found that prefiltering is very important, and should be chosen carefully for the applications (Reference), (Reference), (Reference). Also, since discrete multiwavelet transforms operate on size-RR blocks of data and generate blocks of wavelet coefficients, the correlation within each block of coefficients needs to be exploited. For image compression, predictions rules are proposed to exploit the correlation in order to reduce the bit rate (Reference). For noise reduction, joint thresholding coefficients within each block improve the performance (Reference).

## Overcomplete Representations, Frames, Redundant Transforms, and Adaptive Bases

In this chapter, we apply the ideas of frames and tight frames introduced in Chapter (Reference) as well as bases to obtain a more efficient representation of many interesting signal classes. It might be helpful to review the material on bases and frames in that chapter while reading this section.

Traditional basis systems such as Fourier, Gabor, wavelet, and wave packets are efficient representations for certain classes of signals, but there are many cases where a single system is not effective. For example, the Fourier basis is an efficient system for sinusoidal or smooth periodic signals, but poor for transient or chirp-like signals. Each system seems to be best for a rather well-defined but narrow class of signals. Recent research indicates that significant improvements in efficiency can be achieved by combining several basis systems. One can intuitively imagine removing Fourier components until the expansion coefficients quit dropping off rapidly, then switching to a different basis system to expand the residual and, after that expansion quits dropping off rapidly, switching to still another. Clearly, this is not a unique expansion because the order of expansion system used would give different results. This is because the total expansion system is a linear combination of the individual basis systems and is, therefore, not a basis itself but a frame. It is an overcomplete expansion system and a variety of criteria have been developed to use the freedom of the nonuniqueness of the expansion to advantage. The collection of basis systems from which a subset of expansion vectors is chosen is sometimes called a dictionary.

There are at least two descriptions of the problem. We may want a single expansion system to handle several different classes of signals, each of which are well-represented by a particular basis system or we may have a single class of signals, but the elements of that class are linear combinations of members of the well-represented classes. In either case, there are several criteria that have been identified as important (Reference), (Reference):

• Sparsity: The expansion should have most of the important information in the smallest number of coefficients so that the others are small enough to be neglected or set equal to zero. This is important for compression and denoising.
• Separation: If the measurement consists of a linear combination of signals with different characteristics, the expansion coefficients should clearly separate those signals. If a single signal has several features of interest, the expansion should clearly separate those features. This is important for filtering and detection.
• Superresolution: The resolution of signals or characteristics of a signal should be much better than with a traditional basis system. This is likewise important for linear filtering, detection, and estimation.
• Stability: The expansions in terms of our new overcomplete systems should not be significantly changed by perturbations or noise. This is important in implementation and data measurement.
• Speed: The numerical calculation of the expansion coefficients in the new overcomplete system should be of order O(N)O(N) or O(Nlog(N))O(Nlog(N)).

These criteria are often in conflict with each other, and various compromises will be made in the algorithms and problem formulations for an acceptable balance.

### Overcomplete Representations

This section uses the material in Chapter (Reference) on bases and frames. One goal is to represent a signal using a “dictionary" of expansion functions that could include the Fourier basis, wavelet basis, Gabor basis, etc. We formulate a finite dimensional version of this problem as

y ( n ) = k α k x k ( n ) n , k Z y ( n ) = k α k x k ( n ) n , k Z
(79)

for n=0,1,2,,N-1n=0,1,2,,N-1 and k=0,1,2,,K-1k=0,1,2,,K-1. This can be written in matrix form as

y = X α y = X α
(80)

where yy is a N×1N×1 vector with elements being the signal values y(n)y(n), the matrix XX is N×KN×K the columns of which are made up of all the functions in the dictionary and αα is a K×1K×1 vector of the expansion coefficients αkαk. The matrix operator has the basis signals xkxk as its columns so that the matrix multiplication Equation 80 is simply the signal expansion Equation 79.

For a given signal representation problem, one has two decisions: what dictionary to use (i.e., choice of the XX) and how to represent the signal in terms of this dictionary (i.e., choice of αα). Since the dictionary is overcomplete, there are several possible choices of αα and typically one uses prior knowledge or one or more of the desired properties we saw earlier to calculate the αα.

### A Matrix Example

Consider a simple two-dimensional system with orthogonal basis vectors

x 1 = 1 0 and x 2 = 0 1 x 1 = 1 0 and x 2 = 0 1
(81)

which gives the matrix operator with x1x1 and x2x2 as columns

X = 1 0 0 1 . X = 1 0 0 1 .
(82)

Decomposition with this rather trivial operator gives a time-domain description in that the first expansion coefficient α0α0 is simply the first value of the signal, x(0)x(0), and the second coefficient is the second value of the signal. Using a different set of basis vectors might give the operator

X = 0 . 7071 0 . 7071 0 . 7071 - 0 . 7071 X = 0 . 7071 0 . 7071 0 . 7071 - 0 . 7071
(83)

which has the normalized basis vectors still orthogonal but now at a 45o angle from the basis vectors in Equation 82. This decomposition is a sort of frequency domain expansion. The first column vector will simply be the constant signal, and its expansion coefficient α(0)α(0) will be the average of the signal. The coefficient of the second vector will calculate the difference in y(0)y(0) and y(1)y(1) and, therefore, be a measure of the change.

Notice that y=10y=10 can be represented exactly with only one nonzero coefficient using Equation 82 but will require two with Equation 83, while for y=11y=11 the opposite is true. This means the signals y=10y=10 and y=01y=01 can be represented sparsely by Equation 82 while y=11y=11 and y=1-1y=1-1 can be represented sparsely by Equation 83.

If we create an overcomplete expansion by a linear combination of the previous orthogonal basis systems, then it should be possible to have a sparse representation for all four of the previous signals. This is done by simply adding the columns of Equation 83 to those of Equation 82 to give

X = 1 0 0 . 7071 0 . 7071 0 1 0 . 7071 - 0 . 7071 X = 1 0 0 . 7071 0 . 7071 0 1 0 . 7071 - 0 . 7071
(84)

This is clearly overcomplete, having four expansion vectors in a two-dimensional system. Finding αkαk requires solving a set of underdetermined equations, and the solution is not unique.

For example, if the signal is given by

y = 1 0 y = 1 0
(85)

there are an infinity of solutions, several of which are listed in the following table.

 Case 1 2 3 4 5 6 7 α 0 α 0 0.5000 1.0000 1.0000 1.0000 0 0 0 α 1 α 1 0.0000 0.0000 0 0 -1.0000 1.0000 0 α 2 α 2 0.3536 0 0.0000 0 1.4142 0 0.7071 α 3 α 3 0.3536 0 0 0.0000 0 1.4142 0.7071 | | α | | 2 | | α | | 2 0.5000 1.0000 1.0000 1.0000 3.0000 3.0000 1.0000

Case 1 is the minimum norm solution of y=Xαy=Xα for αkαk. It is calculated by a pseudo inverse with the Matlab command a = pinv(X)*y . It is also the redundant DWT discussed in the next section and calculated by a = X'*y/2. Case 2 is the minimum norm solution, but for no more than two nonzero values of αkαk. Case 2 can also be calculated by inverting the matrix Equation 84 with columns 3 and 4 deleted. Case 3 is calculated the same way with columns 2 and 4 deleted, case 4 has columns 2 and 3 deleted, case 5 has 1 and 4 deleted, case 6 has 1 and 3 deleted, and case 7 has 1 and 2 deleted. Cases 3 through 7 are unique since the reduced matrix is square and nonsingular. The second term of αα for case 1 is zero because the signal is orthogonal to that expansion vector. Notice that the norm of αα is minimum for case 1 and is equal to the norm of yy divided by the redundancy, here two. Also notice that the coefficients in cases 2, 3, and 4 are the same even though calculated by different methods.

Because XX is not only a frame, but a tight frame with a redundancy of two, the energy (norm squared) of αα is one-half the norm squared of yy. The other decompositions (not tight frame or basis) do not preserve the energy.

Next consider a two-dimensional signal that cannot be exactly represented by only one expansion vector. If the unity norm signal is given by

y = 0 . 9806 0 . 1961 y = 0 . 9806 0 . 1961
(86)

the expansion coefficients are listed next for the same cases described previously.

 Case 1 2 3 4 5 6 7 α 0 α 0 0.4903 0.9806 0.7845 1.1767 0 0 0 α 1 α 1 0.0981 0.1961 0 0 -0.7845 1.1767 0 α 2 α 2 0.4160 0 0.2774 0 1.3868 0 0.8321 α 3 α 3 0.2774 0 0 -0.2774 0 1.3868 0.5547 | | α | | 2 | | α | | 2 0.5000 1.0000 0.6923 1.4615 2.5385 3.3077 1.0000

Again, case 1 is the minimum norm solution; however, it has no zero components this time because there are no expansion vectors orthogonal to the signal. Since the signal lies between the 90o and 45o expansion vectors, it is case 3 which has the least two-vector energy representation.

There are an infinite variety of ways to construct the overcomplete frame matrix XX. The one in this example is a four-vector tight frame. Each vector is 45o degrees apart from nearby vectors. Thus they are evenly distributed in the 180o upper plane of the two dimensional space. The lower plane is covered by the negative of these frame vectors. A three-vector tight frame would have three columns, each 60o from each other in the two-dimension plane. A 36-vector tight frame would have 36 columns spaced 5o from each other. In that system, any signal vector would be very close to an expansion vector.

Still another alternative would be to construct a frame (not tight) with nonorthogonal rows. This would result in columns that are not evenly spaced but might better describe some particular class of signals. Indeed, one can imagine constructing a frame operator with closely spaced expansion vectors in the regions where signals are most likely to occur or where they have the most energy.

We next consider a particular modified tight frame constructed so as to give a shift-invariant DWT.

### Shift-Invariant Redundant Wavelet Transforms and Nondecimated Filter Banks

One of the few flaws in the various wavelet basis decompositions and wavelet transforms is the fact the DWT is not translation-invariant. If you shift a signal, you would like the DWT coefficients to simply shift, but it does more than that. It significantly changes character.

Imagine a DWT of a signal that is a wavelet itself. For example, if the signal were

y ( n ) = φ ( 2 4 n - 10 ) y ( n ) = φ ( 2 4 n - 10 )
(87)

then the DWT would be

d 4 ( 10 ) = 1 all other d j ( k ) = c ( k ) = 0 . d 4 ( 10 ) = 1 all other d j ( k ) = c ( k ) = 0 .
(88)

In other words, the series expansion in the orthogonal wavelet basis would have only one nonzero coefficient.

If we shifted the signal to the right so that y(n)=φ(24(n-1)-10)y(n)=φ(24(n-1)-10), there would be many nonzero coefficients because at this shift or translation, the signal is no longer orthogonal to most of the basis functions. The signal energy would be partitioned over many more coefficients and, therefore, because of Parseval's theorem, be smaller. This would degrade any denoising or compressions using thresholding schemes. The DWT described in Chapter (Reference) is periodic in that at each scale jj the periodized DWT repeats itself after a shift of n=2jn=2j, but the period depends on the scale. This can also be seen from the filter bank calculation of the DWT where each scale goes through a different number of decimators and therefore has a different aliasing.

A method to create a linear, shift-invariant DWT is to construct a frame from the orthogonal DWT supplemented by shifted orthogonal DWTs using the ideas from the previous section. If you do this, the result is a frame and, because of the redundancy, is called the redundant DWT or RDWT.

The typical wavelet based signal processing framework consists of the following three simple steps, 1) wavelet transform; 2) point-by-point processing of the wavelet coefficients (e.g. thresholding for denoising, quantization for compression); 3) inverse wavelet transform. The diagram of the framework is shown in Figure 21. As mentioned before, the wavelet transform is not translation-invariant, so if we shift the signal, perform the above processing, and shift the output back, then the results are different for different shifts. Since the frame vectors of the RDWT consist of the shifted orthogonal DWT basis, if we replace the forward/inverse wavelet transform

in the above framework by the forward/inverse RDWT, then the result of the scheme in Figure 22 is the same as the average of all the processing results using DWTs with different shifts of the input data. This is one of the main reasons that RDWT-based signal processing tends to be more robust.

Still another view of this new transform can be had by looking at the Mallat-derived filter bank described in Chapters (Reference) and (Reference). The DWT filter banks illustrated in Figures (Reference) and (Reference) can be modified by removing the decimators between each stage to give the coefficients of the tight frame expansion (the RDWT) of the signal. We call this structure the undecimated filter bank. Notice that, without the decimation, the number of terms in the DWT is larger than NN. However, since these are the expansion coefficients in our new overcomplete frame, that is consistent. Also, notice that this idea can be applied to M-band wavelets and wavelet packets in the same way.

These RDWTs are not precisely a tight frame because each scale has a different redundancy. However, except for this factor, the RDWT and undecimated filter have the same characteristics of a tight frame and, they support a form of Parseval's theorem or energy partitioning.

If we use this modified tight frame as a dictionary to choose a particular subset of expansion vectors as a new frame or basis, we can tailor the system to the signal or signal class. This is discussed in the next section on adaptive systems.

This idea of RDWT was suggested by Mallat (Reference), Beylkin (Reference), Shensa (Reference), Dutilleux (Reference), Nason (Reference), Guo (Reference), (Reference), Coifman, and others. This redundancy comes at a price of the new RDWT having O(Nlog(N))O(Nlog(N)) arithmetic complexity rather than O(N)O(N). Liang and Parks (Reference), Bao and Erdol (Reference), (Reference), Marco and Weiss (Reference), (Reference), (Reference), Daubechies (Reference), and others (Reference) have used some form of averaging or “best basis" transform to obtain shift invariance.

Recent results indicate this nondecimated DWT, together with thresholding, may be the best denoising strategy (Reference), (Reference), (Reference), (Reference), (Reference), (Reference), (Reference), (Reference). The nondecimated DWT is shift invariant, is less affected by noise, quantization, and error, and has order Nlog(N)Nlog(N) storage and arithmetic complexity. It combines with thresholding to give denoising and compression superior to the classical Donoho method for many examples. Further discussion of use of the RDWT can be found in (Reference).

### Adaptive Construction of Frames and Bases

In the case of the redundant discrete wavelet transform just described, an overcomplete expansion system was constructed in such a way as to be a tight frame. This allowed a single linear shift-invariant system to describe a very wide set of signals, however, the description was adapted to the characteristics of the signal. Recent research has been quite successful in constructing expansion systems adaptively so as to give high sparsity and superresolution but at a cost of added computation and being nonlinear. This section will look at some of the recent results in this area (Reference), (Reference), (Reference), (Reference).

While use of an adaptive paradigm results in a shift-invariant orthogonal transform, it is nonlinear. It has the property of DWT{af(x)}=aDWT{f(x)}DWT{af(x)}=aDWT{f(x)}, but it does not satisfy superposition, i.e. DWT{f(x)+g(x)}DWT{f(x)}+DWT{g(x)}DWT{f(x)+g(x)}DWT{f(x)}+DWT{g(x)}. That can sometimes be a problem.

Since these finite dimensional overcomplete systems are a frame, a subset of the expansion vectors can be chosen to be a basis while keeping most of the desirable properties of the frame. This is described well by Chen and Donoho in (Reference), (Reference). Several of these methods are outlined as follows:

• The method of frames (MOF) was first described by Daubechies (Reference), (Reference), (Reference) and uses the rather straightforward idea of solving the overcomplete frame (underdetermined set of equations) in Equation 84 by minimizing the L2L2 norm of αα. Indeed, this is one of the classical definitions of solving the normal equations or use of a pseudo-inverse. That can easily be done in Matlab by a = pinv(X)*y. This gives a frame solution, but it is usually not sparse.
• The best orthogonal basis method (BOB) was proposed by Coifman and Wickerhauser (Reference), (Reference) to adaptively choose a best basis from a large collection. The method is fast (order NlogNNlogN) but not necessarily sparse.
• Mallat and Zhang (Reference) proposed a sequential selection scheme called matching pursuit (MP) which builds a basis, vector by vector. The efficiency of the algorithm depends on the order in which vectors are added. If poor choices are made early, it takes many terms to correct them. Typically this method also does not give sparse representations.
• A method called basis pursuit (BP) was proposed by Chen and Donoho (Reference), (Reference) which solves Equation 84 while minimizing the L1L1 norm of αα. This is done by linear programming and results in a globally optimal solution. It is similar in philosophy to the MOFs but uses an L1L1 norm rather than an L2L2 norm and uses linear programming to obtain the optimization. Using interior point methods, it is reasonably efficient and usually gives a fairly sparse solution.
• Krim et al. describe a best basis method in (Reference). Tewfik et al. propose a method called optimal subset selection in (Reference) and others are (Reference), (Reference).

All of these methods are very signal and problem dependent and, in some cases, can give much better results than the standard M-band or wavelet packet based methods.

## Local Trigonometric Bases

In the material up to this point, all of the expansion systems have required the translation and scaling properties of (Reference) and the satisfaction of the multiresolution analysis assumption of (Reference). From this we have been able to generate orthogonal basis systems with the basis functions having compact support and, through generalization to M-band wavelets and wavelet packets, we have been able to allow a rather general tiling of the time-frequency or time-scale plane with flexible frequency resolution.

By giving up the multiresolution analysis (MRA) requirement, we will be able to create another basis system with a time-frequency tiling somewhat the dual of the wavelet or wavelet packet system. Much as we saw the multiresolution system dividing the frequency bands in a logarithmic spacing for the M=2M=2 systems and a linear spacing for the higher MM case, and a rather general form for the wavelet packets, we will now develop the local cosine and local sine basis systems for a more flexible time segmenting of the time-frequency plane. Rather than modifying the MRA systems by creating the time-varying wavelet systems, we will abandon the MRA and build a basis directly.

What we are looking for is an expansion of a signal or function in the form

f ( t ) = k , n a k ( n ) χ k , n ( t ) , f ( t ) = k , n a k ( n ) χ k , n ( t ) ,
(89)

where the functions χj,k(t)χj,k(t) are of the form (for example)

χ k , n ( t ) = w k ( t ) cos ( α π ( n + β ) t + γ ) . χ k , n ( t ) = w k ( t ) cos ( α π ( n + β ) t + γ ) .
(90)

Here wk(t)wk(t) is a window function giving localization to the basis function and αα, ββ and γγ are constants the choice of which we will get to shortly. kk is a time index while nn is a frequency index. By requiring orthogonality of these basis functions, the coefficients (the transform) are found by an inner product

a k ( n ) = f ( t ) , χ k , n ( t ) = f ( t ) χ k , n ( t ) d t . a k ( n ) = f ( t ) , χ k , n ( t ) = f ( t ) χ k , n ( t ) d t .
(91)

We will now examine how this can be achieved and what the properties of the expansion are.

Fundamentally, the wavelet packet system decomposes L2()L2() into a direct sum of orthogonal spaces, each typically covering a certain frequency band and spanned by the translates of a particular element of the wavelet packet system. With wavelet packets time-frequency tiling with flexible frequency resolution is possible. However, the temporal resolution is determined by the frequency band associated with a particular element in the packet.

Local trigonometric bases (Reference), (Reference) are duals of wavelet packets in the sense that these bases give flexible temporal resolution. In this case, L2()L2() is decomposed into a direct sum of spaces each typically covering a particular time interval. The basis functions are all modulates of a fixed window function.

One could argue that an obvious approach is to partition the time axis into disjoint bins and use a Fourier series expansion in each temporal bin. However, since the basis functions are “rectangular-windowed” exponentials they are discontinuous at the bin boundaries and hence undesirable in the analysis of smooth signals. If one replaces the rectangular window with a “smooth” window, then, since products of smooth functions are smooth, one can generate smooth windowed exponential basis functions. For example, if the time axis is split uniformly, one is looking at basis functions of the form w(t-k)eι2πnt,k,nZw(t-k)eι2πnt,k,nZ for some smooth window function w(t)w(t). Unfortunately, orthonormality disallows the function w(t)w(t) from being well-concentrated in time or in frequency - which is undesirable for time frequency analysis. More precisely, the Balian-Low theorem (see p.108 in (Reference)) states that the Heisenberg product of gg (the product of the time-spread and frequency-spread which is lower bounded by the Heisenberg uncertainty principle) is infinite. However, it turns out that windowed trigonometric bases (that use cosines and sines but not exponentials) can be orthonormal, and the window can have a finite Heisenberg product (Reference). That is the reason why we are looking for local trigonometric bases of the form given in Equation 90.

### Nonsmooth Local Trigonometric Bases

To construct local trigonometric bases we have to choose: (a) the window functions wk(t)wk(t); and (b) the trigonometric functions (i.e., αα, ββ and γγ in Eq. Equation 90). If we use the rectangular window (which we know is a bad choice), then it suffices to find a trigonometric basis for the interval that the window spans. Without loss of generality, we could consider the unit interval (0,1)(0,1) and hence we are interested in trigonometric bases for L2((0,1))L2((0,1)). It is easy to see that the following four sets of functions satisfy this requirement.

1. Φn(t)=2cos(π(n+12)t)Φn(t)=2cos(π(n+12)t), n0,1,2,...n0,1,2,...;
2. Φn(t)=2sin(π(n+12)t)Φn(t)=2sin(π(n+12)t), n0,1,2,...n0,1,2,...;
3. Φn(t)=1,2cos(πnt)Φn(t)=1,2cos(πnt), n1,2,...n1,2,...;
4. Φn(t)=2sin(πnt)Φn(t)=2sin(πnt), n0,1,2,...n0,1,2,....

Indeed, these orthonormal bases are obtained from the Fourier series on (-2,2)(-2,2) (the first two) and on (-1,1)(-1,1) (the last two) by appropriately imposing symmetries and hence are readily verified to be complete and orthonormal on (0,1)(0,1). If we choose a set of nonoverlapping rectangular window functions wk(t)wk(t) such that kwk(t)=1kwk(t)=1 for all tt, and define χk,n(t)=wk(t)Φn(t)χk,n(t)=wk(t)Φn(t), then, χk,n(t)χk,n(t) is a local trigonometric basis for L2()L2(), for each of the four choices of phin(t)phin(t) above.

### Construction of Smooth Windows

We know how to construct orthonormal trigonometric bases for disjoint temporal bins or intervals. Now we need to construct smooth windows wk(t)wk(t) that when applied to cosines and sines retain orthonormality. An outline of the process is as follows: A unitary operation is applied that “unfolds” the discontinuities of all the local basis functions at the boundaries of each temporal bin. Unfolding leads to overlapping (unfolded) basis functions. However, since unfolding is unitary, the resulting functions still form an orthonormal basis. The unfolding operator is parameterized by a function r(t)r(t) that satisfies an algebraic constraint (which makes the operator unitary). The smoothness of the resulting basis functions depends on the smoothness of this underlying function r(t)r(t).

The function r(t)r(t), referred to as a rising cutoff function, satisfies the following conditions (see (Reference)) :

r ( t ) 2 + r ( - t ) 2 = 1 , for all t I R ; r ( t ) = 0 , if t - 1 1 , if t 1 r ( t ) 2 + r ( - t ) 2 = 1 , for all t I R ; r ( t ) = 0 , if t - 1 1 , if t 1
(92)

r(t)r(t) is called a rising cutoff function because it rises from 0 to 1 in the interval [-1,1][-1,1] (note: it does not necessarily have to be monotone increasing). Multiplying a function by r(t)r(t) would localize it to [-1,][-1,]. Every real-valued function r(t)r(t) satisfying Equation 92 is of the form r(t)=sin(θ(t))r(t)=sin(θ(t)) where

θ ( t ) + θ ( - t ) = π 2 for all t I R ; r ( t ) = 0 , if t - 1 . π 2 , if t 1 . θ ( t ) + θ ( - t ) = π 2 for all t I R ; r ( t ) = 0 , if t - 1 . π 2 , if t 1 .
(93)

This ensures that r(-t)=sin(θ(-t))=sin(π2-θ(t))=cos(θ(t))r(-t)=sin(θ(-t))=sin(π2-θ(t))=cos(θ(t)) and therefore r2(t)+r2(-t)=1r2(t)+r2(-t)=1. One can easily construct arbitrarily smooth rising cutoff functions. We give one such recipe from (Reference) (p.105) . Start with a function

r [ 0 ] ( t ) = 0 , if t 1 sin π 4 ( 1 + t ) , if - 1 < t < 1 1 , if t 1 r [ 0 ] ( t ) = 0 , if t 1 sin π 4 ( 1 + t ) , if - 1 < t < 1 1 , if t 1
(94)

It is readily verified to be a rising cutoff function. Now recursively define r[1](t),r[2](t),...r[1](t),r[2](t),... as follows:

r [ n + 1 ] ( t ) = r [ n ] ( sin ( π 2 t ) ) . r [ n + 1 ] ( t ) = r [ n ] ( sin ( π 2 t ) ) .
(95)

Notice that r[n](t)r[n](t) is a rising cutoff function for every nn. Moreover, by induction on nn it is easy to show that r[n](t)C2n-1r[n](t)C2n-1 (it suffices to show that derivatives at t=-1t=-1 and t=1t=1 exist and are zero up to order 2n-12n-1).

### Folding and Unfolding

Using a rising cutoff function r(t)r(t) one can define the folding operator, UU, and its inverse, the unfolding operator UU as follows:

U ( r ) f ( t ) = r ( t ) f ( t ) + r ( - t ) f ( - t ) , if t > 0 r ( - t ) f ( t ) - r ( t ) f ( - t ) , if t < 0 U ( r ) f ( t ) = r ( t ) f ( t ) + r ( - t ) f ( - t ) , if t > 0 r ( - t ) f ( t ) - r ( t ) f ( - t ) , if t < 0
(96)
U ( r ) f ( t ) = r ( t ) f ( t ) - r ( - t ) f ( - t ) , if t > 0 r ( - t ) f ( t ) + r ( t ) f ( - t ) , if t < 0 U ( r ) f ( t ) = r ( t ) f ( t ) - r ( - t ) f ( - t ) , if t > 0 r ( - t ) f ( t ) + r ( t ) f ( - t ) , if t < 0
(97)

Notice that U(r)U(r)f(t)=(r(t)2+r(-t)2)f(t)=U(r)U(r)f(t)U(r)U(r)f(t)=(r(t)2+r(-t)2)f(t)=U(r)U(r)f(t) and that U(r)f=f=U(r)fU(r)f=f=U(r)f showing that U(r)U(r) is a unitary operator on L2()L2(). Also these operators change f(t)f(t) only in [-1,1][-1,1] since U(r)f(t)=f(t)=U(r)f(t)U(r)f(t)=f(t)=U(r)f(t) for t-1t-1 and t1t1. The interval [-1,1][-1,1] is the action region of the folding/unfolding operator. U(r)U(r) is called a folding operator acting at zero because for smooth ff, U(r)fU(r)f has a discontinuity at zero. By translation and dilation of r(t)r(t) one can define U(r,t0,ϵ)U(r,t0,ϵ) and U(r,t0,ϵ)U(r,t0,ϵ) that folds and unfolds respectively about t=t0t=t0 with action region[t0-ϵ,t0+ϵ][t0-ϵ,t0+ϵ] and action radiusϵϵ.

Notice Equation 96 and Equation 97 do not define the value U(r)f(0)U(r)f(0) and U(r)f(0)U(r)f(0) because of the discontinuity that is potentially introduced. An elementary exercise in calculus divulges the nature of this discontinuity. If fCd()fCd(), then U(r)fCd(0)U(r)fCd(0). At t=0t=0, left and right derivatives exist with all even-order left-derivatives and all odd order right-derivatives (upto and including dd) being zero. Conversely, given any function fCd(0)fCd(0) which has a discontinuity of the above type, U(r)fU(r)f has a unique extension across t=0t=0 (i.e., a choice of value for (U(r)f)(0)U(r)f)(0)) that is in Cd()Cd(). One can switch the signs in Equation 96 and Equation 97 to obtain another set of folding and unfolding operators. In this case, for fCd()fCd(), U(r)fU(r)f will have its even-order right derivatives and odd-order left derivatives equal to zero. We will use U+U+, U+U+ and U-U-, U-U-, respectively to distinguish between the two types of folding/unfolding operators and call them positive and negative polarity folding/unfolding operators respectively.

So far we have seen that the folding operator is associated with a rising cutoff function, acts at a certain point, has a certain action region and radius and has a certain polarity. To get a qualitative idea of what these operators do, let us look at some examples.

First, consider a case where f(t)f(t) is even- or-odd symmetric about the folding point on the action interval. Then, UfUf corresponds to simply windowing ff by an appropriate window function. Indeed, if f(t)=f(-t)f(t)=f(-t) on [-1,1][-1,1],

U + ( r , 0 , 1 ) f ( t ) = ( r ( t ) + r ( - t ) ) f ( t ) , if t > 0 , ( r ( - t ) - r ( t ) ) f ( t ) , if t < 0 , U + ( r , 0 , 1 ) f ( t ) = ( r ( t ) + r ( - t ) ) f ( t ) , if t > 0 , ( r ( - t ) - r ( t ) ) f ( t ) , if t < 0 ,
(98)

and if f(t)=-f(-t)f(t)=-f(-t) on [-1,1][-1,1]

U + ( r , 0 , 1 ) f ( t ) = ( r ( t ) - r ( - t ) ) f ( t ) , if t > 0 , ( r ( - t ) - r ( t ) ) f ( t ) , if t < 0 . U + ( r , 0 , 1 ) f ( t ) = ( r ( t ) - r ( - t ) ) f ( t ) , if t > 0 , ( r ( - t ) - r ( t ) ) f ( t ) , if t < 0 .
(99)

(Reference) shows a rising cutoff function and the action of the folding operators of both polarities on the constant function. Observe the nature of the discontinuity at t=0t=0 and the effect of polarityreversal.

We saw that for signals with symmetry in the action region folding, corresponds to windowing. Next we look at signals that are supported to the right (or left) of the folding point and see what unfolding does to them. In this case, U+(r)(f)U+(r)(f) is obtained by windowing the even (or odd) extension of f(t)f(t) about the folding point. Indeed if f(t)=0,t<0f(t)=0,t<0

U + ( r , 0 , 1 ) f ( t ) = r ( t ) f ( t ) , if t > 0 , r ( t ) f ( - t ) , if t < 0 , U + ( r , 0 , 1 ) f ( t ) = r ( t ) f ( t ) , if t > 0 , r ( t ) f ( - t ) , if t < 0 ,
(100)

and if f(t)=0,t>0f(t)=0,t>0,

U + ( r , 0 , 1 ) f ( t ) = - - r ( - t ) f ( - t ) , if t > 0 , r ( - t ) f ( t ) , if t < 0 , U + ( r , 0 , 1 ) f ( t ) = - - r ( - t ) f ( - t ) , if t > 0 , r ( - t ) f ( t ) , if t < 0 ,
(101)

(Reference) shows the effect of the positive unfolding operator acting on cosine and sine functions supported on the right and left half-lines respectively. Observe that unfolding removes the discontinuities at t=0t=0. If the polarity is reversed, the effects on signals on the half-line are switched; the right half-line is associated with windowed odd extensions and left half-line with windowed even extensions.

### Local Cosine and Sine Bases

Recall the four orthonormal trigonometric bases for L2((0,1))L2((0,1)) we described earlier.

1. Φn(t)=2cos(π(n+12)t)Φn(t)=2cos(π(n+12)t), n0,1,2,...n0,1,2,...;
2. Φn(t)=2sin(π(n+12)t)Φn(t)=2sin(π(n+12)t), n0,1,2,...n0,1,2,...;
3. Φn(t)=1,2cos(πnt)Φn(t)=1,2cos(πnt), n1,2,...n1,2,...;
4. Φn(t)=2sin(πnt)Φn(t)=2sin(πnt), n0,1,2,...n0,1,2,....

The bases functions have discontinuities at t=0t=0 and t=1t=1 because they are restrictions of the cosines and sines to the unit interval by rectangular windowing. The natural extensions of these basis functions to tt (i.e., unwindowed cosines and sines) are either even (say “+”) or odd (say “-”) symmetric (locally) about the endpoints t=0t=0 and t=1t=1. Indeed the basis functions for the four cases are (+,-)(+,-), (-,+)(-,+), (+,+)(+,+) and (-,-)(-,-) symmetric, respectively, at (0,1)(0,1). From the preceding analysis, this means that unfolding these basis functions corresponds to windowing if the unfolding operator has the right polarity. Also observe that the basis functions are discontinuous at the endpoints. Moreover, depending on the symmetry at each endpoint all odd derivatives (for “+” symmetry) or even derivatives (for “--” symmetry) are zero. By choosing unfolding operators of appropriate polarity at the endpoints (with non overlapping action regions) for the four bases, we get smooth basis functions of compact support. For example, for (+,--) symmetry, the basis function U+(r0,0,ϵ0)U+(r1,1,ϵ1)ψn(t)U+(r0,0,ϵ0)U+(r1,1,ϵ1)ψn(t) is supported in (-ϵ0,1+ϵ1)(-ϵ0,1+ϵ1) and is as many times continuously differentiable as r0r0 and r1r1 are.

Let tjtj be an ordered set of points in defining a partition into disjoint intervals Ij=[tj,tj+1]Ij=[tj,tj+1]. Now choose one of the four bases above for each interval such that at tjtj the basis functions for Ij-1Ij-1 and that for IjIj have opposite symmetries. We say the polarity at tjtj is positive if the symmetry is -)(+-)(+ and negative if it is +)(-+)(-. At each tjtj choose a smooth cutoff function rj(t)rj(t) and action radius ϵjϵj so that the action intervals do not overlap. Let p(j)p(j) be the polarity of tjtj and define the unitary operator

U = j U p ( j ) ( r j , t j , ϵ j ) . U = j U p ( j ) ( r j , t j , ϵ j ) .
(102)

Let ψn(t)ψn(t) denote all the basis functions for all the intervals put together. Then ψn(t)ψn(t) forms a nonsmooth orthonormal basis for L2()L2(). Simultaneously Uψn(t)Uψn(t) also forms a smooth

orthonormal basis for L2()L2(). To find the expansion coefficients of a function f(t)f(t) in this basis we use

f , U ψ n = U f , ψ n . f , U ψ n = U f , ψ n .
(103)

In other words, to compute the expansion coefficients of ff in the new (smooth) basis, one merely folds ff to UfUf and finds its expansion coefficients with respect to the original basis. This allows one to exploit fast algorithms available for coefficient computation in the original basis.

So for an arbitrary choice of polarities at the end points tjtj we have smooth local trigonometric bases. In particular by choosing the polarity to be positive for all tjtj (consistent with the choice of the first basis in all intervals) we get local cosine bases. If the polarity is negative for all tjtj (consistent with the choice of the second basis for all intervals), we get local sine bases. Alternating choice of polarity (consistent with the alternating choice of the third and fourth bases in the intervals) thus leads to alternating cosine/sine bases.

All these bases can be constructed in discrete time by sampling the cosines/sines basis functions (Reference). Local cosine bases in discrete time were constructed originally by Malvar and are sometimes called lapped orthogonal transforms (Reference). In the discrete case, the efficient implementation of trigonometric transforms (using DCT I-IV and DST I-IV) can be utilized after folding. In this case, expanding in local trigonometric bases corresponds to computing a DCT after preprocesing (or folding) the signal.

For a sample basis function in each of the four bases, Figure 7 shows the corresponding smooth basis function after unfolding. Observe that for local cosine and sine bases, the basis functions are not linear phase; while the window is symmetric, the windowed functions are not. However, for alternating sine/cosine bases the (unfolded) basis functions are linear phase. There is a link between local sine (or cosine) bases and modulated filter banks that cannot have linear phase filters (discussed in Chapter (Reference)). So there is also a link between alternating cosine/sine bases and linear-phase modulated filter banks (again see Chapter (Reference)). This connection is further explored in (Reference).

Local trigonometric bases have been applied to several signal processing problems. For example, they have been used in adaptive spectral analysis and in the segmentation of speech into voiced and unvoiced regions (Reference). They are also used for image compression and are known in the literature as lapped-orthogonal transforms (Reference).

### Signal Adaptive Local Trigonometric Bases

In the adaptive wavelet packet analysis described in "Adaptive Wavelet Packet Systems", we considered a full filter bank tree of decompositions and used some algorithm (best-basis algorithm, for instance) to prune the tree to get the best tree topology (equivalently frequency partition) for a given signal. The idea here is similar. We partition the time axis (or interval) into bins and successively refine each partition into further bins, giving a tree of partitions for the time axis (or interval). If we use smooth local trigonometric bases at each of the leaves of a full or pruned tree, we get a smooth basis for all signals on the time axis (or interval). In adaptive local bases one grows a full tree and prunes it based on some criterion to get the optimal set of temporal bins.

Figure 23 schematically shows a sample time-frequency tiling associated with a particular local trigonometric basis. Observe that this is the dual of a wavelet packet tiling (see Figure 13)—in the sense that one can switch the time and frequency axes to go between the two.

## Discrete Multiresolution Analysis, the Discrete-Time Wavelet Transform, and the Continuous Wavelet Transform

Up to this point, we have developed wavelet methods using the series wavelet expansion of continuous-time signals called the discrete wavelet transform (DWT), even though it probably should be called the continuous-time wavelet series. This wavelet expansion is analogous to the

Fourier series in that both are series expansions that transform continuous-time signals into a discrete sequence of coefficients. However, unlike the Fourier series, the DWT can be made periodic or nonperiodic and, therefore, is more versatile and practically useful.

In this chapter we will develop a wavelet method for expanding discrete-time signals in a series expansion since, in most practical situations, the signals are already in the form of discrete samples. Indeed, we have already discussed when it is possible to use samples of the signal as scaling function expansion coefficients in order to use the filter bank implementation of Mallat's algorithm. We find there is an intimate connection between the DWT and DTWT, much as there is between the Fourier series and the DFT. One expands signals with the FS but often implements that with the DFT.

To further generalize the DWT, we will also briefly present the continuous wavelet transform which, similar to the Fourier transform, transforms a function of continuous time to a representation with continuous scale and translation. In order to develop the characteristics of these various wavelet representations, we will often call on analogies with corresponding Fourier representations. However, it is important to understand the differences between Fourier and wavelet methods. Much of that difference is connected to the wavelet being concentrated in both time and scale or frequency, to the periodic nature of the Fourier basis, and to the choice of wavelet bases.

### Discrete Multiresolution Analysis and the Discrete-Time Wavelet Transform

Parallel to the developments in early chapters on multiresolution analysis, we can define a discrete multiresolution analysis (DMRA) for l2l2, where the basis functions are discrete sequences (Reference), (Reference), (Reference). The expansion of a discrete-time signal in terms of discrete-time basis function is expressed in a form parallel to (Reference) as

f ( n ) = j , k d j ( k ) ψ ( 2 j n - k ) f ( n ) = j , k d j ( k ) ψ ( 2 j n - k )
(104)

where ψ(m)ψ(m) is the basic expansion function of an integer variable mm. If these expansion functions are an orthogonal basis (or form a tight frame), the expansion coefficients (discrete-time wavelet transform) are found from an inner product by

d j ( k ) = f ( n ) , ψ ( 2 j n - k ) = n f ( n ) ψ ( 2 j n - k ) d j ( k ) = f ( n ) , ψ ( 2 j n - k ) = n f ( n ) ψ ( 2 j n - k )
(105)

If the expansion functions are not orthogonal or even independent but do span 22, a biorthogonal system or a frame can be formed such that a transform and inverse can be defined.

Because there is no underlying continuous-time scaling function or wavelet, many of the questions, properties, and characteristics of the analysis using the DWT in Chapters (Reference), (Reference), (Reference), etc. do not arise. In fact, because of the filter bank structure for calculating the DTWT, the design is often done using multirate frequency domain techniques, e.g., the work by Smith and Barnwell and associates (Reference). The questions of zero wavelet moments posed by Daubechies, which are related to ideas of convergence for iterations of filter banks, and Coifman's zero scaling function moments that were shown to help approximate inner products by samples, seem to have no DTWT interpretation.

The connections between the DTWT and DWT are:

• If the starting sequences are the scaling coefficients for the continuous multiresolution analysis at very fine scale, then the discrete multiresolution analysis generates the same coefficients as does the continuous multiresolution analysis on dyadic rationals.
• When the number of scales is large, the basis sequences of the discrete multiresolution analysis converge in shape to the basis functions of the continuous multiresolution analysis.

The DTWT or DMRA is often described by a matrix operator. This is especially easy if the transform is made periodic, much as the Fourier series or DFT are. For the discrete time wavelet transform (DTWT), a matrix operator can give the relationship between a vector of inputs to give a vector of outputs. Several references on this approach are in (Reference), (Reference), (Reference), (Reference), (Reference), (Reference), (Reference), (Reference).

### Continuous Wavelet Transforms

The natural extension of the redundant DWT in "Overcomplete Representations, Frames, Redundant Transforms, and Adaptive Bases" is to the continuous wavelet transform (CWT), which transforms a continuous-time signal into a wavelet transform that is a function of continuous shift or translation and a continuous scale. This transform is analogous to the Fourier transform, which is redundant, and results in a transform that is easier to interpret, is shift invariant, and is valuable for time-frequency/scale analysis. (Reference), (Reference), (Reference), (Reference), (Reference), (Reference), (Reference)

The definition of the CWT in terms of the wavelet w(t)w(t) is given by

F ( s , τ ) = s - 1 / 2 f ( t ) w t - τ s d t F ( s , τ ) = s - 1 / 2 f ( t ) w t - τ s d t
(106)

where the inverse transform is

f ( t ) = K 1 s 2 F ( s , τ ) w t - τ s d s d τ f ( t ) = K 1 s 2 F ( s , τ ) w t - τ s d s d τ
(107)

with the normalizing constant given by

K = | W ( ω ) | 2 | ω | d ω , K = | W ( ω ) | 2 | ω | d ω ,
(108)

with W(ω)W(ω) being the Fourier transform of the wavelet w(t)w(t). In order for the wavelet to be admissible (for Equation 107 to hold), K<K<. In most cases, this simply requires that W(0)=0W(0)=0 and that W(ω)W(ω) go to zero (W()=0W()=0) fast enough that K<K<.

These admissibility conditions are satisfied by a very large set of functions and give very little insight into what basic wavelet functions should be used. In most cases, the wavelet w(t)w(t) is chosen to give as good localization of the energy in both time and scale as possible for the class of signals of interest. It is also important to be able to calculate samples of the CWT as efficiently as possible, usually through the DWT and Mallat's filter banks or FFTs. This, and the interpretation of the CWT, is discussed in (Reference), (Reference), (Reference), (Reference), (Reference), (Reference), (Reference), (Reference), (Reference).

The use of the CWT is part of a more general time-frequency analysis that may or may not use wavelets (Reference), (Reference), (Reference), (Reference), (Reference).

### Analogies between Fourier Systems and Wavelet Systems

In order to better understand the wavelet transforms and expansions, we will look at the various forms of Fourier transforms and expansion. If we denote continuous time by CT, discrete time by DT, continuous frequency by CF, and discrete frequency by DF, the following table will show what the discrete Fourier transform (DFT), Fourier series (FS), discrete-time Fourier transform (DTFT), and Fourier transform take as time domain signals and produce as frequency domain transforms or series. For example, the Fourier series takes a continuous-time input signal and produces a sequence of discrete-frequency coefficients while the DTFT takes a discrete-time sequence of numbers as an input signal and produces a transform that is a function of continuous frequency.

 DT CT DF DFT FS CF DTFT FT

Because the basis functions of all four Fourier transforms are periodic, the transform of a periodic signal (CT or DT) is a function of discrete frequency. In other words, it is a sequence of series expansion coefficients. If the signal is infinitely long and not periodic, the transform is a function of continuous frequency and the inverse is an integral, not a sum.

Periodic in time Discrete in frequency Periodic in time Discrete in frequency
(109)
Periodic in frequency Discrete in time Periodic in frequency Discrete in time
(110)

A bit of thought and, perhaps, referring to appropriate materials on signal processing and Fourier methods will make this clear and show why so many properties of Fourier analysis are created by the periodic basis functions.

Also recall that in most cases, it is the Fourier transform, discrete-time Fourier transform, or Fourier series that is needed but it is the DFT that can be calculated by a digital computer and that is probably using the FFT algorithm. If the coefficients of a Fourier series drop off fast enough or, even better, are zero after some harmonic, the DFT of samples of the signal will give the Fourier series coefficients. If a discrete-time signal has a finite nonzero duration, the DFT of its values will be samples of its DTFT. From this, one sees the relation of samples of a signal to the signal and the relation of the various Fourier transforms.

Now, what is the case for the various wavelet transforms? Well, it is both similar and different. The table that relates the continuous and discrete variables is given by where DW indicates discrete values for scale and translation given by jj and kk, with CW denoting continuous values for scale and translation.

 DT CT DW DTWT DWT CW DTCWT CWT

We have spent most this book developing the DWT, which is a series expansion of a continuous time signal. Because the wavelet basis functions are concentrated in time and not periodic, both the DTWT and DWT will represent infinitely long signals. In most practical cases, they are made periodic to facilitate efficient computation. Chapter (Reference) gives the details of how the transform is made periodic. The discrete-time, continuous wavelet transform (DTCWT) is seldom used and not discussed here.

The naming of the various transforms has not been consistent in the literature and this is complicated by the wavelet transforms having two transform variables, scale and translation. If we could rename all the transforms, it would be more consistent to use Fourier series (FS) or wavelet series (WS) for a series expansion that produced discrete expansion coefficients, Fourier transforms (FT) or wavelet transforms (WT) for integral expansions that produce functions of continuous frequency or scale or translation variable together with DT (discrete time) or CT (continuous time) to describe the input signal. However, in common usage, only the DTFT follows this format!

 Common Consistent Time, Transform Input Output name name C or D C or D periodic periodic FS CTFS C D Yes No DFT DTFS D D Yes Yes DTFT DTFT D C No Yes FT CTFT C C No No DWT CTWS C D Y or N Y or N DTWT DTWS D D Y or N Y or N – DTWT D C N N CWT CTWT C C N N

Recall that the difference between the DWT and DTWT is that the input to the DWT is a sequence of expansion coefficients or a sequence of inner products while the input to the DTWT is the signal itself, probably samples of a continuous-time signal. The Mallat algorithm or filter bank structure is exactly the same. The approximation is made better by zero moments of the scaling function (see (Reference)) or by some sort of prefiltering of the samples to make them closer to the inner products (Reference).

As mentioned before, both the DWT and DTWT can be formulated as nonperiodic, on-going transforms for an exact expansion of infinite duration signals or they may be made periodic to handle finite-length or periodic signals. If they are made periodic (as in Chapter (Reference)), then there is an aliasing that takes place in the transform. Indeed, the aliasing has a different period at the different scales which may make interpretation difficult. This does not harm the inverse transform which uses the wavelet information to “unalias" the scaling function coefficients. Most (but not all) DWT, DTWT, and matrix operators use a periodized form (Reference).

## Content actions

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks