Finally, we say a few words about DWT implementation. Here
we focus on a single DWT stage and assume circular
convolution, yielding an
MMxMM
DWT matrix
T
M
T
M
. In the general case,
MMxMM
matrix multiplication requires
M2
M
2
multiplications. The DWT matrices, however, have
a circular-convolution structure which allows us to
implement them using significantly less multiplies. Below
we present some simple and reasonably efficient approaches
for the implementation of
T
M
T
M
and
T
M
T
T
M
.
We treat the inverse DWT first. Recall that in the lowpass
synthesis branch, we upsample the input before circularly
convolving with
Hz
H
z
. Denoting the upsampled coefficient sequence by
an
a
n
, fast circular convolution
an*hn
a
n
h
n
can be described as follows (using Matlab
notation)
ifft( fft(a).*fft(h,length(a)) )
where we have assumed that length(a) ≥
length(h). The highpass branch is handled
similarly using
Gz
G
z
, after which the two branch outputs are summed.
Next we treat the forward DWT. Recall that in the lowpass
analysis branch, we circularly convolve the input with
Hz-1
H
z
and then downsample the result. The fast circular
convolution
an*h-n
a
n
h
n
can be implemented using
wshift('1', ifft(fft(a).*fft(flipud(h),length(a))), length(h)-1 )
where wshift accomplishes a circular
shift of the ifft output that makes up
for the unwanted delay of length(h)-1
samples imposed by the flipud operation.
The highpass branch is handled similarly but with filter
Gz-1
G
z
. Finally, each branch is downsampled by factor two.
We note that the proposed approach is not totally efficient
because downsampling is performed after circular convolution
(and upsampling before circular convolution). Still, we have
outlined this approach because it is easy to understand and
still results in major saving when
MM is large: it converts the
OM2
O
M
2
matrix multiply into an
OMlog2M
O
M
2
M
operation.