The basic nn-point DCT requires
n2
n
2
multiplications and
n(n−1)
n
n
1
additions to calculate
y=Tx
y
T
x
(64 mults and 56 adds for
n=8
n
8
).
From the figure in our discussion of DCT, it is clear
that symmetries exist in the DCT basis functions. These can be
exploited to reduce the computation load of the DCT.
All the odd rows of TT in this equation from our discussion of
DCT possess even symmetry about their centres and all the even
rows possess odd symmetry. Hence we may form:
∀i,i=1→4:(ui=xi+x9−i)∧(vi=xi−x9−i)
i
i
1
4
u
i
x
i
x
9
i
v
i
x
i
x
9
i
(1)
and then form the odd and even terms in
yy from two
4×4
4
4
transforms:
((
y1
y3
y5
y7
)=
T
left
,
odd
u)∧((
y2
y4
y6
y8
)=
T
left
,
even
v)
y
1
y
3
y
5
y
7
T
left
,
odd
u
y
2
y
4
y
6
y
8
T
left
,
even
v
(2)
where
T
left
,
odd
T
left
,
odd
and
T
left
,
even
T
left
,
even
are the
44x
44 matrices formed by the left
halves of the odd and even rows of
TT.
This reduces the computation to 8 add/subtract operations for
Equation 1 and
2×16
2
16
mults and
2×12
2
12
adds for Equation 2 - almost
halving the total computation load.
The matrix
T
left
,
even
T
left
,
even
cannot easily be simplified much further, but
T
left
,
odd
T
left
,
odd
can, as it possesses the same symmetries as
TT (it is equivalent
to a 4-point DCT matrix). Hence we may use the same technique on
this matrix to reduce the 16 mults and 12 adds for this product
to 4 add/subtract operations followed by a pair of
22 x 22
matrix products, requiring
2×4
2
4
mults and
2×2
2
2
adds. Finally two of these mults may be saved since
one of the 22 x
22matrices is just a scaled
add/subtract matrix (like the Haar transform).
The total computation load for the
8×8
8
8
DCT then becomes:
-
8+12+4+2+2=28
8
12
4
2
2
28
add/subtract operations;
-
16+4+2=22
16
4
2
22
multiply operations.
More complicated algorithms exist (JPEG Book, sections 4.3.2 to
4.3.5) which reduce the number of multiplies further. However
these all require more intermediate results to be stored. In
modern DSP chips this can cost more CPU cycles than the extra
multiplications which can often be done simultaneously with
additions. Hence the simple approach given above is frequently
optimal.