Because we compute prime point DFTs by converting them in to
circular convolutions, most of this and the next section is devoted
to an explanation of the split nesting convolution algorithm.
In this section we introduce the various operations needed
to carry out the split nesting algorithm.
In particular, we describe the prime factor permutation
that is used to convert a one-dimensional circular convolution
into a multi-dimensional one.
We also discuss the reduction operations needed when
the Chinese Remainder Theorem for polynomials is used
in the computation of convolution.
The reduction operations needed for the split nesting algorithm
are particularly well organized.
We give an explicit matrix description of the reduction operations
and give a program that implements the action of these
reduction operations.
The presentation relies upon the notions of similarity
transformations, companion matrices and Kronecker products.
With them, we describe the split nesting algorithm in a manner
that brings out its structure.
We find that when companion matrices are used to describe
convolution, the reduction operations block diagonalizes
the circular shift matrix.
The companion matrix
of a monic polynomial,
M(s)=m0+m1s+⋯+mn-1sn-1+snM(s)=m0+m1s+⋯+mn-1sn-1+sn
is given by
C
M
=
-
m
0
1
1
-
m
1
⋱
⋮
1
-
m
n
-
1
.
C
M
=
-
m
0
1
1
-
m
1
⋱
⋮
1
-
m
n
-
1
.
(1)
Its usefulness in the following discussion comes from
the following relation which permits a matrix formulation
of convolution.
Let
X
(
s
)
=
x
0
+
x
1
s
+
⋯
x
n
-
1
s
n
-
1
H
(
s
)
=
h
0
+
h
1
s
+
⋯
h
n
-
1
s
n
-
1
Y
(
s
)
=
y
0
+
y
1
s
+
⋯
y
n
-
1
s
n
-
1
M
(
s
)
=
m
0
+
m
1
s
+
⋯
m
n
-
1
s
n
-
1
+
s
n
X
(
s
)
=
x
0
+
x
1
s
+
⋯
x
n
-
1
s
n
-
1
H
(
s
)
=
h
0
+
h
1
s
+
⋯
h
n
-
1
s
n
-
1
Y
(
s
)
=
y
0
+
y
1
s
+
⋯
y
n
-
1
s
n
-
1
M
(
s
)
=
m
0
+
m
1
s
+
⋯
m
n
-
1
s
n
-
1
+
s
n
(2)
Then
Y
(
s
)
=
〈
H
(
s
)
X
(
s
)
〉
M
(
s
)
⇔
y
=
∑
k
=
0
n
-
1
h
k
C
M
k
x
Y
(
s
)
=
〈
H
(
s
)
X
(
s
)
〉
M
(
s
)
⇔
y
=
∑
k
=
0
n
-
1
h
k
C
M
k
x
(3)
where y=(y0,⋯,yn-1)ty=(y0,⋯,yn-1)t,
x=(x0,⋯,xn-1)tx=(x0,⋯,xn-1)t,
and CMCM is the companion matrix of M(s)M(s).
In Equation 3, we say yy is the convolution of xx and
hh with respect to M(s)M(s).
In the case of circular convolution,
M(s)=sn-1M(s)=sn-1 and Csn-1Csn-1 is
the circular shift matrix
denoted by SnSn,
S
n
=
1
1
⋱
1
S
n
=
1
1
⋱
1
(4)
Notice that any circulant matrix can be written as
∑khkSnk∑khkSnk.
Similarity transformations
can be used to interpret the action of some convolution algorithms.
If CM=T-1ATCM=T-1AT for some matrix TT
(CMCM and AA are similar, denoted
CM∼ACM∼A),
then Equation 3 becomes
y
=
T
-
1
∑
k
=
0
n
-
1
h
k
A
k
T
x
.
y
=
T
-
1
∑
k
=
0
n
-
1
h
k
A
k
T
x
.
(5)
That is, by employing the similarity transformation given
by TT in this way, the action of SnkSnk is
replaced by that of AkAk.
Many circular convolution algorithms can be understood,
in part,
by understanding the manipulations made to SnSn
and the resulting new matrix AA.
If the transformation TT is to be useful,
it must satisfy two requirements:
(1) TxTx must be simple to compute, and
(2) AA must have some advantageous structure.
For example, by the convolution property of the DFT,
the DFT matrix FF diagonalizes SnSn,
S
n
=
F
-
1
w
0
w
1
⋱
w
n
-
1
F
S
n
=
F
-
1
w
0
w
1
⋱
w
n
-
1
F
(6)
so that it diagonalizes every circulant matrix.
In this case, TxTx can be computed by using an FFT and
the structure of AA is the simplest possible.
So the two above mentioned conditions are met.
The Winograd Structure
can be described in this manner also.
Suppose M(s)M(s) can be factored as M(s)=M1(s)M2(s)M(s)=M1(s)M2(s)
where M1M1 and M2M2 have no common roots,
then CM∼CM1⊕CM2CM∼CM1⊕CM2
where ⊕⊕ denotes the matrix direct sum.
Using this similarity and recalling Equation 3,
the original convolution is decomposed into
disjoint convolutions.
This is, in fact, a statement of the Chinese Remainder Theorem
for polynomials expressed in matrix notation.
In the case of circular convolution,
sn-1=∏d|nΦd(s)sn-1=∏d|nΦd(s), so that
SnSn can be transformed to a block diagonal matrix,
S
n
∼
C
Φ
1
C
Φ
d
⋱
C
Φ
n
=
⊕
d
|
n
C
Φ
d
S
n
∼
C
Φ
1
C
Φ
d
⋱
C
Φ
n
=
⊕
d
|
n
C
Φ
d
(7)
where Φd(s)Φd(s) is the dthdth cyclotomic polynomial.
In this case, each block represents a convolution
with respect to a cyclotomic polynomial, or a
`cyclotomic convolution'.
Winograd's approach carries out these cyclotomic convolutions
using the Toom-Cook algorithm.
Note that for each divisor, dd, of nn there is a
corresponding block on the diagonal of size
φ(d)φ(d),
for the degree of Φd(s)Φd(s)
is φ(d)φ(d)
where φ(·)φ(·) is the Euler totient function.
This method is good for short lengths, but
as nn increases
the cyclotomic convolutions become cumbersome,
for as the number of distinct prime divisors of dd
increases,
the operation described by ∑khkCΦdk∑khkCΦdk
becomes more difficult to implement.
The Agarwal-Cooley Algorithm
utilizes the fact that
S
n
=
P
t
S
n
1
⊗
S
n
2
P
S
n
=
P
t
S
n
1
⊗
S
n
2
P
(8)
where n=n1n2n=n1n2, (n1,n2)=1(n1,n2)=1 and
PP is an appropriate permutation [1].
This converts the one dimensional circular convolution
of length nn to a two dimensional one of length
n1n1 along one dimension and length n2n2 along
the second.
Then an n1n1-point and an n2n2-point circular convolution
algorithm can be combined to obtain an nn-point algorithm.
In polynomial notation, the mapping accomplished by
this permutation
PP can be informally indicated by
Y
(
s
)
=
〈
X
(
s
)
H
(
s
)
〉
s
n
-
1
⇔
P
Y
(
s
,
t
)
=
〈
X
(
s
,
t
)
H
(
s
,
t
)
〉
s
n
1
-
1
,
t
n
2
-
1
.
Y
(
s
)
=
〈
X
(
s
)
H
(
s
)
〉
s
n
-
1
⇔
P
Y
(
s
,
t
)
=
〈
X
(
s
,
t
)
H
(
s
,
t
)
〉
s
n
1
-
1
,
t
n
2
-
1
.
(9)
It should be noted that Equation 8
implies that a circulant matrix of size n1n2n1n2 can be
written as a block circulant matrix with circulant
blocks.
The Split-Nesting algorithm
[3] combines the structures of the
Winograd and Agarwal-Cooley methods, so that SnSn is transformed
to a block diagonal
matrix as in Equation 7,
S
n
∼
⊕
d
|
n
Ψ
(
d
)
.
S
n
∼
⊕
d
|
n
Ψ
(
d
)
.
(10)
Here
Ψ(d)=⊗p|d,p∈PCΦHd(p)Ψ(d)=⊗p|d,p∈PCΦHd(p)
where Hd(p)Hd(p) is the highest power of pp dividing dd, and
PP is the set of primes.
S
45
∼
1
C
Φ
3
C
Φ
9
C
Φ
5
C
Φ
3
⊗
C
Φ
5
C
Φ
9
⊗
C
Φ
5
S
45
∼
1
C
Φ
3
C
Φ
9
C
Φ
5
C
Φ
3
⊗
C
Φ
5
C
Φ
9
⊗
C
Φ
5
(11)
In this structure a multidimensional cyclotomic convolution,
represented by Ψ(d)Ψ(d), replaces each cyclotomic convolution
in Winograd's algorithm (represented by CΦdCΦd in
Equation 7.
Indeed, if the product of b1,⋯,bkb1,⋯,bk is dd and they are
pairwise relatively prime, then
CΦd∼CΦb1⊗⋯⊗CΦbkCΦd∼CΦb1⊗⋯⊗CΦbk.
This gives a method for combining cyclotomic convolutions
to compute a longer circular convolution.
It is like the Agarwal-Cooley method but requires fewer
additions [3].
One can obtain Sn1⊗Sn2Sn1⊗Sn2 from
Sn1n2Sn1n2 when (n1,n2)=1(n1,n2)=1,
for in this case,
SnSn is similar to Sn1⊗Sn2Sn1⊗Sn2, n=n1n2n=n1n2.
Moreover, they are related by a permutation.
This permutation is that of the prime factor FFT
algorithms and is employed in nesting algorithms
for circular convolution [1], [2].
The permutation is described by Zalcstein [7],
among others, and it is his description we draw on in the following.
Let n=n1n2n=n1n2 where (n1,n2)=1(n1,n2)=1.
Define ekek, (0≤k≤n-10≤k≤n-1), to be the standard basis vector,
(0,⋯,0,1,0,⋯,0)t(0,⋯,0,1,0,⋯,0)t, where the 1 is in
the kthkth position.
Then, the circular shift matrix, SnSn, can be described by
S
n
e
k
=
e
〈
k
+
1
〉
n
.
S
n
e
k
=
e
〈
k
+
1
〉
n
.
(12)
Note that, by inspection,
(
S
n
2
⊗
S
n
1
)
e
a
+
n
1
b
=
e
〈
a
+
1
〉
n
1
+
n
1
〈
b
+
1
〉
n
2
(
S
n
2
⊗
S
n
1
)
e
a
+
n
1
b
=
e
〈
a
+
1
〉
n
1
+
n
1
〈
b
+
1
〉
n
2
(13)
where 0≤a≤n1-10≤a≤n1-1 and 0≤b≤n2-10≤b≤n2-1.
Because n1n1 and n2n2 are relatively prime a permutation matrix PP
can be defined by
P
e
k
=
e
〈
k
〉
n
1
+
n
1
〈
k
〉
n
2
.
P
e
k
=
e
〈
k
〉
n
1
+
n
1
〈
k
〉
n
2
.
(14)
With this PP,
P
S
n
e
k
=
P
e
〈
k
+
1
〉
n
=
e
〈
〈
k
+
1
〉
n
〉
n
1
+
n
1
〈
〈
k
+
1
〉
n
〉
n
2
=
e
〈
k
+
1
〉
n
1
+
n
1
〈
k
+
1
〉
n
2
P
S
n
e
k
=
P
e
〈
k
+
1
〉
n
=
e
〈
〈
k
+
1
〉
n
〉
n
1
+
n
1
〈
〈
k
+
1
〉
n
〉
n
2
=
e
〈
k
+
1
〉
n
1
+
n
1
〈
k
+
1
〉
n
2
(15)
and
(
S
n
2
⊗
S
n
1
)
P
e
k
=
(
S
n
2
⊗
S
n
1
)
e
〈
k
〉
n
1
+
n
1
〈
k
〉
n
2
=
e
〈
k
+
1
〉
n
1
+
n
1
〈
k
+
1
〉
n
2
.
(
S
n
2
⊗
S
n
1
)
P
e
k
=
(
S
n
2
⊗
S
n
1
)
e
〈
k
〉
n
1
+
n
1
〈
k
〉
n
2
=
e
〈
k
+
1
〉
n
1
+
n
1
〈
k
+
1
〉
n
2
.
(16)
Since PSnek=(Sn2⊗Sn1)PekPSnek=(Sn2⊗Sn1)Pek
and P-1=PtP-1=Pt, one gets, in the multi-factor case, the following.
If n=n1⋯nkn=n1⋯nk and n1,...,nkn1,...,nk are pairwise
relatively prime, then
Sn=Pt(Snk⊗⋯⊗Sn1)PSn=Pt(Snk⊗⋯⊗Sn1)P
where PP is the permutation matrix given by
P
ek
=e
〈k
〉
n1
+n1
〈k
〉
n2
+⋯+
n1⋯
nk-1〈k
〉
n
k
P
ek
=e
〈k
〉
n1
+n1
〈k
〉
n2
+⋯+
n1⋯
nk-1〈k
〉
n
k
.
This useful permutation will be denoted here as
Pnk,⋯,n1Pnk,⋯,n1.
If n=p1e1p2e2⋯pkekn=p1e1p2e2⋯pkek then
this permutation yields the matrix, Sp1e1⊗⋯⊗SpkekSp1e1⊗⋯⊗Spkek. This product can be written
simply as ⊗i=1kSpiei⊗i=1kSpiei, so that
one has
Sn=Pn1,⋯,nkt⊗i=1kSpieiPn1,⋯,nkSn=Pn1,⋯,nkt⊗i=1kSpieiPn1,⋯,nk.
It is quite simple to show that
P
a
,
b
,
c
=
(
I
a
⊗
P
b
,
c
)
P
a
,
b
c
=
(
P
a
,
b
⊗
I
c
)
P
a
b
,
c
.
P
a
,
b
,
c
=
(
I
a
⊗
P
b
,
c
)
P
a
,
b
c
=
(
P
a
,
b
⊗
I
c
)
P
a
b
,
c
.
(17)
In general, one has
P
n
1
,
⋯
,
n
k
=
∏
i
=
2
k
(
P
n
1
⋯
n
i
-
1
,
n
i
⊗
I
n
i
+
1
⋯
n
k
)
.
P
n
1
,
⋯
,
n
k
=
∏
i
=
2
k
(
P
n
1
⋯
n
i
-
1
,
n
i
⊗
I
n
i
+
1
⋯
n
k
)
.
(18)
A Matlab function for Pa,b⊗IsPa,b⊗Is is
pfp2I()
in one of the appendices.
This program is a direct implementation
of the definition.
In a paper by Templeton [5], another method for
implementing Pa,bPa,b, without `if' statements,
is given. That method requires some precalculations, however.
A function for
Pn1,⋯,nkPn1,⋯,nk is
pfp(). It uses Equation 18 and calls
pfp2I() with the appropriate arguments.
The Chinese Remainder Theorem for polynomials can be used to
decompose a convolution of two sequences
(the polynomial product of two polynomials evaluated
modulo a third polynomial)
into smaller convolutions (smaller polynomial products) [6].
The Winograd nn point circular convolution algorithm
requires that polynomials are reduced modulo the
cyclotomic polynomial factors of sn-1sn-1,
Φd(s)Φd(s) for each dd dividing nn.
When nn has several prime divisors the reduction operations
become quite complicated and writing a program to implement them
is difficult.
However, when nn is a prime power, the reduction
operations are very structured and can be done in a straightforward manner.
Therefore, by converting a one-dimensional convolution
to a multi-dimensional one, in which the length is a prime power
along each dimension,
the split nesting algorithm avoids the need for
complicated reductions operations.
This is one advantage the split nesting algorithm has over
the Winograd algorithm.
By applying the reduction operations appropriately to the circular
shift matrix, we are able to obtain a block diagonal form,
just as in the Winograd convolution algorithm.
However, in the split nesting algorithm, each diagonal block
represents multi-dimensional cyclotomic convolution
rather than a one-dimensional one.
By forming multi-dimensional convolutions out of one-dimensional ones,
it is possible to combine algorithms for smaller convolutions
(see the next section).
This is a second advantage split nesting algorithm has over
the Winograd algorithm.
The split nesting algorithm, however, generally uses more than
the minimum number of multiplications.
Below we give an explicit matrix description of the required
reduction operations, give a program that implements them,
and give a formula for the number of additions required.
(No multiplications are needed.)
First, consider n=pn=p, a prime.
Let
X
(
s
)
=
x
0
+
x
1
s
+
⋯
+
x
p
-
1
s
p
-
1
X
(
s
)
=
x
0
+
x
1
s
+
⋯
+
x
p
-
1
s
p
-
1
(19)
and recall sp-1=(s-1)(sp-1+sp-2+⋯+s+1)=Φ1(s)Φp(s)sp-1=(s-1)(sp-1+sp-2+⋯+s+1)=Φ1(s)Φp(s).
The residue
〈
X
(
s
)
〉
Φ
1
(
s
)
〈
X
(
s
)
〉
Φ
1
(
s
)
is found by summing the coefficients of X(s)X(s).
The residue
〈
X
(
s
)
〉
Φ
p
(
s
)
〈
X
(
s
)
〉
Φ
p
(
s
)
is given by
∑k=0p-2(xk-xp-1)sk∑k=0p-2(xk-xp-1)sk.
Define RpRp to be the matrix that
reduces X(s)X(s) modulo Φ1(s)Φ1(s) and Φp(s)Φp(s),
such that
if
X
0
(
s
)
=
〈
X
(
s
)
〉
Φ
1
(
s
)
X
0
(
s
)
=
〈
X
(
s
)
〉
Φ
1
(
s
)
and
X
1
(
s
)
=
〈
X
(
s
)
〉
Φ
p
(
s
)
X
1
(
s
)
=
〈
X
(
s
)
〉
Φ
p
(
s
)
then
X
0
X
1
=
R
p
X
X
0
X
1
=
R
p
X
(20)
where XX, X0X0 and X1X1 are vectors formed
from the coefficients of X(s)X(s), X0(s)X0(s) and X1(s)X1(s).
That is,
R
p
=
1
1
1
1
1
1
-
1
1
-
1
1
-
1
1
-
1
R
p
=
1
1
1
1
1
1
-
1
1
-
1
1
-
1
1
-
1
(21)
so that Rp=
1
-
1
GpRp=
1
-
1
Gp
where GpGp is the Φp(s)Φp(s) reduction matrix
of size (p-1)×p(p-1)×p.
Similarly,
let X(s)=x0+x1s+⋯+xpe-1spe-1X(s)=x0+x1s+⋯+xpe-1spe-1 and
define RpeRpe
to be the matrix that reduces X(s)X(s) modulo
Φ1(s)Φ1(s), Φp(s)Φp(s), ..., Φpe(s)Φpe(s)
such that
X
0
X
1
⋮
X
e
=
R
p
e
X
,
X
0
X
1
⋮
X
e
=
R
p
e
X
,
(22)
where as above, XX, X0X0, ..., XeXe are the coefficients
of
X
(
s
)
X
(
s
)
,
〈
X
(
s
)
〉
Φ
1
(
s
)
〈
X
(
s
)
〉
Φ
1
(
s
)
, ...,
〈
X
(
s
)
〉
Φ
p
e
(
s
)
〈
X
(
s
)
〉
Φ
p
e
(
s
)
.
It turns out that
R
p
e
R
p
e
can be written in terms of
R
p
R
p
.
Consider the reduction of
X
(
s
)
=
x
0
+
⋯
+
x
8
s
8
X
(
s
)
=
x
0
+
⋯
+
x
8
s
8
by
Φ
1
(
s
)
=
s
-
1
Φ
1
(
s
)
=
s
-
1
,
Φ
3
(
s
)
=
s
2
+
s
+
1
Φ
3
(
s
)
=
s
2
+
s
+
1
, and
Φ
9
(
s
)
=
s
6
+
s
3
+
1
Φ
9
(
s
)
=
s
6
+
s
3
+
1
.
This is most efficiently performed by reducing
X
(
s
)
X
(
s
)
in two steps.
That is, calculate
X
'
(
s
)
=
〈
X
(
s
)
〉
s
3
-
1
X
'
(
s
)
=
〈
X
(
s
)
〉
s
3
-
1
and
X
2
(
s
)
=
〈
X
(
s
)
〉
s
6
+
s
3
+
1
X
2
(
s
)
=
〈
X
(
s
)
〉
s
6
+
s
3
+
1
.
Then calculate
X
0
(
s
)
=
〈
X
'
(
s
)
〉
s
-
1
X
0
(
s
)
=
〈
X
'
(
s
)
〉
s
-
1
and
X
1
(
s
)
=
〈
X
'
(
s
)
〉
s
2
+
s
+
1
X
1
(
s
)
=
〈
X
'
(
s
)
〉
s
2
+
s
+
1
.
In matrix notation this becomes
X
'
X
2
=
I
3
I
3
I
3
I
3
-
I
3
I
3
-
I
3
X
and
X
0
X
1
=
1
1
1
1
-
1
1
-
1
X
'
.
X
'
X
2
=
I
3
I
3
I
3
I
3
-
I
3
I
3
-
I
3
X
and
X
0
X
1
=
1
1
1
1
-
1
1
-
1
X
'
.
(23)
Together these become
X
0
X
1
X
2
=
R
3
I
3
I
3
I
3
I
3
I
3
I
3
-
I
3
I
3
-
I
3
X
X
0
X
1
X
2
=
R
3
I
3
I
3
I
3
I
3
I
3
I
3
-
I
3
I
3
-
I
3
X
(24)
or
X
0
X
1
X
2
=
(
R
3
⊕
I
6
)
(
R
3
⊗
I
3
)
X
X
0
X
1
X
2
=
(
R
3
⊕
I
6
)
(
R
3
⊗
I
3
)
X
(25)
so that
R9=(R3⊕I6)(R3⊗I3)R9=(R3⊕I6)(R3⊗I3)
where ⊕⊕ denotes the matrix direct sum.
Similarly, one finds that
R27=(R3⊕I24)((R3⊗I3)⊕I18)(R3⊗I9)R27=(R3⊕I24)((R3⊗I3)⊕I18)(R3⊗I9).
In general, one has the following.
RpeRpe is a pe×pepe×pe matrix given by
Rpe=∏k=0e-1((Rp⊗Ipk)⊕Ipe-pk+1)Rpe=∏k=0e-1((Rp⊗Ipk)⊕Ipe-pk+1)
and can be implemented with 2(pe-1)2(pe-1)
additions.
The following formula gives the decomposition of a circular
convolution into disjoint non-circular convolutions when
the number of points is a prime power.
R
p
e
S
p
e
R
p
e
-
1
=
1
C
Φ
p
⋱
C
Φ
p
e
=
⊕
i
=
0
e
C
Φ
p
i
R
p
e
S
p
e
R
p
e
-
1
=
1
C
Φ
p
⋱
C
Φ
p
e
=
⊕
i
=
0
e
C
Φ
p
i
(26)
R
9
S
9
R
9
-
1
=
1
C
Φ
3
C
Φ
9
R
9
S
9
R
9
-
1
=
1
C
Φ
3
C
Φ
9
(27)
It turns out that when nn is not a prime power,
the reduction of polynomials modulo the cyclotomic polynomial
Φn(s)Φn(s) becomes complicated,
and with an increasing number of prime factors, the complication increases.
Recall, however, that a circular convolution of length
p1e1⋯pkekp1e1⋯pkek can be converted (by an appropriate
permutation) into a kk dimensional circular convolution
of length pieipiei along dimension ii.
By employing this one-dimensional to multi-dimensional
mapping technique, one can avoid having to perform
polynomial reductions modulo Φn(s)Φn(s)
for non-prime-power nn.
The prime factor permutation discussed previously
is the permutation that converts a one-dimensional circular
convolution into a multi-dimensional one.
Again, we can use the Kronecker product to
represent this.
In this case,
the reduction operations are applied to each matrix in the
following way:
T
S
p
1
e
1
⊗
⋯
⊗
S
p
k
e
k
T
-
1
=
⊕
i
=
0
e
1
C
Φ
p
1
i
⊗
⋯
⊗
⊕
i
=
0
e
k
C
Φ
p
k
i
T
S
p
1
e
1
⊗
⋯
⊗
S
p
k
e
k
T
-
1
=
⊕
i
=
0
e
1
C
Φ
p
1
i
⊗
⋯
⊗
⊕
i
=
0
e
k
C
Φ
p
k
i
(28)
where
T
=
R
p
1
e
1
⊗
⋯
⊗
R
p
k
e
k
T
=
R
p
1
e
1
⊗
⋯
⊗
R
p
k
e
k
(29)
T
S
9
⊗
S
5
T
-
1
=
1
C
Φ
3
C
Φ
9
⊗
1
C
Φ
5
T
S
9
⊗
S
5
T
-
1
=
1
C
Φ
3
C
Φ
9
⊗
1
C
Φ
5
(30)
where T=R9⊗R5T=R9⊗R5.
The matrix Rp1e1⊗⋯⊗RpkekRp1e1⊗⋯⊗Rpkek
can be factored using a property of the Kronecker product.
Notice that (A⊗B)=(A⊗I)(I⊗B)(A⊗B)=(A⊗I)(I⊗B),
and (A⊗B⊗C)=(A⊗I)(I⊗B⊗I)(I⊗C)(A⊗B⊗C)=(A⊗I)(I⊗B⊗I)(I⊗C)
(with appropriate dimensions) so that one gets
⊗
i
=
1
k
R
p
i
e
i
=
∏
i
=
1
k
(
I
m
i
⊗
R
p
i
e
i
⊗
I
n
i
)
,
⊗
i
=
1
k
R
p
i
e
i
=
∏
i
=
1
k
(
I
m
i
⊗
R
p
i
e
i
⊗
I
n
i
)
,
(31)
where mi=∏j=1i-1pjejmi=∏j=1i-1pjej,
ni=∏j=i+1kpjejni=∏j=i+1kpjej
and where the empty product is taken to be 1.
This factorization shows that TT can be implemented basically
by implementing copies of RpeRpe.
There are many variations on this factorization
as explained in [4].
That the various factorization can be interpreted as vector
or parallel implementations is also explained in
[4].
R
9
⊗
R
5
=
(
R
9
⊗
I
5
)
(
I
9
⊗
R
5
)
R
9
⊗
R
5
=
(
R
9
⊗
I
5
)
(
I
9
⊗
R
5
)
(32)
and
R
9
⊗
R
25
⊗
R
7
=
(
R
9
⊗
I
175
)
(
I
9
⊗
R
25
⊗
I
7
)
(
I
225
⊗
R
7
)
R
9
⊗
R
25
⊗
R
7
=
(
R
9
⊗
I
175
)
(
I
9
⊗
R
25
⊗
I
7
)
(
I
225
⊗
R
7
)
(33)
When this factored form of ⊗Rni⊗Rni
or any of the variations alluded to above,
is used, the number of additions incurred is given by
∑
i
=
1
k
N
p
i
e
i
A
(
R
p
i
e
i
)
=
∑
i
=
1
k
N
p
i
e
i
2
(
p
i
e
i
-
1
)
=
2
N
∑
i
=
1
k
1
-
1
p
i
e
i
=
2
N
k
-
∑
i
=
1
k
1
p
i
e
i
∑
i
=
1
k
N
p
i
e
i
A
(
R
p
i
e
i
)
=
∑
i
=
1
k
N
p
i
e
i
2
(
p
i
e
i
-
1
)
=
2
N
∑
i
=
1
k
1
-
1
p
i
e
i
=
2
N
k
-
∑
i
=
1
k
1
p
i
e
i
(34)
where N=p1e1⋯pkekN=p1e1⋯pkek.
Although the use of operations of the form
Rp1e1⊗⋯⊗RpkekRp1e1⊗⋯⊗Rpkek
is simple, it does not exactly separate the circular convolution
into smaller disjoint convolutions.
In other words, its use does not give rise in
Equation 28 and Equation 30 to
block diagonal matrices whose diagonal
blocks are the form ⊗iCΦpi⊗iCΦpi.
However, by reorganizing the arrangement of the
operations we can obtain the block diagonal
form we seek.
First, suppose AA, BB and CC are matrices of sizes
a×aa×a, b×bb×b and c×cc×c
respectively.
If
T
B
T
-
1
=
B
1
B
2
T
B
T
-
1
=
B
1
B
2
(35)
where B1B1 and B2B2 are
matrices of sizes b1×b1b1×b1 and
b2×b2b2×b2,
then
Q
(
A
⊗
B
⊗
C
)
Q
-
1
=
A
⊗
B
1
⊗
C
A
⊗
B
2
⊗
C
Q
(
A
⊗
B
⊗
C
)
Q
-
1
=
A
⊗
B
1
⊗
C
A
⊗
B
2
⊗
C
(36)
where
Q
=
I
a
⊗
T
(
1
:
b
1
,
:
)
⊗
I
c
I
a
⊗
T
(
b
1
+
1
:
b
,
:
)
⊗
I
c
.
Q
=
I
a
⊗
T
(
1
:
b
1
,
:
)
⊗
I
c
I
a
⊗
T
(
b
1
+
1
:
b
,
:
)
⊗
I
c
.
(37)
Here T(1:b1,:)T(1:b1,:) denotes the first b1b1 rows
and all the columns of TT and similarly for
T(b1+1:b,:)T(b1+1:b,:).
Note that
A
⊗
B
1
⊗
C
A
⊗
B
2
⊗
C
≠
A
⊗
B
1
B
2
⊗
C
.
A
⊗
B
1
⊗
C
A
⊗
B
2
⊗
C
≠
A
⊗
B
1
B
2
⊗
C
.
(38)
That these two expressions are not equal explains why
the arrangement of operations must be reorganized
in order to obtain the desired block diagonal form.
The appropriate reorganization is described by the
expression in Equation 37.
Therefore, we must modify the transformation of
Equation 28 appropriately.
It should be noted that this reorganization of
operations does not change their computational cost.
It is still given by Equation 34.
For example, we can use this observation
and the expression in Equation 37 to arrive at the following
similarity transformation:
Q
S
p
1
⊗
S
p
2
Q
-
1
=
1
C
Φ
p
1
C
Φ
p
2
C
Φ
p
1
⊗
C
Φ
p
2
Q
S
p
1
⊗
S
p
2
Q
-
1
=
1
C
Φ
p
1
C
Φ
p
2
C
Φ
p
1
⊗
C
Φ
p
2
(39)
where
Q
=
I
p
1
⊗
1
-p
2
t
I
p
1
⊗
G
p
2
R
p
1
⊗
I
p
2
Q
=
I
p
1
⊗
1
-p
2
t
I
p
1
⊗
G
p
2
R
p
1
⊗
I
p
2
(40)
1
-p
1
-p
is a column vector of pp 1's
1
-p
=
1
1
⋯
1
t
1
-p
=
1
1
⋯
1
t
(41)
and GpGp is the (p-1)×p(p-1)×p matrix:
G
p
=
1
-
1
1
-
1
⋱
⋮
1
-
1
=
I
p
-
1
-
1
̲
p
-
1
.
G
p
=
1
-
1
1
-
1
⋱
⋮
1
-
1
=
I
p
-
1
-
1
̲
p
-
1
.
(42)
In general we have
R
S
p
1
e
1
⊗
⋯
⊗
S
p
k
e
k
R
-
1
=
⊕
d
|
n
Ψ
(
d
)
R
S
p
1
e
1
⊗
⋯
⊗
S
p
k
e
k
R
-
1
=
⊕
d
|
n
Ψ
(
d
)
(43)
where R=Rp1e1,⋯,pkekR=Rp1e1,⋯,pkek is given by
R
p
1
e
1
,
⋯
,
p
k
e
k
=
∏
i
=
k
1
Q
(
m
i
,
p
i
e
i
,
n
i
)
R
p
1
e
1
,
⋯
,
p
k
e
k
=
∏
i
=
k
1
Q
(
m
i
,
p
i
e
i
,
n
i
)
(44)
with mi=∏j=1i-1pjejmi=∏j=1i-1pjej,
ni=∏j=i+1kpjejni=∏j=i+1kpjej
and
Q
(
a
,
p
e
,
c
)
=
∏
j
=
0
e
-
1
I
a
⊗
1
-
p
t
⊗
I
c
p
j
I
a
⊗
G
p
⊗
I
c
p
j
I
a
c
(
p
e
-
p
j
+
1
)
.
Q
(
a
,
p
e
,
c
)
=
∏
j
=
0
e
-
1
I
a
⊗
1
-
p
t
⊗
I
c
p
j
I
a
⊗
G
p
⊗
I
c
p
j
I
a
c
(
p
e
-
p
j
+
1
)
.
(45)
1
-p
1
-p
and
G
p
G
p
are as given in
Equation 41 and Equation 42.
R
S
9
⊗
S
5
R
-
1
=
1
C
Φ
3
C
Φ
9
C
Φ
5
C
Φ
3
⊗
C
Φ
5
C
Φ
9
⊗
C
Φ
5
R
S
9
⊗
S
5
R
-
1
=
1
C
Φ
3
C
Φ
9
C
Φ
5
C
Φ
3
⊗
C
Φ
5
C
Φ
9
⊗
C
Φ
5
(46)
where
R
=
R
9
,
5
=
Q
(
9
,
5
,
1
)
Q
(
1
,
9
,
5
)
R
=
R
9
,
5
=
Q
(
9
,
5
,
1
)
Q
(
1
,
9
,
5
)
(47)
and
RR can be implemented
with 152 additions.
Notice the distinction between this example and
example "Reduction Operations".
In example "Reduction Operations" we obtained from
S9⊗S5S9⊗S5 a Kronecker product of two
block diagonal matrices, but here we obtained
a block diagonal matrix whose diagonal blocks
are the Kronecker product of cyclotomic companion
matrices.
Each block in Equation 46 represents a multi-dimensional
cyclotomic convolution.
A Matlab program that carries out the operation
Rp1e1,⋯,pkekRp1e1,⋯,pkek
in Equation 43 is KRED()
.
function x = KRED(P,E,K,x)
% x = KRED(P,E,K,x);
% P : P = [P(1),...,P(K)];
% E : E = [E(K),...,E(K)];
for i = 1:K
a = prod(P(1:i-1).^E(1:i-1));
c = prod(P(i+1:K).^E(i+1:K));
p = P(i);
e = E(i);
for j = e-1:-1:0
x(1:a*c*(p^(j+1))) = RED(p,a,c*(p^j),x(1:a*c*(p^(j+1))));
end
end
It calls the Matlab program RED()
.
function y = RED(p,a,c,x)
% y = RED(p,a,c,x);
y = zeros(a*c*p,1);
for i = 0:c:(a-1)*c
for j = 0:c-1
y(i+j+1) = x(i*p+j+1);
for k = 0:c:c*(p-2)
y(i+j+1) = y(i+j+1) + x(i*p+j+k+c+1);
y(i*(p-1)+j+k+a*c+1) = x(i*p+j+k+1) - x(i*p+j+c*(p-1)+1);
end
end
end
These two Matlab programs are not written to run as fast as
they could be.
They are a `naive' coding of Rp1e1,⋯,pkekRp1e1,⋯,pkek
and are meant to serve as a basis for more efficient programs.
In particular, the indexing and the loop counters can
be modified to improve the efficiency.
However, the modifications that minimize the overhead
incurred by indexing operations depends on the
programming language, the compiler and the computer used.
These two programs are written with simple
loop counters and complicated indexing operations
so that appropriate modifications can be easily made.
The inverse of RpRp has the form
R
p
-
1
=
1
p
1
p
-
1
-
1
-
1
-
1
1
-
1
p
-
1
-
1
-
1
1
-
1
-
1
p
-
1
-
1
1
-
1
-
1
-
1
p
-
1
1
-
1
-
1
-
1
-
1
R
p
-
1
=
1
p
1
p
-
1
-
1
-
1
-
1
1
-
1
p
-
1
-
1
-
1
1
-
1
-
1
p
-
1
-
1
1
-
1
-
1
-
1
p
-
1
1
-
1
-
1
-
1
-
1
(48)
and
R
p
e
-
1
=
∏
k
=
0
e
-
1
(
(
R
p
-
1
⊗
I
p
e
-
1
-
k
)
⊕
I
p
e
-
p
e
-
k
)
.
R
p
e
-
1
=
∏
k
=
0
e
-
1
(
(
R
p
-
1
⊗
I
p
e
-
1
-
k
)
⊕
I
p
e
-
p
e
-
k
)
.
(49)
Because the inverse of QQ in Equation 37 is given by
Q
-
1
=
I
a
⊗
T
-
1
(
:
,
1
:
b
1
)
⊗
I
c
I
a
⊗
T
-
1
(
:
,
b
1
+
1
:
b
)
⊗
I
c
Q
-
1
=
I
a
⊗
T
-
1
(
:
,
1
:
b
1
)
⊗
I
c
I
a
⊗
T
-
1
(
:
,
b
1
+
1
:
b
)
⊗
I
c
(50)
the inverse of the matrix RR described by eqs
Equation 43, Equation 44 and Equation 45
is given by
R
-
1
=
∏
i
=
1
k
Q
(
m
i
,
p
i
e
i
,
n
i
)
-
1
R
-
1
=
∏
i
=
1
k
Q
(
m
i
,
p
i
e
i
,
n
i
)
-
1
(51)
with mi=∏j=1i-1pjejmi=∏j=1i-1pjej,
ni=∏j=i+1kpjejni=∏j=i+1kpjej
and
Q
(
a
,
p
e
,
c
)
-
1
=
∏
j
=
e
-
1
0
I
a
⊗
1
-p
t
⊗
I
c
p
j
I
a
⊗
V
p
⊗
I
c
p
j
I
a
c
(
p
e
-
p
j
+
1
)
Q
(
a
,
p
e
,
c
)
-
1
=
∏
j
=
e
-
1
0
I
a
⊗
1
-p
t
⊗
I
c
p
j
I
a
⊗
V
p
⊗
I
c
p
j
I
a
c
(
p
e
-
p
j
+
1
)
(52)
where VpVp denotes the matrix in Equation 48
without the first column.
A Matlab program for RtRt is tKRED()
,
it calls the Matlab program tRED()
.
A Matlab program for R-tR-t is itKRED()
,
it calls the Matlab program itRED()
.
These programs all appear in one of the appendices.