A basic technique in fast algorithms for convolution is
that of interpolation.
That is, two polynomials are evaluated at some common points
and these values are multiplied [1], [3], [4].
By interpolating these products,
the product of the two original polynomials can be determined.
In the Winograd short convolution algorithms, this technique
is used and the common points of evaluation are the simple integers,
0, 1, and -1-1.
Indeed, the computational savings of the interpolation technique
depends on the use of special points at which to interpolate.
In the Winograd algorithm the computational savings
come from the simplicity of the small integers.
(As an algorithm for convolution, the FFT interpolates over the
roots of unity.)
This interpolation method is often called the Toom-Cook method
and it is given by two matrices that describe a bilinear form.
We use bilinear forms to give a matrix formulation of the split
nesting algorithm.
The split nesting algorithm combines smaller convolution algorithms
to obtain algorithms for longer lengths.
We use the Kronecker product to explicitly describe the way in which
smaller convolution algorithms are appropriately combined.
First we consider the linear convolution of two nn point
sequences. Recall that
the linear convolution of hh and xx can be represented
by a matrix vector product.
When n=3n=3:
h
0
h
1
h
0
h
2
h
1
h
0
h
2
h
1
h
2
x
0
x
1
x
2
h
0
h
1
h
0
h
2
h
1
h
0
h
2
h
1
h
2
x
0
x
1
x
2
(1)
This linear convolution matrix can be written as
h0H0+h1H1+h2H2h0H0+h1H1+h2H2 where
HkHk are clear.
The product ∑k=0n-1hkHkx∑k=0n-1hkHkx
can be found using the Toom-Cook algorithm, an
interpolation method.
Choose 2n-12n-1 interpolation points, i1,⋯,i2n-1i1,⋯,i2n-1,
and let AA and CC be matrices given by
A
=
i
1
0
⋯
i
1
n
-
1
⋮
i
2
n
-
1
0
⋯
i
2
n
-
1
n
-
1
and
C
=
i
1
0
⋯
i
1
2
n
-
2
⋮
i
2
n
-
1
0
⋯
i
2
n
-
1
2
n
-
2
-
1
.
A
=
i
1
0
⋯
i
1
n
-
1
⋮
i
2
n
-
1
0
⋯
i
2
n
-
1
n
-
1
and
C
=
i
1
0
⋯
i
1
2
n
-
2
⋮
i
2
n
-
1
0
⋯
i
2
n
-
1
2
n
-
2
-
1
.
(2)
That is, AA is a degree n-1n-1 Vandermonde matrix and
CC is the inverse of the degree 2n-22n-2 Vandermonde matrix
specified by the same points specifying AA.
With these matrices, one has
∑
k
=
0
n
-
1
h
k
H
k
x
=
C
A
h
*
A
x
∑
k
=
0
n
-
1
h
k
H
k
x
=
C
A
h
*
A
x
(3)
where ** denotes point by point multiplication.
The terms AhAh and AxAx are the values of
H(s)H(s) and X(s)X(s) at the points i1,⋯i2n-1i1,⋯i2n-1.
The point by point multiplication gives the
values Y(i1),⋯,Y(i2n-1)Y(i1),⋯,Y(i2n-1).
The operation of CC obtains the coefficients of
Y(s)Y(s) from its values at these points of evaluation.
This is the bilinear form
and it implies that
H
k
=
Cdiag
(
A
e
k
)
A
.
H
k
=
Cdiag
(
A
e
k
)
A
.
(4)
If n=2n=2, then equations Equation 3 and
Equation 4 give
h
0
0
h
1
h
0
0
h
1
x
=
C
A
h
*
A
x
h
0
0
h
1
h
0
0
h
1
x
=
C
A
h
*
A
x
(5)
When the interpolation points are 0, 1,and -1-1,
A
=
1
0
1
1
1
-
1
and
C
=
1
0
0
0
.
5
-
.
5
-
1
.
5
.
5
A
=
1
0
1
1
1
-
1
and
C
=
1
0
0
0
.
5
-
.
5
-
1
.
5
.
5
(6)
However, AA and CC do not need to be Vandermonde
matrices as in Equation 2.
For example, see the two point linear convolution algorithm
in the appendix.
As long as AA and CC are matrices such that
Hk=Cdiag(Aek)AHk=Cdiag(Aek)A, then the linear convolution
of xx and hh is given by
the bilinear form y=C{Ah*Ax}y=C{Ah*Ax}.
More generally, as long as AA, BB and CC are
matrices satisfying Hk=Cdiag(Bek)AHk=Cdiag(Bek)A,
then y=C{Bh*Ax}y=C{Bh*Ax} computes the
linear convolution of hh and xx.
For convenience, if C{Ah*Ax}C{Ah*Ax}
computes the nn point linear convolution of hh and xx
(both hh and xx are nn point sequences),
then we say “(A,B,C)(A,B,C) describes a bilinear
form for nn point linear convolution."
Similarly, we can write a bilinear form for cyclotomic
convolution.
Let dd be any positive integer and let X(s)X(s)
and H(s)H(s) be polynomials of degree
φ(d)-1φ(d)-1 where φ(·)φ(·) is the Euler
totient function.
If AA, BB and CC are matrices
satisfying CΦdk=Cdiag(Bek)ACΦdk=Cdiag(Bek)A for 0≤k≤φ(d)-10≤k≤φ(d)-1,
then the coefficients of Y(s)=〈X(s)H(s)〉Φd(s)Y(s)=〈X(s)H(s)〉Φd(s)
are given by y=C{Bh*Ax}y=C{Bh*Ax}.
As above, if y=C{Bh*Ax}y=C{Bh*Ax} computes the
dd-cyclotomic convolution, then we say
“(A,B,C)(A,B,C) describes a bilinear form for Φd(s)Φd(s)
convolution."
But since 〈X(s)H(s)〉Φd(s)〈X(s)H(s)〉Φd(s) can be found by computing
the product of X(s)X(s) and H(s)H(s) and reducing the
result, a cyclotomic convolution algorithm
can always be derived by following a linear
convolution algorithm by the appropriate reduction
operation:
If GG is the appropriate reduction matrix and if (A,B,F)(A,B,F)
describes a bilinear form for a φ(d)φ(d) point
linear convolution, then
(A,B,GF)(A,B,GF) describes a bilinear form for Φd(s)Φd(s)
convolution.
That is, y=GF{Bh*Ax}y=GF{Bh*Ax}
computes the coefficients of 〈X(s)H(s)〉Φd(s)〈X(s)H(s)〉Φd(s).
By using the Chinese Remainder Theorem for polynomials,
circular convolution can be decomposed into disjoint
cyclotomic convolutions.
Let pp be a prime and consider pp point circular
convolution.
Above we found that
S
p
=
R
p
-
1
1
C
Φ
p
R
p
S
p
=
R
p
-
1
1
C
Φ
p
R
p
(7)
and therefore
S
p
k
=
R
p
-
1
1
C
Φ
p
k
R
p
.
S
p
k
=
R
p
-
1
1
C
Φ
p
k
R
p
.
(8)
If (Ap,Bp,Cp)(Ap,Bp,Cp) describes a bilinear form for Φp(s)Φp(s)
convolution,
then
S
p
k
=
R
p
-
1
1
C
p
diag
1
B
p
R
p
e
k
1
A
p
R
p
S
p
k
=
R
p
-
1
1
C
p
diag
1
B
p
R
p
e
k
1
A
p
R
p
(9)
and consequently the circular convolution of hh and xx can
be computed by
y
=
R
p
-
1
C
{
B
R
p
h
*
A
R
p
x
}
y
=
R
p
-
1
C
{
B
R
p
h
*
A
R
p
x
}
(10)
where A=1⊕ApA=1⊕Ap, B=1⊕BpB=1⊕Bp and C=1⊕CpC=1⊕Cp.
We say (A,B,C)(A,B,C)
describes a bilinear form for pp point circular convolution.
Note that if (D,E,F)(D,E,F) describes a (p-1)(p-1) point linear convolution
then ApAp, BpBp and CpCp can be taken to be
Ap=DAp=D, Bp=EBp=E and Cp=GpFCp=GpF where GpGp represents
the appropriate reduction operations.
Specifically, GpGp is given by
Equation 42 from Preliminaries.
Next we consider pepe point circular convolution.
Recall that
Spe=Rpe-1⊕i=0eCΦpiRpeSpe=Rpe-1⊕i=0eCΦpiRpe
as in Equation 27 from Preliminaries
so that the circular convolution is decomposed into
a set of e+1e+1 disjoint Φpi(s)Φpi(s) convolutions.
If (Api,Bpi,Cpi)(Api,Bpi,Cpi)
describes a bilinear form for Φpi(s)Φpi(s) convolution
and if
A
=
1
⊕
A
p
⊕
⋯
⊕
A
p
e
B
=
1
⊕
B
p
⊕
⋯
⊕
B
p
e
C
=
1
⊕
C
p
⊕
⋯
⊕
C
p
e
A
=
1
⊕
A
p
⊕
⋯
⊕
A
p
e
B
=
1
⊕
B
p
⊕
⋯
⊕
B
p
e
C
=
1
⊕
C
p
⊕
⋯
⊕
C
p
e
(11)
then (ARpe,BRpe,Rpe-1C)(ARpe,BRpe,Rpe-1C)
describes a bilinear form for pepe point circular convolution.
In particular, if
(Dd,Ed,Fd)(Dd,Ed,Fd) describes a bilinear form for dd point
linear convolution, then ApiApi, BpiBpi and CpiCpi
can be taken to be
A
p
i
=
D
φ
(
p
i
)
B
p
i
=
E
φ
(
p
i
)
C
p
i
=
G
p
i
F
φ
(
p
i
)
A
p
i
=
D
φ
(
p
i
)
B
p
i
=
E
φ
(
p
i
)
C
p
i
=
G
p
i
F
φ
(
p
i
)
(12)
where GpiGpi represents the appropriate reduction
operation and φ(·)φ(·) is the Euler totient function.
Specifically, GpiGpi has the following form
G
p
i
=
I
(
p
-
1
)
p
i
-
1
-
1
-p
-
1
⊗
I
p
i
-
1
I
(
p
-
2
)
p
i
-
1
-
1
0
p
i
-
1
+
1
,
(
p
-
2
)
p
i
-
1
-
1
G
p
i
=
I
(
p
-
1
)
p
i
-
1
-
1
-p
-
1
⊗
I
p
i
-
1
I
(
p
-
2
)
p
i
-
1
-
1
0
p
i
-
1
+
1
,
(
p
-
2
)
p
i
-
1
-
1
(13)
if p≥3p≥3, while
G
2
i
=
I
2
i
-
1
-
I
2
i
-
1
-
1
0
1
,
2
i
-
1
-
1
.
G
2
i
=
I
2
i
-
1
-
I
2
i
-
1
-
1
0
1
,
2
i
-
1
-
1
.
(14)
Note that the matrix RpeRpe
block diagonalizes SpeSpe and each diagonal block
represents a cyclotomic convolution.
Correspondingly, the matrices AA, BB and CC of
the bilinear form also have a block diagonal structure.
We now describe the split-nesting algorithm for general
length circular convolution [4].
Let n=p1e1⋯pkekn=p1e1⋯pkek
where pipi are distinct primes.
We have seen that
S
n
=
P
t
R
-
1
⊕
d
|
n
Ψ
(
d
)
R
P
S
n
=
P
t
R
-
1
⊕
d
|
n
Ψ
(
d
)
R
P
(15)
where PP is the prime factor permutation
P=Pp1e1,⋯,pkekP=Pp1e1,⋯,pkek
and RR represents the reduction operations.
For example, see Equation 46 in Preliminaries.
RPRP block diagonalizes SnSn and each diagonal
block represents a multi-dimensional cyclotomic
convolution.
To obtain a bilinear form for a multi-dimensional convolution,
we can combine bilinear forms for one-dimensional
convolutions.
If (Apji,Bpji,Cpji)(Apji,Bpji,Cpji)
describes a bilinear form for Φpji(s)Φpji(s)
convolution
and if
A
=
⊕
d
|
n
A
d
B
=
⊕
d
|
n
B
d
C
=
⊕
d
|
n
C
d
A
=
⊕
d
|
n
A
d
B
=
⊕
d
|
n
B
d
C
=
⊕
d
|
n
C
d
(16)
with
A
d
=
⊗
p
|
d
,
p
∈
P
A
H
d
(
p
)
B
d
=
⊗
p
|
d
,
p
∈
P
B
H
d
(
p
)
C
d
=
⊗
p
|
d
,
p
∈
P
C
H
d
(
p
)
A
d
=
⊗
p
|
d
,
p
∈
P
A
H
d
(
p
)
B
d
=
⊗
p
|
d
,
p
∈
P
B
H
d
(
p
)
C
d
=
⊗
p
|
d
,
p
∈
P
C
H
d
(
p
)
(17)
where Hd(p)Hd(p) is the highest power of pp dividing dd, and
PP is the set of primes,
then
(ARP,BRP,PtR-1C)(ARP,BRP,PtR-1C)
describes a bilinear form for nn point circular convolution.
That is
y
=
P
t
R
-
1
C
B
R
P
h
*
A
R
P
x
y
=
P
t
R
-
1
C
B
R
P
h
*
A
R
P
x
(18)
computes the circular convolution of hh and xx.
As above (Apji,Bpji,Cpji)(Apji,Bpji,Cpji) can be taken to
be (Dφ(pji),Eφ(pji),GpjiFφ(pji))(Dφ(pji),Eφ(pji),GpjiFφ(pji))
where
(Dd,Ed,Fd)(Dd,Ed,Fd) describes a bilinear form for dd point linear convolution.
This is one particular choice for (Apji,Bpji,Cpji)(Apji,Bpji,Cpji)
- other bilinear forms for cyclotomic convolution that are not derived
from linear convolution algorithms exist.
A 45 point circular convolution algorithm:
y
=
P
t
R
-
1
C
B
R
P
h
*
A
R
P
x
y
=
P
t
R
-
1
C
B
R
P
h
*
A
R
P
x
(19)
where
P
=
P
9
,
5
R
=
R
9
,
5
A
=
1
⊕
A
3
⊕
A
9
⊕
A
5
⊕
(
A
3
⊗
A
5
)
⊕
(
A
9
⊗
A
5
)
B
=
1
⊕
B
3
⊕
B
9
⊕
B
5
⊕
(
B
3
⊗
B
5
)
⊕
(
B
9
⊗
B
5
)
C
=
1
⊕
C
3
⊕
C
9
⊕
C
5
⊕
(
C
3
⊗
C
5
)
⊕
(
C
9
⊗
C
5
)
P
=
P
9
,
5
R
=
R
9
,
5
A
=
1
⊕
A
3
⊕
A
9
⊕
A
5
⊕
(
A
3
⊗
A
5
)
⊕
(
A
9
⊗
A
5
)
B
=
1
⊕
B
3
⊕
B
9
⊕
B
5
⊕
(
B
3
⊗
B
5
)
⊕
(
B
9
⊗
B
5
)
C
=
1
⊕
C
3
⊕
C
9
⊕
C
5
⊕
(
C
3
⊗
C
5
)
⊕
(
C
9
⊗
C
5
)
(20)
and where (Apji,Bpji,Cpji)(Apji,Bpji,Cpji)
describes a bilinear form for Φpji(s)Φpji(s) convolution.
The matrix exchange property is a useful technique
that, under certain circumstances,
allows one to save computation in carrying out
the action of bilinear forms [2]. Suppose
y
=
C
A
x
*
B
h
y
=
C
A
x
*
B
h
(21)
as in Equation 18.
When hh is known and fixed, BhBh can be pre-computed
so that yy can be found using only the operations
represented by CC and AA and the point by point
multiplications denoted by **.
The operation of BB is absorbed into the
multiplicative constants.
Note that in Equation 18, the matrix
corresponding to CC is
more complicated than is BB.
It is therefore advantageous to absorb the work
of CC instead of BB into the multiplicative constants if possible.
This can be done when yy is the circular convolution
of xx and hh by using the matrix exchange property.
To explain the matrix exchange property we draw from [2].
Note that y=Cdiag(Ax)Bhy=Cdiag(Ax)Bh, so that
Cdiag(Ax)BCdiag(Ax)B must be the corresponding circulant matrix,
Cdiag
(
A
x
)
B
=
x
0
x
n
-
1
⋯
x
1
x
1
x
0
x
2
⋮
x
n
-
1
x
n
-
2
x
0
.
Cdiag
(
A
x
)
B
=
x
0
x
n
-
1
⋯
x
1
x
1
x
0
x
2
⋮
x
n
-
1
x
n
-
2
x
0
.
(22)
Since
Cdiag(Ax)B=JCdiag(Ax)BtJCdiag(Ax)B=JCdiag(Ax)BtJ
where JJ is the reversal matrix, one gets
y
=
C
A
x
*
B
h
=
Cdiag
(
A
x
)
B
h
=
J
Cdiag
(
A
x
)
B
t
J
h
=
J
B
t
d
i
a
g
(
A
x
)
C
t
J
h
=
J
B
t
A
x
*
C
t
J
h
y
=
C
A
x
*
B
h
=
Cdiag
(
A
x
)
B
h
=
J
Cdiag
(
A
x
)
B
t
J
h
=
J
B
t
d
i
a
g
(
A
x
)
C
t
J
h
=
J
B
t
A
x
*
C
t
J
h
(23)
As noted in [2], the matrix exchange property
can be used whenever
y=T(x)hy=T(x)h where T(x)T(x) satisfies
T(x)=J1T(x)tJ2T(x)=J1T(x)tJ2 for some matrices J1J1 and J2J2.
In that case one gets
y=J1BtAx*CtJ2hy=J1BtAx*CtJ2h.
Applying the matrix exchange property
to Equation 18 one gets
y
=
J
P
t
R
t
B
t
C
t
R
-
t
P
J
h
*
A
R
P
x
.
y
=
J
P
t
R
t
B
t
C
t
R
-
t
P
J
h
*
A
R
P
x
.
(24)
A 45 point circular convolution algorithm:
y
=
J
P
t
R
t
B
t
u
*
A
R
P
x
y
=
J
P
t
R
t
B
t
u
*
A
R
P
x
(25)
where u=CtR-tPJhu=CtR-tPJh and
P
=
P
9
,
5
R
=
R
9
,
5
A
=
1
⊕
A
3
⊕
A
9
⊕
A
5
⊕
(
A
3
⊗
A
5
)
⊕
(
A
9
⊗
A
5
)
B
t
=
1
⊕
B
3
t
⊕
B
9
t
⊕
B
5
t
⊕
(
B
3
t
⊗
B
5
t
)
⊕
(
B
9
t
⊗
B
5
t
)
C
t
=
1
⊕
C
3
t
⊕
C
9
t
⊕
C
5
t
⊕
(
C
3
t
⊗
C
5
t
)
⊕
(
C
9
t
⊗
C
5
t
)
P
=
P
9
,
5
R
=
R
9
,
5
A
=
1
⊕
A
3
⊕
A
9
⊕
A
5
⊕
(
A
3
⊗
A
5
)
⊕
(
A
9
⊗
A
5
)
B
t
=
1
⊕
B
3
t
⊕
B
9
t
⊕
B
5
t
⊕
(
B
3
t
⊗
B
5
t
)
⊕
(
B
9
t
⊗
B
5
t
)
C
t
=
1
⊕
C
3
t
⊕
C
9
t
⊕
C
5
t
⊕
(
C
3
t
⊗
C
5
t
)
⊕
(
C
9
t
⊗
C
5
t
)
(26)
and where (Apji,Bpji,Cpji)(Apji,Bpji,Cpji)
describes a bilinear form for Φpji(s)Φpji(s) convolution.