Fast Circular Convolution
Since,
∑m=0N-1xmhn-mmodN=yn
is equivalent to
Yk=XkHk
m
0
N
1
x
m
h
n
m
N
y
n
is equivalent to
Y
k
X
k
H
k
yn
y
n
can be computed as
yn=IDFTDFTxnDFThn
y
n
IDFT
DFT
x
n
DFT
h
n
Cost-
Direct-
N2
N
2
complex multiplies.
-
NN-1
N
N
1
complex adds.
Via FFTs- 3 FFTs + NN multipies.
-
N+3N2log2N
N
3
N
2
2
N
complex multiplies.
-
3Nlog2N
3
N
2
N
complex adds.
If
Hk
H
k
can be precomputed, cost is only 2 FFts +
NN multiplies.
Fast Linear Convolution
DFT produces cicular convolution. For linear convolution, we
must zero-pad sequences so that circular wrap-around always
wraps over zeros.
To achieve linear convolution using fast circular convolution,
we must use zero-padded DFTs of length
N≥L+M-1
N
L
M
1
Choose shortest convenient
N
N
(usually smallest power-of-two greater than or equal to
L+M-1
L
M
1
)
yn=
IDFT
N
DFT
N
xn
DFT
N
hn
y
n
IDFT
N
DFT
N
x
n
DFT
N
h
n
note: There is some inefficiency when compared to circular convolution due to longer zero-padded
DFTs. Still,
ONlog2N
O
N
2
N
savings over direct computation.
Running Convolution
Suppose
L=∞
L
, as in a real time filter application, or
L≫M
≫
L
M
. There are efficient block methods for computing fast convolution.
Overlap-Save (OLS) Method
Note that if a length-
MM filter
hn
h
n
is circularly convulved with a
length-
NN segment of a signal
xn
x
n
,
the first
M-1
M
1
samples are wrapped around and thus is
incorrect. However, for
M-1≤n≤N-1
M
1
n
N
1
,the convolution is linear
convolution, so these samples are correct. Thus
N-M+1
N
M
1
good outputs are produced for each
length-
NN circular convolution.
The Overlap-Save Method: Break long signal into successive
blocks of
N
N
samples, each block overlapping the previous block by
M-1
M
1
samples. Perform circular convolution of each
block with filter
hm
h
m
. Discard first
M-1
M
1
points in each output block, and concatenate the remaining
points to create
yn
y
n
.
Computation cost for a length-NN equals
2n
2
n
FFT per output sample is (assuming precomputed
Hk
H
k
) 2 FFTs and NN multiplies
2N2log2N+NN-M+1=Nlog2N+1N-M+1
complex multiplies
2
N
2
2
N
N
N
M
1
N
2
N
1
N
M
1
complex multiplies
2Nlog2NN-M+1=2Nlog2NN-M+1
complex adds
2
N
2
N
N
M
1
2
N
2
N
N
M
1
complex adds
Compare to
M
M mults,
M-1
M
1
adds per output point for direct method. For a given
M
M, optimal
N
N can be determined by finding
N
N minimizing operation counts. Usualy, optimal
N
N is
4M≤Nopt≤8M
4
M
Nopt
8
M
.
Overlap-Add (OLA) Method
Zero-pad length-
LL blocks by
M-1
M
1
samples.
Add successive blocks, overlapped by
M-1
M
1
samples, so that the tails sum to produce the
complete linear convolution.
Computational Cost: Two length
N=L+M-1
N
L
M
1
FFTs and
M
M mults and
M-1
M
1
adds per
L
L output points; essentially the sames as OLS method.