We saw earlier that Fourier analysis is not well suited to
describing local changes in "frequency content" because the
frequency components defined by the Fourier transform have
infinite (i.e., global) time support. For
example, if we have a signal with periodic components plus a
glitch at time
t0
t0
, we might want accurate knowledge of both the periodic
component frequencies and the glitch time
(Figure 1).
The Short-Time Fourier Transform (STFT) provides a means of
joint time-frequency analysis. The STFT pair can be written
XSTFTΩτ=∫-∞∞xtwt−τⅇ-ⅈΩtdt
XSTFT
Ω
τ
t
x
t
w
t
τ
Ω
t
xt=12π∫-∞∞∫-∞∞XSTFTΩτwt−τⅇⅈΩtdΩdt
x
t
1
2
t
Ω
XSTFT
Ω
τ
w
t
τ
Ω
t
assuming real-valued
wt
w
t
for which
∫|wt|2dt=1
t
w
t
2
1
. The STFT can be interpreted as a "sliding window
CTFT": to calculate
XSTFTΩτ
XSTFT
Ω
τ
, slide the center of window
wt
w
t
to time ττ, window
the input signal, and compute the CTFT of the result (Figure 2).
The idea is to isolate the signal in the vicinity of time
ττ, then perform a CTFT
analysis in order to estimate the "local" frequency content at
time ττ.
Essentially, the STFT uses the basis elements
b
Ω
,
τ
t=wt−τⅇⅈΩt
b
Ω
,
τ
t
w
t
τ
Ω
t
over the range
t∈-∞∞
t
and
Ω∈-∞∞
Ω
. This can be understood as time and frequency shifts
of the window function
wt
w
t
. The STFT basis is often illustrated by a tiling of
the time-frequency plane, where each tile represents a
particular basis element (Figure 3):
The height and width of a tile represent the spectral and
temporal widths of the basis element, respectively, and the
position of a tile represents the spectral and temporal centers
of the basis element. Note that, while the tiling
diagram suggests that the STFT uses a discrete set of
time/frequency shifts, the STFT basis is really constructed from
a continuum of time/frequency shifts.
Note that we can decrease spectral width
ΔΩ
ΔΩ
at the cost of increased temporal width
Δt
Δt
by stretching basis waveforms in time, although the
time-bandwidth product
ΔtΔΩ
Δt
ΔΩ
(i.e., the area of each tile) will
remain constant (Figure 4).
Our observations can be summarized as follows:
-
the time resolutions and frequency resolutions of every STFT
basis element will equal those of the window
wt
w
t
. (All STFT tiles have the same shape.)
-
the use of a wide window will give good frequency resolution
but poor time resolution, while the use of a narrow window
will give good time resolution but poor frequency
resolution. (When tiles are stretched in one direction they
shrink in the other.)
- The combined time-frequency resolution of the basis,
proportional to
1ΔtΔΩ
1
Δt
ΔΩ
, is determined not by window width but by window
shape. Of all shapes, the Gaussian
wt=12πⅇ-12t2
w
t
1
2
1
2
t
2
gives the highest time-frequency resolution,
although its infinite time-support makes it impossible to
implement. (The Gaussian window results in tiles with
minimum area.)
Finally, it is interesting to note that the STFT implies a
particular definition of
instantaneous
frequency. Consider the linear chirp
xt=sinΩ0t2
x
t
Ω0
t
2
. From casual observation, we might expect an
instantaneous frequency of
Ω0τ
Ω0
τ
at time
ττ since
∀,t=τ:sinΩ0t2=sinΩ0τt
t
τ
Ω0
t
2
Ω0
τ
t
The STFT, however, will indicate a
time-
ττ instantaneous frequency
of
ddtΩ0t2|
t=τ
=2Ω0τ
t
τ
t
Ω0
t
2
2
Ω0
τ
The phase-derivative interpretation of
instantaneous frequency only makes sense for signals containing
exactly one sinusoid, though! In summary,
always remember that the traditional notion of "frequency"
applies only to the CTFT; we must be very careful when bending
the notion to include, e.g., "instantaneous
frequency", as the results may be unexpected!