We now turn back to the encoding of signals. We are interested in encoding the set
B A (M)={f∈B A :f(t)≤M,t∈R}B A (M)={f∈B A :f(t)≤M,t∈R}
(1)
where
MM is arbitrary but fixed. We shall restrict our discussion to the case where distortion is measured in
L ∞ [T,T]L ∞ [T,T] where
T>0T>0 is arbitrary but fixed. Then,
B A (M)B A (M) is a compact subset of
L ∞ L ∞ :
B A (M)⊆L ∞ [T,T]B A (M)⊆L ∞ [T,T].
We shall sketch how one can construct an asymptotically optimal encoder/decoder for B A B A .
The details for this construction can be found in (Reference).
We know f ^(ω)=0f ^(ω)=0 for ω≥Aπω≥Aπ, and f≤Mf≤M. How can we encode ff in practice? We begin by chosing λ=λ(T)>1λ=λ(T)>1 (see Figure 1) which will represent a slight oversampling factor
we shall utilize.
Given a target distortion ϵ>0ϵ>0, we
choose kk so that 2 k1 <ϵ≤2 k 2 k1 <ϵ≤2 k . Given ff, we shall encode ff by first taking samples f(n λA)f(n λA) for n λA∈[T(1+δ),T(1+δ)]n λA∈[T(1+δ),T(1+δ)] where δ(T)>0δ(T)>0. In other words, we sample ff on a slightly larger interval than [T,T][T,T]. For each sample
f(n λA)f(n λA), we shall use the first k+k 0 (T)k+k 0 (T) bits of its binary expansion.
In other words, our encoder takes ff and the samples f(n λA)f(n λA) and then assigns to f(n λA)f(n λA)
the first k+k 0 (T)k+k 0 (T) bits of this number.
To decode, the receiver would take the bits and construct the approximation f ¯(n λA)f ¯(n λA)
to f(n Aλ)f(n Aλ) from the bits provided.
Notice that we have the accuracy
f(n λA)f ¯(n λA)≤2 kk 0 ·M.f(n λA)f ¯(n λA)≤2 kk 0 ·M.
(2)
We utilize the function
g λ g λ satisfying (
(Reference)) to define
f
¯
(
t
)
=
∑
n
∈
N
T
f
¯
n
λ
A
g
λ
(
λ
A
t

n
)
,
f
¯
(
t
)
=
∑
n
∈
N
T
f
¯
n
λ
A
g
λ
(
λ
A
t

n
)
,
(3)
where
N T :={n:T(1+δ)≤n λA≤T(1+δ)}.
N T :={n:T(1+δ)≤n λA≤T(1+δ)}.
(4)
We then have
f(t)f ¯(t)≤∑ n∈N T fn λAf ¯n λA·g λ (λAtn)+∑ n λA>T(1+δ) fn λA·g λ (λAtn)f(t)f ¯(t)≤∑ n∈N T fn λAf ¯n λA·g λ (λAtn)+∑ n λA>T(1+δ) fn λA·g λ (λAtn)
(5)
The term fn λAf ¯n λAfn λAf ¯n λA that appears in the
first summation in (Equation 5) is bounded by M·2 kk 0 M·2 kk 0 . The term fn λAfn λA that appears in the second summation in the same equation
is bounded by MM. Therefore,
f(t)f ¯(t)≤∑ n∈N T M·2 kk 0 ·g λ (λAtn)+∑ n λA>T(1+δ) M·g λ (λAtn)=:S 1 +S 2 f(t)f ¯(t)≤∑ n∈N T M·2 kk 0 ·g λ (λAtn)+∑ n λA>T(1+δ) M·g λ (λAtn)=:S 1 +S 2
(6)
We can estimate
S 1 S 1
by
S 1 =∑ n∈N T M·2 kk 0 ·g λ (λAtn)≤M·2 kk 0 ·∑ n g λ (λAtn)≤M·C 0 (λ)·2 kk 0 (becauseg(·)decaysfast)S 1 =∑ n∈N T M·2 kk 0 ·g λ (λAtn)≤M·2 kk 0 ·∑ n g λ (λAtn)≤M·C 0 (λ)·2 kk 0 (becauseg(·)decaysfast)
(7)
Therefore, if we choose
k 0 k 0 sufficiently large, then
S 1 ≤M·C 0 (λ)·2 kk 0 ≤ϵ 2S 1 ≤M·C 0 (λ)·2 kk 0 ≤ϵ 2.
The second summation
S 2 S 2 can also be bounded by
ϵ/2ϵ/2 by using the fast
decay of the function
g λ g λ (see (
(Reference))).
To make the encoder/decoder specific we need to precisely define δδ and λλ. It turns out that the best choices (in terms of bit rate performance on the class B A B A ) depend on TT. But δ T →0δ T →0 and λ T →1λ T →1 as T→∞T→∞. Recall that
Shannon sampling requires 2TλA2TλA samples. Since our encoder/decoder uses k+k 0 k+k 0 bits per sample, the total number of bits is (k+k 0 )·2λAT(1+δ)(k+k 0 )·2λAT(1+δ), and so coding will require
roughly kk bits per Shannon sample.
This encoder/decoder can be proven to be optimal in the sense of averaged performance as we shall now describe. The average of performance of optimal encoding is defined by
lim
T
→
∞
n ϵ B A (M),L ∞ ⌊T,T⌋ 2T
lim
T
→
∞
n ϵ B A (M),L ∞ ⌊T,T⌋ 2T
(8)
If we replace the optimal bit rate
n ϵ n ϵ in (
Equation 8) by the number of bits required by our encoder/decoder
then the resulting limit will be the same as that in (
Equation 8).
In summary, to encode band limited signals on an interval [T,T][T,T], an optimal strategy is to sample at a slightly higher rate than Nyquist and on a slightly large interval than [T,T][T,T]. Each sample should then be quantized by using the binary expansion of the sample. In this way, for an investment of kk bits per Nyquist rate sample, we get a distortion of
2 k 2 k .
To get a feel for the number of bits required by such an encoder, let us say A=10 6 A=10 6 (signals
band limited to 1Mhz). Say T=24hours≈10 5 secondsT=24hours≈10 5 seconds, and k=10k=10 bits. Then, A·k·2T=10 6 ·10·10 5 =10 12 A·k·2T=10 6 ·10·10 5 =10 12 bits. This is too BIG!
The above encoding is is known as Pulse Coded Modulation (PCM). In practice,
people frequently use another encoder called SigmaDelta Modulation. Instead of oversampling
just slightly, Sigma Delta over samples a lot and then assign only one (or a few) bits per sample.
Why is SigmaDelta preferred to PCM in practice? There are two reasons commonly given:
 Getting accurate samples, quantization, etc. is not
practical because of noise. For better accuracy, we need more
expensive hardware.
 Noise shaping. In SigmaDelta, the distortion is higher
but the distortion is spread over frequencies outside of the desired
range.
In PCM, the distortion decays exponentially (like 2 k 2 k ), whereas
for SigmaDelta, the distortion decays like a polynomial (like
1 k m 1 k m ). Although the distortion decays faster in PCM, the
distortion in SigmaDelta is spread outside the desired frequency
range.