Given ϵ>0ϵ>0, and the compact set KK, consider all coverings
of KK by balls of radius ϵϵ, as shown in
Figure 1. In other words,
K⊆U i=1 N b(f i ,ϵ).K⊆U i=1 N b(f i ,ϵ).
(1)
Let
N ϵ :=inf{N:overallsuchcovers}N ϵ :=inf{N:overallsuchcovers}.
N ϵ (K)N ϵ (K) is called the covering number of
KK. Since it
depends on
XX and
KK, we write it as
N ϵ =N ϵ (K,X)N ϵ =N ϵ (K,X).
The Kolmogorov entropy, denoted by H ϵ (K,X)H ϵ (K,X), of the compact set KK in XX is defined as the logarithm of the covering number:
H ϵ (K,X)=logN ϵ (K,X).H ϵ (K,X)=logN ϵ (K,X).
(2)
The Kolmogorov entropy solves our problem of optimal encoding in the sense of the following theorem.
For any compact set K⊂XK⊂X, we have n ϵ (K,X)=⌈H ϵ (K,X)⌉n ϵ (K,X)=⌈H ϵ (K,X)⌉, where ⌈·⌉⌈·⌉ is the ceiling function.
Sketch: We can define an encoder-decoder as follows
To encode: Say f∈Kf∈K. Just specify which ball it is
covered by. Because the number of balls is N ϵ (K,X ̲ ¯)N ϵ (K,X ̲ ¯), we
need at most ⌈logN ϵ (K,X ̲ ¯)⌉⌈logN ϵ (K,X ̲ ¯)⌉ bits to specify any such ball
ball.
To decode: Just take the center of the ball specified by the
bitstream.
It is now easy to see that this encoder-decoder pair is optimal in either of the senses given
above. □□
The above encoder is not practical. However, the Kolmogorov entropy tells us the best performance we can expect from any encoder-decoder pair. Kolmogorov entropy is defined in the deterministic setting. It is the analogue of the Shannon entropy which is defined in a stochastic setting.