The Shannon-McMillan Theorem can be expressed as the
‘zero error limit’ of the Rate Distortion Theorem (Dembo
and Zeitouni, 1998; Cover and Thomas, 1991), which de-
fines a splitting criterion that identifies high probability pairs
of sequences. We follow closely the treatment of Cover and
Thomas (1991).
The origin of the problem is the question of representing
one information source by a simpler one in such a way that
the least information is lost. For example we might have a
continuous variate between 0 and 100, and wish to represent
it in terms of a small set of integers in a way that minimizes
the inevitable distortion that process creates. Typically, for
example, an analog audio signal will be replaced by a ‘digital’
one. The problem is to do this in a way which least distorts
the reconstructed audio waveform.
Suppose the original stationary, ergodic information source
Y with output from a particular alphabet generates sequences
of the form
yn = y1, ...,yn.
These are ‘digitized,’ in some sense, producing a chain of
‘digitized values’
bn = b1,...,bn,
where the b-alphabet is much more restricted than the y-
alphabet.
bn is, in turn, deterministically retranslated into a repro-
duction of the original signal yn. That is, each bm is mapped
on to a unique n-length y-sequence in the alphabet of the
information source Y :
mn
b → y = ^ι,...,^n.
Note, however, that many yn sequences may be mapped
onto the same retranslation sequence yn, so that information
will, in general, be lost.
The central problem is to explicitly minimize that loss.
The retranslation process defines a new stationary, ergodic
information source, Y
The next step is to define a distortion measure, d(y, y),
which compares the original to the retranslated path. For
example the Hamming distortion is
d(y,y) = 1,У = У
d(y,y) = 0,y = y.
(23)
For continuous variates the Squared error distortion is
d(y,y) = (У — У)2.
(24)
There are many possibilities.
The distortion between paths yn and yn is defined as
d(yn,yn ) = n Σ d(yj ,yj).
n j=1
(25)
Suppose that with each path yn and bn-path retranslation
into the y-language and denoted yn , there are associated in-
dividual, joint, and conditional probability distributions
p(yn),p(yn),p(yn∣yn).
The average distortion is defined as
D = Ep(yn)d(yn,yn).
yn
(26)
It is possible, using the distributions given above, to de-
fine the information transmitted from the incoming Y to the
outgoing Y process in the usual manner, using the Shannon
source uncertainty of the strings:
I (Y, Y) ≡ H (y ) - H (y ∣Y = H (y ) + H (Yy) - H (Y, Y).
If there is no uncertainty in Y given the retranslation Y,
then no information is lost.
In general, this will not be true.
The information rate distortion function R(D) for a source
Y with a distortion measure d(y, y) is defined as
R(D)= min I (Y,Y).
p(y,y);^(y,y) p(y)p(y∣y)d(y,y)≤D
(27)
15