lim
n→∞
H(X0,...,Xn)
n+1
(3)
This information source is defined as dual to the underlying
ergodic cognitive process.
Adiabatic means that the source has been parametized ac-
cording to some scheme, and that, over a certain range, along
a particular piece, as the parameters vary, the source remains
as close to stationary and ergodic as needed for information
theory’s central theorems to apply. Stationary means that
the system’s probabilities do not change in time, and ergodic,
roughly, that the cross sectional means approximate long-time
averages. Between pieces it is necessary to invoke various
kinds of phase transition formalisms, as described more fully
in Wallace (2005) or Wallace and Wallace (2008).
Wallace (2005, pp. 34-36) applies this formalism to a stan-
dard neural network model much like equation (1).
In the developmental vernacular of Ciliberti et al., we now
examine paths in phenotype space that begins at some S0
and converges n = t∕∆t → ∞ to some other S∞. Suppose
the system is conceived at S0 , and h represents (for exam-
ple) reproduction when phenotype S∞ is reached. Thus h(x)
can have two values, i.e., B0 not able to reproduce, and B1 ,
mature enough to reproduce. Then x = (S0, S∆t, ..., Sn∆t, ...)
until h(x) = B1.
Structure is now subsumed within the sequential grammar
and syntax of the dual information source rather than within
the cross sectional internals of (wij)-space, a simplifying shift
in perspective.
4 Consequences of the perspective
change
This transformation carries computational burdens, as well as
providing mathematical insight.
First, the fact that viable networks comprise a tiny fraction
of all those possible emerges easily from the spinglass formula-
tion simply because of the ‘mechanical’ limit that the number
of paths from S0 to S∞ will always be far smaller than the
total number of possible paths, most of which simply do not
end on the target configuration.
From the information source perspective, which inherently
subsumes a far larger set of dynamical structures than possi-
ble in a spinglass model - not simply those of symbolic dy-
namics - the result is what Khinchin (1957) characterizes as
the ‘E-property’ of a stationary, ergodic information source.
This allows, in the limit of infinitely long output, the classifi-
cation of output strings into two sets;
[1] a very large collection of gibberish which does not con-
form to underlying (sequential) rules of grammar and syntax,
in a large sense, and which has near-zero probability, and
[2] a relatively small ‘meaningful’ set, in conformity with
underlying structural rules, having very high probability.
The essential content of the Shannon-McMillan Theorem is
that, if N (n) is the number of meaningful strings of length n,
then the uncertainty of an information source X can be de-
fined as H[X] = limn→∞ log[N (n)]/n, that can be expressed
in terms of joint and conditional probabilities as in equation
(3) above. Proving these results for general stationary, er-
godic information sources requires considerable mathematical
machinery (e.g., Khinchin, 1957; Cover and Thomas, 1991;
Dembo and Zeitouni, 1998).
Second, information source uncertainty has an important
heuristic interpretation. Ash (1990) puts it this way:
...[W]e may regard a portion of text in a par-
ticular language as being produced by an informa-
tion source. The probabilities P [Xn = an |X0 =
a0, ...Xn-1 = an-1] may be estimated from the avail-
able data about the language; in this way we can
estimate the uncertainty associated with the lan-
guage. A large uncertainty means, by the [Shannon-
McMillan Theorem], a large number of ‘meaningful’
sequences. Thus given two languages with uncer-
tainties H1 and H2 respectively, if H1 > H2 , then
in the absence of noise it is easier to communicate
in the first language; more can be said in the same
amount of time. On the other hand, it will be easier
to reconstruct a scrambled portion of text in the sec-
ond language, since fewer of the possible sequences
of length n are meaningful.
This will prove important below.
Third, information source uncertainty is homologous with
free energy density in a physical system, a matter having im-
plications across a broad class of dynamical behaviors.
The free energy density of a physical system having vol-
ume V and partition function Z(K) derived from the system’s
Hamiltonian - the energy function - at inverse temperature
K is (e.g., Landau and Lifshitz 2007)
F K] = ɪ-m —1 '"g 'Z (K V ' =
V→∞ K V
lim V)],
V→∞ V
(4)
where Z = Z-1/K.
Feynman (2000), following the classic work by Bennett
(1988), concludes that the information contained in a message
is simply the free energy needed to erase it. Thus, according
to this argument, source uncertainty is homologous to free