Consciousness, cognition, and the hierarchy of context: extending the global neuronal workspace model



Again, for non-ergodic sources, a limit limn→∞ H may be
defined for each path, but it will not necessarily given by
the simple cross-sectional law-of-large numbers analogs above.
For ‘nearly’ ergodic systems one might perhaps use something
of the form

H(x + δx) H(x) + δxdH∕dx.

Different language-analogs will, of course, be defined by
different divisions of the total universe of possible responses
into different pairs of sets
B0 and B1 , or by requiring more
than one response in
B1 along a path. However, like the use of
different distortion measures in the Rate Distortion Theorem
(e.g. Cover and Thomas, 1991), it seems obvious that the
underlying dynamics will all be qualitatively similar.

Similar but not identical, and herein lies the first of two
essential matters: dividing the full set of possible responses
into sets
B0 and B1 may itself require higher order cognitive
decisions by another module or modules, suggesting the ne-
cessity of ‘choice’ within a more or less broad set of possible
languages-of-thought. This would, in one way, reflect the need
of the organism to shift gears according to the different chal-
lenges it faces, leading to a model for autocognitive disease
when a normally excited state is recurrently (and incorrectly)
identified as a member of the ‘resting’ set
B0 .

A second possible source of structure, however, lies at the
input rather than the output end of the model: i.e. sup-
pose we classify paths instead of outputs. That is, we define
equivalence classes in convolutional ‘path space’ according to
whether a state
akM can be connected by a path with some
originating state
aM . That is, we, in turn, set each possible
state to an
a0 , and define other states as formally equiva-
lent to it if they can be reached from that (now variable)
a0 = aM by a grammatical/syntactical path. That is, a state
which can be reached by a legitimate path from
aM is taken
as equivalent to it. We can thus divide path space into (ordi-
narily) disjoint sets of equivalence classes. Each equivalence
class defines its own language-of-thought: disjoint cognitive
modules, possibly associated with an embedding equivalence
class algebra.

While meaningful paths - creating an inherent grammar
and syntax - are defined entirely in terms of system response,
as Atlan and Cohen (1998) propose, a critical task is to make
these (relatively) disjoint cognitive modules interact, and to
examine the effects of that interaction on global properties.
Punctuated phase transition effects will emerge in a natural
manner.

Before proceeding, however, we give two explicit neural net-
work applications.

First the simple stochastic neuron: A series of inputs yij , i =
1
...m from m nearby neurons at time j is convoluted with
‘weights’
wij , i = 1...m, using an inner product

m

aj = yj wj = yj wj
i=1

in the context of a ‘transfer function’ f(yj wj ) such that the
probability of the neuron firing and having a discrete output
zj = 1 is P (zj = 1) = f(yj wj ). Thus the probability that
the neuron does not fire at time
j is 1 - f(yj wj ).

In the terminology of this section the m values yij constitute
‘sensory activity’ and the
m weights wij the ‘ongoing activity’
at time
j, with aj = yj wj and x = a0, a1, ...an, ...

A little more work leads to a fairly standard neural network
model in which the network is trained by appropriately vary-
ing the
w through least squares or other error minimization
feedback. This can be shown to, essentially, replicate rate
distortion arguments (Cover and Thomas, 1991), as we can
use the error definition to define a distortion function d(y, y)
which measures the difference between the training pattern
y
and the network output y as a function of, for example, the
inverse number of training cycles,
K . As discussed in some
detail elsewhere (Wallace, 2002), learning plateau behavior
follows as a phase transition on the parameter
K in the mu-
tual information I(Y, У).

Park et al. (2000) treat the stochastic neural network
in terms of a space of related probability density functions
[
p(x, y; w)|w Rm], where x is the input, y the output and
w the parameter vector. The goal of learning is to find an op-
timum
w* which maximizes the log likelihood function. They
define a loss function of learning as

L(x, y; w) ≡ - log p(x, y; w),

and one can take as a learning paradigm the gradient rela-
tion

wt+1 = Wt - ηt∂L(x, y; w)∕∂w,

where ηt is a learning rate.

Park et al. (2000) attack this optimization problem by
recognizing that the space of
p(x, y; w) is Riemannian with a
metric given by the Fisher information matrix

G(w) =


log p∕∂w[ log p∕∂w]T p(x, y; w)dydx

where T is the transpose operation. A Fisher-efficient on-
line estimator is then obtained by using the ‘natural’ gradient
algorithm

wt+1 = wt - ηtG-1∂L(x, y; w)∕∂w.

Again, through the synergistic family of probability distri-
butions
p(x, y; w), this can be viewed as a special case - a
‘representation’, to use physics jargon - of the general ‘con-
volution argument’ given above.

It seems likely that a rate distortion analysis of the inter-
action between training language and network response lan-
guage will nonetheless show the ubiquity of learning plateaus,
even in this rather elegant special case.

We will eventually parametize the information source un-
certainty of the dual information source with respect to one
or more variates, writing, e.g.
H[K], where K (K1 , ..., Ks )
represents a vector in a parameter space. Let the vector
K
follow some path in time, i.e. trace out a generalized line
or surface
K(t). We will, following the argument of Wal-
lace (2002b), assume that the probabilities defining
H, for
the most part, closely track changes in
K(t), so that along a
particular ‘piece’ of a path in parameter space the informa-
tion source remains as close to memoryless and ergodic as is
needed for the mathematics to work. Between pieces, below,



More intriguing information

1. IMPROVING THE UNIVERSITY'S PERFORMANCE IN PUBLIC POLICY EDUCATION
2. Ultrametric Distance in Syntax
3. Macroeconomic Interdependence in a Two-Country DSGE Model under Diverging Interest-Rate Rules
4. The name is absent
5. Before and After the Hartz Reforms: The Performance of Active Labour Market Policy in Germany
6. Death as a Fateful Moment? The Reflexive Individual and Scottish Funeral Practices
7. The name is absent
8. The East Asian banking sector—overweight?
9. The name is absent
10. The name is absent
11. The name is absent
12. Correlates of Alcoholic Blackout Experience
13. Human Development and Regional Disparities in Iran:A Policy Model
14. The name is absent
15. The Advantage of Cooperatives under Asymmetric Cost Information
16. The name is absent
17. The name is absent
18. The name is absent
19. EMU's Decentralized System of Fiscal Policy
20. The Dynamic Cost of the Draft