semantics to symbols is much more problematic in con-
nectionist models, because it is part of the connectionist
doctrine that representations (symbols) are distributed
in the structures of neural nets. It is therefore more dif-
ficult to produce a trace of what happens semantically
in an ANN, because no syntactical structure exists.
ANNs are usually described as having distinct and dis-
crete inputs and outputs8 , each labeled as having a dis-
tinct and discrete meaning. Such labels may be words,
like boy, girl, read, book, or, the labels may be concepts
such as phonemes, or visual inputs. Such labels have
their own set of problems associated with them. Attach-
ing the value ‘grandmother’ to one of the input nodes
illustrates my concern. While nearly everyone rejects
the existence of a grandmother-neuron in the brain as
a rather naive concept, boy-, girl-, or book- neurons are
willingly accepted in models.
Localized representations are no longer available once
the focus shifts on to hidden nodes within the network,
and the ‘representations’ are now described in terms of
weights, or synaptic strengths, between individual units.
However, for a meaningful interpretation of the network
and its dynamics, it is necessary to convey content and
meaning in terms of non-distributed (localized) symbols,
because is not sufficient for a discussion of what goes on
in ANNs to assign semantic content merely to inputs
and outputs. In order to track the flow of information
through the networks, some descriptions are needed, be-
cause explaining the processes in the ANN in terms of
connection-weights between neurons is tedious and un-
suitable for the kinds of models in question. Discussing
representations in terms of connection weights is tedious,
because the number of connections can be considerable,
even in small networks9. A distributed representation R,
i.e. the activation pattern for a particular input I1...k ,
could be specified in the form of a matrix, or as a vector,
with as many elements as there are connections in the
network.
R(I1...k) = (.8234,.9872,.1290, . . ., .0012).
In any case, it it necessary to specify all of the numeric
values to capture every single activation pattern. Repre-
sentations and descriptions in this form are unsuitable,
because they reveal little in terms of the cognitive func-
tion that is modeled. Where do new and helpful descrip-
tions come from?
Interpreting models
The representations for words, concepts, phonemes, vi-
sual inputs, and so on, are usually coded in binary, or
as real values, in paired input and output vectors in the
training set for the ANN. During the training the rela-
tionships between the input and output vectors are en-
coded in the hidden layers of the ANN, or as Fodor and
Pylyshyn (1988) put it, ”the weights among connections
8Inputs and outputs of ANNs can also have continuous
values. The kinds of models I am discussing here have typi-
cally discrete values.
9A fully connected feed forward network with 20 input
nodes, 10 hidden nodes, and 5 output nodes has 250 connec-
tions.
are adjusted until the system’s behavior comes to model
the statistical properties of its inputs” (my italics).
Elman (1990), for example, presented 29 words in the
human language one at a time to a simple recurrent net-
work in the form of binary vectors I1 . . . In , such that
a single bit represented a particular word. The words
themselves were presented in sequences forming two and
three word sentences that had been generated accord-
ing to a set of 15 fixed templates. A cluster analysis
of the hidden nodes revealed that the trained network
exhibits similar activation patterns for inputs (words)
according to their relative position in the sequence (sen-
tence) and their probability of occurring in relation to
other words. The analysis of these activation patterns
allowed for the classification of inputs into categories like
nouns or verbs. Moreover, the categories of internal rep-
resentations could be broken down into smaller groups
like human, non-human, large animals, or edibles, and so
on.
Cluster analysis is used as a method to gain insights into
the internal representations of ANNs, but is not with-
out some conceptual problems. Clark (2001) argues that
cluster analysis is an analytic technique to provide an-
swers to the crucial question of what kinds of representa-
tions the network has acquired. However, cluster analy-
sis does not reveal anything that is not already contained
in the raw data of the model. The relationships and pat-
terns in the input datasets and training datasets become
embedded in the structure of the network during train-
ing10 . What counts are the mathematical and statistical
relations that are contained in the training datasets. In
many cases the relations may just be tacitly accepted. In
other models these relations are purposefully introduced
from the outset. Under these conditions, the relations
are part of the model’s design. Elman (1990), for ex-
ample, states that “13 classes of nouns and verbs were
chosen” for generating the datasets. Whether the rela-
tions in the data are introduced by design, or whether
the experimenter is unaware of these statistical artifacts,
there should be no surprise that the analysis will reveal
these relations later during the experiment. The imple-
mentation of a model as an ANN and the subsequent ex-
traction of results that are already in the data may have
little value in terms of obtaining empirical evidence. The
training set of pairs of input and output vectors already
contains all there is to the model, and the ANN does
not add anything that could not be extracted from the
training sets through other mathematical or computa-
tional methods.
Green (2001) argues that
these results are just as analytic as are the results
of a mathematical derivation; indeed they are just
mathematical derivation. It is logically not possible
that [the results] could have turned out other than
they did (Green, 2001, 109).
10The patterns and relationships in these datasets can ei-
ther be carefully designed or might be an unwanted by-
product.
1187