function. Here we use:
φ (x) = a tanh( bx) = a
ebx
+ e-bx
ebx - e-bx
(2)
with a = 1.716 and b = 2/3, as suggested by previous works (see e.g. Guyon,
1991). The precise value of these parameters is relevant only to the speed at
which networks learn, and does not affect their generalization properties.
The outputs coming from the hidden layer reach a single output cell via a
second system v of connections, and the activation of this cell is taken as the
response of the network to the stimulation, according to:
where M is the number of hidden units and the function ψ(∙) is given by:
o(s) = ψ
∑M vihi(s)
(3)
ψ(x)= ∙.' . (4)
This activation function differs from the one pertaining to the hidden layer in that
its values are in the interval [0, 1], and this enables us to interpret the network
output o(s) as probability of response to the stimulus s. A more detailed model
would distinguish between different kinds of response (corresponding to different
behaviours), for example by having more output units, but here we rely only on
o(s) as a model of behaviour.
In the present context a learning procedure is just an algorithm that is able to
adjust the network outputs to a set of stimuli according to a given criterion, by
changing the network connections w and v. Here we use the well-known back-
propagation algorithm (LeCun, 1985; Parker, 1985; Rumelhart et al., 1986) and
a simple model of learning through an evolutionary process (Enquist & Leimar,
1993). In both cases the performance of the network can be quantified by the
differences between the present outputs and the target ones, for example by the
quantity:
e=∑(o(sα)-dα)2 (5)
α
where the sum is over the whole training set and dα is the desired output to stim-
ulus sα . This performance measure is used by the two algorithms in different
ways: in back-propagation a computation is made of how much each connection
is contributing to the error equation (5), then changing it a little to diminish the
error; in the evolutionary algorithm a new network is created with some mutated
connections, and the one is retained that yields the lesser error. In both algorithms
the step just sketched is repeated until the network produces the desired outputs