Figure 7: The robot environment in the wall-following ex-
periment. The robot is programmed to track and follow
a human demonstrator using its on-board camera; the
input to the mirror system comes from the sonar sensors
around its body.
The input to the system conics from 20 sonar sen-
sors around the top of the robot, which in practice
arc not affected by the presence of the demonstrator.
The learner can sometimes lose the demonstra-
tor, so it only inspects its perceptual input when the
demonstrator is in sight, that is, when it is in a social
context. Otherwise, the attention system would en-
counter situations not relevant to the task (Marom
and Hayes, 2001b). Wc regard this setup as social
situatedness in the sense that information is implic-
itly shared between the demonstrator and imitator
about the specific task to be learned. This is an ap-
pealing idea that has been experimented with before
(for example Haycs and Dcmiris, 1994; Billard and
Dautcnhahn, 1997). Further, this setup allows for
the perceived input to be represented directly as tar-
gets to be achieved by the motor system (through the
inverse model), as described in Section 2. The object-
interactions in this case correspond to how the robot
responds to being near a wall or away from it.
In this experiment the inverse model is not as
straightforward as in the first experiment where the
PID provides an intuitive inverse model. In similar
experiments in simulation we have used an inverse
model which consisted of a discrctiscd database of
states and transition matrices obtained by Ictting the
agent explore its environment (Marom ct al., 2002).
Wc have found that in the physical system it is diffi-
cult to obtain such an inverse model which is reliable;
wo believe that a more sophisticated approach is re-
quired, such as reinforcement learning, and leave that
to further work. Wo overcame this problem by hand-
crafting a sot of rules that operate on a small sot
of states which reliably generalise the robot’s state
space, and with which the robot can decide how to
got from one perceptual state to another.
To summarise, the stimulus in this experiment is
a 20-dimonsional vector that represents the robot’s
perception of the wall, and the motor commands cal-
culated by the inverse model are used to control the
robot’s motors to move forward, turn loft, or turn
right.
5.1 Learning & Recall
In the learning phase the robot follows behind the
human demonstrator for 10000 stops, which is ap-
proximately 40 minutes of real time; duo to hard-
ware and practical limitations, all the information is
stored for off-board learning. The various recall runs
reported below are all based on this one dataset.
Figure 8(a) shows one SOFM network that the at-
tention system can produce and highlights the emer-
gent clusters in the SOFM. As in the previous exper-
iment, since the dimensionality of the sensor space is
too high to visualise, wo have used PCA to reduce
the number of dimensions to two (the principal com-
ponents used in the figure account for approximately
70% of the variance). Wo can sec a cluster for no-wall
(at the top), and as wo move away from it wo move
towards clusters corresponding to the walls (loft and
right).
Following training at the learning phase, the mir-
ror system is fixed and is used to control and tost
the robot, as in the previous experiment: the robot
is placed in the environment on its own; at each stop
the robot’s perception activates one of the nodes in
its SOFM; the corresponding motor schema provides
a motor target which, again, is simply the SOFM
node vector; and this target is passed to the inverse
model which selects the best action likely to achieve
it. When no node is active (the match is very poor) a
‘wandering’ behaviour is triggered (the robot moves
around randomly). The recall phase consists of 6000
stops, which corresponds to around 23 minutes of
real time; to avoid the robot following the wall on
one side for the duration of the run, and thus not
testing the learned behaviour fully, wo use an ‘inter-
rupt’, which forces the robot to turn away from the
wall, every 1000 stops during the run.
Wc have also equipped the robot with a built-in
obstacle avoidance behaviour to protect it from un-
successful learning and also from situations not en-
countered at the learning phase. For example, when
the robot follows behind the demonstrator, it never
sees the wall directly in front of it, so wo do not ex-
pect it to know how to handle such a situation in
the recall phase, but wo also don’t want it to drive
into the wall! To account for unsuccessful learning
wo penalise the evaluation whenever the obstacle-
avoidance is triggered.
Figure 8(b) shows the node-activation at the recall
phase, of the SOFM shown in Figure 8(a). Firstly,
wo sec that the nodes that form clusters are also acti-
vated together and intermittently at the recall phase.