-21
-19
-17
-15
-13
-11
-9
-7
1E3Πi
-5 -3 -1 0 1 3 5 7
9 11 13 15 17 19 21 23
Figure 7: A subset of images used in autonomous navigation problem. The number right below the image shows the
needed heading direction (in degrees) associated with that image.
Table 1: Performance of the SAIL robot in grounded speech learning. After training, the trainer tested the SAIL
robot by guiding it through the second floor of Engineering Building. As SAIL did not have perfect heading alignment,
the human trainer used verbal commands to adjust robot heading during turns and straight navigation. During the
navigation, the arm and eye commands are issued 10 times each at different locations.
Commands |
Go left |
Go right |
Forward |
Backward |
Freeze |
Correct rate(%) |
88.9 ~ |
89.3 ^ |
92.8 |
87.5 ~ |
88.9 ~ |
Commands |
Arm left |
Arm right |
Arm up |
Arm down |
Hand open |
Correct rate(%) |
90 |
90 |
100 |
100 |
90 |
Commands |
Hand close |
See left |
See right |
See up |
See down |
Correct rate(%) |
80 ~ |
100 ^ |
100 |
100 ~ |
100 ~ |
the Engineering Building, at a typical human walk-
ing speed.
4.6 Action chaining
The capability of learning new skills is very
important for an artificial agent to scale up.
We have designed and implemented a hierarchi-
cal developmental learning architecture (Fig. 10),
which enables a robot to develop complex behav-
iors (chained actions) after acquisition of simple
ones (Zhang and Weng, 2002b). The mechanism
that makes this possible is chained secondary con-
ditioning. An action chaining process can be written
mathematically as,
Cc → Cs1 → As1 → Cs2 → As2 ⇒ Cc → As1 → As2
(3)
where Cc is the composite command, Cs1 and Cs2
are commands invoking basic actions As1 and As2 ,
respectively. → means “followed by”, and ⇒ means
“develops”. The problem here is that Cs1 and Cs2
are missing in the developed stimuli-response asso-
ciation. The major challenge of this work is that
training and testing must be conducted in the same
mode through online real-time interactions between
the robot and the trainer.
In the experiment, upon learning the basic gripper
tip movements (Fig. 11), the SAIL robot learned to
combine individually instructed movements to be a
composite one invoked by a single verbal command
without any reprogramming (Fig. 12). To solve the
problem of missing context in action chaining, we
modeled a primed context as the follow-up sensation
and action of a real context. By backpropagating
the primed context, a real context was able to pre-
dict future contexts, which enabled the agent to re-
act correctly even with some missing contexts. The
learning strategy integrated supervised learning and
reinforcement learning. To handle the “abstraction”
issue in real sensory inputs, a multi-level architecture
was used with the higher level emulating the function
of higher-order cortex in biology in some sense.
5. Value system
A value system of a robot enables the robot to know
what is bad and what is good, and to act for the
good. Without a value system, a robot either does
nothing or does every move mechanically and thus
lacks intelligence. We have designed and imple-
mented a low level value system for the SAIL robot.
The value system integrates the habituation mecha-
nism and reinforcement learning so that the robot’s
responses to certain visual stimuli would change after