The cognitive mapping module is the central part
of the system. It is responsible for learning the asso-
ciation between the sensory information, the context,
and the behavior. The behaviors can be both exter-
nal and internal. The external behaviors correspond
to control signals for external effectors such as the
joint motors of a robot arm, or whatever peripher-
als that the robot has to act on the environment.
The internal behaviors include the above-mentioned
attention selection signals for the sensory mapping
module, the effector that manipulates the internal
states and the threshold control signals to the gating
system. The cognitive mapping is implemented by
the IHDR tree, which mathematically computes the
mapping,
g : S × X → S × A,
where S is the state (context) space, X is the sen-
sory space, and A is the action space. IHDR derives
the best features that are most relevant to output
by doing a double clustering in both input and out-
put space. It constructs a tree structure and repeats
the double clustering in a coarse-to-fine manner in
each of the tree nodes. The resulted tree structure is
used to find the best matching input cluster in a fast
logarithmic time. Compared to other methods, such
as artificial neural network, linear discriminant anal-
ysis, and principal component analysis, IHDR has
advantages in handling high-dimensional input, do-
ing discriminant feature selection, and learning from
one instance.
The gating system evaluates whether the intended
action accumulates sufficient thrust to be issued as
an actual action. In this way, actions are actually
made only when a sufficient number of action prim-
itives are given through the time by the cognitive
mapping module. This mechanism significantly re-
duces the requirement on the accuracy of timing of
issued action primitives.
Three types of learning modes have been imple-
mented on SAIL: learning by imitation (supervised
learning), reinforcement learning, and communica-
tive learning. In the following sections, we explain
how learning is conducted by the SAIL robot while
providing the experimental results.
4.2 Staggered Hierarchical Mapping
We have designed and implemented a sensory map-
ping method, called “Staggered Hierarchical Map-
ping (SHM),” shown in Fig. 6, and its developmen-
tal algorithm (Zhang and Weng, 2002a). Its goal in-
cludes: (1) to generate feature representation for re-
ceptive fields at different positions in the sensory
space and with different sizes and (2) to allow at-
tention selection for local processing. SHM is a
model motivated by human early visual pathways
including processing performed by the retina, Lat-
eral Geniculate Nucleus (LGN) and the primary vi-
sual cortex. A new Incremental Principal Compo-
nent Analysis (IPCA) method is used to automati-
cally develop orientation sensitive and other needed
filters (Zhang and Weng, 2001a). From sequentially
sensed video frames, the proposed algorithm devel-
ops a hierarchy of filters, whose outputs are uncor-
related within each layer, but with increasing scale
of receptive fields from low to high layers. To study
the completeness of the representation generated by
the SHM, we experimentally showed that the re-
sponse produced at any layer is sufficient to recon-
struct the corresponding “retinal” image to a great
degree. This result indicates that the internal rep-
resentation generated for receptive fields at different
locations and sizes are nearly complete in the sense
that it does not lose important information. The at-
tention selection effector is internal and thus cannot
be guided from the “outside” by a human teacher.
The behaviors for internal effectors can be learned
through reinforcement learning and communicative
learning.
4.3 Vision-guided navigation
In the experiment of vision-guided naviga-
tion (Weng et al., 2000a), a human teacher teaches
the robot by taking it for a walk along the hallways
of MSU Engineering Building. Force sensors on the
robot body sense the push action of the teacher and
its two drive wheels complies by moving at a speed
that is proportional to the force that is sensed each
side. In other words, the robot performs supervised
learning in real time through imitation.
The IHDR mapping algorithm processes the input
image in real time. It derives features that are re-
lated to the action but disregard features that are
not. The human teacher does not need to define
features. The system runs at about 10 Hz, 10 up-
dates of navigation decisions per second. In other
words, for each 100 millisecond, a different set of fea-
ture subspaces are used. To address the requirement
of real-time speed, the IHDR method incrementally
constructs a tree architecture which automatically
generates and updates the representations in a coarse
to fine fashion. The real-time speed is achieved by
the logarithmic time complexity of the tree in that
the time required to update the tree for each sensory
frame is a logarithmic function in the number of fine
clusters (prototypes) in the tree.
After 4 trips along slightly different trajectories
along the hallways, the human teacher started to
let the robot go free. He needed to “hand push”
the robot at certain places when necessary until the
robot could reliably navigate along the hallway, with-
out a need for “hand-lead.” We found that about 10
trips were sufficient for the SAIL robot to navigate
along the hallways, using only vision, without using