Provided by Cognitive Sciences ePrint Archive
Novelty and Reinforcement Learning in the Value
System of Developmental Robots
Xiao Huang and John Weng
Computer Science and Engineering Department
Michigan State University
East Lansing, MI, 48824
Abstract
The value system of a developmental robot-
signals the occurrence of salient sensory in-
puts, modulates the mapping from sensory in-
puts to action outputs, and evaluates candi-
date actions. In the work reported here, a
low level value system is modeled and imple-
mented. It simulates the поп-associative an-
imal learning mechanism known as habitua-
tion effect. Reinforcement- learning is also in-
tegrated with novelty. Experimental results
show that the proposed value system works
as designed in a study of robot- viewing angle
selection.
1. Introduction
Motivated by studies of developmental psychology
and neuroscience (Piaget-, 1952) (Flavell et- al., 1993)
(Sur et- al., 1999), computation studies about- au-
tonomous mental development- has drawn increased
attention (Weng et- al., 2000) (Almassy et- al., 1998)
(Ognιen, 1997). With the developmental paradigm
for robots, a task-nonspecific developmental program
is designed by human programmer. The robot- devel-
ops its mental skills through real-time, online inter-
actions with the environment- An important- part- of
a developmental program is its value system.
Neuroscience studies have shown that value
system has the basic function of the multi-
ple diffuse ascending systems of the vertebrate
brain (Montague et- al., 1996) (Sporns, 2000). The
detailed mechanisms of the value system and
its development- are mostly unknown although
some characterizations of this system are avail-
able (Schultz, 2000). Generally, value systems are
distributed in the brain. They respond to sensory
stimuli, modulate neural activity, and project- the ef-
fect- to wide areas of the brain.
Value-dependent- learning has been successfully
applied to modeling the sensory maps in the barn
owl’s inferior colliculus (Rued et- al., 1997). Sporns
and colleagues (Sporns et- al., 2000) proposed a
value system based on this learning mechanism
to model robots’ adaptive behavior. Their work
shows that a robot’s value system can modulate
its own responses in the context- of various con-
ditioning tasks. Although reinforcement- learning
for robots is not- new and has been widely stud-
ied (Watkins, 1992) (Sutton and Barto, 1998), stud-
ies on integrated value systems in robots are still few.
Ogmen’s work (Ognιen, 1997) is very similar to our
study. His framework is based on ART (Adaptive
Resonance Theory), which considers novelty, rein-
forcement- and habit- However, only a simple sim-
ulation experiment- is reported. Whether the model
can be used in real time and complex environments
is unknown.
In this paper, we report- the development- of a
robotic value system by integrating novelty and re-
inforcement- learning. The novelty models the habit-
uation effect- in animal learning. It- is known that an-
imals respond differently to stimuli of different- nov-
elties. Human babies get- bored by constant- stim-
uli. This is displayed by a reduction in fixation
time (Kaplan et- al., 1990). Infants pay longer atten-
tion to novel stimulus. However, this doesn’t- means
that novelty is always preferred (Zeanian, 1976).
We propose a Coniputational model of a low level
value system which integrates novelty and other re-
wards. We present- the working of this value system
through simulation and real time testing on our SAIL
(short- for Self-organizing, Autonomous, Incremental
Learner) robot- The work reported here does not-
model high-level mechanisms such as stress.
2. System architecture
The basic architecture implemented for the SAIL
robot- is shown in Fig. 1. The sensory input- can be
visual, auditory, and tactile. These inputs are rep-
resented by a high dimensional vector so that each
component- corresponds to a scale-normalized recep-
tor (e.g. pixel). It- is the cognitive mapping module
that derives most- discriminating features from input-
streams and maps each input- vector to the corre-
sponding effector control signal.
Mathematically, the cognitive mapping is fornιu-