Novelty and Reinforcement Learning in the Value System of Developmental Robots



Provided by Cognitive Sciences ePrint Archive

Novelty and Reinforcement Learning in the Value
System of Developmental Robots

Xiao Huang and John Weng

Computer Science and Engineering Department

Michigan State University

East Lansing, MI, 48824

Abstract

The value system of a developmental robot-
signals the occurrence of salient sensory in-
puts, modulates the mapping from sensory in-
puts to action outputs, and evaluates candi-
date actions. In the work reported here, a
low level value system is modeled and imple-
mented. It simulates the поп-associative an-
imal learning mechanism known as habitua-
tion effect. Reinforcement- learning is also in-
tegrated with novelty. Experimental results
show that the proposed value system works
as designed in a study of robot- viewing angle
selection.

1. Introduction

Motivated by studies of developmental psychology
and neuroscience (Piaget-, 1952) (Flavell et- al., 1993)
(Sur et- al., 1999), computation studies about- au-
tonomous mental development- has drawn increased
attention (Weng et- al., 2000) (Almassy et- al., 1998)
(Ognιen, 1997). With the developmental paradigm
for robots, a task-nonspecific developmental program
is designed by human programmer. The robot- devel-
ops its mental skills through real-time, online inter-
actions with the environment- An important- part- of
a developmental program is its value system.

Neuroscience studies have shown that value
system has the basic function of the multi-
ple diffuse ascending systems of the vertebrate
brain (Montague et- al., 1996) (Sporns, 2000). The
detailed mechanisms of the value system and
its development- are mostly unknown although
some characterizations of this system are avail-
able (Schultz, 2000). Generally, value systems are
distributed in the brain. They respond to sensory
stimuli, modulate neural activity, and project- the ef-
fect- to wide areas of the brain.

Value-dependent- learning has been successfully
applied to modeling the sensory maps in the barn
owl’s inferior colliculus (Rued et- al., 1997). Sporns
and colleagues (Sporns et- al., 2000) proposed a
value system based on this learning mechanism
to model robots’ adaptive behavior. Their work
shows that a robot’s value system can modulate
its own responses in the context- of various con-
ditioning tasks. Although reinforcement- learning
for robots is not- new and has been widely stud-
ied (Watkins, 1992) (Sutton and Barto, 1998), stud-
ies on integrated value systems in robots are still few.
Ogmen’s work (Ognιen, 1997) is very similar to our
study. His framework is based on ART (Adaptive
Resonance Theory), which considers novelty, rein-
forcement- and habit- However, only a simple sim-
ulation experiment- is reported. Whether the model
can be used in real time and complex environments
is unknown.

In this paper, we report- the development- of a
robotic value system by integrating novelty and re-
inforcement- learning. The novelty models the habit-
uation effect- in animal learning. It- is known that an-
imals respond differently to stimuli of different- nov-
elties. Human babies get- bored by constant- stim-
uli. This is displayed by a reduction in fixation
time (Kaplan et- al., 1990). Infants pay longer atten-
tion to novel stimulus. However, this doesn’t- means
that novelty is always preferred (Zeanian, 1976).
We propose a Coniputational model of a low level
value system which integrates novelty and other re-
wards. We present- the working of this value system
through simulation and real time testing on our SAIL
(short- for Self-organizing, Autonomous, Incremental
Learner) robot- The work reported here does not-
model high-level mechanisms such as stress.

2. System architecture

The basic architecture implemented for the SAIL
robot- is shown in Fig. 1. The sensory input- can be
visual, auditory, and tactile. These inputs are rep-
resented by a high dimensional vector so that each
component- corresponds to a scale-normalized recep-
tor (e.g. pixel). It- is the cognitive mapping module
that derives most- discriminating features from input-
streams and maps each input- vector to the corre-
sponding effector control signal.

Mathematically, the cognitive mapping is fornιu-



More intriguing information

1. The name is absent
2. Unemployment in an Interdependent World
3. The name is absent
4. Draft of paper published in:
5. IMMIGRATION POLICY AND THE AGRICULTURAL LABOR MARKET: THE EFFECT ON JOB DURATION
6. PROPOSED IMMIGRATION POLICY REFORM & FARM LABOR MARKET OUTCOMES
7. The name is absent
8. Graphical Data Representation in Bankruptcy Analysis
9. INTERACTION EFFECTS OF PROMOTION, RESEARCH, AND PRICE SUPPORT PROGRAMS FOR U.S. COTTON
10. Insurance within the firm
11. Educational Inequalities Among School Leavers in Ireland 1979-1994
12. Strengthening civil society from the outside? Donor driven consultation and participation processes in Poverty Reduction Strategies (PRSP): the Bolivian case
13. Sustainability of economic development and governance patterns in water management - an overview on the reorganisation of public utilities in Campania, Italy, under EU Framework Directive in the field of water policy (2000/60/CE)
14. Putting Globalization and Concentration in the Agri-food Sector into Context
15. Mean Variance Optimization of Non-Linear Systems and Worst-case Analysis
16. The name is absent
17. The name is absent
18. The name is absent
19. The Impact of EU Accession in Romania: An Analysis of Regional Development Policy Effects by a Multiregional I-O Model
20. The name is absent