Novelty and Reinforcement Learning in the Value System of Developmental Robots



part and not being able to notice novel thing in other
views.

З.4 Prototype updating queue

In the batch learning mode of a Q-Iearning al-
gorithm, the back-propagation is applied to all
states. For real-time development, this global iter-
ation method is not applicable, due to the excessive
time required. We must- use a local method that
only involves a small number of computations that
go through a local state trajectory. This is why we
designed the prototype updating queue in Fig. 1,
which stores the addresses of formerly visited primi-
tive prototypes. At each time step, after the sensory
input is received, the primed sensation is updated
according to the following expression:

a,(n)(f) -a.(n-D(f)+l±l7(a.(f+1)-a.(n-D(f)) (5)

where I is the amnesic parameter. If Z > 1, it means
the latest input contributes more.

Thus, not only is the Q value backpropagatcd, so
is the primed sensation. This back propagation is
performed recursively from the tail of the queue back
to the head of the queue. After the entire queue is
updated, the current primitive prototype’s address
is pushed into the queue and the oldest primitive
prototype at the head is pushed out of the queue.
Because we can limit- the length of prototype queue,
real-time updating becomes possible.

3.5 Algorithm of innate value system

The algorithm of the innate value system works in
the following way:

1. Grab the new sensory input- .τ(t).

2. Query the IHDR tree and get- a prototype s(t)
and related list- of primed contexts.

3. If .τ(t) is significantly different- from s(t), it- is con-
sidered as a new prototype and we update IHDR
tree by saving .τ(t). Otherwise, .τ(t) updates s(t)
through incremental averaging.

4. Using Boltzmann Exploration Eq. 4 to chose
an action based on the Q-valuc of every primed
action. Execute the action.

5. Calculate novelty with Eq. 1 and integrate with
immediate reward
r(t + 1).

6. Update prototype queue with Eq. 3 and Eq. 5.
Go to step 1.

Figure 2: The GUI simulator. The arrow indicates the
position and the viewing angle of the robot.

4. Simulations

In order to test- the innate value system with ground
truth, a simulation environment- is developed. The
simulator GUI is shown in Fig. 2. The big window
shows the viewing environment- while the small win-
dow shows the images the robot- observes currently.
There are several buttons that control the position
and viewing angle of the robot-. The “Good” and
“Bad” buttons are used to issue rewards. In every
state, the baby robot- has three possible actions: stay
at the current- viewing angel (action O), turn neck
left- 30 degree (action 1) and turn neck right- 30 de-
gree (action 2). The representation of sensory input
consists of visual images and absolute viewing angle.
The dimension of input- image is 100 × 100. We as-
sume that the robot- cannot- look backward and the
number of absolute viewing angle is 7 (from -3 to 3,
0 stands for center). The parameters are defined as
follows:
a = 0.8,7 = 0.9 in Eq. 3; the initial value
of
θ is 10 in Eq. 4.

4-1 Habituation effect

In the first- experiment-, we let- the robot- explores by
itself by viewing around. It- is reasonable that a pos-
itive initial Q-value (e.g. 1) is assigned to action
0, which assumes that the robot- just- stares staticly.
Only when one view is really boring, it- will turn its
head away. The initial Q-valuc of other actions is 0.
Fig. 3 shows how the Q-value of each action changes
based on novelty in the state whose absolute view
angle is 0. As shown in the left- part-, for action 0,
it- starts with a positive Q-valuc, which means the
probability of staying at the same viewing angle is
large. After staring for a while, the primed sensa-
tion of action 0 is equal to the actual sensation of
next- step. According to Eq. 1, the novelty value is
equal to zero so that the Q-valuc of primed action



More intriguing information

1. The name is absent
2. Cross border cooperation –promoter of tourism development
3. The duration of fixed exchange rate regimes
4. Unemployment in an Interdependent World
5. Chebyshev polynomial approximation to approximate partial differential equations
6. The name is absent
7. Citizenship
8. The Role of Trait Emotional Intelligence (El) in the Workplace.
9. The name is absent
10. The name is absent
11. Lending to Agribusinesses in Zambia
12. Deprivation Analysis in Declining Inner City Residential Areas: A Case Study From Izmir, Turkey.
13. The name is absent
14. The name is absent
15. Surveying the welfare state: challenges, policy development and causes of resilience
16. Migration and employment status during the turbulent nineties in Sweden
17. The name is absent
18. LOCAL CONTROL AND IMPROVEMENT OF COMMUNITY SERVICE
19. Modelling the health related benefits of environmental policies - a CGE analysis for the eu countries with gem-e3
20. A Note on Productivity Change in European Co-operative Banks: The Luenberger Indicator Approach