Berthouze, L., Kaplan, F., Kozima, H., Yano, H., Konczak, J., Metta, G., Nadel, J., Sandini, G., Stojanov, G. and Balkenius, C. (Eds.)
Proceedings of the Fifth International Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems
Lund University Cognitive Studies, 123. ISBN 91-974741-4-2
Towards Teaching a Robot to Count Objects
Julien Vitay
Loria / INRIA
Campus Scientifique, B.P. 239
54506 Vandoeuvre-Ies-Nancy Cedex, FRANCE
Abstract
We present here an example of incremental
learning between two computational models
dealing with different modalities: a model al-
lowing to switch spatial visual attention and a
model allowing to learn the ordinal sequence
of phonetical numbers. Their merging via a
common reward signal allows anyway to pro-
duce a cardinal counting behaviour that can
be implemented on a robot.
1. Context
The constructivist theory of learning (Piaget, 1972,
Vygotsky, 1986) states that cognitive development
relies on relatively discrete stages, where the in-
fant learns new schemes on the basis of formerly
acquired schemes in the previous stage. Two tran-
sitions between stages are of particular interest for
the neurobotics community: the acquiring of sen-
sorimotor schemes from motor reflexes; the acquir-
ing of basic language abilities like semantics from
sensorimotor schemes. In particular for the sec-
ond transition, the quite recent discovery of the so-
called ”mirror-neurons” in the premotor area F5 of
the monkey (Rizzolatti et al., 1996) (which respond
equally for the execution and the observation of an
action) has lead (Rizzolatti and Arbib, 1998) to ex-
plain the acquiring of language via the common ab-
stract representation of sensorimotor schemes be-
tween the learner and his social environment.
As this indicates that the semantics of an action
(either performed or recognized) is linked to its mo-
tor preparation, the same seems to be true with
the semantics of an ob ject. The sensorimotor con-
tingency theory by (O’Regan and Noe, 2001) states
that seeing is not building an internal representation
of the whole visual information but rather explor-
ing via visuomotor schemes (for example saccades)
the behaviourally relevant location and ignoring the
others. A striking evidence is given by the ”change
blindness” experiments which showed how the disap-
pearance of a massive part of an image can be totally
unreported by a subject if this part were not relevant
for the understanding of the scene.
This idea of using previously acquired sensorimo-
tor schemes to learn the semantics of an action or
an object is in our view the ma jor issue in au-
tonomous robotics: the work done by Aude Billard
(Schaal et al., 2003), Luc Steels (Steels, 2003) and
Jun Tani (Sugita and Tani, 2002) for example en-
lights the advantages of that approach compared to
classical artificial intelligence (based only on explicit
representations).
In this paper, we present an example of incremen-
tal learning of a cognitive ability (counting ob jects
in a scene) using a previously acquired sensorimo-
tor skill (switch of attention on salient targets). We
will first briefly describe the proposed task and then
present the two different models and their merging.
2. Numbering Objects
Interaction of a robot with its environment needs
non-linear and complex computations to achieve a
successful behaviour. In particular in natural scenes,
targets are not always unique: the task “bring me
three apples” does not specify which apples are to be
brought. In such a task, a robot would have first to
determine if there is enough apples in the scene: de-
termining the size of a set is called cardinality. When
performing the task itself, the robot has to know that
the first apple is followed by the second one and then
by the third: determining the position of an item in
a sequence is called ordinality.
The relationship between these two aspects of
numbering in developmental psychology is not yet
clear (see (Brannon and Van de Walle, 2001) for a
debate). Young infants (< 2 years) seem to have a
cardinal ability limited to 3 or 4 items called “subitiz-
ing”, but this ability only improves with the acquir-
ing of verbal counting (with counting rhymes for ex-
ample) at the age of 3.5 or 4 years. It is only when
they master the verbal sequence “one two three four
five..” that they are able to tell that four objects are
less than five. In other words, they have to make the
correspondance between the “four” word related to
the verbal sequence and the four objects in front of
them, what is a cross-modal task (between phono-
logical inputs and visual properties).
In this paper we will not consider the subitizing
125