(a) an experimental environment
(b) the left camera image of the robot (left: the detected
result of the caregiver’s face by template matching, right:
the detected result of the bright colors)
Figure 5: An experimental setup for joint attention
3.2 Learning performance in uncontrolled
environments
It is verified that the proposed model enables the
robot to acquire the ability of joint attention even if
multiple objects are set in the environment. The gat-
ing function, that is the selecting rate of LM ∆θ, is
defined as a sigmoid function shown in Figure 6 (a).
As the result of the learning experiment, Figure 6 (b)
shows the change of the success rate of joint attention
in terms of the learning time, where the number of
the objects is set to 1, 3, 5, or 10. Here, the number
of the object 1 means that the robot has only correct
learning situations in every steps. By contrast, the
number 10 means that the robot can experience the
correct learning situations only at 1/10 probability
at the beginning of the learning. However, the robot
is expected to increase the proportion of the correct
ones by utilizing the learning module which has al-
ready acquired the sensorimotor coordination until
that time.
From the result of Figure 6 (b), we can see that the
success rates of joint attention are at chance levels at
the beginning of learning; however, they increase to
high performance at the end although many ob jects
are placed in the environment. Therefore, it can be
concluded that the proposed model enables the robot
to acquire the ability of joint attention without a
controlled environment nor external task evaluation.
3.3 Incremental learning process
Next, we investigate the learning process of joint at-
tention based on the proposed model. Figure 6 (c)
shows the pan angle of the robot’s camera head when
the robot attends to an object, in which “°” and
“ × ” indicate success of joint attention, that is the
object that the robot attends to coincides with the
object that the caregiver attends to, and failure, re-
spectively. The number of objects is five, and the
data are presented every 50 steps during the learn-
ing time (I) 2-3, (II) 12-13, and (III) 27-28 [×104],
each of which is highlighted in Figure 6 (b). The pan
angle is 0 [deg] when the robot attends to the care-
giver, and the view range of the robot is ±18 [deg].
In other words, the ob jects within ±18 [deg] are ob-
served in the field of the robot’s view when the robot
attends to the caregiver. From this result, we can see
that the success number of joint attention increases
over learning time, and at the same time, the range
of the camera angle becomes wide from ±18 [deg].
These phenomena in the three stages (I), (II), and
(III) can be regarded as equivalent to the infant’s
developmental stages of joint attention at the 6th,
12th, and 18th month shown in Figure 1. Therefore,
we can conclude that the proposed model enables the
robot to demonstrate the developmental process of
the infant’s joint attention and consequently could
explain how the infant acquires the ability of joint
attention.
3.4 Final task performance
Finally, we evaluate the final task performance of the
robot which has learned in the environment includ-
ing five objects. Figure 7 shows the change of the
robot camera image when it shifts its gaze direction
from the caregiver’s face to the ob ject based on the
output of the learning module. The caregiver’s face
image (30 × 25 pixels) enclosed in a rectangle is the
input of the learning module, and the straight line on
the face shows the output of the learning module, in
which the width and the height indicate the pan and
the tilt angles of the output. The circle and the cross
lines show the gazing area of the robot and the ob-
ject’s position, respectively. The learning module in-
crementally generates the motor commands LM1 ∆θ,
LM2 ∆θ, and LM3 ∆θ at each step, and the robot con-
sequently attends to the object that the caregiver at-
tends to. From this result, it is confirmed that the
proposed model enables the robot to realize joint at-
tention even if the object is far from the caregiver.