2.2.2 Visual feedback controller
The visual feedback controller receives the detected
image feature of the object itrg and outputs the mo-
tor command V F ∆θ for the camera head to attend
to itrg. First, this controller calculates the ob ject po-
sition (xi, yi) in the camera image. Then, the motor
command V F ∆θ is generated as
VFδθ=μ vvff∆ppan ʌ=g μ xi - cχ ʌ, (2)
∆θtilt yi - cy
where g is a scalar gain and (cx, cy) denotes the
center position of the image. The motor command
VF∆θ
is sent to the gate as the output of the visual
feedback controller.
As described above, visual attention that is one of
the robot’s embedded mechanisms is performed by
the salient feature detector and the visual feedback
controller.
2.2.3 Internal evaluator
The other embedded mechanism that is learning with
self-evaluation is realized by the internal evaluator
and the learning module.
The internal evaluator drives the learning mecha-
nism in the learning module when the following con-
dition is met:
√(xi - cx)2 + (Уі - cy)2 < dth, (3)
where dth is a threshold for evaluating whether the
robot watches an object in the center of the camera
image or not. Note that the internal evaluator does
not know whether joint attention has succeeded or
failed but knows whether visual attention has done.
2.2.4 Learning module
The learning module consists of a three-layered neu-
ral network. In the forward processing, this mod-
ule receives the image of the caregiver’s face and the
angle of the camera head θ as inputs, and outputs
LM∆θ
as a motor command. The caregiver’s face
image is required to estimate the motor command
LM∆θ to follow the caregiver’s gaze direction. The
angle of the camera head θ is utilized to move the
camera head incrementally because the caregiver’s
attention cannot be narrowed down to a particular
point along the line of the caregiver’s gaze. The gen-
erated motor command LM∆θ is sent to the gate as
the output of the learning module.
In the learning process, this module learns sen-
sorimotor coordination by back propagation when
it is triggered by the internal evaluator. As men-
tioned above, the internal evaluator drives the learn-
ing module according to the success of visual atten-
tion, not joint attention, this module has correct and
incorrect learning data for joint attention. In the
former case, the learning module can acquire the ap-
propriate correlation between the inputs, the care-
giver’s face image and θ, and the output ∆θ. On
the other hand, in the latter case, this module can-
not find the appropriate sensorimotor coordination.
However, the learning module is expected to statisti-
cally lose the incorrect data as outliers as described
in 2.1 while the learned sensorimotor coordination
of the correct data survives in the learning module.
As a result, the survived correlation in the learning
module allows the robot to realize joint attention.
2.2.5 Gate
The gate arbitrates the motor command ∆θ be-
tween VF∆θ from the visual feedback controller and
LM∆θ from the learning module. The gate sets a
gating function to define the selecting rate of the
outputs. In the beginning of the learning, the select-
ing rate of VF∆θ is set to a high probability because
the learning module has not acquired the appropri-
ate sensorimotor coordination for joint attention yet.
On the other hand, in the latter stage of the learning,
the output LM∆θ from the learning module, which
has acquired the sensorimotor coordination for joint
attention, becomes more probable to be selected. As
a result, the robot can increase the proportion of the
correct learning situations according to the learning
advance. It allows the learning module to acquire
more appropriate sensorimotor coordination for joint
attention.
2.3 Incremental learning
It is expected that the proposed model makes the
robot acquire the ability of joint attention through
the following incremental learning process.
stage I: In the beginning of the learning, the robot
has a tendency to attend to an interesting object
in the field of the robot’s view based on the em-
bedded mechanism of visual attention since the
gate mainly selects VF∆θ. At the top of Fig-
ure 4, the robot outputs VF1 ∆θ or VF2∆θ case
by case and watches one object regardless of the
direction of the caregiver’s attention. At the same
time, the robot begins to learn the sensorimotor
coordination in each case.
stage II: In the middle stage of the learning, the
robot is able to realize joint attention owing to
the learning in stage I if the object that the care-
giver attends to is observed in the robot’s first
view. At the middle left of Figure 4, the learning
module has acquired the sensorimotor coordina-
tion of LM1 ∆θ because only that of VF1 ∆θ had
the correlation in stage I.