means of manipulating the behavior of other humans.
When one person makes a speech act, in addition to
communicating a piece of information, that person
may have intentions to change the state of the world
and especially the behavior of those who hear the ut-
terance, in a particular way. “It has started to rain
outside” often really means something like “Close the
window, please”. The speaker expects the hearer to
react in a certain way as a result of hearing the ut-
terance, which then becomes the speaker’s tool for
manipulation of others and of her surroundings.
This pragmatic view of the function of language
is extremely important in trying to explain, or de-
vise, an ability for early language acquisition, be-
cause infants and young children specifically learn to
use speech as a tool. Halliday identifies three main
stages of linguistic development: (I) the child’s initial
closed proto-linguistic system, (II) the transitional
stage to that of adult language, and (III) the learn-
ing of the adult language. In the first stage, the
child has a finite number of meanings to convey and
to that effect uses self-generated labels that may or
may not resemble adult words for similar occasions.
Halliday posits six initial functions of a developing
proto-linguistic system that may be expressed in the
first stage:
1. Instrumental - satisfying the child’s needs
2. Regulatory - controlling the behavior of others
3. Interactional - engaging in a social situation
4. Personal - asserting own unique self
5. Heuristic - exploring the environment
6. Imaginative - pretending and playing
These six functions of the child’s phase I proto-
language seem to develop in that sequence and repre-
sent the child’s growing cognitive ability and aware-
ness. They also present a great starting point and
timeline for an artificial system that would acquire
a natural language in a way similar to human chil-
dren. The infant’s protolanguage during Phase I is
finite and formulaic (see also (Wray, 2000) for a dis-
cussion of formulaic systems in the evolution of lan-
guage), as will be the first part of Kismet’s language
development module (see Table 1).
Note that we make no claims of faithfully following
the developmental schedule of human infants. Nei-
ther do we achieve a complexity of development that
would approach that of an infant. The preceding
summary shows the main source of inspiration for
our pragmatics-based approach to language acqui-
sition by an artificial creature, designed to exhibit
certain properties of human infants. As can be seen
in the next two sections, learning in the artificial sys-
tem results from similar scenarios of, e.g., frustration
of a goal-directed behavior and an intuitive drive for
vocalization.
4. Kismet’s Protolanguage Module
In order to achieve communication between humans
and a sociable robotic creature, words must be a tool
used by the robot to manipulate its physical and so-
cial world and they must be interpreted by humans
as having such a pragmatic functional meaning. In
Kismet’s case, it will start with proto-language and
proto-verbal behaviors. The “proto” terms refer to
the pre-grammatical early stage of development.
4.1 The robotic platform
Kismet is an expressive robotic head, designed to
have a youthful appearance and perceptual and
motor capabilities tuned to human communication
channels. The robot receives visual input from four
color CCD cameras and auditory input from a micro-
phone. It performs motor acts such as vocalizations,
facial expressions, posture changes, as well as gaze
direction and head orientation.
Kismet’s control architectures run on a complex
network of processors in real time (approaching 30
Hz for visual signals, and 8 kHz sample rate with
frame windows of 10 ms for auditory signals), with
minimal latencies (less than 500 ms). Low-level vi-
sual processing and eye/neck motor control is per-
formed by 12 networked 400 MHz PCs running QNX.
The high-level perceptual system, the motivation
and behavior systems, the motor skill system and
the face motor control run on four Motorola 68332
microprocessors running L, a multi-threaded Lisp de-
veloped in our lab. Expressive speech synthesis and
vocal affect recognition execute on a dual 450 MHz
PC running NT, and the speech recognition system
(ViaVoice) and protolanguage module run on two
500 MHz PCs running Linux.
Although not a mobile robot and with all compu-
tation off-board, Kismet is an autonomous agent in
that it pursues its own agenda by engaging in spe-
cific behaviors which are tuned to the satisfaction
of its own “goals” and “desires”. The motivation is
provided by a set of internal homeostatic variables
called “drives”, such as the level of engagement with
the environment or the intensity of social play, which
must be maintained within certain normal bounds in
order for Kismet’s system to be at equilibrium.
“Emotions” constitute another facet of Kismet’s
motivational system. The robot’s emotional state is
modelled, after Ekman, cited in (Breazeal, 2000), as
a point in three-dimensional space, where the axes
represent arousal, valence, and stance. The choice
of emotion depends on simple appraisals of the per-
ceptual stimuli. The robot has a 15 DoF face that
mirrors its internal “emotional” state expressively.