is over the man to describe a picture of a man holding an
umbrella were reduced when rain was depicted as falling on
the man even when the umbrella was depicted directly
above the man’s head [5].
1.2 Neural Network Models of Spatial Language
There is some computational work that has modeled the
acquisition and use of spatial terms using neural networks
with a psychologically and linguistically plausible
approach. Harris [9] used neural networks to model the
polysemy of the preposition over, that is the fact that the
term over appears to have many different senses, such as
"being above", "up", "across", etc. Harris's model used
feedforward neural networks trained through back
propagation to learn to associate the correct meaning of
over with different sentences. All input sentences contained
the term over to relate the position of a figure object with
respect to a ground object. After learning the correct
mapping of the meanings of over, the activity of some of
the hidden units auto-organizes in a way that units become
sensitive to certain features of the object set used in the
training sentences. There are units whose activation
distinguishes between objects which are or are not normally
in contact with a surface, and other units that are sensitive
to the size and shape of the objects.
The model introduces the problem of polysemy and
openness of the meaning of some spatial terms [9]. It shows
the emergence of the role of object-knowledge effects for
spatial language using auto-organization systems, such as
neural networks. However, this work lacks any reference to
the role of geometrical features in the learning and use of
spatial prepositions. The encoding of input in only linguistic
terms does not allow any processing of geometrical
properties between objects. The neural network model is
subject to the problem of symbol grounding in cognitively
plausible models [8].
Terry Regier [12] has proposed a computational model
for spatial prepositions using a method called "constrained
connectionism" [6]. The model is trained on the use of
various spatial prepositions for static (e.g. over and above)
and moving (e.g. through) objects, and makes explicit use
of the processing of geometrical information. The model
consists of a complex neural network in which the units'
layers and connection patterns are structured according to
neuropsychological and cognitive evidence; only a few
units are based on unstructured parallel distributed
processing. An image of two objects (ground and figure) is
input to the lower layer of the network. Then the image
goes through several levels of geometrical processing. The
output units, corresponding to spatial prepositions, are
activated according to the geometrical position of the figure
object with respect to the central ground. Regier [12] tested
this model for various cognitive and cross-linguistic spatial
language phenomena. For example, the model proved
suitable for reproducing the experimental data of Logan &
Sadler's [11] spatial templates for the prepositions over,
above, under and below.
The Regier model, even though it is able to reproduce
many of the experimental and cross-linguistic data on the
use and learning of spatial terms, has the limitations of
relying only on geometrical-based processing and only
deals with abstract objects. The network uses different
geometrical indices, such as the center of mass between the
two objects, their minimal distance, and the overlapping of
their shapes. Although the use of these geometric
components does allow the system to deal with change over
time, no other information is extracted and used, such as
that of the objects' functionality.
Recently, a new computational model for spatial
language has been proposed by Regier & Carlson [13]. This
does not use connectionist techniques. It is based both on
attentional factors on the processing of geometrical features
of abstract objects.
2 Method
The prototype of a hybrid NN and VR system has been
developed. The NN learns to use spatial prepositions in
response to input stimuli describing geometrical and
functional relationships between two objects. The NN
module is integrated with a VR interface, where a user can
directly manipulate geometric and extra-geometric factors.
This system can be used as an experimental tool for spatial
language and for natural language interfacing in VR
environments.
2.1 Neural Network
The NN architecture consists of a multi-layer perceptron.
The input layer receives information about a visual scene
depicting specific spatial configurations of objects. The
output units activate the correct spatial preposition(s)
describing the scene. The network has four output units,
respectively for the prepositions over, above, under and
below. The activation of each unit corresponds to the level
of agreement for the use of a specific term. After training,
the activation must correspond to the subjective ratings
collected in experimental studies. The hidden layer contains
five units, a number sufficient for the network to learn the
training data. The number of input units varies according to
the explicit/implicit encoding of some of the properties of
the objects and the scene.
The training and testing task utilize the stimuli and data
from an experiment on the role of functional factors in the
rating of the spatial prepositions over/above/under/below
(experiment 2 in [5]). In this study, subjects used a 7-point
Likert scale to rate the use of the four spatial prepositions
for 72 scenes. A scene always depicted a man
holding/wearing an object (e.g. umbrella, visor) to protect
himself from another object (e.g., rain, spray). In this
experiment four independent variables were manipulated:
Orientation of the protecting object (3 levels: an umbrella