A Hybrid Neural Network and Virtual Reality System for Spatial Language Processing

suitcase), and the rain. For the protecting objects, the user
can edit some of their features, such as the size and rotation.
The program starts by showing an almost full-screen
window with eleven buttons and displays a man with his
right hand up. This man is rotated 60 degrees around his Y-
axis. The user can then display/hide an object and edit its
features. Once all the attributes are ready, the user can click
on the “NNAnswer” button to ask the NN module to
provide the rating for the four prepositions (Figure 3).

Figure 3: Interface of the VR system. The user can choose the
protecting object to display and edit its features. After the NN
processes the scene, the ratings for the four spatial prepositions
are shown in the bottom right corner of the interface.

This VR module was developed in Java using Borland’s
Builder Java3D library. Through the Java3D API is possible
to create simple virtual reality worlds. The Java program
also controlled the communication with the NN module
running in Mathlab.

3 Results

3.1 Training and generalization

The training task was relatively easy to learn for a
multiplayer perceptron, mainly due to the limited set of
training data (71 training stimuli). The final error for all
different architectures resulted in an average SSE 0.05. The
networks were also able to generalize well to the stimulus
taken out from the training set. The average generalization
error for all architectures was 0.04. Table 1 reports the
detailed average errors for each architecture. The results are
similar in the three conditions, with a tendency for the
feature-based object encoding network to reach lower
training error.

The whole VR and NN system was also successfully
tested. After manipulating the properties of objects in the
VR interface, the network produced the correct rating for
each preposition that were passed back to the VR interface
and shown to the user.

Table 1: Average training and generalization errors for the three
network architectures.

SSE error

Net A:

Localist
experiment

Net B:

Localist
object

Net C:
Feature
object

Training

0.051

0.055

0.046

Generalization

0.041

0.046

0.044

3.2 Analysis of Internal Representations

To understand the way geometrical and extra-geometrical
factors are processed by the networks, a cluster analysis of
the hidden activation was performed. This informs us about
the major criteria used by the network to perform the spatial
language task. A greater distance between clusters indicates
which variables are used first to process (i.e. separate)
stimuli and experimental conditions.

For each of the three network architectures, we chose the
five out of the ten replications with the best learning
performance. The connection weights of the fifteen selected
networks after epoch 10000 were used to calculate the
hidden activation. The activation values of the five hidden
units for each of the 72 input scenes were saved and used to
perform a cluster analysis. Subsequently, we studied the
cluster diagrams to identify the order in which some
functional and/or geometrical factors are used to separate
clusters of experimental scenes. Although there was
variability between the five cluster analyses of each
architecture, it was possible to identify some common
clustering strategies for each condition.

Diagrams of network A

With the experiment encoding architecture there are three
diagrams that share the use of common and consistent
clustering criteria. In these networks, clusters are created
early according to a geometrical factor, i.e. the
Orientation variable. The first divisions group input
scenes according to the degree of rotation (0, 45, 90) of the
protecting object. The second consistent clustering criterion
groups scenes according to the type of objects falling on the
man (e.g. rain or spray). In the fourth diagram, the early
clustering criteria are a mix of the Function fulfillment and
the Orientation variables. The fifth diagram does not
have an identifiable clustering criterion.

Diagrams of network B

In the five diagrams for the architecture with localist object
encoding, the early divisions into clusters are determined by
the variables Orientation and by that of the falling object.
There is not clear and consistent prioritization of these two
factors.

Diagrams of network C

The condition with feature-based encoding of objects has
four diagrams that share the same clustering criteria. The

More intriguing information

1. RETAIL SALES: DO THEY MEAN REDUCED EXPENDITURES? GERMAN GROCERY EVIDENCE
2. The name is absent
3. ‘Goodwill is not enough’
4. Income Mobility of Owners of Small Businesses when Boundaries between Occupations are Vague
5. Motivations, Values and Emotions: Three Sides of the same Coin
6. Existentialism: a Philosophy of Hope or Despair?
7. A Study of Prospective Ophthalmology Residents’ Career Perceptions
8. The economic doctrines in the wine trade and wine production sectors: the case of Bastiat and the Port wine sector: 1850-1908
9. Change in firm population and spatial variations: The case of Turkey
10. Learning-by-Exporting? Firm-Level Evidence for UK Manufacturing and Services Sectors