Table 1: Comparison between children's semantic space
and adult semantic spaces
Semantic space |
Size (in |
Percentage of |
d |
Children |
3.2 |
.53 |
~69 |
Literature |
14.1 |
.38 |
.52 |
Le Monde 1993 |
19.3 |
.44 |
.23 |
Le Monde 1995 |
20.6 |
.37 |
.21 |
Le Monde 1997 |
24.7 |
.40 |
.28 |
Le Monde 1999 |
24.2 |
.34 |
.25 |
In accordance with the previous experiment, the children's
semantic space has the better results, although its size is
much smaller. Student tests have shown that the children
semantic space is significantly different from others
(p < .05) except for the percentage of correct answers when
compared to the Le Monde 1993 corpus (p < .1).
Experiment 2
This second experiment is based on verbal association
norms published by de La Haye (2003). Two-hundred
inducing words (144 nouns, 28 verbs and 28 adjectives)
were proposed to 9 to 11-year-old children. For each word,
participants had to provide the first word that came to their
mind. This resulted in a list of words, ranked by frequency.
For instance, given the word cartable (satchel), results are
the following for 9-year-old children:
- ecole (school): 51%
- sac (bag): 12%
- affaires (stuff): 6%
...
- classe (class): 1%
- sacoche (satchel): 1%
- vieux (old): 1%
This means that 51% of the children answered the word
ecole (school) when given the word cartable (satchel). The
two words are therefore strongly associated for 9-year-old
children. These association values were compared with the
LSA cosine between word vectors: we selected the three
best-ranked words as well as the three worst-ranked (like in
the previous example). We then measured the cosines
between the inducing word and the best ranked, the 2nd best-
ranked, the 3rd best ranked, and the mean cosine between the
inducing word and the three worst-ranked. Results are
presented in Table 2.
Table 2: Mean cosine between inducing word and various
associated words for 9-years-old children
Words____________ |
Mean cosine with inducing word |
Best-ranked words |
.26 |
2nd best-ranked words |
.23 |
3rd best ranked-words |
.19 |
3 worst-ranked words |
__________________.11_________________ |
Student tests show that all differences are significant
(p < .03). This means that our semantic space is not only
able to distinguish between the strong and weak associates,
but can also discriminate the first-ranked from the second-
ranked and the latter from the third-ranked.
Measure of correlation with human data is also significant
(r(1184 =.39, p<.001). Actually, two factors might have
lowered this result. First, although we tried to mimic what a
child has been exposed to, we could not control all word
frequencies within the corpus. Therefore, some words might
have occurred with a low frequency in the corpus, leading
to an inaccurate semantic representation. When the previous
comparison was performed on the 20% most frequent
words, the correlation was much higher (r(234 =.57,
p<.001).
The second factor is the participant agreement: when
most children provide the same answer to an inducing word,
there is a high agreement, which means that both words are
very strongly associated. However, there are cases when
there is almost no agreement: for instance the three first
answers to the word bruit (noise) are crier (to shout) (9%),
entendre (to hear) (7%) and silence (silence) (6%). It is not
surprising that the model corresponds better to the children
data in case of a high agreement, since this denotes a strong
association that should be reflected in the corpus. In order to
select answers whose agreement was higher, we measured
their entropy. The formula is the following:
entropy {item}= fre freq {answer ).log(
answer
freq a answer)
A low entropy corresponds to a high agreement and vice
versa. When we selected the 20% items with the lowest
entropy, the correlation also raises (r(234)=.48, p<.001).
All these results show that the association degree between
words defined by the cosine measure within the semantic
space seems to correspond quite well to children's
judgement of association.
We also compared these results with the previous adult
semantic spaces. Results are presented in Table 3.
Table 3: Correlations between participant child data and
different kinds of semantic spaces
Semantic space |
Size (in million |
Correlation with |
Children |
3.2 |
.39 |
Literature |
14.1 |
.34 |
Le Monde 1993 |
19.3 |
.31 |
Le Monde 1995 |
20.6 |
.26 |
Le Monde 1997 |
24.7 |
.26 |
Le Monde 1999 |
24.2 |
.24 |
In spite of much larger sizes, all adult semantic spaces
correlate worse than the children's semantic space with the
data of the participants in the study. Statistical tests show
that all differences between the child model and the other
semantic spaces are significant (p<.03).
Experiment 3
The third experiment is based on recall or summary tasks.
Children were asked to read a text and write out as much as
they could recall, immediately after reading or after a fixed