Measuring Semantic Similarity by Latent Relational Analysis

Provided by Cognitive Sciences ePrint Archive

Peter D. Turney

Institute for Information Technology
National Research Council Canada

M-50 Montreal Road, Ottawa, Ontario, Canada, K1A 0R6

Abstract

This paper introduces Latent Relational Analysis
(LRA), a method for measuring semantic similar-
ity. LRA measures similarity in the semantic rela-
tions between two pairs of words. When two pairs
have a high degree of relational similarity, they are
analogous. For example, the pair cat:meow is
analogous to the pair dog:bark. There is evidence
from cognitive science that relational similarity is
fundamental to many cognitive and linguistic tasks
(e.g., analogical reasoning). In the Vector Space
Model (VSM) approach to measuring relational
similarity, the similarity between two pairs is cal-
culated by the cosine of the angle between the vec-
tors that represent the two pairs. The elements in
the vectors are based on the frequencies of manu-
ally constructed patterns in a large corpus. LRA ex-
tends the VSM approach in three ways: (1) patterns
are derived automatically from the corpus, (2) Sin-
gular Value Decomposition is used to smooth the
frequency data, and (3) synonyms are used to re-
formulate word pairs. This paper describes the
LRA algorithm and experimentally compares LRA
to VSM on two tasks, answering college-level mul-
tiple-choice word analogy questions and classify-
ing semantic relations in noun-modifier expres-
sions. LRA achieves state-of-the-art results, reach-
ing human-level performance on the analogy ques-
tions and significantly exceeding VSM perform-
ance on both tasks.

1 Introduction

This paper introduces Latent Relational Analysis (LRA), a
method for measuring relational similarity. LRA has poten-
tial applications in many areas, including information ex-
traction, word sense disambiguation, machine translation,
and information retrieval.

Relational similarity is correspondence between relations,
in contrast with attributional similarity, which is correspon-
dence between attributes [Medin et al., 1990]. When two
words have a high degree of attributional similarity, we say
they are synonymous. When two pairs of words have a high
degree of relational similarity, we say they are analogous.
For example, the word pair mason:stone is analogous to the
pair carpenter:wood; the relation between mason and stone
is highly similar to the relation between carpenter and wood.

Past work on semantic similarity measures has mainly
been concerned with attributional similarity. For instance,
Latent Semantic Analysis (LSA) can measure the degree of
similarity between two words, but not between two relations
[Landauer and Dumais, 1997].

Recently the Vector Space Model (VSM) of information
retrieval has been adapted to the task of measuring relational
similarity, achieving a score of 47% on a collection of 374
college-level multiple-choice word analogy questions [Tur-
ney and Littman, 2005]. The VSM approach represents the
relation between a pair of words by a vector of frequencies
of predefined patterns in a large corpus.

LRA extends the VSM approach in three ways: (1) the
patterns are derived automatically from the corpus (they are
not predefined), (2) the Singular Value Decomposition
(SVD) is used to smooth the frequency data (it is also used
this way in LSA), and (3) automatically generated syno-
nyms are used to explore reformulations of the word pairs.
LRA achieves 56% on the 374 analogy questions, statisti-
cally equivalent to the average human score of 57%. On the
related problem of classifying noun-modifier relations, LRA
achieves similar gains over the VSM. For both problems,
LRA’s performance is state-of-the-art.

To motivate this research, Section 2 briefly outlines some
possible applications for a measure of relational similarity.
Related work with the VSM approach to relational similar-
ity is described in Section 3. The LRA algorithm is pre-
sented in Section 4. LRA and VSM are experimentally
evaluated by their performance on word analogy questions
in Section 5 and on classifying semantic relations in noun-
modifier expressions in Section 6. We discuss the interpreta-
tion of the results, limitations of LRA, and future work in
Section 7. The paper concludes in Section 8.

2 Applications of Relational Similarity

Many problems in text processing would be solved (or at
least greatly simplified) if we had a black box that could
take as input two chunks of text and produce as output a
measure of the degree of similarity in the meanings of the
two chunks. We could use it for information retrieval, ques-

More intriguing information

1. The storage and use of newborn babies’ blood spot cards: a public consultation
2. The name is absent
3. Naïve Bayes vs. Decision Trees vs. Neural Networks in the Classification of Training Web Pages
4. Non-causality in Bivariate Binary Panel Data
5. Trade and Empire, 1700-1870
6. The name is absent
7. The name is absent
8. Exchange Rate Uncertainty and Trade Growth - A Comparison of Linear and Nonlinear (Forecasting) Models
9. The name is absent
10. Portuguese Women in Science and Technology (S&T): Some Gender Features Behind MSc. and PhD. Achievement