Measuring Semantic Similarity by Latent Relational Analysis



tion answering, machine translation using parallel corpora,
information extraction, word sense disambiguation, text
summarization,measuringlexicalcohesion,identifyingsen-
timent and affect in text, and many other tasks in natural
language processing. This is the vision that motivates re-
searchinparaphrasing[BarzilayandMcKeown,2001]and
textualentailment[DaganandGlickman,2004],twotopics
thathavelatelyattractedmuchinterest.

Intheabsenceofsuchablackbox,currentapproachesto
theseproblemstypicallyusemeasuresofattributionalsimi-
larity. For example, the standard bag-of-wordsapproach to
information retrieval is based on attributional similarity
[Salton and McGill,1983].Given a query, a searchengine
produces a ranked list of documents, where the rank of a
document depends on the attributional similarity of the
document to the query. The attributes are based on word
frequencies;relationsbetweenwordsareignored.

Although attributional similarity measures are very use-
ful, we believe that they are limited and should be supple-
mented by relational similarity measures. Cognitive psy-
chologists have also argued that human similarity judge-
ments involve both attributional and relational similarity
[Medin
et al.,1990].

Considerwordsensedisambiguationforexample.Iniso-
lation,theword“plant”couldrefertoanindustrialplantor
a living organism. Suppose the word “plant” appears in
sometextneartheword“food”.Atypicalapproachtodis-
ambiguating “plant” would compare the attributional simi-
larity of “food” and “industrial plant” to the attributional
similarity of “food” and “living organism” [Lesk, 1986;
BanerjeeandPedersen,2003].Inthiscase,thedecisionmay
not be clear, sinceindustrial plants often produce food and
livingorganismsoftenserveasfood.Itwouldbeveryhelp-
fultoknowtherelationbetween“food”and“plant”inthis
example. In the text “food
for the plant”, the relation be-
tween food and plant strongly suggests that the plant is a
livingorganism,sinceindustrialplantsdonotneedfood.In
the text “food
at the plant”, the relation strongly suggests
that the plant is an industrial plant, since living organisms
arenotusuallyconsideredaslocations.

A measure of relational similarity could potentially im-
prove the performance of any text processing application
thatcurrentlyusesameasureofattributionalsimilarity.We
believe relational similarity is the next step, after attribu-
tionalsimilarity,towardstheblackboxenvisionedabove.

3 Related Work

Let R1bethesemantic relationbetweenapair of words,A
and B,and letR2 be thesemantic relation betweenanother
pair,
Cand D.Wewishtomeasuretherelationalsimilarity
between
R1andR2.TherelationsR1andR2arenotgivento
us; our task is to infer these hidden (latent) relations and
thencomparethem.

IntheVSMapproachofTurneyandLittman[2005],we
createvectors,
r1andr2,thatrepresentfeaturesofR1andR2,
andmeasurethesimilarityof
R1andR2bythecosineofthe
angle
θbetween r1 = (r1.1,...,r^} and r2 = (r2i,...r-t^ :

n

Σ ru

cosine(0) =       ' 1         =

nn

J∑ (r1' )2 Σ (Г2' )2

'=1                       '=1

r1r2


^hl Fl Г2ІГ


We make a vector, r, to characterize the relationship be-
tween two words,
X and Y, by counting the frequencies of
variousshortphrasescontaining
XandY.TurneyandLitt-
man [2005] use a list of 64 joining terms, such as “of”,
“for”, and “to”, to form 128 phrases that contain
X and Y,
suchas
“XofY”, “YofX”, “XforY”, “YforX”, “XtoY”,
and“YtoX”.Thesephrasesarethenusedasqueriesfora
searchengineandthenumberofhits(matchingdocuments)
is recorded for each query. This process yields a vector of
128numbers.Ifthenumberofhitsforaqueryis
x,thenthe
correspondingelementinthevector
ris log(x + 1) .

TurneyandLittman[2005]evaluatedtheVSMapproach
by its performance on 374 college-level multiple-choice
SATanalogyquestions,achievingascoreof47%.ASAT
analogy question consists of a target word pair, called the
stem, and five cho'ce word pairs. To answer an analogy
question, vectors are created for the stem pair and each
choice pair, and then cosines are calculated for the angles
between the stem vector and each choice vector. The best
guessisthechoicepairwiththehighestcosine.Weusethe
samesetofanalogyquestionstoevaluateLRAinSection5.

ThebestpreviousperformanceontheSATquestionswas
achievedbycombiningthirteenseparatemodules[Turney
et
al.
,2003].TheperformanceofLRAsignificantlysurpasses
thiscombinedsystem,butthereisnorealcontestbetween
these approaches, because we can simply add LRA to the
combination,asafourteenthmodule.SincetheVSMmod-
ulehadthebestperformanceofthethirteenmodules[Tur-
ney
et al.,2003],thefollowingexperimentsfocusoncom-
paringVSMandLRA.

TheVSMwasalsoevaluatedbyitsperformanceasadis-
tance measure in a supervised nearest neighbour classifier
for noun-modifier semantic relations [Turney and Littman,
2005].Theproblemistoclassifyanoun-modifierpair,such
as “laser printer”, according to the semantic relation be-
tweentheheadnoun(printer)andthemodifier(laser).The
evaluation used 600 noun-modifier pairs that have been
manuallylabeledwith30classesofsemanticrelations[Nas-
taseandSzpakowicz,2003].Forexample,“laserprinter”is
classifiedas
'nstrument;theprinterusesthelaserasanin-
strumentforprinting.Atestingpairisclassifiedbysearch-
ing for its single nearest neighbour in the labeled training
data.Thebestguessisthelabelforthetrainingpairwiththe
highestcosine;thatis,thetrainingpairthatis
most analo-
gous
tothetestingpair,accordingtoVSM.LRAisevalu-
atedwiththesamesetofnoun-modifierpairsinSection6.

4 Latent Relational Analysis

LRAtakesasinputasetofwordpairsandproducesasout-
put a measure of the relational similarity between any two
of the input pairs. LRA relies on three resources, (1) a
searchenginewithaverylargecorpusoftext,(2)abroad-
coverage thesaurus of synonyms, and (3) an efficient im-



More intriguing information

1. A NEW PERSPECTIVE ON UNDERINVESTMENT IN AGRICULTURAL R&D
2. Multifunctionality of Agriculture: An Inquiry Into the Complementarity Between Landscape Preservation and Food Security
3. The name is absent
4. The effect of globalisation on industrial districts in Italy: evidence from the footwear sector
5. ¿Por qué se privatizan servicios en los municipios (pequeños)? Evidencia empírica sobre residuos sólidos y agua.
6. Improving behaviour classification consistency: a technique from biological taxonomy
7. The name is absent
8. The name is absent
9. Elicited bid functions in (a)symmetric first-price auctions
10. The name is absent
11. The name is absent
12. On the Relation between Robust and Bayesian Decision Making
13. The name is absent
14. Analyse des verbraucherorientierten Qualitätsurteils mittels assoziativer Verfahren am Beispiel von Schweinefleisch und Kartoffeln
15. Cardiac Arrhythmia and Geomagnetic Activity
16. The Works of the Right Honourable Edmund Burke
17. Higher education funding reforms in England: the distributional effects and the shifting balance of costs
18. The name is absent
19. Placentophagia in Nonpregnant Nulliparous Mice: A Genetic Investigation1
20. Place of Work and Place of Residence: Informal Hiring Networks and Labor Market Outcomes