Word Sense Disambiguation by Web Mining for Word Co-occurrence Probabilities



Provided by Cognitive Sciences ePrint Archive


Word Sense Disambiguation by Web Mining
for Word Co-occurrence Probabilities

Peter D. TURNEY

Institute for Information Technology

National Research Council of Canada

Ottawa, Ontario, Canada, K1A 0R6
[email protected]

Abstract

This paper describes the National Research Coun-
cil (NRC) Word Sense Disambiguation (WSD) sys-
tem, as applied to the English Lexical Sample (ELS)
task in Senseval-3. The NRC system approach-
es WSD as a classical supervised machine learn-
ing problem, using familiar tools such as the Weka
machine learning software and Brill’s rule-based
part-of-speech tagger. Head words are represent-
ed as feature vectors with several hundred features.
Approximately half of the features are syntactic and
the other half are semantic. The main novelty in the
system is the method for generating the semantic
features, based on word co-occurrence probabilities.
The probabilities are estimated using the Waterloo
MultiText System with a corpus of about one ter-
abyte of unlabeled text, collected by a web crawler.

1 Introduction

The Senseval-3 English Lexical Sample (ELS) task
requires disambiguating 57 words, with an average
of roughly 140 training examples and 70 testing
examples of each word. Each example is about a
paragraph of text, in which the word that is to be dis-
ambiguated is marked as the
head word. The aver-
age head word has around six senses. The training
examples are manually classified according to the
intended sense of the head word, inferred from the
surrounding context. The task is to use the training
data and any other relevant information to automat-
ically assign classes to the testing examples.

This paper presents the National Research Coun-
cil (NRC) Word Sense Disambiguation (WSD)
system, which generated our four entries for
the Senseval-3 ELS task (NRC-Fine, NRC-Fine2,
NRC-Coarse, and NRC-Coarse2). Our approach to
the ELS task is to treat it as a classical supervised
machine learning problem. Each example is repre-
sented as a feature vector with several hundred fea-
tures. Each of the 57 ambiguous words is represent-
ed with a different set of features. Typically, around
half of the features are syntactic and the other half
are semantic. After the raw examples are converted
to feature vectors, the Weka machine learning soft-
ware is used to induce a model of the training data
and predict the classes of the testing examples (Wit-
ten and Frank, 1999).

The syntactic features are based on part-of-
speech tags, assigned by a rule-based tagger (Brill,
1994). The main innovation of the NRC WSD sys-
tem is the method for generating the semantic fea-
tures, which are derived from word co-occurrence
probabilities. We estimated these probabilities
using the Waterloo MultiText System with a corpus
of about one terabyte of unlabeled text, collected by
a web crawler (Clarke et al., 1995; Clarke and Cor-
mack, 2000; Terra and Clarke, 2003).

In Section 2, we describe the NRC WSD system.
Our experimental results are presented in Section 3
and we conclude in Section 4.

2 System Description

This section presents various aspects of the system
in roughly the order in which they are executed. The
following definitions will simplify the description.

Head Word: One of the 57 words that are to be
disambiguated.

Example: One or more contiguous sentences, illus-
trating the usage of a head word.

Context: The non-head words in an example.
Feature: A property of a head word in a context.
For instance, the feature
tag_hp1_NNP is the prop-
erty of having (or not having) a proper noun (
NNP
is the part-of-speech tag for a proper noun) immedi-
ately following the head word (
hp1 represents the
location
head plus one).

Feature Value: Features have values, which
depend on the specific example. For instance,
tag_hp1_NNP is a binary feature that has the value
1 (
true: the following word is a proper noun) or 0
(
false: the following word is not a proper noun).

Feature Vector: Each example is represented by
a vector. Features are the dimensions of the vector
space and a vector of feature values specifies a point
in the feature space.



More intriguing information

1. How do investors' expectations drive asset prices?
2. Nonparametric cointegration analysis
3. The name is absent
4. The name is absent
5. EMU's Decentralized System of Fiscal Policy
6. The name is absent
7. The name is absent
8. The name is absent
9. TOWARD CULTURAL ONCOLOGY: THE EVOLUTIONARY INFORMATION DYNAMICS OF CANCER
10. THE CO-EVOLUTION OF MATTER AND CONSCIOUSNESS1
11. A Location Game On Disjoint Circles
12. Competition In or For the Field: Which is Better
13. The name is absent
14. THE UNCERTAIN FUTURE OF THE MEXICAN MARKET FOR U.S. COTTON: IMPACT OF THE ELIMINATION OF TEXTILE AND CLOTHING QUOTAS
15. Self-Help Groups and Income Generation in the Informal Settlements of Nairobi
16. The name is absent
17. The economic value of food labels: A lab experiment on safer infant milk formula
18. Income Mobility of Owners of Small Businesses when Boundaries between Occupations are Vague
19. The name is absent
20. A Unified Model For Developmental Robotics