Word Sense Disambiguation by Web Mining for Word Co-occurrence Probabilities



Provided by Cognitive Sciences ePrint Archive


Word Sense Disambiguation by Web Mining
for Word Co-occurrence Probabilities

Peter D. TURNEY

Institute for Information Technology

National Research Council of Canada

Ottawa, Ontario, Canada, K1A 0R6
[email protected]

Abstract

This paper describes the National Research Coun-
cil (NRC) Word Sense Disambiguation (WSD) sys-
tem, as applied to the English Lexical Sample (ELS)
task in Senseval-3. The NRC system approach-
es WSD as a classical supervised machine learn-
ing problem, using familiar tools such as the Weka
machine learning software and Brill’s rule-based
part-of-speech tagger. Head words are represent-
ed as feature vectors with several hundred features.
Approximately half of the features are syntactic and
the other half are semantic. The main novelty in the
system is the method for generating the semantic
features, based on word co-occurrence probabilities.
The probabilities are estimated using the Waterloo
MultiText System with a corpus of about one ter-
abyte of unlabeled text, collected by a web crawler.

1 Introduction

The Senseval-3 English Lexical Sample (ELS) task
requires disambiguating 57 words, with an average
of roughly 140 training examples and 70 testing
examples of each word. Each example is about a
paragraph of text, in which the word that is to be dis-
ambiguated is marked as the
head word. The aver-
age head word has around six senses. The training
examples are manually classified according to the
intended sense of the head word, inferred from the
surrounding context. The task is to use the training
data and any other relevant information to automat-
ically assign classes to the testing examples.

This paper presents the National Research Coun-
cil (NRC) Word Sense Disambiguation (WSD)
system, which generated our four entries for
the Senseval-3 ELS task (NRC-Fine, NRC-Fine2,
NRC-Coarse, and NRC-Coarse2). Our approach to
the ELS task is to treat it as a classical supervised
machine learning problem. Each example is repre-
sented as a feature vector with several hundred fea-
tures. Each of the 57 ambiguous words is represent-
ed with a different set of features. Typically, around
half of the features are syntactic and the other half
are semantic. After the raw examples are converted
to feature vectors, the Weka machine learning soft-
ware is used to induce a model of the training data
and predict the classes of the testing examples (Wit-
ten and Frank, 1999).

The syntactic features are based on part-of-
speech tags, assigned by a rule-based tagger (Brill,
1994). The main innovation of the NRC WSD sys-
tem is the method for generating the semantic fea-
tures, which are derived from word co-occurrence
probabilities. We estimated these probabilities
using the Waterloo MultiText System with a corpus
of about one terabyte of unlabeled text, collected by
a web crawler (Clarke et al., 1995; Clarke and Cor-
mack, 2000; Terra and Clarke, 2003).

In Section 2, we describe the NRC WSD system.
Our experimental results are presented in Section 3
and we conclude in Section 4.

2 System Description

This section presents various aspects of the system
in roughly the order in which they are executed. The
following definitions will simplify the description.

Head Word: One of the 57 words that are to be
disambiguated.

Example: One or more contiguous sentences, illus-
trating the usage of a head word.

Context: The non-head words in an example.
Feature: A property of a head word in a context.
For instance, the feature
tag_hp1_NNP is the prop-
erty of having (or not having) a proper noun (
NNP
is the part-of-speech tag for a proper noun) immedi-
ately following the head word (
hp1 represents the
location
head plus one).

Feature Value: Features have values, which
depend on the specific example. For instance,
tag_hp1_NNP is a binary feature that has the value
1 (
true: the following word is a proper noun) or 0
(
false: the following word is not a proper noun).

Feature Vector: Each example is represented by
a vector. Features are the dimensions of the vector
space and a vector of feature values specifies a point
in the feature space.



More intriguing information

1. The name is absent
2. Disturbing the fiscal theory of the price level: Can it fit the eu-15?
3. Foreword: Special Issue on Invasive Species
4. The name is absent
5. Tax Increment Financing for Optimal Open Space Preservation: an Economic Inquiry
6. Heavy Hero or Digital Dummy: multimodal player-avatar relations in FINAL FANTASY 7
7. Trade Openness and Volatility
8. The name is absent
9. Agricultural Policy as a Social Engineering Tool
10. Banking Supervision in Integrated Financial Markets: Implications for the EU
11. School Effectiveness in Developing Countries - A Summary of the Research Evidence
12. Insurance within the firm
13. The name is absent
14. ENERGY-RELATED INPUT DEMAND BY CROP PRODUCERS
15. DETERMINANTS OF FOOD AWAY FROM HOME AMONG AFRICAN-AMERICANS
16. On the Integration of Digital Technologies into Mathematics Classrooms
17. A methodological approach in order to support decision-makers when defining Mobility and Transportation Politics
18. The changing face of Chicago: demographic trends in the 1990s
19. Discourse Patterns in First Language Use at Hcme and Second Language Learning at School: an Ethnographic Approach
20. Gender and headship in the twenty-first century