Word Sense Disambiguation by Web Mining for Word Co-occurrence Probabilities

Provided by Cognitive Sciences ePrint Archive

Word Sense Disambiguation by Web Mining
for Word Co-occurrence Probabilities

Peter D. TURNEY

Institute for Information Technology

National Research Council of Canada

Ottawa, Ontario, Canada, K1A 0R6
[email protected]

Abstract

This paper describes the National Research Coun-
cil (NRC) Word Sense Disambiguation (WSD) sys-
tem, as applied to the English Lexical Sample (ELS)
task in Senseval-3. The NRC system approach-
es WSD as a classical supervised machine learn-
ing problem, using familiar tools such as the Weka
machine learning software and Brill’s rule-based
part-of-speech tagger. Head words are represent-
ed as feature vectors with several hundred features.
Approximately half of the features are syntactic and
the other half are semantic. The main novelty in the
system is the method for generating the semantic
features, based on word co-occurrence probabilities.
The probabilities are estimated using the Waterloo
MultiText System with a corpus of about one ter-
abyte of unlabeled text, collected by a web crawler.

1 Introduction

The Senseval-3 English Lexical Sample (ELS) task
requires disambiguating 57 words, with an average
of roughly 140 training examples and 70 testing
examples of each word. Each example is about a
paragraph of text, in which the word that is to be dis-
ambiguated is marked as the head word. The aver-
age head word has around six senses. The training
examples are manually classified according to the
intended sense of the head word, inferred from the
surrounding context. The task is to use the training
data and any other relevant information to automat-
ically assign classes to the testing examples.

This paper presents the National Research Coun-
cil (NRC) Word Sense Disambiguation (WSD)
system, which generated our four entries for
the Senseval-3 ELS task (NRC-Fine, NRC-Fine2,
NRC-Coarse, and NRC-Coarse2). Our approach to
the ELS task is to treat it as a classical supervised
machine learning problem. Each example is repre-
sented as a feature vector with several hundred fea-
tures. Each of the 57 ambiguous words is represent-
ed with a different set of features. Typically, around
half of the features are syntactic and the other half
are semantic. After the raw examples are converted
to feature vectors, the Weka machine learning soft-
ware is used to induce a model of the training data
and predict the classes of the testing examples (Wit-
ten and Frank, 1999).

The syntactic features are based on part-of-
speech tags, assigned by a rule-based tagger (Brill,
1994). The main innovation of the NRC WSD sys-
tem is the method for generating the semantic fea-
tures, which are derived from word co-occurrence
probabilities. We estimated these probabilities
using the Waterloo MultiText System with a corpus
of about one terabyte of unlabeled text, collected by
a web crawler (Clarke et al., 1995; Clarke and Cor-
mack, 2000; Terra and Clarke, 2003).

In Section 2, we describe the NRC WSD system.
Our experimental results are presented in Section 3
and we conclude in Section 4.

2 System Description

This section presents various aspects of the system
in roughly the order in which they are executed. The
following definitions will simplify the description.

Head Word: One of the 57 words that are to be
disambiguated.

Example: One or more contiguous sentences, illus-
trating the usage of a head word.

Context: The non-head words in an example.
Feature: A property of a head word in a context.
For instance, the feature tag_hp1_NNP is the prop-
erty of having (or not having) a proper noun (NNP
is the part-of-speech tag for a proper noun) immedi-
ately following the head word (hp1 represents the
location head plus one).

Feature Value: Features have values, which
depend on the specific example. For instance,
tag_hp1_NNP is a binary feature that has the value
1 (true: the following word is a proper noun) or 0
(false: the following word is not a proper noun).

Feature Vector: Each example is represented by
a vector. Features are the dimensions of the vector
space and a vector of feature values specifies a point
in the feature space.

More intriguing information

1. What Contribution Can Residential Field Courses Make to the Education of 11-14 Year-olds?
2. The name is absent
3. The name is absent
4. Deprivation Analysis in Declining Inner City Residential Areas: A Case Study From Izmir, Turkey.
5. Accurate and robust image superresolution by neural processing of local image representations
6. Investment in Next Generation Networks and the Role of Regulation: A Real Option Approach
7. Labour Market Flexibility and Regional Unemployment Rate Dynamics: Spain (1980-1995)
8. The name is absent
9. Estimating the Technology of Cognitive and Noncognitive Skill Formation
10. Solidaristic Wage Bargaining