Word Sense Disambiguation by Web Mining for Word Co-occurrence Probabilities

2.1 Preprocessing

The NRC WSD system first assigns part-of-speech
tags to the words in a given example (Brill, 1994),
and then extracts a nine-word window of tagged
text, centered on the head word (i.e., four words
before and after the head word). Any remaining
words in the example are ignored (usually most of
the example is ignored). The window is not allowed
to cross sentence boundaries. If the head word
appears near the beginning or end of the sentence,
where the window may overlap with adjacent sen-
tences, special null characters fill the positions of
any missing words in the window.

In rare cases, a head word appears more than once
in an example. In such cases, the system selects
a single window, giving preference to the earliest
occurring window with the least nulls. Thus each
example is converted into one nine-word window of
tagged text. Windows from the training examples
for a given head word are then used to build the fea-
ture set for that head word.

2.2 Syntactic Features

Each head word has a unique set of feature names,
describing how the feature values are calculated.
Feature Names: Every syntactic feature has a name
of the form matchtype position model. There are
three matchtypes, ptag, tag, and word, in order
of increasingly strict matching. A ptag match is
a partial tag match, which counts similar part-of-
speech tags, such as NN (singular noun), NNS (plu-
ral noun), NNP (singular proper noun), and NNPS
(plural proper noun), as equivalent. A tag match
requires exact matching in the part-of-speech tags
for the word and the model. A word match requires
that the word and the model are exactly the same,
letter-for-letter, including upper and lower case.

There are five positions, hm2 (head minus two),
hm1 (head minus one), hd0 (head), hp1 (head plus
one), and hp2 (head plus two). Thus syntactic fea-
tures use only a five-word sub-window of the nine-
word window.

The syntactic feature names for a head word
are generated by all of the possible legal combina-
tions of matchtype, position, and model. For ptag
names, the model can be any partial tag. For tag
names, the model can be any tag. For word names,
the model names are not predetermined; they are
extracted from the training windows for the given
head word. For instance, if a training window con-
tains the head word followed by “of’, then one of
the features will be word hp1 of.

For word names, the model names are not
allowed to be words that are tagged as nouns, verbs,
or adjectives. These words are reserved for use in
building the semantic features.

Feature Values: The syntactic features are all
binary-valued. Given a feature with a name of the
form matchtype position model, the feature value
for a given window depends on whether there is a
match of matchtype between the word in the posi-
tion position and the model model. For instance,
the value of tag hp1 NNP depends on whether
the given window has a word in the position hp1
(head plus one) with a tag (part-of-speech tag) that
matches NNP (proper noun). Similarly, the feature
word hp1 of has the value 1 (true) if the given
window contains the head word followed by “of’;
otherwise, it has the value 0 (false).

2.3 Semantic Features

Each head word has a unique set of feature names,
describing how the feature values are calculated.
Feature Names: Most of the semantic features have
names of the form position model. The position
names can be pre (preceding) or fol (following).
They refer to the nearest noun, verb, or adjective
that precedes or follows the head word in the nine-
word window.

The model names are extracted from the training
windows for the head word. For instance, if a train-
ing window contains the word “compelling”, and
this word is the nearest noun, verb, or adjective that
precedes the head word, then one of the features will
be pre compelling.

A few of the semantic features have a different
form of name, avg position sense. In names of this
form, position can be pre (preceding) or fol (fol-
lowing), and sense can be any of the possible senses
(i.e., classes, labels) of the head word.

Feature Values: The semantic features are all
real-valued. For feature names of the form posi-
tion model, the feature value depends on the seman-
tic similarity between the word in position position
and the model word model.

The semantic similarity between two words is
estimated by their Pointwise Mutual Information,
I’M I(fj:і, jr∙j J, using Information Retrieval (Turney,
2001; Terra and Clarke, 2003):

HM ∣(va∣ . hk It)"-
■ U-(n∙ I ',^,!>{'<i⅛ ) /

We estimate the probabilities in this equation by
issuing queries to the Waterloo MultiText System
(Clarke et al., 1995; Clarke and Cormack, 2000;
Terra and Clarke, 2003). Laplace smoothing is
applied to the PMI estimates, to avoid division by
zero.

More intriguing information

1. The name is absent
2. Nonparametric cointegration analysis
3. CHANGING PRICES, CHANGING CIGARETTE CONSUMPTION
4. Business Cycle Dynamics of a New Keynesian Overlapping Generations Model with Progressive Income Taxation
5. Structural Conservation Practices in U.S. Corn Production: Evidence on Environmental Stewardship by Program Participants and Non-Participants
6. Second Order Filter Distribution Approximations for Financial Time Series with Extreme Outlier
7. The name is absent
8. DURABLE CONSUMPTION AS A STATUS GOOD: A STUDY OF NEOCLASSICAL CASES
9. Optimal Rent Extraction in Pre-Industrial England and France – Default Risk and Monitoring Costs
10. Regional specialisation in a transition country - Hungary