Word Sense Disambiguation by Web Mining for Word Co-occurrence Probabilities



weka.classifiers.meta.Bagging

-W weka.classifiers.meta.MultiClassClassifier

-W weka.classifiers.meta.Vote

-B weka.classifiers.functions.supportVector.SMO

-B weka.classifiers.meta.LogitBoost -W weka.classifiers.trees.DecisionStump

-B weka.classifiers.meta.LogitBoost -W weka.classifiers.functions.SimpleLinearRegression

-B weka.classifiers.trees.adtree.ADTree

-B weka.classifiers.rules.JRip

Table 1: Weka (version 3.4) commands for processing the feature vectors.

PMI(w1,w2) has a value of zero when the two
words are statistically independent. A high posi-
tive value indicates that the two words tend to co-
occur, and hence are likely to be semantically relat-
ed. A negative value indicates that the presence of
one of the words suggests the absence of the other.
Past work demonstrates that PMI is a good estima-
tor of semantic similarity (Turney, 2001; Terra and
Clarke, 2003) and that features based on PMI can be
useful for supervised learning (Turney, 2003).

The Waterloo MultiText System allows us to set
the neighbourhood size for co-occurrence (i.e., the
meaning of wɪ A w2). In preliminary experiments
with the ELS data from Senseval-2, we got good
results with a neighbourhood size of 20 words.

For instance, if w is the noun, verb, or adjec-
tive that precedes the head word and is nearest to
the head word in a given window, then the value
of pre_compelling is PMI(w, compelling). If
there is no preceding noun, verb, or adjective within
the window, the value is set to zero.

In names of the form avgрюзШоПєЄєпєє, the
feature value is the average of the feature values of
the corresponding features. For instance, the val-
ue of
avg_pre_argument_1_10_02 is the aver-
age of the values of all of the
premoodel features,
such that
model was extracted from a training win-
dow in which the head word was labeled with the
sense
argument_1_10_02.

The idea here is that, if a testing example should
be labeled, say,
argument_1_10_02, and is a
noun, verb, or adjective that is close to the head
word in the testing example, then PMI(wχ,w2)
should be relatively high when w2 is extract-
ed from a training window with the same sense,
argument_1_10_02, but relatively low when w2
is extracted from a training window with a different
sense. Thus
avgpoostt)on_argument_1_10_02
is likely to be relatively high, compared to other
avgpoosttiotiseenee features.

All semantic features with names of the form
poeition_model are normalized by converting them
to percentiles. The percentiles are calculated sepa-
rately for each feature vector; that is, each feature
vector is normalized internally, with respect to its
own values, not externally, with respect to the oth-
er feature vectors. The
pre features are normalized
independently from the
fol features. The semantic
features with names of the form
avg рюзШоПєЄєпєє
are calculated after the other features are normal-
ized, so they do not need any further normalization.
Preliminary experiments with the ELS data from
Senseval-2 supported the merit of percentile nor-
malization, which was also found useful in another
application where features based on PMI were used
for supervised learning (Turney, 2003).

2.4 Weka Configuration

Table 1 shows the commands that were used to exe-
cute Weka (Witten and Frank, 1999). The default
parameters were used for all of the classifiers. Five
base classifiers (-B) were combined by voting. Mul-
tiple classes were handled by treating them as mul-
tiple two-class problems, using a 1-against-all strat-
egy. Finally, the variance of the system was reduced
with bagging.

We designed the Weka configuration by evalu-
ating many different Weka base classifiers on the
Senseval-2 ELS data, until we had identified five
good base classifiers. We then experimented with
combining the base classifiers, using a variety of
meta-learning algorithms. The resulting system is
somewhat similar to the JHU system, which had
the best ELS scores in Senseval-2 (Yarowsky et al.,
2001). The JHU system combined four base clas-
sifiers using a form of voting, called Thresholded
Model Voting (Yarowsky et al., 2001).

2.5 Postprocessing

The output of Weka includes an estimate of the
probability for each prediction. When the head
word is frequently labeled U (unassignable) in the
training examples, we ignore U examples during
training, and then, after running Weka, relabel the
lowest probability testing examples as U.

3 Results

A total of 26 teams entered 47 systems (both
supervised and unsupervised) in the Senseval-3
ELS task. Table 2 compares the fine-grained and



More intriguing information

1. How does an infant acquire the ability of joint attention?: A Constructive Approach
2. Informal Labour and Credit Markets: A Survey.
3. Implementation of the Ordinal Shapley Value for a three-agent economy
4. The name is absent
5. The ultimate determinants of central bank independence
6. Labour Market Flexibility and Regional Unemployment Rate Dynamics: Spain (1980-1995)
7. The name is absent
8. The Making of Cultural Policy: A European Perspective
9. SOME ISSUES CONCERNING SPECIFICATION AND INTERPRETATION OF OUTDOOR RECREATION DEMAND MODELS
10. NEW DEVELOPMENTS IN FARM PRICE AND INCOME POLICY PROGRAMS: PART I. SITUATION AND PROBLEM
11. The name is absent
12. The name is absent
13. Credit Market Competition and Capital Regulation
14. POWER LAW SIGNATURE IN INDONESIAN LEGISLATIVE ELECTION 1999-2004
15. Should Local Public Employment Services be Merged with the Local Social Benefit Administrations?
16. The name is absent
17. IMPACTS OF EPA DAIRY WASTE REGULATIONS ON FARM PROFITABILITY
18. Evolutionary Clustering in Indonesian Ethnic Textile Motifs
19. The name is absent
20. Picture recognition in animals and humans