weka.classifiers.meta.Bagging
-W weka.classifiers.meta.MultiClassClassifier
-W weka.classifiers.meta.Vote
-B weka.classifiers.functions.supportVector.SMO
-B weka.classifiers.meta.LogitBoost -W weka.classifiers.trees.DecisionStump
-B weka.classifiers.meta.LogitBoost -W weka.classifiers.functions.SimpleLinearRegression
-B weka.classifiers.trees.adtree.ADTree
-B weka.classifiers.rules.JRip
Table 1: Weka (version 3.4) commands for processing the feature vectors.
PMI(w1,w2) has a value of zero when the two
words are statistically independent. A high posi-
tive value indicates that the two words tend to co-
occur, and hence are likely to be semantically relat-
ed. A negative value indicates that the presence of
one of the words suggests the absence of the other.
Past work demonstrates that PMI is a good estima-
tor of semantic similarity (Turney, 2001; Terra and
Clarke, 2003) and that features based on PMI can be
useful for supervised learning (Turney, 2003).
The Waterloo MultiText System allows us to set
the neighbourhood size for co-occurrence (i.e., the
meaning of wɪ A w2). In preliminary experiments
with the ELS data from Senseval-2, we got good
results with a neighbourhood size of 20 words.
For instance, if w is the noun, verb, or adjec-
tive that precedes the head word and is nearest to
the head word in a given window, then the value
of pre_compelling is PMI(w, compelling). If
there is no preceding noun, verb, or adjective within
the window, the value is set to zero.
In names of the form avgрюзШоПєЄєпєє, the
feature value is the average of the feature values of
the corresponding features. For instance, the val-
ue of avg_pre_argument_1_10_02 is the aver-
age of the values of all of the premoodel features,
such that model was extracted from a training win-
dow in which the head word was labeled with the
sense argument_1_10_02.
The idea here is that, if a testing example should
be labeled, say, argument_1_10_02, and wɪ is a
noun, verb, or adjective that is close to the head
word in the testing example, then PMI(wχ,w2)
should be relatively high when w2 is extract-
ed from a training window with the same sense,
argument_1_10_02, but relatively low when w2
is extracted from a training window with a different
sense. Thus avgpoostt)on_argument_1_10_02
is likely to be relatively high, compared to other
avgpoosttiotiseenee features.
All semantic features with names of the form
poeition_model are normalized by converting them
to percentiles. The percentiles are calculated sepa-
rately for each feature vector; that is, each feature
vector is normalized internally, with respect to its
own values, not externally, with respect to the oth-
er feature vectors. The pre features are normalized
independently from the fol features. The semantic
features with names of the form avg рюзШоПєЄєпєє
are calculated after the other features are normal-
ized, so they do not need any further normalization.
Preliminary experiments with the ELS data from
Senseval-2 supported the merit of percentile nor-
malization, which was also found useful in another
application where features based on PMI were used
for supervised learning (Turney, 2003).
2.4 Weka Configuration
Table 1 shows the commands that were used to exe-
cute Weka (Witten and Frank, 1999). The default
parameters were used for all of the classifiers. Five
base classifiers (-B) were combined by voting. Mul-
tiple classes were handled by treating them as mul-
tiple two-class problems, using a 1-against-all strat-
egy. Finally, the variance of the system was reduced
with bagging.
We designed the Weka configuration by evalu-
ating many different Weka base classifiers on the
Senseval-2 ELS data, until we had identified five
good base classifiers. We then experimented with
combining the base classifiers, using a variety of
meta-learning algorithms. The resulting system is
somewhat similar to the JHU system, which had
the best ELS scores in Senseval-2 (Yarowsky et al.,
2001). The JHU system combined four base clas-
sifiers using a form of voting, called Thresholded
Model Voting (Yarowsky et al., 2001).
2.5 Postprocessing
The output of Weka includes an estimate of the
probability for each prediction. When the head
word is frequently labeled U (unassignable) in the
training examples, we ignore U examples during
training, and then, after running Weka, relabel the
lowest probability testing examples as U.
3 Results
A total of 26 teams entered 47 systems (both
supervised and unsupervised) in the Senseval-3
ELS task. Table 2 compares the fine-grained and