ROBUST CLASSIFICATION WITH CONTEXT-SENSITIVE FEATURES

MACHINE LEARNING

273

the m speakers and n classes in the training set. The intra-
class deviation of x_i is:

m n

σ^intra = -11 ∑∑σ., (14)

ⁱ m n ^j~^{k j},^k

J = 1 k = 1

The ratio of inter-class deviation to intra-class deviation is
high when a feature varies greatly across class boundaries,
but varies little within a class. A high weight (a high ratio)
suggests that the feature will be useful for classification.
This is a form of contextual weighting, because the weight
is calculated on the basis of the speaker’s identity, which is
a contextual feature.

Table 4 shows the results of using different combina-
tions of these three strategies with IBL. These results show
that there is a form of synergy here, since the sum of the
improvements of each strategy used separately is less than
the improvement of the three strategies used together
((58 - 56) + (55 - 56) + (58 - 56) = 3% for the sum
of the three strategies used separately versus
66 - 56 = 10% for the three strategies used together).

The three strategies were also tested with cascade-
correlation [5]. Because of the time required for training
CC, results were gathered for only two cases: With no pre-
processing, cascade-correlation correctly classified 216
observations (47%). With preprocessing by all three strate-
gies, cascade-correlation correctly classified 236 observa-
tions (51%). This shows that contextual information can
be of benefit for both neural networks and nearest
neighbor pattern recognition.

HEPATITIS PROGNOSIS

Similar to the previous section, this section examines
strategies 1, 2, and 5: contextual normalization, contextual
expansion, and contextual weighting. The problem is to
determine whether hepatitis patients will live or die from
their disease. There are seventeen primary features, of
which twelve are discrete (such as “patient is taking
steroids”, “patient reports fatigue”) and five are continu-
ous (such as “patient’s bilirubin level”). There are two
contextual features, of which one is discrete (patient’s sex)
and one is continuous (patient’s age). The patient’s sex
was not used in the following experiments, since 90% of
the patients were male. The observations fall in two
classes (live or die) [10]. There are many missing values in
the hepatitis data. These were filled in by using the single-
nearest neighbor algorithm with the training data.

For hepatitis prognosis, bilirubin level is a primary
feature for determining whether the patient will die from
the disease. The age of the patient is a contextual feature,
since we can achieve more accurate prognoses by using
the patient’s age. Age is not a primary feature, since
knowing the patient’s age, by itself, does not help us to
make a prognosis. In support of this claim, compare rows
one and three in Table 5. Adding age as a feature actually
reduces accuracy. Background knowledge does not help us
to determine whether age is primary or contextual, since it
is plausible that the patient’s age could be a primary factor
in hepatitis prognosis. In this case, we must use the data to
estimate the probability distribution. The data suggest that
age is a contextual feature.

The data were divided into a training set and a testing
set. Unlike the previous two experiments, there was no
systematic distinction between the training and testing
sets. The data consist of 155 observations, which were
randomly split to make 10 pairs of training and testing
sets. In each pair, there were 100 training observations and
55 testing observations. Thus the total number of observa-
tions for testing purposes was 550.

Three of the five strategies discussed above were
applied to the data:

Contextual normalization: Each feature was normalized
by equation (11), where the context vector c is simply the
patient’s age. Age was converted into a discrete feature by
dividing age into five intervals, with an equal number of

Table 4: The three strategies applied to the vowel data. Table 5: The three strategies applied to the hepatitis data.

strategy 1: contextual normaliza- tion	strategy 2: contextual expansion	strategy 5: contextual weighting	no. correct (of 462)	percent correct	strategy 1: contextual normaliza- tion	strategy 2: contextual expansion	strategy 5: contextual weighting	no. correct (of 550)	percent correct
No	No	No	258	56	No	No	No	393	71
No	No	Yes	269	58	No	No	Yes	393	71
No	Yes	No	253	55	No	Yes	No	390	71
No	Yes	Yes	272	59	No	Yes	Yes	391	71
Yes	No	No	267	58	Yes	No	No	454	83
Yes	No	Yes	295	64	Yes	No	Yes	460	84
Yes	Yes	No	273	59	Yes	Yes	No	457	83
Yes	Yes	Yes	305	66	Yes	Yes	Yes	464	84

More intriguing information

1. Firm Creation, Firm Evolution and Clusters in Chile’s Dynamic Wine Sector: Evidence from the Colchagua and Casablanca Regions
2. The name is absent
3. The name is absent
4. Demographic Features, Beliefs And Socio-Psychological Impact Of Acne Vulgaris Among Its Sufferers In Two Towns In Nigeria
5. The name is absent
6. The name is absent
7. The Context of Sense and Sensibility
8. Evolutionary Clustering in Indonesian Ethnic Textile Motifs
9. Commuting in multinodal urban systems: An empirical comparison of three alternative models
10. The name is absent