ROBUST CLASSIFICATION WITH CONTEXT-SENSITIVE FEATURES



MACHINE LEARNING

274


patients in each interval. The values of μ i(c) and σ i(c)
were estimated by taking the average and standard
deviation of
xi for each interval c. This is different from
the method used for contextual normalization with the
continuous contextual features in gas turbine engine
diagnosis [7]. Note that equation (11) does not require
continuous features; it works well with the boolean
features in the hepatitis data, when true and false are repre-
sented by one and zero.

Contextual expansion: The age of the patient was treated
as another feature. This strategy is not useful for the
patient’s sex, since so few patients are female.

Contextual weighting: The features were multiplied by
weights, where the weight for a feature was the ratio of
inter-class deviation to intra-class deviation, as in equation
(12). The inter-class deviation and the intra-class deviation
were calculated using the five age intervals.

Table 5 shows the results of using different combina-
tions of the three strategies (contextual normalization,
contextual expansion, and contextual weighting) with
IBL. As in the previous section, there is a form of synergy
here, since the sum of the improvements of each strategy
used separately is less than the improvement of the three
strategies together (
(71 - 71) + (71 - 71) +
(83 - 71) = 12% for the sum of the three strategies
versus 84
- 71 = 13% for the three strategies used
together). In this case, however, the synergy is not as
marked as it is in the previous section. This may be due to
the fact that there is no systematic difference between the
training and testing sets in the hepatitis data, while the
testing set for the vowel data uses different speakers from
the training set.

For comparison, other researchers have reported accu-
racies of 80% [11] and 83% [12] on the hepatitis data. It is
interesting that a single-nearest neighbor algorithm can
match or surpass these results, when strategies are
employed to use the contextual information contained in
the data.

DISCUSSION OF RESULTS

The results reported above indicate that contextual
normalization and contextual weighting can significantly
improve the accuracy of classification. Contextual
expansion is less effective than contextual normalization
and contextual weighting, although it appears useful,
when used in conjunction with the other techniques.

Equation (11) (a form of contextual normalization)
has three characteristics:

1. The normalized features all have the same scale, so
we can directly compare features that were originally
measured with different scales.

2. Equation (11) tends to weight features according to
their relevance for classification. Features that are far
from average, in a given context, are normalized to
values that are far from zero. That is, a surprising fea-
ture will get a high absolute value. A feature that is
irrelevant will tend to have a high variation, so it will
tend to be normalized to a value near zero. A feature
that is near average will also be normalized to a value
near zero. Note that this is true for boolean features,
as well as continuous features.

3. Equation (11) compensates for variations in a feature
that are due to variations in the context. Thus it
reduces the impact of the context, allowing the classi-
fication system to generalize across different contexts
more easily.

Equation (11) is only one possible form of contextual nor-
malization. For example, another form of contextual nor-
malization could use a context-sensitive estimate of the
minimum and maximum values to normalize a feature.

Contextual weighting is a new technique for using
contextual information. The idea of contextual weighting
is to assign more weight to the features that seem more
useful for classification, in a given context. Equation (12)
is only one possible form of contextual weighting. For
example, another form of contextual weighting might vary
the weight as a function of the context. With equation (12),
the weight is calculated using contextual information, but
the weight does not change as a function of the context.

Note that equation (11) is a linear transformation of
the data when the context
c is constant, but it is a
nonlinear transformation when the context is variable.
Equation (12) is a linear transformation of the data, both
when the context
c is constant and when it is variable,
since the weight
wj is fixed; it does not vary with the
context
c.

Of the three classification algorithms, IBL gained the
most from contextual normalization and contextual
weighting. The form of IBL that was used here (single-
nearest neighbor with sum of absolute values as a distance
measure) is particularly sensitive to the scales of the
features. If one feature ranges from 0 to 100 and the
remaining features range from 0 to 1, then the first feature
will have much more influence on the distance measure
than the remaining features. Therefore IBL can benefit sig-
nificantly from contextual normalization, which attempts
to equalize scales. MLR and CC are designed to be unaf-
fected by linear transformations of the features. Therefore
they do not favor features with larger ranges. However,
this strength is also a weakness, because MLR and CC
cannot benefit from preprocessing of the data that
increases the scale of more significant variables. For
example, contextual weighting (using equation (12)) has
no effect on MLR and it has only minor effects on CC.



More intriguing information

1. The Impact of Cognitive versus Affective Aspects on Consumer Usage of Financial Service Delivery Channels
2. IMMIGRATION POLICY AND THE AGRICULTURAL LABOR MARKET: THE EFFECT ON JOB DURATION
3. Voluntary Teaming and Effort
4. Dual Track Reforms: With and Without Losers
5. 09-01 "Resources, Rules and International Political Economy: The Politics of Development in the WTO"
6. Sex differences in the structure and stability of children’s playground social networks and their overlap with friendship relations
7. Gender and headship in the twenty-first century
8. Accurate, fast and stable denoising source separation algorithms
9. Computing optimal sampling designs for two-stage studies
10. The name is absent
11. The name is absent
12. Bird’s Eye View to Indonesian Mass Conflict Revisiting the Fact of Self-Organized Criticality
13. Protocol for Past BP: a randomised controlled trial of different blood pressure targets for people with a history of stroke of transient ischaemic attack (TIA) in primary care
14. The name is absent
15. Secondary stress in Brazilian Portuguese: the interplay between production and perception studies
16. The name is absent
17. Wirtschaftslage und Reformprozesse in Estland, Lettland, und Litauen: Bericht 2001
18. Towards Learning Affective Body Gesture
19. Unilateral Actions the Case of International Environmental Problems
20. The name is absent