MACHINE LEARNING
275
It seems natural that contextual normalization and
contextual weighting combine synergistically. Raw data
consist of features that have essentially random scales.
The scale of a feature usually has no relation to the impor-
tance of the feature for classification. Contextual normal-
ization adjusts the features so that their scales are more
equal. It seems plausible that, in many cases, assigning
equal scales to the features is better for classification than
assigning random scales to the features. Contextual
weighting emphasizes the features that are most relevant
for classification. Again, it seems plausible that, in many
cases, contextual weighting will work better when the
features have first been adjusted, so that they have equal
scales. Thus the synergy found in the experiments reported
here is to be expected.
RELATED WORK
The work described here is most closely related to [6].
However, [6] did not give a precise definition of the dis-
tinction between contextual features (their terminology:
parameters or global features) and primary features (their
terminology: features). They examined only contextual
classifier selection, using neural networks to classify
images, with context such as lighting. They found that
contextual classifier selection resulted in increased
accuracy and efficiency. They did not address the difficul-
ties that arise when the context in the testing set is
different from the context in the training set.
This work is also related to work in speech recogni-
tion on speaker normalization [8]. However, the work on
speaker normalization tends to be specific to speech recog-
nition. Here, the concern is with general-purpose strate-
gies for exploiting context.
FUTURE WORK
Future work will extend the list of strategies, the list
of domains that have been examined, and the list of classi-
fication algorithms that have been tested. It may also be
possible and interesting to develop a general theory of
strategies for exploiting context.
Due to its simplicity, IBL can easily be enhanced with
strategies for exploiting context. Other classification algo-
rithms can also be enhanced, but it may require more
effort. It should be possible to modify algorithms such as
MLR and CC so that they can benefit from a form of con-
textual weighting. For example, instead of preprocessing
the data by multiplying the features by weights, a classifi-
cation algorithm can be designed to take the original data
and the set of weights as two separate sets of inputs. The
algorithm can then use the weights to adjust its internal
processing of the original data. MLR could use the contex-
tual weights to decide which features it should include in
its linear equations.
Another possibility is to design classification algo-
rithms that can automatically distinguish primary features
from contextual features. The definitions given in
equations (1) and (2) should allow automatic distinction.
CONCLUSIONS
The general problem examined here is to accurately
classify observations that have context-sensitive features.
Examples are: the diagnosis of spinal problems, given that
spinal tests are sensitive to the age of the patient; the
diagnosis of gas turbine engine faults, given that engine
performance is sensitive to ambient weather conditions;
the recognition of speech, given that different speakers
have different voices; the prognosis of hepatitis, given the
patient’s age; the classification of images, given varying
lighting conditions. There is clearly a need for general
strategies for exploiting contextual information. The
results presented here demonstrate that contextual infor-
mation can be used to increase the accuracy of classifiers,
particularly when the context in the testing set is different
from the context in the training set.
ACKNOWLEDGMENTS
The gas turbine engine data and engine expertise were
provided by the Engine Laboratory of the NRC, with
funding from DND. The vowel data and the hepatitis data
were obtained from the University of California data
repository (ftp ics.uci.edu, directory /pub/machine-
learning-databases) [10]. The cascade-correlation [5]
software was obtained from Carnegie-Mellon University
(ftp pt.cs.cmu.edu, directory /afs/cs/project/connect/code).
The author wishes to thank Rob Wylie and Peter Clark of
the NRC and two anonymous referees of IEA/AIE-93 for
their helpful comments on this paper.
This paper is an expanded version of a paper that first
appeared in the Proceedings of the European Conference
on Machine Learning, 1993. The author wishes to thank
the conference chairs of both IEA/AIE-93 and ECML-93
for permitting this paper to appear here.
REFERENCES
1. Aha, D.W., Kibler, D., and Albert, M.K., “Instance-
based learning algorithms”, Machine Learning, 6, pp.
37-66, 1991.
2. Kibler, D., Aha, D.W., and Albert, M.K., “Instance-
based prediction of real-valued attributes”, Computa-
tional Intelligence, 5, pp. 51-57, 1989.