context of inductive concept learning, but we are not
aware of any attempt to measure the cost.
10. Cost of Instability
When an induced model is used to gain understanding of
the underlying process that generated the data, it is
important that the model should be stable (Turney, 1995c;
Domingos, 1998). By stability, we mean that, if two
batches of data are generated from the same physical
process, then the two corresponding induced models
should be similar. If the two models are dissimilar, the
learning algorithm is unstable. This is related to the
scientific principle that experiments should be repeatable.
Stability may be seen as a benefit and instability as a cost.
Stability may be increased by acquiring more data (using
a larger training set) or by increasing the bias of the
learning algorithm (Turney, 1995c). Acquiring more data
can be costly (Section 8). Increasing the bias of an
algorithm can increase the misclassification error rate
(Section 2), unless the bias is suitable for the given
learning task. Domingos (1998) presents a meta-learning
algorithm, CMM, that can be used to trade off accuracy
(Section 2), comprehensibility (Section 9.3), and stability.
11. Conclusion
In this paper, we have presented a taxonomy of types of
cost in inductive concept learning. It is our hope that this
taxonomy may serve to organize the literature on cost-
sensitive learning and to inspire research into under-
investigated types of cost.
We do not claim that this taxonomy is complete or
unique. No doubt we have omitted important types of
cost, and certainly other researches would prefer other
taxonomies.
As we worked on this paper, it gradually became clear to
us that the cost of misclassification errors occupies a
unique position in the taxonomy. All of the other costs
that we have discussed here can only be rationally
evaluated in the context of the misclassification error cost
(for the cost of intervention, the unwanted achievement
cost is analogous to the misclassification error cost).
In decision theory (Pearl, 1988) and in the uncertainty in
artificial intelligence literature (Pipitone et al., 1991), test
costs are generally considered in conjunction with
misclassification error costs. However, in the inductive
concept learning literature, it is striking that this has
largely been overlooked. For example, before Turney
(1995a), all of the papers on inductive concept learning
with test costs did not consider test costs in the context of
misclassification error costs (Nunez, 1988, 1991; Tan,
1991a, 1991b, 1993). Yet, if all test costs are greater than
the misclassification error cost, then it is never rational to
do any tests; and if the misclassification error cost is
much greater than the cost of any test, then it is rational to
do all of the tests, unless you are certain that they are
irrelevant.
Similarly, as far as we know, none of the papers on active
learning (Cohn et al., 1995, 1996; Krogh and Vedelsby,
1995, Hasenjager and Ritter, 1998) consider the
misclassification error cost, although we must know
something about the misclassification error cost in order
to rationally determine whether to pay the cost of the
teacher.
Acknowledgements
Thanks to Eibe Frank, Tom Fawcett, and Foster Provost
for helpful comments on an earlier version of this paper.
References
Breiman, L., Friedman, J., Olshen, R., and Stone, C.
(1984). Classification and regression trees. California:
Wadsworth.
Cohn, D.A., Ghahramani, Z., and Jordan, M.I. (1995).
Active learning with statistical models. In Tesauro, G.,
Touretzky, D., and Leen, T. (eds.), Advances in Neural
Information Processing Systems 7, pp. 705-712. MIT
Press, Cambridge, MA.
Cohn, D. A., Ghahramani, Z., & Jordan, M. I. (1996).
Active learning with statistical models. Journal of
Artificial Intelligence Research, 4 , 129-145.
Domingos, P. (1998). Knowledge discovery via multiple
models. Intelligent Data Analysis, 2: 187-202.
Fawcett, T. (1993). Feature Discovery for Problem
Solving Systems. Doctoral dissertation, Department
of Computer Science, University of Massachusetts,
Amherst, MA.
Fawcett, T., and Provost, F.J. (1996). Combining data
mining and machine learning for effective user
profiling. In Proceedings of the Second International
Conference on Knowledge Discovery and Data Mining,
KDD-96, pp. 8-13.
Fawcett, T., and Provost, F.J. (1997). Adaptive fraud
detection. Data Mining and Knowledge Discovery,
1 (3).
Fawcett, T., and Provost, F.J. (1999). Activity monitoring:
Noticing interesting changes in behavior. In
Proceedings of the Fifth International Conference on
Knowledge Discovery and Data Mining, KDD-99.
Hasenjager, M., and Ritter, H. (1998). Active learning
with local models. Neural Processing Letters, 7(2), 107-
117.
Hermans, J., Habbema, J.D.F., and Van der Burght, A.T.
(1974). Cases of doubt in allocation problems, k
populations. Bulletin of the International Statistics
Institute, 45, 523-529.