Case-based reasoning, for example, typically has a low
dynamic complexity during training, but a high dynamic
complexity during testing. On the other hand, neural
networks typically have a high dynamic complexity
during training, but a low dynamic complexity during
testing.
8. Cost of Cases
There is often a cost associated with acquiring cases (i.e.,
examples, feature vectors). Typically a machine learning
researcher is given a small set of cases, and acquiring
further cases is either very expensive or practically
impossible. This is why many papers are concerned with
the “learning curve” (performance as a function of the
sample size) of a machine learning algorithm.
8.1 Cost of Cases for a Batch Learner
Suppose that we plan to use a batch learning algorithm to
build a model that will be embedded in a certain software
system. The model will be built once, using a set of
training data. The software system will perform some
task, using the embedded model, a certain number of
times over the operational lifetime of the system.
For a given learning algorithm, if we can estimate (1) the
learning curve (the relation between training set size and
misclassification error rate), (2) the expected number of
classifications that the learned model will make when
embedded in the operational system, over the lifetime of
the system, (3) the cost of misclassification errors, and (4)
the cost of acquiring cases for training data, then we can
calculate the combined cost of training (building the
model) and operating (using the model) as a function of
training set size. We can then optimize the size of the
training set to minimize this combined cost (Provost et
al., 1999).
Alternatively, an adaptive learning system, given (1) the
expected number of classifications that the learned model
will make when embedded in the operational system, (2)
the cost of misclassification errors, and (3) the cost of
acquiring cases for training data, could adjust its learning
curve (fast but naɪve versus slow but sophisticated) and
training set size to optimize the combined cost of training
and operating.
8.2 Cost of Cases for an Incremental Learner
Suppose that we plan to use an incremental learning
algorithm to build a model that will be embedded in a
certain software system. Unlike the batch learning
scenario, the model will be continuously refined over the
operational lifetime of the system. However, it is likely
that the software system cannot be operationally deployed
without any training. We must decide how many training
cases we should give to the incremental learner before it
becomes sufficiently reliable to deploy the software
system. To make this decision rationally, we need to
assign a cost to acquiring cases for training data. The
situation is similar to the batch learning situation, except
that we suppose that the misclassification error rate will
continue to decrease after the software system is
deployed.
9. Human-Computer Interaction Cost
There is a human cost to using inductive learning
software. This cost includes finding the right features for
describing the cases, finding the right parameters for
optimizing the performance of the learning algorithm,
converting the data to the format required by the learning
algorithm, analyzing the output of the learning algorithm,
and incorporating domain knowledge into the learning
algorithm or the learned model.
9.1 HCI Cost of Data Engineering
By “data engineering”, we mean the steps required to
prepare the data so that they are suitable for a standard
inductive concept learning algorithm. This includes
finding the right features and converting the data to the
required format. Although there has been some discussion
of the issues involved in data engineering (Turney et al.,
1995), we are not aware of any attempt to measure the
HCI costs involved in data engineering.
9.2 HCI Cost of Parameter Setting
Most learning algorithms have a number of parameters
that effect their performance, often by adjusting their bias.
There is a cost involved in determining the best parameter
settings. Often cross-validation is used to set the
parameters (Breiman et al., 1984). Again, we are not
aware of any attempt to measure the HCI costs of
parameter setting.
9.3 HCI Cost of Analysis of Learned Models
There is a human cost associated with understanding
induced models, which is particularly important when the
aim of inductive concept learning is to gain insight into
the physical process that generated the data, rather than to
predict the class of future cases. This is often discussed in
the decision tree induction literature, where it is (crudely)
measured by the number of nodes in the induced decision
tree (Mingers, 1989).
9.4 HCI Cost of Incorporating Domain Knowledge
Several researchers have examined ways of embedding
domain knowledge in a learning algorithm (Opitz and
Shavlik, 1997). It has often been observed, in the context
of expert system construction, that acquiring domain
knowledge from a domain expert is a major bottleneck.
We suppose that it would also be a bottleneck in the