6.3. Error analysis
In Section 6.2 we observed the style features as
the best overall indicator for detecting genre, however, the
situation may be more complicated than such a conclusion
might portend. To understand fully the results in Section
6.2, a thorough error analysis is necessary. In Tables 5, 6,
and 7, we have displayed the errors as six-by-six
confusion matrices. The genre class names have been
denoted by their abbreviated names to save space. As a
reminder, AM stands for Academic Monograph, BF
stands for Book of Fiction, BR stands for Business
Report, M stands for Minutes, P stands for Periodicals,
and T stands for Thesis.
We have used two different measures of the
confusion level displayed by the classifier: one based on
belief ([8]) and another based on error impact. The belief
BC(C1:C2) of a classifier C that class C1 is class C2 is the
number of documents in class C1 labelled as being in C2
divided by the number of documents in class C1. The
error impact EC(C1:C2) of the class C1 in the documents
labelled by the classifier C as C2 measures the percentage
of errors arising from the predicted labels of documents
in class C1 within the errors arising from the classifier's
decision to label documents as belonging to C2. More
precisely, if C1 = C2, EC(C1:C2) is defined to be 0, and
if C1 ≠ C2, EC(C1:C2) is defined to be the number of
documents of class C1 which have been labelled as
belonging to C2 divided by the total number of
documents incorrectly labelled as belonging to class C2.
To compare values across classes, we have compensated
for different numbers of document in each class by
dividing BC(C1:C2) and EC(C1:C2) with the sum of
BC(C1:C2) over all C1, and EC(C1:C2) over all C2,
respectively. If the sum is zero then we simply define the
belief and error impact to be zero. The same notation for
belief and error impact has been retained to denote the
normalised quantity.
We have introduced error impact in contrast to
belief because belief is heavily influenced by the overall
performance of the classifier itself. That is, having a high
level of correct beliefs greatly reduces the incorrect
beliefs of the classifier. In contrast, the greater or smaller
number of academic monographs being labelled correctly
as Academic Monograph does not have as predominant an
influence over the relative distribution of different classes
amongst the documents which have been incorrectly
labelled Academic Monograph. We deemed error impact
to be a better metric for accentuating the differences in
confusion levels between classes within the performance
of a single classifier.
Between two different classes C1 and C2, the
confusion level on the basis of belief, CB(C1:C2), is
defined to be CB(C1:C2) = BC(C1:C2) + BC(C2:C1), and
the confusion level on the basis of error impact,
CE(C1:C2), is defined to be CE(C1:C2) = EC(C1:C2) +
EC(C2:C1).
Table 8. Feature types with lowest pairwise confusion
level on two confusion metrics.
Metric Genre pair |
CB |
CE | |
AM |
BF |
style, Rainbow |
Rainbow |
AM |
BR |
style |
style |
AM |
M |
style |
Rainbow |
AM |
P |
style |
style |
AM |
T |
style |
image |
BF |
BR |
style, Rainbow |
style |
BF |
M |
style |
style |
BF |
P |
style |
image |
BF |
T |
style, Rainbow |
Rainbow |
BR |
M |
style, Rainbow |
Rainbow |
BR |
P |
style |
style |
BR |
T |
style, Rainbow |
style |
M |
P |
style |
image |
M |
T |
style |
style |
P |
T |
image |
image |
The contents of Table 8 indicate the feature type
of the classifier exhibiting the lowest confusion level,
between the pair of genre classes indicated on the two left
most columns, based on the confusion metric noted on the
top most row. Two feature types have been noted where
the confusion levels were equal.
Both metrics agree that style displays the lowest
level of confusion in differentiating the pairs Book of
Fiction and Minutes, Academic Monograph and
Periodicals, Book of Fiction and Minutes, Business
Report and Periodicals and Minutes and Thesis, and
image displays the lowest level for Periodicals and
Thesis (see Table 8). However, we would ideally like to
minimise both error impact and out-of-class belief. For
each pair of classes in Table 8, if we combine the features
which have been calculated to have the lowest level of
confusion on the basis of belief and error impact, the
results seem to support our intuition. For example, style
and image would be estimated as the best features to