are compared in detail across the six genres (Section 6.2
and 6.3).
5.2. Evaluation
The results, apart from those reported in Section
6.3, have been evaluated with three conventional metrics
for classification: accuracy, precision and recall. To make
precise what we mean by these terms, let N be the total
number of documents in the test data, Nc the number of
documents in the class C, TP(C) the number of
documents correctly predicted to be a member of class C,
and FP(C) the number of documents incorrectly predicted
as belonging to class C. Accuracy A is defined to be
A= ∑ TP1C1 ∙
precision P(C) of class C is defined to be
P ∖ C = TP1C
' ' TP∖C) +FP∖C) ,
and, recall, R( C), of class C is defined to be
R TP .
c
Although some debate surrounds the suitability of
accuracy, precision and recall as a measurement of
information retrieval tasks, for classification tasks, they
are still deemed to be a reasonable indicator of classifier
performance.
6. Results
6.1. Overall accuracy
The overall accuracies of classifiers built on each
feature type across statistical methods is reported in Table
2 (best performances are indicated in bold-face).
The tests on the two datasets, consistently
indicate Naive Bayes as the best statistical method for
image features. Although the overall accuracies of Naive
Bayes and Random Forest are comparable on the larger
dataset, averaging (with a heavier weight on the larger set)
the performances on the two datasets, suggested Naive
Bayes as a better performer for image. On both datasets,
Support Vector Machine and Random Forest are both
better than Naive Bayes for style features. Although
Support Vector Machine and Random Forest performs
comparably on the smaller Dataset I, we have chosen
Random Forest as the better choice for style, because the
difference was shown to be prominent on Dataset II. We
have chosen Naive Bayes for Rainbow for comparison on
Dataset I, and Support Vector Machine for Rainbow on
Dataset II: in both cases the difference in performance
was too large to indicate an overall better method for
Rainbow.
In passing, we observe that, based on the overall
accuracies of the classifiers on the two datasets, the
classifiers based on image features are the least affected
by training dataset size (average difference in accuracy
0.036) and the classifiers based on Rainbow are the most
affected by dataset size (average difference in accuracy
0.328). Also the results indicate that Support Vector
machine and Random Forest seem more affected by
dataset size than Naive Bayes.
6.2. Precision and recall
In this section we compare the precision and
recall across genres of the classifiers for each feature type
which have been shown to have the best overall accuracies
in the previous section (on Dataset I, image NB, style RF
and Rainbow NB; on Dataset II, image NB, style RF,
Rainbow SVM). The figures in Tables 3 and 4 show the
precision and recall across the six genres of each classifier
tested on Dataset I and II. The genres are indicated in the
left most column of the tables, with the numbers of
documents in each class noted in parenthesis. The
classifiers being tested are indicated in parenthesis at the
top of each of the following columns.
Table 2. Overall accuracy of feature types across statistical methods
Feature type |
Data & method |
Dataset I (103 items) |
Dataset II (494 items) | ||||
NB |
SVM |
RF |
NB |
SVM |
RF | ||
image |
0.524 |
0.35 |
0.417 |
0.48 |
0.395 |
0.48 | |
style |
0.505 |
0.573 |
0.641 |
0.63 |
0.724 |
0.828 | |
Rainbow |
0.428 |
0.25 |
N/A |
0.618 |
0.715 |
N/A |