Table 3. Genre classification across six classes on Dataset I, 10-fold cross validation.
Genre (no. items) |
Precision |
Recall |
Precision |
Recall |
Precision |
Recall |
Academic Monograph |
0.462 |
0.375 |
0.643 |
0.563 |
0.241 |
0.217 |
Book of Fiction (16) |
0.4 |
0.125 |
0.813 |
0.813 |
0.763 |
0.971 |
Business Report (15) |
0.273 |
0.2 |
0.667 |
0.4 |
0.453 |
0.173 |
Minutes (19) |
0.667 |
0.526 |
0.56 |
0.737 |
0.767 |
0.272 |
Periodicals |
0.773 |
0.895 |
0.565 |
0.684 |
0.232 |
0.570 |
Thesis (19) |
0.432 |
0.889 |
0.688 |
0.611 |
0.541 |
0.377 |
Table 4. Genre classification across six genres on Dataset II, 10-fold cross validation.
Genre |
Precision |
Recall |
Precision |
Recall |
Precision |
Recall |
Academic Monograph |
0.25 |
0.101 |
0.718 |
0.747 |
0.74 |
0.411 |
Book of Fiction (29) |
0.111 |
0.069 |
0.923 |
0.828 |
0.931 |
0.807 |
Business Report (100) |
0.385 |
0.05 |
0.825 |
0.85 |
0.797 |
0.609 |
Minutes (99) |
0.604 |
0.818 |
0.913 |
0.949 |
0.91 |
0.874 |
Periodicals |
0.425 |
0.716 |
0.774 |
0.716 |
0.457 |
0.794 |
Thesis (100) |
0.517 |
0.91 |
0.866 |
0.84 |
0.696 |
0.893 |
The results in Table 3 indicate that, on Dataset I,
both precision and recall of image NB with respect to
Periodicals are much higher than the other two classifiers.
On the other hand, the results indicate that academic
monographs and business reports are best recognised by
style RF. Books of fiction seem to be best distinguished
by style RF and Rainbow NB, but we also observe that
the two classifiers seem to work in complementary
positions (that is, where one has better recall the other has
better precision). With the genre class Thesis, the
complementary situation seems to be formed between
image NB and style RF.
The performance on the genre class Minutes
introduces some controversy: on the basis of precision,
Rainbow NB shows a higher rate than the other two
classifiers, but, on the basis of recall, style RF out
performs Rainbow NB. The comparison is further
complicated by the observation that the average of
precision and recall (given equal weight) suggests image
NB as the best performer.
On the basis of average performance taken over
precision and recall, the results in Table 4 presents style
RF as the best overall performer. The precision of style
RF is better than that of both of the other classifiers in
detecting documents except academic monographs and
books of fiction, and recall is better with respect to all
classes except Periodicals and Thesis. The classifier
image NB shows the best recall rate for detecting theses
and displays a comparable recall rate for detecting
periodicals.