Examining Variations of Prominent Features in Genre Classification



Table 3. Genre classification across six classes on Dataset I, 10-fold cross validation.

Genre (no. items)

Precision
(image NB)

Recall
(image NB)

Precision
(style RF)

Recall
(style RF)

Precision
(Rainbow NB)

Recall
(Rainbow NB)

Academic Monograph
(16)

0.462

0.375

0.643

0.563

0.241

0.217

Book of Fiction (16)

0.4

0.125

0.813

0.813

0.763

0.971

Business Report (15)

0.273

0.2

0.667

0.4

0.453

0.173

Minutes (19)

0.667

0.526

0.56

0.737

0.767

0.272

Periodicals
(Newspaper, Magazine)
(19)

0.773

0.895

0.565

0.684

0.232

0.570

Thesis (19)

0.432

0.889

0.688

0.611

0.541

0.377

Table 4. Genre classification across six genres on Dataset II, 10-fold cross validation.

Genre
(no. items)

Precision
(image NB)

Recall
(image NB)

Precision
(style RF)

Recall
(style RF)

Precision
(Rainbow SVM)

Recall
(Rainbow SVM)

Academic Monograph
(99)

0.25

0.101

0.718

0.747

0.74

0.411

Book of Fiction (29)

0.111

0.069

0.923

0.828

0.931

0.807

Business Report (100)

0.385

0.05

0.825

0.85

0.797

0.609

Minutes (99)

0.604

0.818

0.913

0.949

0.91

0.874

Periodicals
(Newspaper, Magazine)
(67)

0.425

0.716

0.774

0.716

0.457

0.794

Thesis (100)

0.517

0.91

0.866

0.84

0.696

0.893

The results in Table 3 indicate that, on Dataset I,
both precision and recall of
image NB with respect to
Periodicals are much higher than the other two classifiers.
On the other hand, the results indicate that academic
monographs and business reports are best recognised by
style RF. Books of fiction seem to be best distinguished
by
style RF and Rainbow NB, but we also observe that
the two classifiers seem to work in complementary
positions (that is, where one has better recall the other has
better precision). With the genre class Thesis, the
complementary situation seems to be formed between
image NB and style RF.

The performance on the genre class Minutes
introduces some controversy: on the basis of precision,
Rainbow NB shows a higher rate than the other two
classifiers, but, on the basis of recall,
style RF out
performs
Rainbow NB. The comparison is further
complicated by the observation that the average of
precision and recall (given equal weight) suggests
image
NB
as the best performer.

On the basis of average performance taken over
precision and recall, the results in Table 4 presents
style
RF
as the best overall performer. The precision of style
RF
is better than that of both of the other classifiers in
detecting documents except academic monographs and
books of fiction, and recall is better with respect to all
classes except Periodicals and Thesis. The classifier
image NB shows the best recall rate for detecting theses
and displays a comparable recall rate for detecting
periodicals.



More intriguing information

1. Innovation Policy and the Economy, Volume 11
2. Industrial districts, innovation and I-district effect: territory or industrial specialization?
3. Dynamiques des Entreprises Agroalimentaires (EAA) du Languedoc-Roussillon : évolutions 1998-2003. Programme de recherche PSDR 2001-2006 financé par l'Inra et la Région Languedoc-Roussillon
4. Change in firm population and spatial variations: The case of Turkey
5. Policy Formulation, Implementation and Feedback in EU Merger Control
6. Implementation of a 3GPP LTE Turbo Decoder Accelerator on GPU
7. The name is absent
8. Response speeds of direct and securitized real estate to shocks in the fundamentals
9. The name is absent
10. From Communication to Presence: Cognition, Emotions and Culture towards the Ultimate Communicative Experience. Festschrift in honor of Luigi Anolli