Examining Variations of Prominent Features in Genre Classification

Table 3. Genre classification across six classes on Dataset I, 10-fold cross validation.

Genre (no. items)	Precision (image NB)	Recall (image NB)	Precision (style RF)	Recall (style RF)	Precision (Rainbow NB)	Recall (Rainbow NB)
Academic Monograph (16)	0.462	0.375	0.643	0.563	0.241	0.217
Book of Fiction (16)	0.4	0.125	0.813	0.813	0.763	0.971
Business Report (15)	0.273	0.2	0.667	0.4	0.453	0.173
Minutes (19)	0.667	0.526	0.56	0.737	0.767	0.272
Periodicals (Newspaper, Magazine) (19)	0.773	0.895	0.565	0.684	0.232	0.570
Thesis (19)	0.432	0.889	0.688	0.611	0.541	0.377

Table 4. Genre classification across six genres on Dataset II, 10-fold cross validation.

Genre (no. items)	Precision (image NB)	Recall (image NB)	Precision (style RF)	Recall (style RF)	Precision (Rainbow SVM)	Recall (Rainbow SVM)
Academic Monograph (99)	0.25	0.101	0.718	0.747	0.74	0.411
Book of Fiction (29)	0.111	0.069	0.923	0.828	0.931	0.807
Business Report (100)	0.385	0.05	0.825	0.85	0.797	0.609
Minutes (99)	0.604	0.818	0.913	0.949	0.91	0.874
Periodicals (Newspaper, Magazine) (67)	0.425	0.716	0.774	0.716	0.457	0.794
Thesis (100)	0.517	0.91	0.866	0.84	0.696	0.893

The results in Table 3 indicate that, on Dataset I,
both precision and recall of image NB with respect to
Periodicals are much higher than the other two classifiers.
On the other hand, the results indicate that academic
monographs and business reports are best recognised by
style RF. Books of fiction seem to be best distinguished
by style RF and Rainbow NB, but we also observe that
the two classifiers seem to work in complementary
positions (that is, where one has better recall the other has
better precision). With the genre class Thesis, the
complementary situation seems to be formed between
image NB and style RF.

The performance on the genre class Minutes
introduces some controversy: on the basis of precision,
Rainbow NB shows a higher rate than the other two
classifiers, but, on the basis of recall, style RF out
performs Rainbow NB. The comparison is further
complicated by the observation that the average of
precision and recall (given equal weight) suggests image
NB as the best performer.

On the basis of average performance taken over
precision and recall, the results in Table 4 presents style
RF as the best overall performer. The precision of style
RF is better than that of both of the other classifiers in
detecting documents except academic monographs and
books of fiction, and recall is better with respect to all
classes except Periodicals and Thesis. The classifier
image NB shows the best recall rate for detecting theses
and displays a comparable recall rate for detecting
periodicals.

More intriguing information

1. PACKAGING: A KEY ELEMENT IN ADDED VALUE
2. Equity Markets and Economic Development: What Do We Know
3. The name is absent
4. The name is absent
5. Top-Down Mass Analysis of Protein Tyrosine Nitration: Comparison of Electron Capture Dissociation with “Slow-Heating” Tandem Mass Spectrometry Methods
6. Conflict and Uncertainty: A Dynamic Approach
7. Pursuit of Competitive Advantages for Entrepreneurship: Development of Enterprise as a Learning Organization. International and Russian Experience
8. EU enlargement and environmental policy
9. The name is absent
10. A Regional Core, Adjacent, Periphery Model for National Economic Geography Analysis