Table 5. Confusion matrix: Image NB on Dataset II.
classified as ---> |
AM |
BF |
BR |
M |
P |
T |
AM |
10 |
4 |
4 |
20 |
25 |
36 |
BF |
1 |
2 |
0 |
4 |
7 |
15 |
BR |
17 |
9 |
5 |
16 |
32 |
21 |
M |
6 |
0 |
1 |
81 |
0 |
11 |
P |
5 |
1 |
3 |
8 |
48 |
2 |
T |
1 |
2 |
0 |
5 |
1 |
91 |
Table 6. Confusion matrix: Style RF on Dataset II.
classified as ---> |
AM |
BF |
BR |
M |
P |
T |
AM |
74 |
1 |
8 |
3 |
4 |
9 |
BF |
0 |
24 |
0 |
0 |
2 |
0 |
BR |
7 |
0 |
85 |
3 |
4 |
1 |
M |
4 |
0 |
1 |
94 |
0 |
0 |
P |
6 |
0 |
7 |
3 |
48 |
3 |
T |
9 |
1 |
2 |
0 |
4 |
84 |
Table 7. Confusion matrix: Rainbow SVM on Dataset II.
classified as ---> |
AM |
BF |
BR |
M |
P |
T |
AM |
41 |
1 |
8 |
3 |
18 |
28 |
BF |
0 |
23 |
1 |
0 |
4 |
1 |
BR |
6 |
0 |
61 |
3 |
28 |
2 |
M |
3 |
1 |
2 |
87 |
4 |
3 |
P |
3 |
0 |
5 |
3 |
53 |
3 |
T |
2 |
0 |
1 |
0 |
8 |
89 |
Although the results of the experiments suggest style RF
as the overall best performer on the two datasets, they do
not identify genre classes for each classifier on which the
classifier consistently outshines the other two classifiers.
However, upon closer examination, the results do show
that the binary partition of the genre classes, into classes
with the three best performance and three worst
performance, is preserved across the experiments on the
two datasets: these partitions are (Minutes, Periodicals,
Thesis) and (Academic Monograph, Book of Fiction,
Business Report) for image NB, and (Book of Fiction,
Minutes, Thesis) and (Academic Monograph, Business
Report, Periodicals) for style RF and Rainbow SVM.
The general low level performance of the image
features is partly due to the crude image representation. In
the current model, the image features only capture the
first page of the document, and each pixel value is
strongly anchored to its position. This representation
could be improved to combine representations of several
pages of the document and to soften the positional
information to embody the general shape or topology of
the image. Likewise, for style, the size of the dataset and
the variety of the documents in the datasets used for
training and compiling word lists should be further
examined for refinement.
More intriguing information
1. Technological progress, organizational change and the size of the Human Resources Department2. The Macroeconomic Determinants of Volatility in Precious Metals Markets
3. EMU's Decentralized System of Fiscal Policy
4. The name is absent
5. Governance Control Mechanisms in Portuguese Agricultural Credit Cooperatives
6. The name is absent
7. The name is absent
8. The name is absent
9. ISO 9000 -- A MARKETING TOOL FOR U.S. AGRIBUSINESS
10. The bank lending channel of monetary policy: identification and estimation using Portuguese micro bank data