Examining Variations of Prominent Features in Genre Classification



7. Conclusions

The results in this paper provide evidence that
genre classification is a multi-dimensional task possibly
composed of several classification tasks involving a
varying distribution of feature type strengths as
distinguishing factors.

The research proposes expressing document
genre classes in context as an array of varying strengths
across several feature types. This will not only help us to
determine a means of supplementing deficiencies in
current classification methods by suggesting causes of
failure in detecting selected genres, but will also enable us
to relate documents classes from different classification
schema via similar or dissimilar distribution patterns.

8. Acknowledgments

Omitted in the draft. To be inserted later.

10. References

[1] Bagdanov, A. and Worring, M. (2001) Fine-grained
document genre classification using first order random graphs.
In
Proceedings of the Sixth International Conference on
Document Analysis and Recognition (ICDAR2001)
, 79-90.

[2] Barbu, E., Heroux, P., Adam, S., and Turpin, E. (2005)
Clustering document images using a bag of symbols
representation. In
Proceedings International Conference on
Document Analysis and Recognition
, 1216-1220.

[3] Bekkerman, R., McCallum, A., and Huang, G. (2004)
Automatic categorization of email into folders: benchmark
experiments on enron and sri corpora.
Technical Report IR-418,
Centre for Intelligent Information Retrieval, UMASS.

http://www.cs.umass.edu/~mccallum/papers/foldering-tr05.pdf
[4] Biber, D. (1995)
Dimensions of Register Variation:a Cross-
Linguistic Comparison
. Cambridge University Press, New York,.
[5] Boese, E. S. (2005)
Stereotyping the web: genre
classification of web documents.
Master’s thesis, Colorado State
University.

[6] Breiman, L. (2001) Random forests. Machine Learning,
45:5-32.

[26] Burges, C. J. C. (1998) A Tutorial on support vector
machines for pattern recognition. Data Mining and Knowledge
Discovery, Vol 2, 121-167.

[7] Chao, C., Liaw, A., and Breiman, L. (2004) Using random
forest to learn imbalanced data.

http://www.stat.berkeley.edu/breiman/RandomForests/

[8] Chen, L., and Tang, H. L. (2004) Improved computation of
beliefs based on confusion matrix for combining multiple
classifiers. Electronic Letters, Vol 4, No 4, 238- 239.

[9] Finn, A., and Kushmerick, N. (2006) Learning to classify
documents according to genre.
Journal of American Society for
Information Science and Technology
, 57(11), 1506-1518.

[10] Karlgren, J., and Cutting, D. (1994) Recognizing text genres
with simple metric using discriminant analysis. In
Proceedings
15th Conf. Comp. Ling.
, Vol 2, 1071-1075.

[11] Kessler, G., Nunberg, B., and Schuetze, H. (1997)
Automatic detection of text genre. In
Proceedings 35th Ann.
Meeting ACL
, 32-38.

[12] Kim, Y., and Ross, S. (2006) Genre classification in
automated ingest and appraisal metadata. In J. Gonzalo, editor,
Proceedings European Conference on advanced technology and
research in Digital Libraries (ECDL)
, Lecture Notes in
Computer Science, Springer Verlag, Vol 4172, 63-74.

[13] Kim, Y., and Ross, S. (2007) Detecting family resemblance:
Automated genre classification. CODATA Data Science Journal,
ISSN:1683-1470, Vol 6
, , S172-S183.

[14] Kim, Y. and Ross, S. (2007) Feature Type Analysis in
Automated Genre Classification.

http://eprints.erpanet.org/128.

[15] McCallum, A. (1996) Bow: A toolkit for statistical language
modeling, text retrieval, classification and clustering.

http://www.cs.cmu.edu/~mccallum/bow

[16] Minsky, M. (1961). "Steps toward Artificial Intelligence."
Proceedings of the IRE 49(1), 8-30.

[17] Rauber, A. and Muller-Kogler, A. (2001) Integrating
automatic genre analysis into digital libraries. In
Proceedings
ACM/IEEE Joint Conf. Digital Libraries,
Roanoke, VA, 1-10,
http://doi.acm.org/10.1145/379437.379439

[18] Santini, M. (2004) State-of-the-art on Automatic Genre
Identification, Technical Report ITRI-04-03, ITRI, University of
Brighton, UK.

[19] Santini, M. (2006) Towards a Zero-to-Multi-Genre
Classification Scheme, Journee ATALA "Typologies de textes
pour le traitement automatique", Paris.

http://www.nltg.brighton.ac.uk/home/Marina.Santini/marina_san
tini_ATALA2006.pdf

[20] Santini, M. (2007) Characterizing Genres of Web Pages:
Genre Hybridism and Individualization, 40th Annual Hawaii
International Conference on System Sciences (HICSS'07).

http://csdl2.computer.org/comp/proceedings/hicss/2007/2755/00/
27550071.pdf.

[21] Witten, H. I., and E. Frank. (2005) Data mining: Practical
machine learning tools and techniques. 2nd Edition, Morgan
Kaufmann, San Francisco.



More intriguing information

1. Testing Panel Data Regression Models with Spatial Error Correlation
2. Optimal Private and Public Harvesting under Spatial and Temporal Interdependence
3. WP 48 - Population ageing in the Netherlands: Demographic and financial arguments for a balanced approach
4. Aktive Klienten - Aktive Politik? (Wie) Läßt sich dauerhafte Unabhängigkeit von Sozialhilfe erreichen? Ein Literaturbericht
5. The Interest Rate-Exchange Rate Link in the Mexican Float
6. Performance - Complexity Comparison of Receivers for a LTE MIMO–OFDM System
7. Artificial neural networks as models of stimulus control*
8. The name is absent
9. Determinants of U.S. Textile and Apparel Import Trade
10. Declining Discount Rates: Evidence from the UK
11. Infrastructure Investment in Network Industries: The Role of Incentive Regulation and Regulatory Independence
12. The Mathematical Components of Engineering
13. Design and investigation of scalable multicast recursive protocols for wired and wireless ad hoc networks
14. Developing vocational practice in the jewelry sector through the incubation of a new ‘project-object’
15. The name is absent
16. References
17. Innovation Policy and the Economy, Volume 11
18. Eigentumsrechtliche Dezentralisierung und institutioneller Wettbewerb
19. IMPACTS OF EPA DAIRY WASTE REGULATIONS ON FARM PROFITABILITY
20. Anti Microbial Resistance Profile of E. coli isolates From Tropical Free Range Chickens