Classifying Written Texts Through Rhythmic Features


Autoria(s): Balint, Mihaela; Dascalu, Mihai; Trausan-Matu, Stefan
Data(s)

27/09/2016

27/09/2016

2016

Resumo

Rhythm analysis of written texts focuses on literary analysis and it mainly considers poetry. In this paper we investigate the relevance of rhythmic features for categorizing texts in prosaic form pertaining to different genres. Our contribution is threefold. First, we define a set of rhythmic features for written texts. Second, we extract these features from three corpora, of speeches, essays, and newspaper articles. Third, we perform feature selection by means of statistical analyses, and determine a subset of features which efficiently discriminates between the three genres. We find that using as little as eight rhythmic features, documents can be adequately assigned to a given genre with an accuracy of around 80 %, significantly higher than the 33 % baseline which results from random assignment.

This study is part of the RAGE project. The RAGE project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 644187. This publication reflects only the author's view. The European Commission is not responsible for any use that may be made of the information it contains.

Identificador

Balint, M., Dascalu, M., & Trausan-Matu, S. (2016). Classifying Written Texts through Rhythmic Features. In 15th Int. Conf. on Artificial Intelligence: Methodology, Systems, and Applications (AIMSA 2016) (pp. 121–129). Varna, Bulgaria: Springer

http://hdl.handle.net/1820/7055

Publicador

Springer

Relação

info:eu-repo/grantAgreement/EC/H2020/644187/EU/Realising an Applied Gaming Eco-system/RAGE

Direitos

openAccess

Palavras-Chave #rhythm #text classification #natural language processing #discourse analysis
Tipo

conferenceObject