Classification of Texts' Authorship Using a Regression Model on Compressed Data


Autoria(s): Dackova, Diana; Mateev, Plamen
Data(s)

20/07/2016

20/07/2016

2013

Resumo

2010 Mathematics Subject Classification: 68T50,62H30,62J05.

An algorithm for text authorship identification is proposed. The procedure is based on the Kolmogorov complexity and uses regression models on the length of the compressed texts. The classification employs the regression parameters estimates. Different combinations of compressor parameters and the preliminary processing on the data are examined using prose texts of a few English classics.

Identificador

Pliska Studia Mathematica Bulgarica, Vol. 22, No 1, (2013), 25p-32p

0204-9805

http://hdl.handle.net/10525/2508

Idioma(s)

en

Publicador

Institute of Mathematics and Informatics Bulgarian Academy of Sciences

Palavras-Chave #Text authorship identification #Classification #Compression #Linear Regression
Tipo

Article