Text authorship identification Classification Compression Linear Regression
Institute of Mathematics and Informatics Bulgarian Academy of Sciences
Pliska Studia Mathematica Bulgarica, Vol. 22, No 1, (2013), 25p-32p
An algorithm for text authorship identification is proposed. The procedure is based on the Kolmogorov complexity and uses regression models on the length of the compressed texts. The classification employs the regression parameters estimates. Different combinations of compressor parameters and the preliminary processing on the data are examined using prose texts of a few English classics.