Text authorship identification Classification Compression Linear Regression
Issue Date:
2013
Publisher:
Institute of Mathematics and Informatics Bulgarian Academy of Sciences
Citation:
Pliska Studia Mathematica Bulgarica, Vol. 22, No 1, (2013), 25p-32p
Abstract:
An algorithm for text authorship identification is proposed. The procedure is based on the Kolmogorov complexity and uses regression models on the length of the compressed texts. The classification employs the regression parameters estimates. Different combinations of compressor parameters and the preliminary processing on the data are examined using prose texts of a few English classics.