Kolmogorov Complexity Compression Metric Dialect Distance Language Contacts
Issue Date:
2007
Publisher:
Institute of Mathematics and Informatics Bulgarian Academy of Sciences
Citation:
Serdica Journal of Computing, Vol. 1, No 1, (2007), 73p-86p
Abstract:
The paper discusses the application of a similarity metric based
on compression to the measurement of the distance among Bulgarian dia-
lects. The similarity metric is de ned on the basis of the notion of Kolmo-
gorov complexity of a le (or binary string). The application of Kolmogorov
complexity in practice is not possible because its calculation over a le is an
undecidable problem. Thus, the actual similarity metric is based on a real life
compressor which only approximates the Kolmogorov complexity. To use the
metric for distance measurement of Bulgarian dialects we rst represent the
dialectological data in such a way that the metric is applicable. We propose
two such representations which are compared to a baseline distance between
dialects. Then we conclude the paper with an outline of our future work.