 Title: Applying A Normalized Compression Metric To The Measurement Of Dialect Distance Authors: Simov, KirilOsenova, Petya Keywords: Kolmogorov ComplexityCompression MetricDialect DistanceLanguage Contacts Issue Date: 2007 Publisher: Institute of Mathematics and Informatics Bulgarian Academy of Sciences Citation: Serdica Journal of Computing, Vol. 1, No 1, (2007), 73p-86p Abstract: The paper discusses the application of a similarity metric based on compression to the measurement of the distance among Bulgarian dia- lects. The similarity metric is de ned on the basis of the notion of Kolmo- gorov complexity of a le (or binary string). The application of Kolmogorov complexity in practice is not possible because its calculation over a le is an undecidable problem. Thus, the actual similarity metric is based on a real life compressor which only approximates the Kolmogorov complexity. To use the metric for distance measurement of Bulgarian dialects we rst represent the dialectological data in such a way that the metric is applicable. We propose two such representations which are compared to a baseline distance between dialects. Then we conclude the paper with an outline of our future work.

