Applying A Normalized Compression Metric To The Measurement Of Dialect Distance


Autoria(s): Simov, Kiril; Osenova, Petya
Data(s)

16/09/2009

16/09/2009

2007

Resumo

The paper discusses the application of a similarity metric based on compression to the measurement of the distance among Bulgarian dia- lects. The similarity metric is de ned on the basis of the notion of Kolmo- gorov complexity of a le (or binary string). The application of Kolmogorov complexity in practice is not possible because its calculation over a le is an undecidable problem. Thus, the actual similarity metric is based on a real life compressor which only approximates the Kolmogorov complexity. To use the metric for distance measurement of Bulgarian dialects we rst represent the dialectological data in such a way that the metric is applicable. We propose two such representations which are compared to a baseline distance between dialects. Then we conclude the paper with an outline of our future work.

Identificador

Serdica Journal of Computing, Vol. 1, No 1, (2007), 73p-86p

1312-6555

http://hdl.handle.net/10525/334

Idioma(s)

en_US

Publicador

Institute of Mathematics and Informatics Bulgarian Academy of Sciences

Palavras-Chave #Kolmogorov Complexity #Compression Metric #Dialect Distance #Language Contacts
Tipo

Article