Applying A Normalized Compression Metric To The Measurement Of Dialect Distance
Data(s) |
16/09/2009
16/09/2009
2007
|
---|---|
Resumo |
The paper discusses the application of a similarity metric based on compression to the measurement of the distance among Bulgarian dia- lects. The similarity metric is de ned on the basis of the notion of Kolmo- gorov complexity of a le (or binary string). The application of Kolmogorov complexity in practice is not possible because its calculation over a le is an undecidable problem. Thus, the actual similarity metric is based on a real life compressor which only approximates the Kolmogorov complexity. To use the metric for distance measurement of Bulgarian dialects we rst represent the dialectological data in such a way that the metric is applicable. We propose two such representations which are compared to a baseline distance between dialects. Then we conclude the paper with an outline of our future work. |
Identificador |
Serdica Journal of Computing, Vol. 1, No 1, (2007), 73p-86p 1312-6555 |
Idioma(s) |
en_US |
Publicador |
Institute of Mathematics and Informatics Bulgarian Academy of Sciences |
Palavras-Chave | #Kolmogorov Complexity #Compression Metric #Dialect Distance #Language Contacts |
Tipo |
Article |