3 resultados para Dialects

em Bulgarian Digital Mathematics Library at IMI-BAS


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The paper presents our considerations related to the creation of a digital corpus of Bulgarian dialects. The dialectological archive of Bulgarian language consists of more than 250 audio tapes. All tapes were recorded between 1955 and 1965 in the course of regular dialectological expeditions throughout the country. The records typically contain interviews with inhabitants of small villages in Bulgaria. The topics covered are usually related to such issues as birth, everyday life, marriage, family relationship, death, etc. Only a few tapes contain folk songs from different regions of the country. Taking into account the progressive deterioration of the magnetic media and the realistic prospects of data loss, the Institute for Bulgarian Language at the Academy of Sciences launched in 1997 a project aiming at restoration and digital preservation of the dialectological archive. Within the framework of this project more than the half of the records was digitized, de-noised and stored on digital recording media. Since then restoration and digitization activities are done in the Institute on a regular basis. As a result a large collection of sound files has been gathered. Our further efforts are aimed at the creation of a digital corpus of Bulgarian dialects, which will be made available for phonological and linguistic research. Such corpora typically include besides the sound files two basic elements: a transcription, aligned with the sound file, and a set of standardized metadata that defines the corpus. In our work we will present considerations on how these tasks could be realized in the case of the corpus of Bulgarian dialects. Our suggestions will be based on a comparative analysis of existing methods and techniques to build such corpora, and by selecting the ones that fit closer to the particular needs. Our experience can be used in similar institutions storing folklore archives, history related spoken records etc.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The paper discusses the application of a similarity metric based on compression to the measurement of the distance among Bulgarian dia- lects. The similarity metric is de ned on the basis of the notion of Kolmo- gorov complexity of a le (or binary string). The application of Kolmogorov complexity in practice is not possible because its calculation over a le is an undecidable problem. Thus, the actual similarity metric is based on a real life compressor which only approximates the Kolmogorov complexity. To use the metric for distance measurement of Bulgarian dialects we rst represent the dialectological data in such a way that the metric is applicable. We propose two such representations which are compared to a baseline distance between dialects. Then we conclude the paper with an outline of our future work.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The paper presents a computational analysis of Bulgarian dialect variation, concentrating on pronunciation differences. It describes the phonetic data set compiled during the project* ‘Measuring Linguistic Unity and Diversity in Europe’ that consists of the pronunciations of 157 words collected at 197 sites from all over Bulgaria. We also present the results of analyzing this data set using various quantitative methods and compare them to the traditional scholarship on Bulgarian dialects. The results have shown that various dialectometrical techniques clearly identify east-west division of the country along the ‘jat’ border, as well as the third group of varieties in the Rodopi area. The rest of the groups specified in the traditional atlases either were not confirmed or were confirmed with a low confidence.