Biblioteca Digital

9 resultados para Bilingual lexicography

em Bulgarian Digital Mathematics Library at IMI-BAS

Bilingual Corpus - Digital Repository for Preservation of Language Heritage

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The article briefly reviews bilingual Slovak-Bulgarian/Bulgarian-Slovak parallel and aligned corpus. The corpus is collected and developed as results of the collaboration in the frameworks of the joint research project between Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, and Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences. The multilingual corpora are large repositories of language data with an important role in preserving and supporting the world's cultural heritage, because the natural language is an outstanding part of the human cultural values and collective memory, and a bridge between cultures. This bilingual corpus will be widely applicable to the contrastive studies of the both Slavic languages, will also be useful resource for language engineering research and development, especially in machine translation.

Veja mais

Web-application for Presentation of Bulgarian Language Heritage: Bilingual Digital Corpora and Dictionaries

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The paper describes three software packages - the main components of a software system for processing and web-presentation of Bulgarian language resources – parallel corpora and bilingual dictionaries. The author briefly presents current versions of the core components “Dictionary” and “Corpus” as well as the recently developed component “Connection” that links both “Dictionary” and “Corpus”. The components main functionalities are described as well. Some examples of the usage of the system’s web-applications are included.

Veja mais

Automatic Identification of False Friends in Parallel Corpora: Statistical and Semantic Approach

Relevância:

10.00% 10.00%

Publicador:

Resumo:

False friends are pairs of words in two languages that are perceived as similar but have different meanings. We present an improved algorithm for acquiring false friends from sentence-level aligned parallel corpus based on statistical observations of words occurrences and co-occurrences in the parallel sentences. The results are compared with an entirely semantic measure for cross-lingual similarity between words based on using the Web as a corpus through analyzing the words’ local contexts extracted from the text snippets returned by searching in Google. The statistical and semantic measures are further combined into an improved algorithm for identification of false friends that achieves almost twice better results than previously known algorithms. The evaluation is performed for identifying cognates between Bulgarian and Russian but the proposed methods could be adopted for other language pairs for which parallel corpora and bilingual glossaries are available.

Veja mais

Institute for Bulgarian Language, BAS

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The paper presents the history, structure and ongoing activities of the Institute for Bulgarian Language of Bulgarian Academy of Sciences.

Veja mais

Services for Content Creation and Presentation in an Iconographical Digital Library

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Content creation and presentation are key activities in a multimedia digital library (MDL). The proper design and intelligent implementation of these services provide a stable base for overall MDL functionality. This paper presents the framework and the implementation of these services in the latest version of the “Virtual Encyclopaedia of Bulgarian Iconography” multimedia digital library. For the semantic description of the iconographical objects a tree-based annotation template is implemented. It provides options for autocompletion, reuse of values, bilingual entering of data, automated media watermarking, resizing and conversing. The paper describes in detail the algorithm for automated appearance of dependent values for different characteristics of an iconographical object. An algorithm for avoiding duplicate image objects is also included. The service for automated appearance of new objects in a collection after their entering is included as an important part of the content presentation. The paper also presents the overall service-based architecture of the library, covering its main service panels, repositories and their relationships. The presented vision is based on a long-term observation of the users’ preferences, cognitive goals, and needs, aiming to find an optimal functionality solution for the end users.

Veja mais

Noun Sense Disambiguation using Co-Occurrence Relation in Machine Translation

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Word Sense Disambiguation, the process of identifying the meaning of a word in a sentence when the word has multiple meanings, is a critical problem of machine translation. It is generally very difficult to select the correct meaning of a word in a sentence, especially when the syntactical difference between the source and target language is big, e.g., English-Korean machine translation. To achieve a high level of accuracy of noun sense selection in machine translation, we introduced a statistical method based on co-occurrence relation of words in sentences and applied it to the English-Korean machine translator RyongNamSan. ACM Computing Classification System (1998): I.2.7.

Veja mais

Information Technologies for the Preservation of Language Heritage

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we try to present how information technologies as tools for the creation of digital bilingual dictionaries can help the preservation of natural languages. Natural languages are an outstanding part of human cultural values and for that reason they should be preserved as part of the world cultural heritage. We describe our work on the bilingual lexical database supporting the Bulgarian-Polish Online dictionary. The main software tools for the web- presentation of the dictionary are shortly described. We focus our special attention on the presentation of verbs, the richest from a specific characteristics viewpoint linguistic category in Bulgarian.

Veja mais

Language Resources – a Part of World Cultural Heritage

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This article briefly reviews multilingual language resources for Bulgarian, developed in the frame of some international projects: the first-ever annotated Bulgarian MTE digital lexical resources, Bulgarian-Polish corpus, Bulgarian-Slovak parallel and aligned corpus, and Bulgarian-Polish-Lithuanian corpus. These resources are valuable multilingual dataset for language engineering research and development for Bulgarian language. The multilingual corpora are large repositories of language data with an important role in preserving and supporting the world's cultural heritage, because the natural language is an outstanding part of the human cultural values and collective memory, and a bridge between cultures.

Veja mais

Online Dictionary - Tool for Preservation of Language Heritage

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The paper aims to represent a bilingual online dictionary as a useful tool helping preservation of the natural languages. The author focuses on the approach that was taken to develop compatible bilingual lexical database for the Bulgarian-Polish online dictionary. A formal model for the dictionary encoding is developed in accordance with the complex structures of the dictionary entries. These structures vary depending on the grammatical characteristics of Bulgarian headwords. The Web-application for presentation of the bilingual dictionary is also describred.

Veja mais

9 resultados para Bilingual lexicography

em Bulgarian Digital Mathematics Library at IMI-BAS

Filtro por publicador