Biblioteca Digital

3 resultados para Encyclopedias and dictionaries

em Bulgarian Digital Mathematics Library at IMI-BAS

Web-application for Presentation of Bulgarian Language Heritage: Bilingual Digital Corpora and Dictionaries

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The paper describes three software packages - the main components of a software system for processing and web-presentation of Bulgarian language resources – parallel corpora and bilingual dictionaries. The author briefly presents current versions of the core components “Dictionary” and “Corpus” as well as the recently developed component “Connection” that links both “Dictionary” and “Corpus”. The components main functionalities are described as well. Some examples of the usage of the system’s web-applications are included.

Veja mais

Experiments in Detection and Correction of Russian Malapropisms by Means of the WEB

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Malapropism is a semantic error that is hardly detectable because it usually retains syntactical links between words in the sentence but replaces one content word by a similar word with quite different meaning. A method of automatic detection of malapropisms is described, based on Web statistics and a specially defined Semantic Compatibility Index (SCI). For correction of the detected errors, special dictionaries and heuristic rules are proposed, which retains only a few highly SCI-ranked correction candidates for the user’s selection. Experiments on Web-assisted detection and correction of Russian malapropisms are reported, demonstrating efficacy of the described method.

Veja mais

Topic Segmentation: How Much Can We Do by Counting Words and Sequences of Words

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we present an innovative topic segmentation system based on a new informative similarity measure that takes into account word co-occurrence in order to avoid the accessibility to existing linguistic resources such as electronic dictionaries or lexico-semantic databases such as thesauri or ontology. Topic segmentation is the task of breaking documents into topically coherent multi-paragraph subparts. Topic segmentation has extensively been used in information retrieval and text summarization. In particular, our architecture proposes a language-independent topic segmentation system that solves three main problems evidenced by previous research: systems based uniquely on lexical repetition that show reliability problems, systems based on lexical cohesion using existing linguistic resources that are usually available only for dominating languages and as a consequence do not apply to less favored languages and finally systems that need previously existing harvesting training data. For that purpose, we only use statistics on words and sequences of words based on a set of texts. This solution provides a flexible solution that may narrow the gap between dominating languages and less favored languages thus allowing equivalent access to information.

Veja mais

3 resultados para Encyclopedias and dictionaries

em Bulgarian Digital Mathematics Library at IMI-BAS

Filtro por publicador

Web-application for Presentation of Bulgarian Language Heritage: Bilingual Digital Corpora and Dictionaries

Experiments in Detection and Correction of Russian Malapropisms by Means of the WEB

Topic Segmentation: How Much Can We Do by Counting Words and Sequences of Words