Biblioteca Digital

**Autoria(s):** Nakov, Svetlin
Data(s)	16/09/2009 16/09/2009 2009
Resumo	False friends are pairs of words in two languages that are perceived as similar but have different meanings. We present an improved algorithm for acquiring false friends from sentence-level aligned parallel corpus based on statistical observations of words occurrences and co-occurrences in the parallel sentences. The results are compared with an entirely semantic measure for cross-lingual similarity between words based on using the Web as a corpus through analyzing the words’ local contexts extracted from the text snippets returned by searching in Google. The statistical and semantic measures are further combined into an improved algorithm for identification of false friends that achieves almost twice better results than previously known algorithms. The evaluation is performed for identifying cognates between Bulgarian and Russian but the proposed methods could be adopted for other language pairs for which parallel corpora and bilingual glossaries are available.
Identificador	Serdica Journal of Computing, Vol. 3, No 2, (2009), 133p-158p 1312-6555 http://hdl.handle.net/10525/366
Idioma(s)	en
Publicador	Institute of Mathematics and Informatics Bulgarian Academy of Sciences
Palavras-Chave	#Cognates #False Friends #Identification of False Friends #Parallel Corpus #Cross-Lingual Semantic Similarity #Web as a Corpus
Tipo	Article

Acesso ao item digital