Multilingual lexical resources to detect cognates in non-aligned texts


Autoria(s): Wang, Haoxing; Sitbon, Laurianne
Contribuinte(s)

Ferraro, Gabriela

Wan, Stephen

Data(s)

27/11/2014

Resumo

The identification of cognates between two distinct languages has recently start- ed to attract the attention of NLP re- search, but there has been little research into using semantic evidence to detect cognates. The approach presented in this paper aims to detect English-French cog- nates within monolingual texts (texts that are not accompanied by aligned translat- ed equivalents), by integrating word shape similarity approaches with word sense disambiguation techniques in order to account for context. Our implementa- tion is based on BabelNet, a semantic network that incorporates a multilingual encyclopedic dictionary. Our approach is evaluated on two manually annotated da- tasets. The first one shows that across different types of natural text, our method can identify the cognates with an overall accuracy of 80%. The second one, con- sisting of control sentences with semi- cognates acting as either true cognates or false friends, shows that our method can identify 80% of semi-cognates acting as cognates but also identifies 75% of the semi-cognates acting as false friends.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/79707/

Relação

http://eprints.qut.edu.au/79707/1/Multilingual%20lexical%20resources%20to%20detect%20cognates%20in%20non-aligned%20texts.pdf

http://www.aclweb.org/anthology/U14-1003

Wang, Haoxing & Sitbon, Laurianne (2014) Multilingual lexical resources to detect cognates in non-aligned texts. In Ferraro, Gabriela & Wan, Stephen (Eds.) Proceedings of the Australasian Language Technology Association Workshop 2014, Melbourne, Australia, pp. 14-22.

Fonte

School of Electrical Engineering & Computer Science; Institute for Future Environments; Science & Engineering Faculty

Palavras-Chave #080107 Natural Language Processing #English as a Second Language #Cognate Detection #Disambiguation
Tipo

Conference Paper