Multilingual lexical resources to detect cognates in non-aligned texts
Contribuinte(s) |
Ferraro, Gabriela Wan, Stephen |
---|---|
Data(s) |
27/11/2014
|
Resumo |
The identification of cognates between two distinct languages has recently start- ed to attract the attention of NLP re- search, but there has been little research into using semantic evidence to detect cognates. The approach presented in this paper aims to detect English-French cog- nates within monolingual texts (texts that are not accompanied by aligned translat- ed equivalents), by integrating word shape similarity approaches with word sense disambiguation techniques in order to account for context. Our implementa- tion is based on BabelNet, a semantic network that incorporates a multilingual encyclopedic dictionary. Our approach is evaluated on two manually annotated da- tasets. The first one shows that across different types of natural text, our method can identify the cognates with an overall accuracy of 80%. The second one, con- sisting of control sentences with semi- cognates acting as either true cognates or false friends, shows that our method can identify 80% of semi-cognates acting as cognates but also identifies 75% of the semi-cognates acting as false friends. |
Formato |
application/pdf |
Identificador | |
Relação |
http://eprints.qut.edu.au/79707/1/Multilingual%20lexical%20resources%20to%20detect%20cognates%20in%20non-aligned%20texts.pdf http://www.aclweb.org/anthology/U14-1003 Wang, Haoxing & Sitbon, Laurianne (2014) Multilingual lexical resources to detect cognates in non-aligned texts. In Ferraro, Gabriela & Wan, Stephen (Eds.) Proceedings of the Australasian Language Technology Association Workshop 2014, Melbourne, Australia, pp. 14-22. |
Fonte |
School of Electrical Engineering & Computer Science; Institute for Future Environments; Science & Engineering Faculty |
Palavras-Chave | #080107 Natural Language Processing #English as a Second Language #Cognate Detection #Disambiguation |
Tipo |
Conference Paper |