How comparable are parallel corpora? Measuring the distribution of general vocabulary and connectives
Data(s) |
2011
|
---|---|
Resumo |
In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity between various sub-parts. We compare results obtained using a general measure of lexical similarity based on χ2 and by counting the number of discourse connectives. We argue that discourse connectives provide a more sensitive measure, revealing differences that are not visible with the general measure. We also provide evidence for the existence of specific characteristics defining translated texts as opposed to non-translated ones, due to a universal tendency for explicitation. |
Formato |
application/pdf |
Identificador |
http://boris.unibe.ch/78680/1/p78-cartoni.pdf Zufferey, Sandrine; Cartoni, Bruno; Popescu-Belis, Andrei; Meyer, Thomas (2011). How comparable are parallel corpora? Measuring the distribution of general vocabulary and connectives. In: Proceedings of 4th Workshop on Building and Using Comparable Corpora. Portland, Oregon. 24.06.2011. doi:10.7892/boris.78680 urn:isbn:978-1-937284-015 |
Idioma(s) |
eng |
Relação |
http://boris.unibe.ch/78680/ http://dl.acm.org/citation.cfm?id=2024251&CFID=775998834&CFTOKEN=64365854 |
Direitos |
info:eu-repo/semantics/restrictedAccess |
Fonte |
Zufferey, Sandrine; Cartoni, Bruno; Popescu-Belis, Andrei; Meyer, Thomas (2011). How comparable are parallel corpora? Measuring the distribution of general vocabulary and connectives. In: Proceedings of 4th Workshop on Building and Using Comparable Corpora. Portland, Oregon. 24.06.2011. |
Palavras-Chave | #840 French & related literatures #440 French & related languages |
Tipo |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion PeerReviewed |