How comparable are parallel corpora? Measuring the distribution of general vocabulary and connectives


Autoria(s): Zufferey, Sandrine; Cartoni, Bruno; Popescu-Belis, Andrei; Meyer, Thomas
Data(s)

2011

Resumo

In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity between various sub-parts. We compare results obtained using a general measure of lexical similarity based on χ2 and by counting the number of discourse connectives. We argue that discourse connectives provide a more sensitive measure, revealing differences that are not visible with the general measure. We also provide evidence for the existence of specific characteristics defining translated texts as opposed to non-translated ones, due to a universal tendency for explicitation.

Formato

application/pdf

Identificador

http://boris.unibe.ch/78680/1/p78-cartoni.pdf

Zufferey, Sandrine; Cartoni, Bruno; Popescu-Belis, Andrei; Meyer, Thomas (2011). How comparable are parallel corpora? Measuring the distribution of general vocabulary and connectives. In: Proceedings of 4th Workshop on Building and Using Comparable Corpora. Portland, Oregon. 24.06.2011.

doi:10.7892/boris.78680

urn:isbn:978-1-937284-015

Idioma(s)

eng

Relação

http://boris.unibe.ch/78680/

http://dl.acm.org/citation.cfm?id=2024251&CFID=775998834&CFTOKEN=64365854

Direitos

info:eu-repo/semantics/restrictedAccess

Fonte

Zufferey, Sandrine; Cartoni, Bruno; Popescu-Belis, Andrei; Meyer, Thomas (2011). How comparable are parallel corpora? Measuring the distribution of general vocabulary and connectives. In: Proceedings of 4th Workshop on Building and Using Comparable Corpora. Portland, Oregon. 24.06.2011.

Palavras-Chave #840 French & related literatures #440 French & related languages
Tipo

info:eu-repo/semantics/conferenceObject

info:eu-repo/semantics/publishedVersion

PeerReviewed