Bitext alignment: building and evaluating a bilingual corpus and translation memory of academic course descriptions


Autoria(s): Cocozza, Daniele
Contribuinte(s)

Ferraresi, Adriano

Data(s)

12/03/2015

Resumo

Following the internationalization of contemporary higher education, academic institutions based in non-English speaking countries are increasingly urged to produce contents in English to address international prospective students and personnel, as well as to increase their attractiveness. The demand for English translations in the institutional academic domain is consequently increasing at a rate exceeding the capacity of the translation profession. Resources for assisting non-native authors and translators in the production of appropriate texts in L2 are therefore required in order to help academic institutions and professionals streamline their translation workload. Some of these resources include: (i) parallel corpora to train machine translation systems and multilingual authoring tools; and (ii) translation memories for computer-aided tools. The purpose of this study is to create and evaluate reference resources like the ones mentioned in (i) and (ii) through the automatic sentence alignment of a large set of Italian and English as a Lingua Franca (ELF) institutional academic texts given as equivalent but not necessarily parallel (i.e. translated). In this framework, a set of aligning algorithms and alignment tools is examined in order to identify the most profitable one(s) in terms of accuracy and time- and cost-effectiveness. In order to determine the text pairs to align, a sample is selected according to document length similarity (characters) and subsequently evaluated in terms of extent of noisiness/parallelism, alignment accuracy and content leverageability. The results of these analyses serve as the basis for the creation of an aligned bilingual corpus of academic course descriptions, which is eventually used to create a translation memory in TMX format.

Formato

application/pdf

Identificador

http://amslaurea.unibo.it/8199/1/cocozza_daniele_tesi.pdf

Cocozza, Daniele (2015) Bitext alignment: building and evaluating a bilingual corpus and translation memory of academic course descriptions. [Laurea magistrale], Università di Bologna, Corso di Studio in Traduzione specializzata [LM-DM270] - Forli' <http://amslaurea.unibo.it/view/cds/CDS8061/>

Relação

http://amslaurea.unibo.it/8199/

Direitos

info:eu-repo/semantics/openAccess

Palavras-Chave #alignment, corpora, translation technology, English as a Lingua Franca, academic course descriptions #scuola :: 843894 :: Lingue e Letterature, Traduzione e Interpretazione #cds :: 8061 :: Traduzione specializzata [LM-DM270] - Forli' #sessione :: terza
Tipo

PeerReviewed