9 resultados para Statistical Machine Translation
em AMS Tesi di Laurea - Alm@DL - Università di Bologna
Resumo:
Con il presente studio si è inteso analizzare l’impatto dell’utilizzo di una memoria di traduzione (TM) e del post-editing (PE) di un output grezzo sul livello di difficoltà percepita e sul tempo necessario per ottenere un testo finale di alta qualità. L’esperimento ha coinvolto sei studenti, di madrelingua italiana, del corso di Laurea Magistrale in Traduzione Specializzata dell’Università di Bologna (Vicepresidenza di Forlì). I partecipanti sono stati divisi in tre coppie, a ognuna delle quali è stato assegnato un estratto di comunicato stampa in inglese. Per ogni coppia, ad un partecipante è stato chiesto di tradurre il testo in italiano usando la TM all’interno di SDL Trados Studio 2011. All’altro partecipante è stato chiesto di fare il PE completo in italiano dell’output grezzo ottenuto da Google Translate. Nei casi in cui la TM o l’output non contenevano traduzioni (corrette), i partecipanti avrebbero potuto consultare Internet. Ricorrendo ai Think-aloud Protocols (TAPs), è stato chiesto loro di riflettere a voce alta durante lo svolgimento dei compiti. È stato quindi possibile individuare i problemi traduttivi incontrati e i casi in cui la TM e l’output grezzo hanno fornito soluzioni corrette; inoltre, è stato possibile osservare le strategie traduttive impiegate, per poi chiedere ai partecipanti di indicarne la difficoltà attraverso interviste a posteriori. È stato anche misurato il tempo impiegato da ogni partecipante. I dati sulla difficoltà percepita e quelli sul tempo impiegato sono stati messi in relazione con il numero di soluzioni corrette rispettivamente fornito da TM e output grezzo. È stato osservato che usare la TM ha comportato un maggior risparmio di tempo e che, al contrario del PE, ha portato a una riduzione della difficoltà percepita. Il presente studio si propone di aiutare i futuri traduttori professionisti a scegliere strumenti tecnologici che gli permettano di risparmiare tempo e risorse.
Resumo:
Following the internationalization of contemporary higher education, academic institutions based in non-English speaking countries are increasingly urged to produce contents in English to address international prospective students and personnel, as well as to increase their attractiveness. The demand for English translations in the institutional academic domain is consequently increasing at a rate exceeding the capacity of the translation profession. Resources for assisting non-native authors and translators in the production of appropriate texts in L2 are therefore required in order to help academic institutions and professionals streamline their translation workload. Some of these resources include: (i) parallel corpora to train machine translation systems and multilingual authoring tools; and (ii) translation memories for computer-aided tools. The purpose of this study is to create and evaluate reference resources like the ones mentioned in (i) and (ii) through the automatic sentence alignment of a large set of Italian and English as a Lingua Franca (ELF) institutional academic texts given as equivalent but not necessarily parallel (i.e. translated). In this framework, a set of aligning algorithms and alignment tools is examined in order to identify the most profitable one(s) in terms of accuracy and time- and cost-effectiveness. In order to determine the text pairs to align, a sample is selected according to document length similarity (characters) and subsequently evaluated in terms of extent of noisiness/parallelism, alignment accuracy and content leverageability. The results of these analyses serve as the basis for the creation of an aligned bilingual corpus of academic course descriptions, which is eventually used to create a translation memory in TMX format.
Resumo:
Computer-assisted translation (or computer-aided translation or CAT) is a form of language translation in which a human translator uses computer software in order to facilitate the translation process. Machine translation (MT) is the automated process by which a computerized system produces a translated text or speech from one natural language to another. Both of them are leading and promising technologies in the translation industry; it therefore seems important that translation students and professional translators become familiar with this relatively new types of technology. Whether used together, not only might these two different types of systems reduce translation time, but also lead to a further improvement in the field of translation technologies. The dissertation consists of four chapters. The first one surveys the chronological development of MT and CAT tools, the emergence of pre-editing, post-editing and controlled language and the very last frontiers in this sector. The second one provide a general overview on the four main CAT tools that are used nowadays and tested hereto. The third chapter is dedicated to the experimentations that have been conducted in order to analyze and evaluate the performance of the four integrated systems that are the core subject of this dissertation. Finally, the fourth chapter deals with the issue of terminological equivalence in interlinguistic translation. The purpose of this dissertation is not to provide an objective and definitive solution to the complex issues that arise at any time in the field of translation technologies, this aim being well away from being achieved, but to supply information about the limits and potentiality that are typical of those instruments which are now essential to any professional translator.
Resumo:
This dissertation was conducted within the project Language Toolkit, which has the aim of integrating the worlds of work and university. In particular, it consists of the translation into English of documents commissioned by the Italian company TR Turoni and its primary purpose is to demonstrate that, in the field of translation for companies, the existing translation support tools and software can optimise and facilitate the translation process. The work consists of five chapters. The first introduces the Language Toolkit project, the TR Turoni company and its relationship with the CERMAC export consortium. After outlining the current state of company internationalisation, the importance of professional translators in enhancing the competitiveness of companies that enter new international markets is highlighted. Chapter two provides an overview of the texts to be translated, focusing on the textual function and typology and on the addressees. After that, manual translation and the main software developed specifically for translators are described, with a focus on computer-assisted translation (CAT) and machine translation (MT). The third chapter presents the target texts and the corresponding translations. Chapter four is dedicated to the analysis of the translation process. The first two texts were translated manually, with the support of a purpose-built specialized corpus. The following two documents were translated with the software SDL Trados Studio 2011 and its applications. The last texts were submitted to the Google Translate service and to a process of pre and post-editing. Finally, in chapter five conclusions are drawn about the main limits and potentialities of the different translations techniques. In addition to this, the importance of an integrated use of all available instruments is underlined.
Resumo:
The aim of this dissertation is to provide a translation from English into Italian of a highly specialized scientific article published by the online journal ALTEX. In this text, the authors propose a roadmap for how to overcome the acknowledged scientific gaps for the full replacement of systemic toxicity testing using animals. The main reasons behind this particular choice are my personal interest in specialized translation of scientific texts and in the alternatives to animal testing. Moreover, this translation has been directly requested by the Italian molecular biologist and clinical biochemist Candida Nastrucci. It was not possible to translate the whole article in this project, for this reason, I decided to translate only the introduction, the chapter about skin sensitization, and the conclusion. I intend to use the resources that were created for this project to translate the rest of the article in the near future. In this study, I will show how a translator can translate such a specialized text with the help of a field expert using CAT Tools and a specialized corpus. I will also discuss whether machine translation can prove useful to translate this type of document. This work is divided into six chapters. The first one introduces the main topic of the article and explains my reasons for choosing this text; the second one contains an analysis of the text type, focusing on the differences and similarities between Italian and English conventions. The third chapter provides a description of the resources that were used to translate this text, i.e. the corpus and the CAT Tools. The fourth one contains the actual translation, side-by-side with the original text, while the fifth one provides a general comment on the translation difficulties, an analysis of my translation choices and strategies, and a comment about the relationship between the field expert and the translator. Finally, the last chapter shows whether machine translation and post-editing can be an advantageous strategy to translate this type of document. The project also contains two appendixes. The first one includes 54 complex terminological sheets, while the second one includes 188 simple terminological sheets.
Resumo:
This dissertation is part of the Language Toolkit project which is a collaboration between the School of Foreign Languages and Literature, Interpreting and Translation of the University of Bologna, Forlì campus, and the Chamber of Commerce of Forlì-Cesena. This project aims to create an exchange between translation students and companies who want to pursue a process of internationalization. The purpose of this dissertation is demonstrating the benefits that translation systems can bring to businesses. In particular, it consists of the translation into English of documents supplied by the Italian company Technologica S.r.l. and the creation of linguistic resources that can be integrated into computer-assisted translation (CAT) software, in order to optimize the translation process. The latter is claimed to be a priority with respect to the actual translation products (the target texts), since the analysis conducted on the source texts highlighted that the company could streamline and optimize its English language communication thanks to the use of open source CAT tools such as OmegaT. The work consists of five chapters. The first introduces the Language Toolkit project, the company (Technologica S.r.l ) and its products. The second chapter provides some considerations about technical translation, its features and some misconceptions about it. The difference between technical translation and scientific translation is then clarified and an overview is offered of translation aids such as those used for computer-assisted translation, machine translation, termbases and translation memories. The third chapter contains the analysis of the texts commissioned by Technologica S.r.l. and their categorization. The fourth chapter describes the translation process, with particular attention to terminology extraction and the creation of a bilingual glossary based on a specialized corpus. The glossary was integrated into the OmegaT software in order to facilitate the translation process both for the present task and for future applications. The memory deriving from the translation represents a sort of hybrid resource between a translation memory and a glossary. This was found to be the most appropriate format, given the specific nature of the texts to be translated. Finally, in chapter five conclusions are offered about the importance of language training within a company environment, the potentialities of translation aids and the benefits that they would bring to a company wishing to internationalize itself.
Resumo:
La presente tesi nasce da un tirocinio avanzato svolto presso l’azienda CTI (Communication Trend Italia) di Milano. Gli obiettivi dello stage erano la verifica della possibilità di inserire gli strumenti automatici nel flusso di lavoro dell’azienda e l'individuazione delle tipologie testuali e delle combinazioni linguistiche a cui essi sono applicabili. Il presente elaborato si propone di partire da un’analisi teorica dei vari aspetti legati all’utilizzo della TA, per poi descriverne l’applicazione pratica nei procedimenti che hanno portato alla creazione dei sistemi custom. Il capitolo 1 offre una panoramica teorica sul mondo della machine translation, che porta a delineare la modalità di utilizzo della TA ad oggi più diffusa: quella in cui la traduzione fornita dal sistema viene modificata tramite post-editing oppure il testo di partenza viene ritoccato attraverso il pre-editing per eliminare gli elementi più ostici. Nel capitolo 2, partendo da una panoramica relativa ai principali software di traduzione automatica in uso, si arriva alla descrizione di Microsoft Translator Hub, lo strumento scelto per lo sviluppo dei sistemi custom di CTI. Nel successivo passaggio, l’attenzione si concentra sull’ottenimento di sistemi customizzati. Un ampio approfondimento è dedicato ai metodi per reperire ed utilizzare le risorse. In seguito viene descritto il percorso che ha portato alla creazione e allo sviluppo dei due sistemi Bilanci IT_EN e Atto Costitutivo IT_EN in Microsoft Translator Hub. Infine, nel quarto ed ultimo capitolo gli output che i due sistemi forniscono vengono rivisti per individuarne le caratteristiche e analizzati tramite alcuni tool di valutazione automatica. Grazie alle informazioni raccolte vengono poi formulate alcune previsioni sul futuro uso dei sistemi presso l’azienda CTI.
Resumo:
This work focuses on Machine Translation (MT) and Speech-to-Speech Translation, two emerging technologies that allow users to automatically translate written and spoken texts. The first part of this work provides a theoretical framework for the evaluation of Google Translate and Microsoft Translator, which is at the core of this study. Chapter one focuses on Machine Translation, providing a definition of this technology and glimpses of its history. In this chapter we will also learn how MT works, who uses it, for what purpose, what its pros and cons are, and how machine translation quality can be defined and assessed. Chapter two deals with Speech-to-Speech Translation by focusing on its history, characteristics and operation, potential uses and limits deriving from the intrinsic difficulty of translating spoken language. After describing the future prospects for SST, the final part of this chapter focuses on the quality assessment of Speech-to-Speech Translation applications. The last part of this dissertation describes the evaluation test carried out on Google Translate and Microsoft Translator, two mobile translation apps also providing a Speech-to-Speech Translation service. Chapter three illustrates the objectives, the research questions, the participants, the methodology and the elaboration of the questionnaires used to collect data. The collected data and the results of the evaluation of the automatic speech recognition subsystem and the language translation subsystem are presented in chapter four and finally analysed and compared in chapter five, which provides a general description of the performance of the evaluated apps and possible explanations for each set of results. In the final part of this work suggestions are made for future research and reflections on the usability and usefulness of the evaluated translation apps are provided.
Machine Learning applicato al Web Semantico: Statistical Relational Learning vs Tensor Factorization
Resumo:
Obiettivo della tesi è analizzare e testare i principali approcci di Machine Learning applicabili in contesti semantici, partendo da algoritmi di Statistical Relational Learning, quali Relational Probability Trees, Relational Bayesian Classifiers e Relational Dependency Networks, per poi passare ad approcci basati su fattorizzazione tensori, in particolare CANDECOMP/PARAFAC, Tucker e RESCAL.