15 resultados para Machine Translation (MT)
em AMS Tesi di Laurea - Alm@DL - Università di Bologna
Resumo:
The aim of this essay, which focuses on patent translation, is to compare the use of Computer-Assisted Translation (CAT) and Machine Translation (MT). During my curricular internship at a specialized-translation agency called Centro Traduzioni Imolese, I was able to practice patent translation thanks to CAT tools like SDL Trados Studio, something I have never studied at university in Forlì. Nowadays, however, Machine Translation is widely used in patent translation as well, due to the vast number of technical terms and their repetitiveness in patents, so the machine can translate automatically and rapidly all repeated terms with the same word, thanks to the use of corpora and translation memories linked to the patent field. In the first chapter I will give a definition of what a patent is, and I will introduce the concept of patent literature; afterwards, I will illustrate the differences between Computer-Assisted Translation and Machine Translation used in patent translation. In the second chapter I will translate two portions of patent 102019000018530, via the Matecat online application, translating the first part with CAT and the second part with MT, then doing the same for the second portion selected from the patent. Finally, in the third chapter, I will analyse the two translations, comparing the results in order to discover which is the more efficient method for translating patents.
Resumo:
Con il presente studio si è inteso analizzare l’impatto dell’utilizzo di una memoria di traduzione (TM) e del post-editing (PE) di un output grezzo sul livello di difficoltà percepita e sul tempo necessario per ottenere un testo finale di alta qualità. L’esperimento ha coinvolto sei studenti, di madrelingua italiana, del corso di Laurea Magistrale in Traduzione Specializzata dell’Università di Bologna (Vicepresidenza di Forlì). I partecipanti sono stati divisi in tre coppie, a ognuna delle quali è stato assegnato un estratto di comunicato stampa in inglese. Per ogni coppia, ad un partecipante è stato chiesto di tradurre il testo in italiano usando la TM all’interno di SDL Trados Studio 2011. All’altro partecipante è stato chiesto di fare il PE completo in italiano dell’output grezzo ottenuto da Google Translate. Nei casi in cui la TM o l’output non contenevano traduzioni (corrette), i partecipanti avrebbero potuto consultare Internet. Ricorrendo ai Think-aloud Protocols (TAPs), è stato chiesto loro di riflettere a voce alta durante lo svolgimento dei compiti. È stato quindi possibile individuare i problemi traduttivi incontrati e i casi in cui la TM e l’output grezzo hanno fornito soluzioni corrette; inoltre, è stato possibile osservare le strategie traduttive impiegate, per poi chiedere ai partecipanti di indicarne la difficoltà attraverso interviste a posteriori. È stato anche misurato il tempo impiegato da ogni partecipante. I dati sulla difficoltà percepita e quelli sul tempo impiegato sono stati messi in relazione con il numero di soluzioni corrette rispettivamente fornito da TM e output grezzo. È stato osservato che usare la TM ha comportato un maggior risparmio di tempo e che, al contrario del PE, ha portato a una riduzione della difficoltà percepita. Il presente studio si propone di aiutare i futuri traduttori professionisti a scegliere strumenti tecnologici che gli permettano di risparmiare tempo e risorse.
Resumo:
Artificial Intelligence (AI) is gaining ever more ground in every sphere of human life, to the point that it is now even used to pass sentences in courts. The use of AI in the field of Law is however deemed quite controversial, as it could provide more objectivity yet entail an abuse of power as well, given that bias in algorithms behind AI may cause lack of accuracy. As a product of AI, machine translation is being increasingly used in the field of Law too in order to translate laws, judgements, contracts, etc. between different languages and different legal systems. In the legal setting of Company Law, accuracy of the content and suitability of terminology play a crucial role within a translation task, as any addition or omission of content or mistranslation of terms could entail legal consequences for companies. The purpose of the present study is to first assess which neural machine translation system between DeepL and ModernMT produces a more suitable translation from Italian into German of the atto costitutivo of an Italian s.r.l. in terms of accuracy of the content and correctness of terminology, and then to assess which translation proves to be closer to a human reference translation. In order to achieve the above-mentioned aims, two human and automatic evaluations are carried out based on the MQM taxonomy and the BLEU metric. Results of both evaluations show an overall better performance delivered by ModernMT in terms of content accuracy, suitability of terminology, and closeness to a human translation. As emerged from the MQM-based evaluation, its accuracy and terminology errors account for just 8.43% (as opposed to DeepL’s 9.22%), while it obtains an overall BLEU score of 29.14 (against DeepL’s 27.02). The overall performances however show that machines still face barriers in overcoming semantic complexity, tackling polysemy, and choosing domain-specific terminology, which suggests that the discrepancy with human translation may still be remarkable.
Resumo:
Computer-assisted translation (or computer-aided translation or CAT) is a form of language translation in which a human translator uses computer software in order to facilitate the translation process. Machine translation (MT) is the automated process by which a computerized system produces a translated text or speech from one natural language to another. Both of them are leading and promising technologies in the translation industry; it therefore seems important that translation students and professional translators become familiar with this relatively new types of technology. Whether used together, not only might these two different types of systems reduce translation time, but also lead to a further improvement in the field of translation technologies. The dissertation consists of four chapters. The first one surveys the chronological development of MT and CAT tools, the emergence of pre-editing, post-editing and controlled language and the very last frontiers in this sector. The second one provide a general overview on the four main CAT tools that are used nowadays and tested hereto. The third chapter is dedicated to the experimentations that have been conducted in order to analyze and evaluate the performance of the four integrated systems that are the core subject of this dissertation. Finally, the fourth chapter deals with the issue of terminological equivalence in interlinguistic translation. The purpose of this dissertation is not to provide an objective and definitive solution to the complex issues that arise at any time in the field of translation technologies, this aim being well away from being achieved, but to supply information about the limits and potentiality that are typical of those instruments which are now essential to any professional translator.
Resumo:
This dissertation was conducted within the project Language Toolkit, which has the aim of integrating the worlds of work and university. In particular, it consists of the translation into English of documents commissioned by the Italian company TR Turoni and its primary purpose is to demonstrate that, in the field of translation for companies, the existing translation support tools and software can optimise and facilitate the translation process. The work consists of five chapters. The first introduces the Language Toolkit project, the TR Turoni company and its relationship with the CERMAC export consortium. After outlining the current state of company internationalisation, the importance of professional translators in enhancing the competitiveness of companies that enter new international markets is highlighted. Chapter two provides an overview of the texts to be translated, focusing on the textual function and typology and on the addressees. After that, manual translation and the main software developed specifically for translators are described, with a focus on computer-assisted translation (CAT) and machine translation (MT). The third chapter presents the target texts and the corresponding translations. Chapter four is dedicated to the analysis of the translation process. The first two texts were translated manually, with the support of a purpose-built specialized corpus. The following two documents were translated with the software SDL Trados Studio 2011 and its applications. The last texts were submitted to the Google Translate service and to a process of pre and post-editing. Finally, in chapter five conclusions are drawn about the main limits and potentialities of the different translations techniques. In addition to this, the importance of an integrated use of all available instruments is underlined.
Resumo:
This work focuses on Machine Translation (MT) and Speech-to-Speech Translation, two emerging technologies that allow users to automatically translate written and spoken texts. The first part of this work provides a theoretical framework for the evaluation of Google Translate and Microsoft Translator, which is at the core of this study. Chapter one focuses on Machine Translation, providing a definition of this technology and glimpses of its history. In this chapter we will also learn how MT works, who uses it, for what purpose, what its pros and cons are, and how machine translation quality can be defined and assessed. Chapter two deals with Speech-to-Speech Translation by focusing on its history, characteristics and operation, potential uses and limits deriving from the intrinsic difficulty of translating spoken language. After describing the future prospects for SST, the final part of this chapter focuses on the quality assessment of Speech-to-Speech Translation applications. The last part of this dissertation describes the evaluation test carried out on Google Translate and Microsoft Translator, two mobile translation apps also providing a Speech-to-Speech Translation service. Chapter three illustrates the objectives, the research questions, the participants, the methodology and the elaboration of the questionnaires used to collect data. The collected data and the results of the evaluation of the automatic speech recognition subsystem and the language translation subsystem are presented in chapter four and finally analysed and compared in chapter five, which provides a general description of the performance of the evaluated apps and possible explanations for each set of results. In the final part of this work suggestions are made for future research and reflections on the usability and usefulness of the evaluated translation apps are provided.
Resumo:
Sin dal suo ingresso nel mercato della traduzione, la machine translation (MT) è stata impiegata in numerosi settori professionali e ha suscitato l’interesse del grande pubblico. Con l’avvento della traduzione automatica neurale, la MT ha avuto un ulteriore impulso e ci si è iniziati a interrogare sulla possibilità di estenderne il raggio d’azione all’ambito letterario, che sembra tuttora non toccato dagli automatismi delle macchine in virtù della funzione espressiva e del contenuto creativo dei testi. Il presente elaborato nasce quindi con l’obiettivo di valutare l’effettiva applicabilità della traduzione automatica in ambito letterario attraverso un’attenta analisi dell’output prodotto dalla MT supportata da un confronto con la traduzione svolta da un professionista. Questa analisi si baserà sulla traduzione di un estratto del romanzo Hunger Games di Suzanne Collins e le traduzioni verranno effettuate da due dei migliori software di traduzione automatica neurale attualmente sul mercato: ModernMT e DeepL. Una volta terminata la valutazione dei singoli output verranno mostrati i risultati e determinati i limiti e i vantaggi della machine translation.
Resumo:
Following the internationalization of contemporary higher education, academic institutions based in non-English speaking countries are increasingly urged to produce contents in English to address international prospective students and personnel, as well as to increase their attractiveness. The demand for English translations in the institutional academic domain is consequently increasing at a rate exceeding the capacity of the translation profession. Resources for assisting non-native authors and translators in the production of appropriate texts in L2 are therefore required in order to help academic institutions and professionals streamline their translation workload. Some of these resources include: (i) parallel corpora to train machine translation systems and multilingual authoring tools; and (ii) translation memories for computer-aided tools. The purpose of this study is to create and evaluate reference resources like the ones mentioned in (i) and (ii) through the automatic sentence alignment of a large set of Italian and English as a Lingua Franca (ELF) institutional academic texts given as equivalent but not necessarily parallel (i.e. translated). In this framework, a set of aligning algorithms and alignment tools is examined in order to identify the most profitable one(s) in terms of accuracy and time- and cost-effectiveness. In order to determine the text pairs to align, a sample is selected according to document length similarity (characters) and subsequently evaluated in terms of extent of noisiness/parallelism, alignment accuracy and content leverageability. The results of these analyses serve as the basis for the creation of an aligned bilingual corpus of academic course descriptions, which is eventually used to create a translation memory in TMX format.
Resumo:
La presente tesi nasce da un tirocinio avanzato svolto presso l’azienda CTI (Communication Trend Italia) di Milano. Gli obiettivi dello stage erano la verifica della possibilità di inserire gli strumenti automatici nel flusso di lavoro dell’azienda e l'individuazione delle tipologie testuali e delle combinazioni linguistiche a cui essi sono applicabili. Il presente elaborato si propone di partire da un’analisi teorica dei vari aspetti legati all’utilizzo della TA, per poi descriverne l’applicazione pratica nei procedimenti che hanno portato alla creazione dei sistemi custom. Il capitolo 1 offre una panoramica teorica sul mondo della machine translation, che porta a delineare la modalità di utilizzo della TA ad oggi più diffusa: quella in cui la traduzione fornita dal sistema viene modificata tramite post-editing oppure il testo di partenza viene ritoccato attraverso il pre-editing per eliminare gli elementi più ostici. Nel capitolo 2, partendo da una panoramica relativa ai principali software di traduzione automatica in uso, si arriva alla descrizione di Microsoft Translator Hub, lo strumento scelto per lo sviluppo dei sistemi custom di CTI. Nel successivo passaggio, l’attenzione si concentra sull’ottenimento di sistemi customizzati. Un ampio approfondimento è dedicato ai metodi per reperire ed utilizzare le risorse. In seguito viene descritto il percorso che ha portato alla creazione e allo sviluppo dei due sistemi Bilanci IT_EN e Atto Costitutivo IT_EN in Microsoft Translator Hub. Infine, nel quarto ed ultimo capitolo gli output che i due sistemi forniscono vengono rivisti per individuarne le caratteristiche e analizzati tramite alcuni tool di valutazione automatica. Grazie alle informazioni raccolte vengono poi formulate alcune previsioni sul futuro uso dei sistemi presso l’azienda CTI.
Resumo:
This thesis examines the state of audiovisual translation (AVT) in the aftermath of the COVID-19 emergency, highlighting new trends with regards to the implementation of AI technologies as well as their strengths, constraints, and ethical implications. It starts with an overview of the current AVT landscape, focusing on future projections about its evolution and its critical aspects such as the worsening working conditions lamented by AVT professionals – especially freelancers – in recent years and how they might be affected by the advent of AI technologies in the industry. The second chapter delves into the history and development of three AI technologies which are used in combination with neural machine translation in automatic AVT tools: automatic speech recognition, speech synthesis and deepfakes (voice cloning and visual deepfakes for lip syncing), including real examples of start-up companies that utilize them – or are planning to do so – to localize audiovisual content automatically or semi-automatically. The third chapter explores the many ethical concerns around these innovative technologies, which extend far beyond the field of translation; at the same time, it attempts to revindicate their potential to bring about immense progress in terms of accessibility and international cooperation, provided that their use is properly regulated. Lastly, the fourth chapter describes two experiments, testing the efficacy of the currently available tools for automatic subtitling and automatic dubbing respectively, in order to take a closer look at their perks and limitations compared to more traditional approaches. This analysis aims to help discerning legitimate concerns from unfounded speculations with regards to the AI technologies which are entering the field of AVT; the intention behind it is to humbly suggest a constructive and optimistic view of the technological transformations that appear to be underway, whilst also acknowledging their potential risks.
Resumo:
The aim of this dissertation is to provide a translation from English into Italian of a highly specialized scientific article published by the online journal ALTEX. In this text, the authors propose a roadmap for how to overcome the acknowledged scientific gaps for the full replacement of systemic toxicity testing using animals. The main reasons behind this particular choice are my personal interest in specialized translation of scientific texts and in the alternatives to animal testing. Moreover, this translation has been directly requested by the Italian molecular biologist and clinical biochemist Candida Nastrucci. It was not possible to translate the whole article in this project, for this reason, I decided to translate only the introduction, the chapter about skin sensitization, and the conclusion. I intend to use the resources that were created for this project to translate the rest of the article in the near future. In this study, I will show how a translator can translate such a specialized text with the help of a field expert using CAT Tools and a specialized corpus. I will also discuss whether machine translation can prove useful to translate this type of document. This work is divided into six chapters. The first one introduces the main topic of the article and explains my reasons for choosing this text; the second one contains an analysis of the text type, focusing on the differences and similarities between Italian and English conventions. The third chapter provides a description of the resources that were used to translate this text, i.e. the corpus and the CAT Tools. The fourth one contains the actual translation, side-by-side with the original text, while the fifth one provides a general comment on the translation difficulties, an analysis of my translation choices and strategies, and a comment about the relationship between the field expert and the translator. Finally, the last chapter shows whether machine translation and post-editing can be an advantageous strategy to translate this type of document. The project also contains two appendixes. The first one includes 54 complex terminological sheets, while the second one includes 188 simple terminological sheets.
Resumo:
This dissertation is part of the Language Toolkit project which is a collaboration between the School of Foreign Languages and Literature, Interpreting and Translation of the University of Bologna, Forlì campus, and the Chamber of Commerce of Forlì-Cesena. This project aims to create an exchange between translation students and companies who want to pursue a process of internationalization. The purpose of this dissertation is demonstrating the benefits that translation systems can bring to businesses. In particular, it consists of the translation into English of documents supplied by the Italian company Technologica S.r.l. and the creation of linguistic resources that can be integrated into computer-assisted translation (CAT) software, in order to optimize the translation process. The latter is claimed to be a priority with respect to the actual translation products (the target texts), since the analysis conducted on the source texts highlighted that the company could streamline and optimize its English language communication thanks to the use of open source CAT tools such as OmegaT. The work consists of five chapters. The first introduces the Language Toolkit project, the company (Technologica S.r.l ) and its products. The second chapter provides some considerations about technical translation, its features and some misconceptions about it. The difference between technical translation and scientific translation is then clarified and an overview is offered of translation aids such as those used for computer-assisted translation, machine translation, termbases and translation memories. The third chapter contains the analysis of the texts commissioned by Technologica S.r.l. and their categorization. The fourth chapter describes the translation process, with particular attention to terminology extraction and the creation of a bilingual glossary based on a specialized corpus. The glossary was integrated into the OmegaT software in order to facilitate the translation process both for the present task and for future applications. The memory deriving from the translation represents a sort of hybrid resource between a translation memory and a glossary. This was found to be the most appropriate format, given the specific nature of the texts to be translated. Finally, in chapter five conclusions are offered about the importance of language training within a company environment, the potentialities of translation aids and the benefits that they would bring to a company wishing to internationalize itself.
Resumo:
Natural Language Processing (NLP) has seen tremendous improvements over the last few years. Transformer architectures achieved impressive results in almost any NLP task, such as Text Classification, Machine Translation, and Language Generation. As time went by, transformers continued to improve thanks to larger corpora and bigger networks, reaching hundreds of billions of parameters. Training and deploying such large models has become prohibitively expensive, such that only big high tech companies can afford to train those models. Therefore, a lot of research has been dedicated to reducing a model’s size. In this thesis, we investigate the effects of Vocabulary Transfer and Knowledge Distillation for compressing large Language Models. The goal is to combine these two methodologies to further compress models without significant loss of performance. In particular, we designed different combination strategies and conducted a series of experiments on different vertical domains (medical, legal, news) and downstream tasks (Text Classification and Named Entity Recognition). Four different methods involving Vocabulary Transfer (VIPI) with and without a Masked Language Modelling (MLM) step and with and without Knowledge Distillation are compared against a baseline that assigns random vectors to new elements of the vocabulary. Results indicate that VIPI effectively transfers information of the original vocabulary and that MLM is beneficial. It is also noted that both vocabulary transfer and knowledge distillation are orthogonal to one another and may be applied jointly. The application of knowledge distillation first before subsequently applying vocabulary transfer is recommended. Finally, model performance due to vocabulary transfer does not always show a consistent trend as the vocabulary size is reduced. Hence, the choice of vocabulary size should be empirically selected by evaluation on the downstream task similar to hyperparameter tuning.
Resumo:
Throughout the years, technology has had an undeniable impact on the AVT field. It has revolutionized the way audiovisual content is consumed by allowing audiences to easily access it at any time and on any device. Especially after the introduction of OTT streaming platforms such as Netflix, Amazon Prime Video, Disney+, Apple TV+, and HBO Max, which offer a vast catalog of national and international products, the consumption of audiovisual products has been on a constant rise and, consequently, the demand for localized content too. In turn, the AVT industry resorts to new technologies and practices to handle the ever-growing workload and the faster turnaround times. Due to the numerous implications that it has on the industry, technological advancement can be considered an area of research of particular interest for the AVT studies. However, in the case of dubbing, research and discussion regarding the topic is lagging behind because of the more limited impact that technology has had on the very conservative dubbing industry. Therefore, the aim of the dissertation is to offer an overview of some of the latest technological innovations and practices that have already been implemented (i.e. cloud dubbing and DeepDub technology) or that are still under development and research (i.e. automatic speech recognition and respeaking, machine translation and post-editing, audio-based and visual-based dubbing techniques, text-based editing of talking-head videos, and automatic dubbing), and respectively discuss their reception by the industry professionals, and make assumptions about their future implementation in the dubbing field.
Resumo:
Lo scopo dell’elaborato è quello di valutare una prima applicazione di intelligenza artificiale per svolgere manutenzione predittiva sui giunti installati nella rete di distribuzione, al fine di prevenirne i guasti. L’estensione delle reti in cavo è infatti in costante aumento per supportare lo sviluppo della rete elettrica e, per l’installazione di nuove linee o nella manutenzione di cavi, si fa uso di giunti, spesso soggetti a guasti e di difficile manutenzione. Risulta quindi importante svolgere manutenzione predittiva su questi componenti, al fine di prevenirne i guasti ed ottenere un risparmio in termini di tempo e denaro. A seguito un’attenta analisi della struttura dei giunti, dei loro modi di guasto e dei parametri caratteristici, si è proceduto alla generazione sintetica di un dataset contenente misure sui giunti e, sulla base di specifici criteri applicati a questi valori, l’informazione sulla presenza o sull’assenza di un guasto. Il dataset è stato poi utilizzato per la fase di apprendimento e di test di sei algoritmi di machine learning, selezionati a partire da considerazioni sullo stato dell’arte, valutato tramite una Revisione Sistematica della Letteratura (SLR). I risultati così ottenuti sono stati analizzati tramite specifiche metriche e da questo caso di studio sono emerse le elevate potenzialità delle tecnologie di intelligenza artificiale ai fini della prevenzione dei guasti nelle reti di distribuzione, soprattutto se considerata la semplicità implementativa degli algoritmi.