989 resultados para machine translation


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A etiquetagem morfossintática é uma tarefa básica requerida por muitas aplicações de processamento de linguagem natural, tais como análise gramatical e tradução automática, e por aplicações de processamento de fala, por exemplo, síntese de fala. Essa tarefa consiste em etiquetar palavras em uma sentença com as suas categorias gramaticais. Apesar dessas aplicações requererem etiquetadores que demandem maior precisão, os etiquetadores do estado da arte ainda alcançam acurácia de 96 a 97%. Nesta tese, são investigados recursos de corpus e de software para o desenvolvimento de um etiquetador com acurácia superior à do estado da arte para o português brasileiro. Centrada em uma solução híbrida que combina etiquetagem probabilística com etiquetagem baseada em regras, a proposta de tese se concentra em um estudo exploratório sobre o método de etiquetagem, o tamanho, a qualidade, o conjunto de etiquetas e o gênero dos corpora de treinamento e teste, além de avaliar a desambiguização de palavras novas ou desconhecidas presentes nos textos a serem etiquetados. Quatro corpora foram usados nos experimentos: CETENFolha, Bosque CF 7.4, Mac-Morpho e Selva Científica. O modelo de etiquetagem proposto partiu do uso do método de aprendizado baseado em transformação(TBL) ao qual foram adicionadas três estratégias, combinadas em uma arquitetura que integra as saídas (textos etiquetados) de duas ferramentas de uso livre, o TreeTagger e o -TBL, com os módulos adicionados ao modelo. No modelo de etiquetador treinado com o corpus Mac-Morpho, de gênero jornalístico, foram obtidas taxas de acurácia de 98,05% na etiquetagem de textos do Mac-Morpho e 98,27% em textos do Bosque CF 7.4, ambos de gênero jornalístico. Avaliou-se também o desempenho do modelo de etiquetador híbrido proposto na etiquetagem de textos do corpus Selva Científica, de gênero científico. Foram identificadas necessidades de ajustes no etiquetador e nos corpora e, como resultado, foram alcançadas taxas de acurácia de 98,07% no Selva Científica, 98,06% no conjunto de teste do Mac-Morpho e 98,30% em textos do Bosque CF 7.4. Esses resultados são significativos, pois as taxas de acurácia alcançadas são superiores às do estado da arte, validando o modelo proposto em busca de um etiquetador morfossintático mais confiável.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Machine translation systems have been increasingly used for translation of large volumes of specialized texts. The efficiency of these systems depends directly on the implementation of strategies for controlling lexical use of source texts as a way to guarantee machine performance and, ultimately, human revision and post-edition work. This paper presents a brief history of application of machine translation, introduces the concept of lexicon and ambiguity and focuses on some of the lexical control strategies presently used, discussing their possible implications for the production and reading of specialized texts.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper analyzes how machine translation has changed the way translation is conceived and practiced in the information age. From a brief review of the early designs of machine translation programs, I discuss the changes implemented in the past decades in these systems to combine mechanical processing and the accessory work by the translator.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The classification of texts has become a major endeavor with so much electronic material available, for it is an essential task in several applications, including search engines and information retrieval. There are different ways to define similarity for grouping similar texts into clusters, as the concept of similarity may depend on the purpose of the task. For instance, in topic extraction similar texts mean those within the same semantic field, whereas in author recognition stylistic features should be considered. In this study, we introduce ways to classify texts employing concepts of complex networks, which may be able to capture syntactic, semantic and even pragmatic features. The interplay between various metrics of the complex networks is analyzed with three applications, namely identification of machine translation (MT) systems, evaluation of quality of machine translated texts and authorship recognition. We shall show that topological features of the networks representing texts can enhance the ability to identify MT systems in particular cases. For evaluating the quality of MT texts, on the other hand, high correlation was obtained with methods capable of capturing the semantics. This was expected because the golden standards used are themselves based on word co-occurrence. Notwithstanding, the Katz similarity, which involves semantic and structure in the comparison of texts, achieved the highest correlation with the NIST measurement, indicating that in some cases the combination of both approaches can improve the ability to quantify quality in MT. In authorship recognition, again the topological features were relevant in some contexts, though for the books and authors analyzed good results were obtained with semantic features as well. Because hybrid approaches encompassing semantic and topological features have not been extensively used, we believe that the methodology proposed here may be useful to enhance text classification considerably, as it combines well-established strategies. (c) 2012 Elsevier B.V. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The realization that statistical physics methods can be applied to analyze written texts represented as complex networks has led to several developments in natural language processing, including automatic summarization and evaluation of machine translation. Most importantly, so far only a few metrics of complex networks have been used and therefore there is ample opportunity to enhance the statistics-based methods as new measures of network topology and dynamics are created. In this paper, we employ for the first time the metrics betweenness, vulnerability and diversity to analyze written texts in Brazilian Portuguese. Using strategies based on diversity metrics, a better performance in automatic summarization is achieved in comparison to previous work employing complex networks. With an optimized method the Rouge score (an automatic evaluation method used in summarization) was 0.5089, which is the best value ever achieved for an extractive summarizer with statistical methods based on complex networks for Brazilian Portuguese. Furthermore, the diversity metric can detect keywords with high precision, which is why we believe it is suitable to produce good summaries. It is also shown that incorporating linguistic knowledge through a syntactic parser does enhance the performance of the automatic summarizers, as expected, but the increase in the Rouge score is only minor. These results reinforce the suitability of complex network methods for improving automatic summarizers in particular, and treating text in general. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The automatic disambiguation of word senses (i.e., the identification of which of the meanings is used in a given context for a word that has multiple meanings) is essential for such applications as machine translation and information retrieval, and represents a key step for developing the so-called Semantic Web. Humans disambiguate words in a straightforward fashion, but this does not apply to computers. In this paper we address the problem of Word Sense Disambiguation (WSD) by treating texts as complex networks, and show that word senses can be distinguished upon characterizing the local structure around ambiguous words. Our goal was not to obtain the best possible disambiguation system, but we nevertheless found that in half of the cases our approach outperforms traditional shallow methods. We show that the hierarchical connectivity and clustering of words are usually the most relevant features for WSD. The results reported here shed light on the relationship between semantic and structural parameters of complex networks. They also indicate that when combined with traditional techniques the complex network approach may be useful to enhance the discrimination of senses in large texts. Copyright (C) EPLA, 2012

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Computer-assisted translation (or computer-aided translation or CAT) is a form of language translation in which a human translator uses computer software in order to facilitate the translation process. Machine translation (MT) is the automated process by which a computerized system produces a translated text or speech from one natural language to another. Both of them are leading and promising technologies in the translation industry; it therefore seems important that translation students and professional translators become familiar with this relatively new types of technology. Whether used together, not only might these two different types of systems reduce translation time, but also lead to a further improvement in the field of translation technologies. The dissertation consists of four chapters. The first one surveys the chronological development of MT and CAT tools, the emergence of pre-editing, post-editing and controlled language and the very last frontiers in this sector. The second one provide a general overview on the four main CAT tools that are used nowadays and tested hereto. The third chapter is dedicated to the experimentations that have been conducted in order to analyze and evaluate the performance of the four integrated systems that are the core subject of this dissertation. Finally, the fourth chapter deals with the issue of terminological equivalence in interlinguistic translation. The purpose of this dissertation is not to provide an objective and definitive solution to the complex issues that arise at any time in the field of translation technologies, this aim being well away from being achieved, but to supply information about the limits and potentiality that are typical of those instruments which are now essential to any professional translator.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This dissertation was conducted within the project Language Toolkit, which has the aim of integrating the worlds of work and university. In particular, it consists of the translation into English of documents commissioned by the Italian company TR Turoni and its primary purpose is to demonstrate that, in the field of translation for companies, the existing translation support tools and software can optimise and facilitate the translation process. The work consists of five chapters. The first introduces the Language Toolkit project, the TR Turoni company and its relationship with the CERMAC export consortium. After outlining the current state of company internationalisation, the importance of professional translators in enhancing the competitiveness of companies that enter new international markets is highlighted. Chapter two provides an overview of the texts to be translated, focusing on the textual function and typology and on the addressees. After that, manual translation and the main software developed specifically for translators are described, with a focus on computer-assisted translation (CAT) and machine translation (MT). The third chapter presents the target texts and the corresponding translations. Chapter four is dedicated to the analysis of the translation process. The first two texts were translated manually, with the support of a purpose-built specialized corpus. The following two documents were translated with the software SDL Trados Studio 2011 and its applications. The last texts were submitted to the Google Translate service and to a process of pre and post-editing. Finally, in chapter five conclusions are drawn about the main limits and potentialities of the different translations techniques. In addition to this, the importance of an integrated use of all available instruments is underlined.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The aim of this dissertation is to provide a translation from English into Italian of a highly specialized scientific article published by the online journal ALTEX. In this text, the authors propose a roadmap for how to overcome the acknowledged scientific gaps for the full replacement of systemic toxicity testing using animals. The main reasons behind this particular choice are my personal interest in specialized translation of scientific texts and in the alternatives to animal testing. Moreover, this translation has been directly requested by the Italian molecular biologist and clinical biochemist Candida Nastrucci. It was not possible to translate the whole article in this project, for this reason, I decided to translate only the introduction, the chapter about skin sensitization, and the conclusion. I intend to use the resources that were created for this project to translate the rest of the article in the near future. In this study, I will show how a translator can translate such a specialized text with the help of a field expert using CAT Tools and a specialized corpus. I will also discuss whether machine translation can prove useful to translate this type of document. This work is divided into six chapters. The first one introduces the main topic of the article and explains my reasons for choosing this text; the second one contains an analysis of the text type, focusing on the differences and similarities between Italian and English conventions. The third chapter provides a description of the resources that were used to translate this text, i.e. the corpus and the CAT Tools. The fourth one contains the actual translation, side-by-side with the original text, while the fifth one provides a general comment on the translation difficulties, an analysis of my translation choices and strategies, and a comment about the relationship between the field expert and the translator. Finally, the last chapter shows whether machine translation and post-editing can be an advantageous strategy to translate this type of document. The project also contains two appendixes. The first one includes 54 complex terminological sheets, while the second one includes 188 simple terminological sheets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This dissertation is part of the Language Toolkit project which is a collaboration between the School of Foreign Languages and Literature, Interpreting and Translation of the University of Bologna, Forlì campus, and the Chamber of Commerce of Forlì-Cesena. This project aims to create an exchange between translation students and companies who want to pursue a process of internationalization. The purpose of this dissertation is demonstrating the benefits that translation systems can bring to businesses. In particular, it consists of the translation into English of documents supplied by the Italian company Technologica S.r.l. and the creation of linguistic resources that can be integrated into computer-assisted translation (CAT) software, in order to optimize the translation process. The latter is claimed to be a priority with respect to the actual translation products (the target texts), since the analysis conducted on the source texts highlighted that the company could streamline and optimize its English language communication thanks to the use of open source CAT tools such as OmegaT. The work consists of five chapters. The first introduces the Language Toolkit project, the company (Technologica S.r.l ) and its products. The second chapter provides some considerations about technical translation, its features and some misconceptions about it. The difference between technical translation and scientific translation is then clarified and an overview is offered of translation aids such as those used for computer-assisted translation, machine translation, termbases and translation memories. The third chapter contains the analysis of the texts commissioned by Technologica S.r.l. and their categorization. The fourth chapter describes the translation process, with particular attention to terminology extraction and the creation of a bilingual glossary based on a specialized corpus. The glossary was integrated into the OmegaT software in order to facilitate the translation process both for the present task and for future applications. The memory deriving from the translation represents a sort of hybrid resource between a translation memory and a glossary. This was found to be the most appropriate format, given the specific nature of the texts to be translated. Finally, in chapter five conclusions are offered about the importance of language training within a company environment, the potentialities of translation aids and the benefits that they would bring to a company wishing to internationalize itself.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

La presente tesi nasce da un tirocinio avanzato svolto presso l’azienda CTI (Communication Trend Italia) di Milano. Gli obiettivi dello stage erano la verifica della possibilità di inserire gli strumenti automatici nel flusso di lavoro dell’azienda e l'individuazione delle tipologie testuali e delle combinazioni linguistiche a cui essi sono applicabili. Il presente elaborato si propone di partire da un’analisi teorica dei vari aspetti legati all’utilizzo della TA, per poi descriverne l’applicazione pratica nei procedimenti che hanno portato alla creazione dei sistemi custom. Il capitolo 1 offre una panoramica teorica sul mondo della machine translation, che porta a delineare la modalità di utilizzo della TA ad oggi più diffusa: quella in cui la traduzione fornita dal sistema viene modificata tramite post-editing oppure il testo di partenza viene ritoccato attraverso il pre-editing per eliminare gli elementi più ostici. Nel capitolo 2, partendo da una panoramica relativa ai principali software di traduzione automatica in uso, si arriva alla descrizione di Microsoft Translator Hub, lo strumento scelto per lo sviluppo dei sistemi custom di CTI. Nel successivo passaggio, l’attenzione si concentra sull’ottenimento di sistemi customizzati. Un ampio approfondimento è dedicato ai metodi per reperire ed utilizzare le risorse. In seguito viene descritto il percorso che ha portato alla creazione e allo sviluppo dei due sistemi Bilanci IT_EN e Atto Costitutivo IT_EN in Microsoft Translator Hub. Infine, nel quarto ed ultimo capitolo gli output che i due sistemi forniscono vengono rivisti per individuarne le caratteristiche e analizzati tramite alcuni tool di valutazione automatica. Grazie alle informazioni raccolte vengono poi formulate alcune previsioni sul futuro uso dei sistemi presso l’azienda CTI.