847 resultados para Machine Translation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Word Sense Disambiguation, the process of identifying the meaning of a word in a sentence when the word has multiple meanings, is a critical problem of machine translation. It is generally very difficult to select the correct meaning of a word in a sentence, especially when the syntactical difference between the source and target language is big, e.g., English-Korean machine translation. To achieve a high level of accuracy of noun sense selection in machine translation, we introduced a statistical method based on co-occurrence relation of words in sentences and applied it to the English-Korean machine translator RyongNamSan. ACM Computing Classification System (1998): I.2.7.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The volume of Audiovisual Translation (AVT) is increasing to meet the rising demand for data that needs to be accessible around the world. Machine Translation (MT) is one of the most innovative technologies to be deployed in the field of translation, but it is still too early to predict how it can support the creativity and productivity of professional translators in the future. Currently, MT is more widely used in (non-AV) text translation than in AVT. In this article, we discuss MT technology and demonstrate why its use in AVT scenarios is particularly challenging. We also present some potentially useful methods and tools for measuring MT quality that have been developed primarily for text translation. The ultimate objective is to bridge the gap between the tech-savvy AVT community, on the one hand, and researchers and developers in the field of high-quality MT, on the other.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of this essay, which focuses on patent translation, is to compare the use of Computer-Assisted Translation (CAT) and Machine Translation (MT). During my curricular internship at a specialized-translation agency called Centro Traduzioni Imolese, I was able to practice patent translation thanks to CAT tools like SDL Trados Studio, something I have never studied at university in Forlì. Nowadays, however, Machine Translation is widely used in patent translation as well, due to the vast number of technical terms and their repetitiveness in patents, so the machine can translate automatically and rapidly all repeated terms with the same word, thanks to the use of corpora and translation memories linked to the patent field. In the first chapter I will give a definition of what a patent is, and I will introduce the concept of patent literature; afterwards, I will illustrate the differences between Computer-Assisted Translation and Machine Translation used in patent translation. In the second chapter I will translate two portions of patent 102019000018530, via the Matecat online application, translating the first part with CAT and the second part with MT, then doing the same for the second portion selected from the patent. Finally, in the third chapter, I will analyse the two translations, comparing the results in order to discover which is the more efficient method for translating patents.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Artificial Intelligence (AI) is gaining ever more ground in every sphere of human life, to the point that it is now even used to pass sentences in courts. The use of AI in the field of Law is however deemed quite controversial, as it could provide more objectivity yet entail an abuse of power as well, given that bias in algorithms behind AI may cause lack of accuracy. As a product of AI, machine translation is being increasingly used in the field of Law too in order to translate laws, judgements, contracts, etc. between different languages and different legal systems. In the legal setting of Company Law, accuracy of the content and suitability of terminology play a crucial role within a translation task, as any addition or omission of content or mistranslation of terms could entail legal consequences for companies. The purpose of the present study is to first assess which neural machine translation system between DeepL and ModernMT produces a more suitable translation from Italian into German of the atto costitutivo of an Italian s.r.l. in terms of accuracy of the content and correctness of terminology, and then to assess which translation proves to be closer to a human reference translation. In order to achieve the above-mentioned aims, two human and automatic evaluations are carried out based on the MQM taxonomy and the BLEU metric. Results of both evaluations show an overall better performance delivered by ModernMT in terms of content accuracy, suitability of terminology, and closeness to a human translation. As emerged from the MQM-based evaluation, its accuracy and terminology errors account for just 8.43% (as opposed to DeepL’s 9.22%), while it obtains an overall BLEU score of 29.14 (against DeepL’s 27.02). The overall performances however show that machines still face barriers in overcoming semantic complexity, tackling polysemy, and choosing domain-specific terminology, which suggests that the discrepancy with human translation may still be remarkable.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This thesis summarizes the results on the studies on a syntax based approach for translation between Malayalam, one of Dravidian languages and English and also on the development of the major modules in building a prototype machine translation system from Malayalam to English. The development of the system is a pioneering effort in Malayalam language unattempted by previous researchers. The computational models chosen for the system is first of its kind for Malayalam language. An in depth study has been carried out in the design of the computational models and data structures needed for different modules: morphological analyzer , a parser, a syntactic structure transfer module and target language sentence generator required for the prototype system. The generation of list of part of speech tags, chunk tags and the hierarchical dependencies among the chunks required for the translation process also has been done. In the development process, the major goals are: (a) accuracy of translation (b) speed and (c) space. Accuracy-wise, smart tools for handling transfer grammar and translation standards including equivalent words, expressions, phrases and styles in the target language are to be developed. The grammar should be optimized with a view to obtaining a single correct parse and hence a single translated output. Speed-wise, innovative use of corpus analysis, efficient parsing algorithm, design of efficient Data Structure and run-time frequency-based rearrangement of the grammar which substantially reduces the parsing and generation time are required. The space requirement also has to be minimised

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper we describe the methodology and the structural design of a system that translates English into Malayalam using statistical models. A monolingual Malayalam corpus and a bilingual English/Malayalam corpus are the main resource in building this Statistical Machine Translator. Training strategy adopted has been enhanced by PoS tagging which helps to get rid of the insignificant alignments. Moreover, incorporating units like suffix separator and the stop word eliminator has proven to be effective in bringing about better training results. In the decoder, order conversion rules are applied to reduce the structural difference between the language pair. The quality of statistical outcome of the decoder is further improved by applying mending rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper underlines a methodology for translating text from English into the Dravidian language, Malayalam using statistical models. By using a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase, the machine automatically generates Malayalam translations of English sentences. This paper also discusses a technique to improve the alignment model by incorporating the parts of speech information into the bilingual corpus. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in training. Various handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. The structural difference between the English Malayalam pair is resolved in the decoder by applying the order conversion rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam sentence using statistical models. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set among the sentence pairs of the source and target language before subjecting them for training. This paper deals with certain techniques which can be adopted for improving the alignment model of SMT. Methods to incorporate the parts of speech information into the bilingual corpus has resulted in eliminating many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Presence of Malayalam words with predictable translations has also contributed in reducing the insignificant alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Complex networks have been increasingly used in text analysis, including in connection with natural language processing tools, as important text features appear to be captured by the topology and dynamics of the networks. Following previous works that apply complex networks concepts to text quality measurement, summary evaluation, and author characterization, we now focus on machine translation (MT). In this paper we assess the possible representation of texts as complex networks to evaluate cross-linguistic issues inherent in manual and machine translation. We show that different quality translations generated by NIT tools can be distinguished from their manual counterparts by means of metrics such as in-(ID) and out-degrees (OD), clustering coefficient (CC), and shortest paths (SP). For instance, we demonstrate that the average OD in networks of automatic translations consistently exceeds the values obtained for manual ones, and that the CC values of source texts are not preserved for manual translations, but are for good automatic translations. This probably reflects the text rearrangements humans perform during manual translation. We envisage that such findings could lead to better NIT tools and automatic evaluation metrics.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Following the internationalization of contemporary higher education, academic institutions based in non-English speaking countries are increasingly urged to produce contents in English to address international prospective students and personnel, as well as to increase their attractiveness. The demand for English translations in the institutional academic domain is consequently increasing at a rate exceeding the capacity of the translation profession. Resources for assisting non-native authors and translators in the production of appropriate texts in L2 are therefore required in order to help academic institutions and professionals streamline their translation workload. Some of these resources include: (i) parallel corpora to train machine translation systems and multilingual authoring tools; and (ii) translation memories for computer-aided tools. The purpose of this study is to create and evaluate reference resources like the ones mentioned in (i) and (ii) through the automatic sentence alignment of a large set of Italian and English as a Lingua Franca (ELF) institutional academic texts given as equivalent but not necessarily parallel (i.e. translated). In this framework, a set of aligning algorithms and alignment tools is examined in order to identify the most profitable one(s) in terms of accuracy and time- and cost-effectiveness. In order to determine the text pairs to align, a sample is selected according to document length similarity (characters) and subsequently evaluated in terms of extent of noisiness/parallelism, alignment accuracy and content leverageability. The results of these analyses serve as the basis for the creation of an aligned bilingual corpus of academic course descriptions, which is eventually used to create a translation memory in TMX format.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper describes the UPM system for translation task at the EMNLP 2011 workshop on statistical machine translation (http://www.statmt.org/wmt11/), and it has been used for both directions: Spanish-English and English-Spanish. This system is based on Moses with two new modules for pre and post processing the sentences. The main contribution is the method proposed (based on the similarity with the source language test set) for selecting the sentences for training the models and adjusting the weights. With system, we have obtained a 23.2 BLEU for Spanish-English and 21.7 BLEU for EnglishSpanish

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Ontologies and taxonomies are widely used to organize concepts providing the basis for activities such as indexing, and as background knowledge for NLP tasks. As such, translation of these resources would prove useful to adapt these systems to new languages. However, we show that the nature of these resources is significantly different from the "free-text" paradigm used to train most statistical machine translation systems. In particular, we see significant differences in the linguistic nature of these resources and such resources have rich additional semantics. We demonstrate that as a result of these linguistic differences, standard SMT methods, in particular evaluation metrics, can produce poor performance. We then look to the task of leveraging these semantics for translation, which we approach in three ways: by adapting the translation system to the domain of the resource; by examining if semantics can help to predict the syntactic structure used in translation; and by evaluating if we can use existing translated taxonomies to disambiguate translations. We present some early results from these experiments, which shed light on the degree of success we may have with each approach

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Semantic Web aims to allow machines to make inferences using the explicit conceptualisations contained in ontologies. By pointing to ontologies, Semantic Web-based applications are able to inter-operate and share common information easily. Nevertheless, multilingual semantic applications are still rare, owing to the fact that most online ontologies are monolingual in English. In order to solve this issue, techniques for ontology localisation and translation are needed. However, traditional machine translation is difficult to apply to ontologies, owing to the fact that ontology labels tend to be quite short in length and linguistically different from the free text paradigm. In this paper, we propose an approach to enhance machine translation of ontologies based on exploiting the well-structured concept descriptions contained in the ontology. In particular, our approach leverages the semantics contained in the ontology by using Cross Lingual Explicit Semantic Analysis (CLESA) for context-based disambiguation in phrase-based Statistical Machine Translation (SMT). The presented work is novel in the sense that application of CLESA in SMT has not been performed earlier to the best of our knowledge.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper describes the design, development and field evaluation of a machine translation system from Spanish to Spanish Sign Language (LSE: Lengua de Signos Española). The developed system focuses on helping Deaf people when they want to renew their Driver’s License. The system is made up of a speech recognizer (for decoding the spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs belonging to the sign language), and a 3D avatar animation module (for playing back the signs). For the natural language translator, three technological approaches have been implemented and evaluated: an example-based strategy, a rule-based translation method and a statistical translator. For the final version, the implemented language translator combines all the alternatives into a hierarchical structure. This paper includes a detailed description of the field evaluation. This evaluation was carried out in the Local Traffic Office in Toledo involving real government employees and Deaf people. The evaluation includes objective measurements from the system and subjective information from questionnaires. The paper details the main problems found and a discussion on how to solve them (some of them specific for LSE).

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The language barrier prevents Latino students from experiencing academic success, and prevents Latino parents from participating in their children's education. Through a review of journal articles, research projects, doctoral dissertations, legislation, and books, this project studies the benefits and dangers of various methods of translating and interpreting in the education system, including issues created by language barriers in schools, common methods of translating and interpreting, and legislation addressing language barriers and education. The project reveals that schools use various methods to translate and interpret, including relying on children, school staff and machine translation, although such methods are often problematic and inaccurate. The project also reveals that professional translation and interpretation are superior to the various non-professional methods.