868 resultados para traduzione automatica, machine translation, post-editing, pre-editing, workflow, LSP, TA, MT
Resumo:
This paper introduces a rule-based classification of single-word and compound verbs into a statistical machine translation approach. By substituting verb forms by the lemma of their head verb, the data sparseness problem caused by highly-inflected languages can be successfully addressed. On the other hand, the information of seen verb forms can be used to generate new translations for unseen verb forms. Translation results for an English to Spanish task are reported, producing a significant performance improvement.
Resumo:
In this paper we describe MARIE, an Ngram-based statistical machine translation decoder. It is implemented using a beam search strategy, with distortion (or reordering) capabilities. The underlying translation model is based on an Ngram approach, extended to introduce reordering at the phrase level. The search graph structure is designed to perform very accurate comparisons, what allows for a high level of pruning, improving the decoder efficiency. We report several techniques for efficiently prune out the search space. The combinatory explosion of the search space derived from the search graph structure is reduced by limiting the number of reorderings a given translation is allowed to perform, and also the maximum distance a word (or a phrase) is allowed to be reordered. We finally report translation accuracy results on three different translation tasks.
Resumo:
In this paper a method to incorporate linguistic information regarding single-word and compound verbs is proposed, as a first step towards an SMT model based on linguistically-classified phrases. By substituting these verb structures by the base form of the head verb, we achieve a better statistical word alignment performance, and are able to better estimate the translation model and generalize to unseen verb forms during translation. Preliminary experiments for the English - Spanish language pair are performed, and future research lines are detailed. © 2005 Association for Computational Linguistics.
Resumo:
We report an empirical study of n-gram posterior probability confidence measures for statistical machine translation (SMT). We first describe an efficient and practical algorithm for rapidly computing n-gram posterior probabilities from large translation word lattices. These probabilities are shown to be a good predictor of whether or not the n-gram is found in human reference translations, motivating their use as a confidence measure for SMT. Comprehensive n-gram precision and word coverage measurements are presented for a variety of different language pairs, domains and conditions. We analyze the effect on reference precision of using single or multiple references, and compare the precision of posteriors computed from k-best lists to those computed over the full evidence space of the lattice. We also demonstrate improved confidence by combining multiple lattices in a multi-source translation framework. © 2012 The Author(s).
Resumo:
Jones, E. H. G., Thomas, Ned, and King, Alan, 'Machine Translation and the Internet', Mercator Media Forums (2001) 5(1) pp.84-98 RAE2008
Resumo:
This paper examines changes in religious geographies for Ireland from 1834 to 1911. It shows that in a period of dramatic social and economic change religious geographies remained remarkably stable. In this it challenges the accepted historiography. It makes use of new data in new ways with the full exploitation of the 1834 Enumeration of Religion and, in so doing, is able to examine the impact of the Great Irish Famine on geographies of religion. These data are visualised both using traditional choropleth maps and, more innovatively in this subject area, cartograms.
Resumo:
The present dissertation examines how grammatical aspect and mood are handled by machine translation (MT) systems within the scope of imperative sentences (orders, recommendations) when dealing with the language pair French-Greek (unidirectional, towards Greek). As the grammatical category of aspect is not expressed in the same way in both languages, choosing the correct aspect value when translating a verb from French to Greek can pose problems. We are interested in describing the types of errors that occur and their frequency in a corpus taken from texts pertaining to the security domain and from technical manuals, where imperative sentences are very common. In order to further delimit our research, our sample consists of sentences that comply with the general principles of simplicity and readability provided by several controlled language guidelines and aimed at higher translatability when having MT in mind. In a second phase, this study aims at discovering how modifying some of the control rules would help (or not) the MT systems better decide upon the translation of aspect and mood.
Resumo:
Dans ce mémoire, nous examinons certaines propriétés des représentations distribuées de mots et nous proposons une technique pour élargir le vocabulaire des systèmes de traduction automatique neurale. En premier lieu, nous considérons un problème de résolution d'analogies bien connu et examinons l'effet de poids adaptés à la position, le choix de la fonction de combinaison et l'impact de l'apprentissage supervisé. Nous enchaînons en montrant que des représentations distribuées simples basées sur la traduction peuvent atteindre ou dépasser l'état de l'art sur le test de détection de synonymes TOEFL et sur le récent étalon-or SimLex-999. Finalament, motivé par d'impressionnants résultats obtenus avec des représentations distribuées issues de systèmes de traduction neurale à petit vocabulaire (30 000 mots), nous présentons une approche compatible à l'utilisation de cartes graphiques pour augmenter la taille du vocabulaire par plus d'un ordre de magnitude. Bien qu'originalement développée seulement pour obtenir les représentations distribuées, nous montrons que cette technique fonctionne plutôt bien sur des tâches de traduction, en particulier de l'anglais vers le français (WMT'14).
Resumo:
Due to the emergence of multiple language support on the Internet, machine translation (MT) technologies are indispensable to the communication between speakers using different languages. Recent research works have started to explore tree-based machine translation systems with syntactical and morphological information. This work aims the development of Syntactic Based Machine Translation from English to Malayalam by adding different case information during translation. The system identifies general rules for various sentence patterns in English. These rules are generated using the Parts Of Speech (POS) tag information of the texts. Word Reordering based on the Syntax Tree is used to improve the translation quality of the system. The system used Bilingual English –Malayalam dictionary for translation.
Resumo:
In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam translation using statistical models like translation model, language model and a decoder. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set up among the sentence pairs of the source and target language before subjecting them for training. This paper is deals with the techniques which can be adopted for improving the alignment model of SMT. Incorporating the parts of speech information into the bilingual corpus has eliminated many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
Machine translation has been a particularly difficult problem in the area of Natural Language Processing for over two decades. Early approaches to translation failed since interaction effects of complex phenomena in part made translation appear to be unmanageable. Later approaches to the problem have succeeded (although only bilingually), but are based on many language-specific rules of a context-free nature. This report presents an alternative approach to natural language translation that relies on principle-based descriptions of grammar rather than rule-oriented descriptions. The model that has been constructed is based on abstract principles as developed by Chomsky (1981) and several other researchers working within the "Government and Binding" (GB) framework. Thus, the grammar is viewed as a modular system of principles rather than a large set of ad hoc language-specific rules.
Resumo:
[EU]Lan honetan semantika distribuzionalaren eta ikasketa automatikoaren erabilera aztertzen dugu itzulpen automatiko estatistikoa hobetzeko. Bide horretan, erregresio logistikoan oinarritutako ikasketa automatikoko eredu bat proposatzen dugu hitz-segiden itzulpen- probabilitatea modu dinamikoan modelatzeko. Proposatutako eredua itzulpen automatiko estatistikoko ohiko itzulpen-probabilitateen orokortze bat dela frogatzen dugu, eta testuinguruko nahiz semantika distribuzionaleko informazioa barneratzeko baliatu ezaugarri lexiko, hitz-cluster eta hitzen errepresentazio bektorialen bidez. Horretaz gain, semantika distribuzionaleko ezagutza itzulpen automatiko estatistikoan txertatzeko beste hurbilpen bat lantzen dugu: hitzen errepresentazio bektorial elebidunak erabiltzea hitz-segiden itzulpenen antzekotasuna modelatzeko. Gure esperimentuek proposatutako ereduen baliagarritasuna erakusten dute, emaitza itxaropentsuak eskuratuz oinarrizko sistema sendo baten gainean. Era berean, gure lanak ekarpen garrantzitsuak egiten ditu errepresentazio bektorialen mapaketa elebidunei eta hitzen errepresentazio bektorialetan oinarritutako hitz-segiden antzekotasun neurriei dagokienean, itzulpen automatikoaz haratago balio propio bat dutenak semantika distribuzionalaren arloan.