989 resultados para Machine translation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dans ce mémoire, nous examinons certaines propriétés des représentations distribuées de mots et nous proposons une technique pour élargir le vocabulaire des systèmes de traduction automatique neurale. En premier lieu, nous considérons un problème de résolution d'analogies bien connu et examinons l'effet de poids adaptés à la position, le choix de la fonction de combinaison et l'impact de l'apprentissage supervisé. Nous enchaînons en montrant que des représentations distribuées simples basées sur la traduction peuvent atteindre ou dépasser l'état de l'art sur le test de détection de synonymes TOEFL et sur le récent étalon-or SimLex-999. Finalament, motivé par d'impressionnants résultats obtenus avec des représentations distribuées issues de systèmes de traduction neurale à petit vocabulaire (30 000 mots), nous présentons une approche compatible à l'utilisation de cartes graphiques pour augmenter la taille du vocabulaire par plus d'un ordre de magnitude. Bien qu'originalement développée seulement pour obtenir les représentations distribuées, nous montrons que cette technique fonctionne plutôt bien sur des tâches de traduction, en particulier de l'anglais vers le français (WMT'14).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the emergence of multiple language support on the Internet, machine translation (MT) technologies are indispensable to the communication between speakers using different languages. Recent research works have started to explore tree-based machine translation systems with syntactical and morphological information. This work aims the development of Syntactic Based Machine Translation from English to Malayalam by adding different case information during translation. The system identifies general rules for various sentence patterns in English. These rules are generated using the Parts Of Speech (POS) tag information of the texts. Word Reordering based on the Syntax Tree is used to improve the translation quality of the system. The system used Bilingual English –Malayalam dictionary for translation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Statistical Machine Translation (SMT) is one of the potential applications in the field of Natural Language Processing. The translation process in SMT is carried out by acquiring translation rules automatically from the parallel corpora. However, for many language pairs (e.g. Malayalam- English), they are available only in very limited quantities. Therefore, for these language pairs a huge portion of phrases encountered at run-time will be unknown. This paper focuses on methods for handling such out-of-vocabulary (OOV) words in Malayalam that cannot be translated to English using conventional phrase-based statistical machine translation systems. The OOV words in the source sentence are pre-processed to obtain the root word and its suffix. Different inflected forms of the OOV root are generated and a match is looked up for the word variants in the phrase translation table of the translation model. A Vocabulary filter is used to choose the best among the translations of these word variants by finding the unigram count. A match for the OOV suffix is also looked up in the phrase entries and the target translations are filtered out. Structuring of the filtered phrases is done and SMT translation model is extended by adding OOV with its new phrase translations. By the results of the manual evaluation done it is observed that amount of OOV words in the input has been reduced considerably

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam translation using statistical models like translation model, language model and a decoder. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set up among the sentence pairs of the source and target language before subjecting them for training. This paper is deals with the techniques which can be adopted for improving the alignment model of SMT. Incorporating the parts of speech information into the bilingual corpus has eliminated many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Machine translation has been a particularly difficult problem in the area of Natural Language Processing for over two decades. Early approaches to translation failed since interaction effects of complex phenomena in part made translation appear to be unmanageable. Later approaches to the problem have succeeded (although only bilingually), but are based on many language-specific rules of a context-free nature. This report presents an alternative approach to natural language translation that relies on principle-based descriptions of grammar rather than rule-oriented descriptions. The model that has been constructed is based on abstract principles as developed by Chomsky (1981) and several other researchers working within the "Government and Binding" (GB) framework. Thus, the grammar is viewed as a modular system of principles rather than a large set of ad hoc language-specific rules.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

[EU]Lan honetan semantika distribuzionalaren eta ikasketa automatikoaren erabilera aztertzen dugu itzulpen automatiko estatistikoa hobetzeko. Bide horretan, erregresio logistikoan oinarritutako ikasketa automatikoko eredu bat proposatzen dugu hitz-segiden itzulpen- probabilitatea modu dinamikoan modelatzeko. Proposatutako eredua itzulpen automatiko estatistikoko ohiko itzulpen-probabilitateen orokortze bat dela frogatzen dugu, eta testuinguruko nahiz semantika distribuzionaleko informazioa barneratzeko baliatu ezaugarri lexiko, hitz-cluster eta hitzen errepresentazio bektorialen bidez. Horretaz gain, semantika distribuzionaleko ezagutza itzulpen automatiko estatistikoan txertatzeko beste hurbilpen bat lantzen dugu: hitzen errepresentazio bektorial elebidunak erabiltzea hitz-segiden itzulpenen antzekotasuna modelatzeko. Gure esperimentuek proposatutako ereduen baliagarritasuna erakusten dute, emaitza itxaropentsuak eskuratuz oinarrizko sistema sendo baten gainean. Era berean, gure lanak ekarpen garrantzitsuak egiten ditu errepresentazio bektorialen mapaketa elebidunei eta hitzen errepresentazio bektorialetan oinarritutako hitz-segiden antzekotasun neurriei dagokienean, itzulpen automatikoaz haratago balio propio bat dutenak semantika distribuzionalaren arloan.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Establishing metrics to assess machine translation (MT) systems automatically is now crucial owing to the widespread use of MT over the web. In this study we show that such evaluation can be done by modeling text as complex networks. Specifically, we extend our previous work by employing additional metrics of complex networks, whose results were used as input for machine learning methods and allowed MT texts of distinct qualities to be distinguished. Also shown is that the node-to-node mapping between source and target texts (English-Portuguese and Spanish-Portuguese pairs) can be improved by adding further hierarchical levels for the metrics out-degree, in-degree, hierarchical common degree, cluster coefficient, inter-ring degree, intra-ring degree and convergence ratio. The results presented here amount to a proof-of-principle that the possible capturing of a wider context with the hierarchical levels may be combined with machine learning methods to yield an approach for assessing the quality of MT systems. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study examines the question of how language teachers in a highly technologyfriendly university environment view machine translation and the implications that this has for the personal learning environments of students. It brings an activity-theory perspective to the question, examining the ways that the introduction of new tools can disrupt the relationship between different elements in an activity system. This perspective opens up for an investigation of the ways that new tools have the potential to fundamentally alter traditional learning activities. In questionnaires and group discussions, respondents showed general agreement that although use of machine translation by students could be considered cheating, students are bound to use it anyway, and suggested that teachers focus on the kinds of skills students would need when using machine translation and design assignments and exams to practice and assess these skills. The results of the empirical study are used to reflect upon questions of what the roles of teachers and students are in a context where many of the skills that a person needs to be able to interact in a foreign language increasingly can be outsourced to laptops and smartphones.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study examines the question of how language teachers in a highly technology-friendly university environment view machine translation and the implications that this has for the personal learning environments of students. It brings an activity-theory perspective to the question, examining the ways that the introduction of new tools can disrupt the relationship between different elements in an activity system. This perspective opens up for an investigation of the ways that new tools have the potential to fundamentally alter traditional learning activities. In questionnaires and group discussions, respondents showed general agreement that although use of machine translation by students could be considered cheating, students are bound to use it anyway, and suggested that teachers focus on the kinds of skills students would need when using machine translation and design assignments and exams to practice and assess these skills. The results of the empirical study are used to reflect upon questions of what the roles of teachers and students are in a context where many of the skills that a person needs to be able to interact in a foreign language increasingly can be outsourced to laptops and smartphones.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we proposed a Data Translation model which potentially is a major promising web service of the next generation world wide web. This technique is somehow analogy to the technique of traditional machine translation but it is far beyond what we understand about machine translation in the past and nowadays in terms of the scope and the contents. To illustrate the new concept of web services based data translation, a multilingual machine translation electronic dictionary system and its web services based model including generic services, multilingual translation services are presented. This proposed data translation model aims at achieving better web services in easiness, convenience, efficiency, and higher accuracy, scalability, self-learning, self-adapting.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This chapter addresses the exploitation of a supervised machine learning technique to automatically induce Arabic-to-English transfer rules from chunks of parallel aligned linguistic resources. The induced structural transfer rules encode the linguistic translation knowledge for converting an Arabic syntactic structure into a target English syntactic structure. These rules are going to be an integral part of an Arabic-English transfer-based machine translation. Nevertheless, a novel morphological rule induction method is employed for learning Arabic morphological rules that are applied in our Arabic morphological analyzer. To demonstrate the capability of the automated rule induction technique, we conducted rule-based translation experiments that use induced rules from a relatively small data set. The translation quality of the hybrid translation experiments achieved good results in terms of WER.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Con il presente studio si è inteso analizzare l’impatto dell’utilizzo di una memoria di traduzione (TM) e del post-editing (PE) di un output grezzo sul livello di difficoltà percepita e sul tempo necessario per ottenere un testo finale di alta qualità. L’esperimento ha coinvolto sei studenti, di madrelingua italiana, del corso di Laurea Magistrale in Traduzione Specializzata dell’Università di Bologna (Vicepresidenza di Forlì). I partecipanti sono stati divisi in tre coppie, a ognuna delle quali è stato assegnato un estratto di comunicato stampa in inglese. Per ogni coppia, ad un partecipante è stato chiesto di tradurre il testo in italiano usando la TM all’interno di SDL Trados Studio 2011. All’altro partecipante è stato chiesto di fare il PE completo in italiano dell’output grezzo ottenuto da Google Translate. Nei casi in cui la TM o l’output non contenevano traduzioni (corrette), i partecipanti avrebbero potuto consultare Internet. Ricorrendo ai Think-aloud Protocols (TAPs), è stato chiesto loro di riflettere a voce alta durante lo svolgimento dei compiti. È stato quindi possibile individuare i problemi traduttivi incontrati e i casi in cui la TM e l’output grezzo hanno fornito soluzioni corrette; inoltre, è stato possibile osservare le strategie traduttive impiegate, per poi chiedere ai partecipanti di indicarne la difficoltà attraverso interviste a posteriori. È stato anche misurato il tempo impiegato da ogni partecipante. I dati sulla difficoltà percepita e quelli sul tempo impiegato sono stati messi in relazione con il numero di soluzioni corrette rispettivamente fornito da TM e output grezzo. È stato osservato che usare la TM ha comportato un maggior risparmio di tempo e che, al contrario del PE, ha portato a una riduzione della difficoltà percepita. Il presente studio si propone di aiutare i futuri traduttori professionisti a scegliere strumenti tecnologici che gli permettano di risparmiare tempo e risorse.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes methods and results for the annotation of two discourse-level phenomena, connectives and pronouns, over a multilingual parallel corpus. Excerpts from Europarl in English and French have been annotated with disambiguation information for connectives and pronouns, for about 3600 tokens. This data is then used in several ways: for cross-linguistic studies, for training automatic disambiguation software, and ultimately for training and testing discourse-aware statistical machine translation systems. The paper presents the annotation procedures and their results in detail, and overviews the first systems trained on the annotated resources and their use for machine translation.