961 resultados para Statistical Machine Translation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The present dissertation examines how grammatical aspect and mood are handled by machine translation (MT) systems within the scope of imperative sentences (orders, recommendations) when dealing with the language pair French-Greek (unidirectional, towards Greek). As the grammatical category of aspect is not expressed in the same way in both languages, choosing the correct aspect value when translating a verb from French to Greek can pose problems. We are interested in describing the types of errors that occur and their frequency in a corpus taken from texts pertaining to the security domain and from technical manuals, where imperative sentences are very common. In order to further delimit our research, our sample consists of sentences that comply with the general principles of simplicity and readability provided by several controlled language guidelines and aimed at higher translatability when having MT in mind. In a second phase, this study aims at discovering how modifying some of the control rules would help (or not) the MT systems better decide upon the translation of aspect and mood.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dans ce mémoire, nous examinons certaines propriétés des représentations distribuées de mots et nous proposons une technique pour élargir le vocabulaire des systèmes de traduction automatique neurale. En premier lieu, nous considérons un problème de résolution d'analogies bien connu et examinons l'effet de poids adaptés à la position, le choix de la fonction de combinaison et l'impact de l'apprentissage supervisé. Nous enchaînons en montrant que des représentations distribuées simples basées sur la traduction peuvent atteindre ou dépasser l'état de l'art sur le test de détection de synonymes TOEFL et sur le récent étalon-or SimLex-999. Finalament, motivé par d'impressionnants résultats obtenus avec des représentations distribuées issues de systèmes de traduction neurale à petit vocabulaire (30 000 mots), nous présentons une approche compatible à l'utilisation de cartes graphiques pour augmenter la taille du vocabulaire par plus d'un ordre de magnitude. Bien qu'originalement développée seulement pour obtenir les représentations distribuées, nous montrons que cette technique fonctionne plutôt bien sur des tâches de traduction, en particulier de l'anglais vers le français (WMT'14).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning Disability (LD) is a general term that describes specific kinds of learning problems. It is a neurological condition that affects a child's brain and impairs his ability to carry out one or many specific tasks. The learning disabled children are neither slow nor mentally retarded. This disorder can make it problematic for a child to learn as quickly or in the same way as some child who isn't affected by a learning disability. An affected child can have normal or above average intelligence. They may have difficulty paying attention, with reading or letter recognition, or with mathematics. It does not mean that children who have learning disabilities are less intelligent. In fact, many children who have learning disabilities are more intelligent than an average child. Learning disabilities vary from child to child. One child with LD may not have the same kind of learning problems as another child with LD. There is no cure for learning disabilities and they are life-long. However, children with LD can be high achievers and can be taught ways to get around the learning disability. In this research work, data mining using machine learning techniques are used to analyze the symptoms of LD, establish interrelationships between them and evaluate the relative importance of these symptoms. To increase the diagnostic accuracy of learning disability prediction, a knowledge based tool based on statistical machine learning or data mining techniques, with high accuracy,according to the knowledge obtained from the clinical information, is proposed. The basic idea of the developed knowledge based tool is to increase the accuracy of the learning disability assessment and reduce the time used for the same. Different statistical machine learning techniques in data mining are used in the study. Identifying the important parameters of LD prediction using the data mining techniques, identifying the hidden relationship between the symptoms of LD and estimating the relative significance of each symptoms of LD are also the parts of the objectives of this research work. The developed tool has many advantages compared to the traditional methods of using check lists in determination of learning disabilities. For improving the performance of various classifiers, we developed some preprocessing methods for the LD prediction system. A new system based on fuzzy and rough set models are also developed for LD prediction. Here also the importance of pre-processing is studied. A Graphical User Interface (GUI) is designed for developing an integrated knowledge based tool for prediction of LD as well as its degree. The designed tool stores the details of the children in the student database and retrieves their LD report as and when required. The present study undoubtedly proves the effectiveness of the tool developed based on various machine learning techniques. It also identifies the important parameters of LD and accurately predicts the learning disability in school age children. This thesis makes several major contributions in technical, general and social areas. The results are found very beneficial to the parents, teachers and the institutions. They are able to diagnose the child’s problem at an early stage and can go for the proper treatments/counseling at the correct time so as to avoid the academic and social losses.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the emergence of multiple language support on the Internet, machine translation (MT) technologies are indispensable to the communication between speakers using different languages. Recent research works have started to explore tree-based machine translation systems with syntactical and morphological information. This work aims the development of Syntactic Based Machine Translation from English to Malayalam by adding different case information during translation. The system identifies general rules for various sentence patterns in English. These rules are generated using the Parts Of Speech (POS) tag information of the texts. Word Reordering based on the Syntax Tree is used to improve the translation quality of the system. The system used Bilingual English –Malayalam dictionary for translation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A methodology for translating text from English into the Dravidian language, Malayalam using statistical models is discussed in this paper. The translator utilizes a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase and generates automatically the Malayalam translation of an unseen English sentence. Various techniques to improve the alignment model by incorporating the morphological inputs into the bilingual corpus are discussed. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in producing better alignments. Difficulties in translation process that arise due to the structural difference between the English Malayalam pair is resolved in the decoding phase by applying the order conversion rules. The handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Machine translation has been a particularly difficult problem in the area of Natural Language Processing for over two decades. Early approaches to translation failed since interaction effects of complex phenomena in part made translation appear to be unmanageable. Later approaches to the problem have succeeded (although only bilingually), but are based on many language-specific rules of a context-free nature. This report presents an alternative approach to natural language translation that relies on principle-based descriptions of grammar rather than rule-oriented descriptions. The model that has been constructed is based on abstract principles as developed by Chomsky (1981) and several other researchers working within the "Government and Binding" (GB) framework. Thus, the grammar is viewed as a modular system of principles rather than a large set of ad hoc language-specific rules.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study examines the question of how language teachers in a highly technologyfriendly university environment view machine translation and the implications that this has for the personal learning environments of students. It brings an activity-theory perspective to the question, examining the ways that the introduction of new tools can disrupt the relationship between different elements in an activity system. This perspective opens up for an investigation of the ways that new tools have the potential to fundamentally alter traditional learning activities. In questionnaires and group discussions, respondents showed general agreement that although use of machine translation by students could be considered cheating, students are bound to use it anyway, and suggested that teachers focus on the kinds of skills students would need when using machine translation and design assignments and exams to practice and assess these skills. The results of the empirical study are used to reflect upon questions of what the roles of teachers and students are in a context where many of the skills that a person needs to be able to interact in a foreign language increasingly can be outsourced to laptops and smartphones.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study examines the question of how language teachers in a highly technology-friendly university environment view machine translation and the implications that this has for the personal learning environments of students. It brings an activity-theory perspective to the question, examining the ways that the introduction of new tools can disrupt the relationship between different elements in an activity system. This perspective opens up for an investigation of the ways that new tools have the potential to fundamentally alter traditional learning activities. In questionnaires and group discussions, respondents showed general agreement that although use of machine translation by students could be considered cheating, students are bound to use it anyway, and suggested that teachers focus on the kinds of skills students would need when using machine translation and design assignments and exams to practice and assess these skills. The results of the empirical study are used to reflect upon questions of what the roles of teachers and students are in a context where many of the skills that a person needs to be able to interact in a foreign language increasingly can be outsourced to laptops and smartphones.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Con il presente studio si è inteso analizzare l’impatto dell’utilizzo di una memoria di traduzione (TM) e del post-editing (PE) di un output grezzo sul livello di difficoltà percepita e sul tempo necessario per ottenere un testo finale di alta qualità. L’esperimento ha coinvolto sei studenti, di madrelingua italiana, del corso di Laurea Magistrale in Traduzione Specializzata dell’Università di Bologna (Vicepresidenza di Forlì). I partecipanti sono stati divisi in tre coppie, a ognuna delle quali è stato assegnato un estratto di comunicato stampa in inglese. Per ogni coppia, ad un partecipante è stato chiesto di tradurre il testo in italiano usando la TM all’interno di SDL Trados Studio 2011. All’altro partecipante è stato chiesto di fare il PE completo in italiano dell’output grezzo ottenuto da Google Translate. Nei casi in cui la TM o l’output non contenevano traduzioni (corrette), i partecipanti avrebbero potuto consultare Internet. Ricorrendo ai Think-aloud Protocols (TAPs), è stato chiesto loro di riflettere a voce alta durante lo svolgimento dei compiti. È stato quindi possibile individuare i problemi traduttivi incontrati e i casi in cui la TM e l’output grezzo hanno fornito soluzioni corrette; inoltre, è stato possibile osservare le strategie traduttive impiegate, per poi chiedere ai partecipanti di indicarne la difficoltà attraverso interviste a posteriori. È stato anche misurato il tempo impiegato da ogni partecipante. I dati sulla difficoltà percepita e quelli sul tempo impiegato sono stati messi in relazione con il numero di soluzioni corrette rispettivamente fornito da TM e output grezzo. È stato osservato che usare la TM ha comportato un maggior risparmio di tempo e che, al contrario del PE, ha portato a una riduzione della difficoltà percepita. Il presente studio si propone di aiutare i futuri traduttori professionisti a scegliere strumenti tecnologici che gli permettano di risparmiare tempo e risorse.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes the UPM system for translation task at the EMNLP 2011 workshop on statistical machine translation (http://www.statmt.org/wmt11/), and it has been used for both directions: Spanish-English and English-Spanish. This system is based on Moses with two new modules for pre and post processing the sentences. The main contribution is the method proposed (based on the similarity with the source language test set) for selecting the sentences for training the models and adjusting the weights. With system, we have obtained a 23.2 BLEU for Spanish-English and 21.7 BLEU for EnglishSpanish

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work explores the automatic recognition of physical activity intensity patterns from multi-axial accelerometry and heart rate signals. Data collection was carried out in free-living conditions and in three controlled gymnasium circuits, for a total amount of 179.80 h of data divided into: sedentary situations (65.5%), light-to-moderate activity (17.6%) and vigorous exercise (16.9%). The proposed machine learning algorithms comprise the following steps: time-domain feature definition, standardization and PCA projection, unsupervised clustering (by k-means and GMM) and a HMM to account for long-term temporal trends. Performance was evaluated by 30 runs of a 10-fold cross-validation. Both k-means and GMM-based approaches yielded high overall accuracy (86.97% and 85.03%, respectively) and, given the imbalance of the dataset, meritorious F-measures (up to 77.88%) for non-sedentary cases. Classification errors tended to be concentrated around transients, what constrains their practical impact. Hence, we consider our proposal to be suitable for 24 h-based monitoring of physical activity in ambulatory scenarios and a first step towards intensity-specific energy expenditure estimators

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Semantic Web aims to allow machines to make inferences using the explicit conceptualisations contained in ontologies. By pointing to ontologies, Semantic Web-based applications are able to inter-operate and share common information easily. Nevertheless, multilingual semantic applications are still rare, owing to the fact that most online ontologies are monolingual in English. In order to solve this issue, techniques for ontology localisation and translation are needed. However, traditional machine translation is difficult to apply to ontologies, owing to the fact that ontology labels tend to be quite short in length and linguistically different from the free text paradigm. In this paper, we propose an approach to enhance machine translation of ontologies based on exploiting the well-structured concept descriptions contained in the ontology. In particular, our approach leverages the semantics contained in the ontology by using Cross Lingual Explicit Semantic Analysis (CLESA) for context-based disambiguation in phrase-based Statistical Machine Translation (SMT). The presented work is novel in the sense that application of CLESA in SMT has not been performed earlier to the best of our knowledge.