993 resultados para Translation training


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis summarizes the results on the studies on a syntax based approach for translation between Malayalam, one of Dravidian languages and English and also on the development of the major modules in building a prototype machine translation system from Malayalam to English. The development of the system is a pioneering effort in Malayalam language unattempted by previous researchers. The computational models chosen for the system is first of its kind for Malayalam language. An in depth study has been carried out in the design of the computational models and data structures needed for different modules: morphological analyzer , a parser, a syntactic structure transfer module and target language sentence generator required for the prototype system. The generation of list of part of speech tags, chunk tags and the hierarchical dependencies among the chunks required for the translation process also has been done. In the development process, the major goals are: (a) accuracy of translation (b) speed and (c) space. Accuracy-wise, smart tools for handling transfer grammar and translation standards including equivalent words, expressions, phrases and styles in the target language are to be developed. The grammar should be optimized with a view to obtaining a single correct parse and hence a single translated output. Speed-wise, innovative use of corpus analysis, efficient parsing algorithm, design of efficient Data Structure and run-time frequency-based rearrangement of the grammar which substantially reduces the parsing and generation time are required. The space requirement also has to be minimised

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Due to the emergence of multiple language support on the Internet, machine translation (MT) technologies are indispensable to the communication between speakers using different languages. Recent research works have started to explore tree-based machine translation systems with syntactical and morphological information. This work aims the development of Syntactic Based Machine Translation from English to Malayalam by adding different case information during translation. The system identifies general rules for various sentence patterns in English. These rules are generated using the Parts Of Speech (POS) tag information of the texts. Word Reordering based on the Syntax Tree is used to improve the translation quality of the system. The system used Bilingual English –Malayalam dictionary for translation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we describe the methodology and the structural design of a system that translates English into Malayalam using statistical models. A monolingual Malayalam corpus and a bilingual English/Malayalam corpus are the main resource in building this Statistical Machine Translator. Training strategy adopted has been enhanced by PoS tagging which helps to get rid of the insignificant alignments. Moreover, incorporating units like suffix separator and the stop word eliminator has proven to be effective in bringing about better training results. In the decoder, order conversion rules are applied to reduce the structural difference between the language pair. The quality of statistical outcome of the decoder is further improved by applying mending rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A methodology for translating text from English into the Dravidian language, Malayalam using statistical models is discussed in this paper. The translator utilizes a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase and generates automatically the Malayalam translation of an unseen English sentence. Various techniques to improve the alignment model by incorporating the morphological inputs into the bilingual corpus are discussed. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in producing better alignments. Difficulties in translation process that arise due to the structural difference between the English Malayalam pair is resolved in the decoding phase by applying the order conversion rules. The handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Statistical Machine Translation (SMT) is one of the potential applications in the field of Natural Language Processing. The translation process in SMT is carried out by acquiring translation rules automatically from the parallel corpora. However, for many language pairs (e.g. Malayalam- English), they are available only in very limited quantities. Therefore, for these language pairs a huge portion of phrases encountered at run-time will be unknown. This paper focuses on methods for handling such out-of-vocabulary (OOV) words in Malayalam that cannot be translated to English using conventional phrase-based statistical machine translation systems. The OOV words in the source sentence are pre-processed to obtain the root word and its suffix. Different inflected forms of the OOV root are generated and a match is looked up for the word variants in the phrase translation table of the translation model. A Vocabulary filter is used to choose the best among the translations of these word variants by finding the unigram count. A match for the OOV suffix is also looked up in the phrase entries and the target translations are filtered out. Structuring of the filtered phrases is done and SMT translation model is extended by adding OOV with its new phrase translations. By the results of the manual evaluation done it is observed that amount of OOV words in the input has been reduced considerably

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam sentence using statistical models. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set among the sentence pairs of the source and target language before subjecting them for training. This paper deals with certain techniques which can be adopted for improving the alignment model of SMT. Methods to incorporate the parts of speech information into the bilingual corpus has resulted in eliminating many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Presence of Malayalam words with predictable translations has also contributed in reducing the insignificant alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Suffix separation plays a vital role in improving the quality of training in the Statistical Machine Translation from English into Malayalam. The morphological richness and the agglutinative nature of Malayalam make it necessary to retrieve the root word from its inflected form in the training process. The suffix separation process accomplishes this task by scrutinizing the Malayalam words and by applying sandhi rules. In this paper, various handcrafted rules designed for the suffix separation process in the English Malayalam SMT are presented. A classification of these rules is done based on the Malayalam syllable preceding the suffix in the inflected form of the word (check_letter). The suffixes beginning with the vowel sounds like ആല, ഉെെ, ഇല etc are mainly considered in this process. By examining the check_letter in a word, the suffix separation rules can be directly applied to extract the root words. The quick look up table provided in this paper can be used as a guideline in implementing suffix separation in Malayalam language

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Control of protein synthesis is a key step in the regulation of gene expression during apoptosis and the heat shock response. Under such conditions, cap-dependent translation is impaired and Internal Ribosome Entry Site (IRES)-dependent translation plays a major role in mammalian cells. Although the role of IRES-dependent translation during apoptosis has been mainly studied in mammals, its role in the translation of Drosophila apoptotic genes has not been yet studied. The observation that the Drosophila mutant embryos for the cap-binding protein, the eukaryotic initiation factor eIF4E, exhibits increased apoptosis in correlation with up-regulated proapoptotic gene reaper (rpr) transcription constitutes the first evidence for the existence of a cap-independent mechanism for the translation of Drosophila proapoptotic genes. The mechanism of translation of rpr and other proapoptotic genes was investigated in this work. We found that the 5 UTR of rpr mRNA drives translation in an IRES-dependent manner. It promotes the translation of reporter RNAs in vitro either in the absence of cap, in the presence of cap competitors, or in extracts derived from heat shocked and eIF4E mutant embryos and in vivo in cells transfected with reporters bearing a non functional cap structure, indicating that cap recognition is not required in rpr mRNA for translation. We also show that rpr mRNA 5 UTR exhibits a high degree of similarity with that of Drosophila heat shock protein 70 mRNA (hsp70), an antagonist of apoptosis, and that both are able to conduct IRES-mediated translation. The proapoptotic genes head involution defective (hid) and grim, but not sickle, also display IRES activity. Studies of mRNA association to polysomes in embryos indicate that both rpr, hsp70, hid and grim endogenous mRNAs are recruited to polysomes in embryos in which apoptosis or thermal stress was induced. We conclude that hsp70 and, on the other hand, rpr, hid and grim which are antagonizing factors during apoptosis, use a similar mechanism for protein synthesis. The outcome for the cell would thus depend on which protein is translated under a given stress condition. Factors involved in the differential translation driven by these IRES could play an important role. For this purpose, we undertook the identification of the ribonucleoprotein (RNP) complexes assembled onto the 5 UTR of rpr mRNA. We established a tobramycin-affinity-selection protocol that allows the purification of specific RNP that can be further analyzed by mass spectrometry. Several RNA binding proteins were identified as part of the rpr 5 UTR RNP complex, some of which have been related to IRES activity. The involvement of one of them, the La antigen, in the translation of rpr mRNA, was established by RNA-crosslinking experiments using recombinant protein and rpr 5 UTR and by the analysis of the translation efficiency of reporter mRNAs in Drosophila cells after knock down of the endogenous La by RNAi experiments. Several uncharacterized proteins were also identified, suggesting that they might play a role during translation, during the assembly of the translational machinery or in the priming of the mRNA before ribosome recognition. Our data provide evidence for the involvement of La antigen in the translation of rpr mRNA and set a protocol for purification of tagged-RNA-protein complexes from cytoplasmic extracts. To further understand the mechanisms of translation initiation in Drosophila, we analyzed the role of eIF4B on cap-dependent and cap-independent translation. We showed that eIF4B is mostly involved in cap-, but not IRES-dependent translation as it happens in mammals.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Krishin Vigyan Kendras-KVKs (Farm Science Centres) have been established by the Indian Council of Agricultural Research in 569 districts. The trust areas of KVKs are refinement and demonstration of technologies, and training of farmers and extension functionaries. Imparting vocational trainings in agriculture and allied fields for the rural youth is one of its mandates. The study was undertaken to do a formative and summative (outcome and impact) evaluation of the beekeeping and mushroom growing vocational training programmes in the Indian state of Punjab. One-group pre and post evaluation design was employed for conducting a formative and outcome evaluation. The knowledge tests were administered to 35 beekeeping and 25 mushroom cultivation trainees, before and after the training programmes organized in 2004. The trainees significantly gained in knowledge. A separate sample of 640 trainees, trained prior to 2004, was selected for finding the adoption status. Out of 640, a sample of 200 was selected by proportionate sampling technique out of three categories, namely: non-adopters, discontinued-adopters and continued-adopters for evaluating the long-term impact of these training programmes. Ex-post-facto one-shot case study design was applied for this impact analysis. The vocational training programmes have resulted in continued-adoption of beekeeping and mushroom cultivation enterprises by 20% and 51% trained farmers, respectively. Age and trainee occupation had significant influence on the adoption decision of beekeeping vocation, whereas education and family income significantly affected the adoption decision of mushroom cultivation. The continued adopters of beekeeping and mushroom growing had increased their family income by 49% and 24%, respectively. These training programmes are augmenting the dwindling farm income of the farmers in Indian Punjab.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Lateinische Schriftsteller im Original zu lesen, fällt vielen Schülerinnen und Schülern in der Lektürephase des Lateinunterrichts schwer. In der vorliegenden Dissertation wird untersucht, inwiefern gezielter Einsatz von Lernstrategien das Textverständnis verbessern kann. Strategisches Arbeiten mit Texten kann bereits zu einem sehr frühen Zeitpunkt in schriftbasierten Kulturen nachgewiesen werden. In dieser Arbeit werden Quellentexte aus der griechisch-römischen Antike und dem Mittelalter hinsichtlich texterschließender Strategien untersucht, systematisiert, kommentiert und im modernen Lateinunterricht eingesetzt. Dabei arbeiten die Schülerinnen und Schüler selbstgesteuert und forschend-entdeckend mit Reproduktionen antiker Papyri und Pergamente. Im Laufe des Unterrichtsprojektes, das ich CLAVIS, lat. für „Schlüssel“, nenne, lernen die Schüler im Zusammenhang mit Fachinhalten des Lateinunterrichts sechs antike Strategien der Texterschließung kennen. Diese Strategien werden heute noch genauso verwendet wie vor 2000 Jahren. Unter Berücksichtigung der Erkenntnisse der modernen Lernstrategieforschung wurden die Strategien ausgewählt, die als besonders effektive Maßnahmen zur Förderung von Textverständnis beurteilt werden, nämlich CONIUGATIO: Vorwissen aktivieren, LEGERE: mehrfaches und möglichst lautes Lesen, ACCIPERE: Hilfen annehmen, VERTERE: Übersetzen mit System, INTERROGARE: Fragen zulassen, SUMMA: Zusammenfassung erstellen. Ziel von CLAVIS ist es, Schülern ein Werkzeug zur systematischen Texterschließung an die Hand zu geben, das leicht zu merken ist und flexibel auf Texte jeder Art und jeder Sprache angewendet werden kann. Um die Effektivität des Unterrichtsprojektes CLAVIS zu überprüfen, wurde mit zwei parallel geführten 10. Klassen am Johann-Schöner-Gymnasium in Karlstadt im Schuljahr 2009/10 eine Vortest-Nachtest-Studie durchgeführt. Eine der Klassen wurde als Experimentalgruppe mit Intervention in Form von CLAVIS unterrichtet, die andere Klasse, die die Kontrollgruppe bildete, erhielt kein strategisches Training. Ein Fragebogen lieferte Informationen zur Vorgehensweise der Schüler bei der Textbearbeitung in Vortest und Nachtest (jeweils eine Übersetzung eines lateinischen Textes in identischem Schwierigkeitsgrad). Die Auswertung der Daten zeigte deutlich, dass Textverständnis und Übersetzungsfähigkeit sich bei denjenigen Schülern verbesserten, die die CLAVIS-Strategien im Nachtest angewendet hatten. Im Zusammenhang mit der Neugestaltung der Lehrpläne auf dem Hintergrund der Kompetenzorientierung ergeben sich für das Fach Latein neue Chancen, nicht nur inhaltlich wertvolle Zeugnisse der Antike zur allgemeinen, zweckfreien Persönlichkeitsbildung von Schülerinnen und Schülern einzusetzen, sondern gezielt Strategien zu vermitteln, die im Hinblick auf die in einer Informationsgesellschaft unverzichtbare Sprach- und Textkompetenz einen konkreten Nutzen haben.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In dieser Arbeit ist die zentrale Frage, warum dicistronische mRNAs, eine für Eukaryoten untypische Organisation, existieren und wie die Translation des zweiten offenen Leserasters initiiert wird. In sieben von neun anfänglich ausgewählten Genkassetten werden tatsächlich nur dicistronische und keine monocistronischen Transkripte gebildet. Im Laufe der Evolution scheint diese Organisation nicht immer erhalten zu bleiben - es finden sich Hinweise für einen operonartigen Aufbau. Nach Transformation mit einem dicistronischen Reporterkonstrukt und in in vitro Translations-Assays weisen die beiden Genkassetten CG31311 und CG33009 eine interne ribosomale Eintrittstelle (IRES) auf, welche die Translation des zweiten Cistrons einleiten kann. Diese beiden IRESs lassen sich in einen Bereich von unter 100 nt eingrenzen. Die Funktionalität der beiden nachgewiesenen IRESs konnte in vivo in der männlichen Keimbahn von Drosophila bestätigt werden, nachdem das Vorhandensein von kryptischen Promotoren in diesen Bereichen ausgeschlossen wurde. Die anderen fünf Genkassetten hingegen zeigen keine IRES-Aktivität und nutzen wahrscheinlich alternative Methoden wie das leaky scanning oder ribosomal shunting zur Translation des zweiten Cistrons. In weiterführenden Analysen wurden sehr komplexe Expressionsmuster beobachtet, die nicht offensichtlich mit der beschriebenen mRNA Organisation in Einklang zu bringen sind. Bei der Genkassette CG33009 zum Beispiel wird das erste Protein während der gesamten Spermatogenese in den Keimzellen synthetisiert, wohingegen das zweite IRES-abhängig translatierte Protein in den die Keimzellen umschließenden Cystenzellen und zusätzlich in den elongierten Spermatiden auftritt. Diese zusätzliche Expression könnte auf Transportprozessen oder Neusynthese beruhen. Die Cystenzell-spezische Expression eines Fusionskonstruktes führte jedoch nicht zum Nachweis des Fusionsproteins in den Keimzellen. Somit ist eine durch die IRES-vermittelte Neusynthese in den elongierten Spermatiden wahrscheinlicher. Ein Verlust dieses IRES-abhängig translatierten Proteins in den Cystenzellen bringt die Spermatogenese zum Erliegen und belegt somit dessen essentielle Funktion. Bei der Genkassette CG31311 kommt es auch zu einer bemerkenswerten Auffälligkeit in der Expression. Während im Hodengewebe große Mengen an Transkript vorhanden sind, die aber nicht zu nachweisbaren Mengen an Protein führen, lässt sich in den Ommatidien ein differenziertes Expressionsmuster für beide Proteine dokumentieren, obwohl die Transkriptmenge hier unterhalb der Nachweisgrenze liegt. Diese Beobachtung suggeriert eine drastische Kontrolle auf Translationsebene, die für das Hodengewebe zum Beispiel in einer Verzögerung der Translation bis nach der Befruchtung bestehen könnte (paternale mRNA). Erste Ansätze zeigen die Interaktion der IRES von CG33009 mit RNA-bindenden Proteinen, potentiellen ITAFs (IRES trans-acting factors), deren Bindung sequenzspezisch erfolgt. In weiteren Experimenten wäre zu testen, ob die hier identifizierten IRESs mit den gleichen oder mit unterschiedlichen Proteinen interagieren.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: The most common application of imputation is to infer genotypes of a high-density panel of markers on animals that are genotyped for a low-density panel. However, the increase in accuracy of genomic predictions resulting from an increase in the number of markers tends to reach a plateau beyond a certain density. Another application of imputation is to increase the size of the training set with un-genotyped animals. This strategy can be particularly successful when a set of closely related individuals are genotyped. ----- Methods: Imputation on completely un-genotyped dams was performed using known genotypes from the sire of each dam, one offspring and the offspring’s sire. Two methods were applied based on either allele or haplotype frequencies to infer genotypes at ambiguous loci. Results of these methods and of two available software packages were compared. Quality of imputation under different population structures was assessed. The impact of using imputed dams to enlarge training sets on the accuracy of genomic predictions was evaluated for different populations, heritabilities and sizes of training sets. ----- Results: Imputation accuracy ranged from 0.52 to 0.93 depending on the population structure and the method used. The method that used allele frequencies performed better than the method based on haplotype frequencies. Accuracy of imputation was higher for populations with higher levels of linkage disequilibrium and with larger proportions of markers with more extreme allele frequencies. Inclusion of imputed dams in the training set increased the accuracy of genomic predictions. Gains in accuracy ranged from close to zero to 37.14%, depending on the simulated scenario. Generally, the larger the accuracy already obtained with the genotyped training set, the lower the increase in accuracy achieved by adding imputed dams. ----- Conclusions: Whenever a reference population resembling the family configuration considered here is available, imputation can be used to achieve an extra increase in accuracy of genomic predictions by enlarging the training set with completely un-genotyped dams. This strategy was shown to be particularly useful for populations with lower levels of linkage disequilibrium, for genomic selection on traits with low heritability, and for species or breeds for which the size of the reference population is limited.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Die vorliegende Dissertation untersucht das Leseverhalten thailändischer Deutschlernender mit dem Ziel, ihre Fähigkeit zum kritischen Lesen unter Anwendung des MURDER-Schemas im fremdsprachlichen Deutschunterricht zu fördern. Neben der Lesefertigkeit soll aufgrund der bestehenden Zusammenhänge zusätzlich das kritische Denken der Lernenden gefördert werden.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Machine translation has been a particularly difficult problem in the area of Natural Language Processing for over two decades. Early approaches to translation failed since interaction effects of complex phenomena in part made translation appear to be unmanageable. Later approaches to the problem have succeeded (although only bilingually), but are based on many language-specific rules of a context-free nature. This report presents an alternative approach to natural language translation that relies on principle-based descriptions of grammar rather than rule-oriented descriptions. The model that has been constructed is based on abstract principles as developed by Chomsky (1981) and several other researchers working within the "Government and Binding" (GB) framework. Thus, the grammar is viewed as a modular system of principles rather than a large set of ad hoc language-specific rules.