933 resultados para Malayalam Translation
Resumo:
This paper underlines a methodology for translating text from English into the Dravidian language, Malayalam using statistical models. By using a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase, the machine automatically generates Malayalam translations of English sentences. This paper also discusses a technique to improve the alignment model by incorporating the parts of speech information into the bilingual corpus. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in training. Various handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. The structural difference between the English Malayalam pair is resolved in the decoder by applying the order conversion rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
Statistical Machine Translation (SMT) is one of the potential applications in the field of Natural Language Processing. The translation process in SMT is carried out by acquiring translation rules automatically from the parallel corpora. However, for many language pairs (e.g. Malayalam- English), they are available only in very limited quantities. Therefore, for these language pairs a huge portion of phrases encountered at run-time will be unknown. This paper focuses on methods for handling such out-of-vocabulary (OOV) words in Malayalam that cannot be translated to English using conventional phrase-based statistical machine translation systems. The OOV words in the source sentence are pre-processed to obtain the root word and its suffix. Different inflected forms of the OOV root are generated and a match is looked up for the word variants in the phrase translation table of the translation model. A Vocabulary filter is used to choose the best among the translations of these word variants by finding the unigram count. A match for the OOV suffix is also looked up in the phrase entries and the target translations are filtered out. Structuring of the filtered phrases is done and SMT translation model is extended by adding OOV with its new phrase translations. By the results of the manual evaluation done it is observed that amount of OOV words in the input has been reduced considerably
Resumo:
In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam translation using statistical models like translation model, language model and a decoder. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set up among the sentence pairs of the source and target language before subjecting them for training. This paper is deals with the techniques which can be adopted for improving the alignment model of SMT. Incorporating the parts of speech information into the bilingual corpus has eliminated many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
In this paper we describe the methodology and the structural design of a system that translates English into Malayalam using statistical models. A monolingual Malayalam corpus and a bilingual English/Malayalam corpus are the main resource in building this Statistical Machine Translator. Training strategy adopted has been enhanced by PoS tagging which helps to get rid of the insignificant alignments. Moreover, incorporating units like suffix separator and the stop word eliminator has proven to be effective in bringing about better training results. In the decoder, order conversion rules are applied to reduce the structural difference between the language pair. The quality of statistical outcome of the decoder is further improved by applying mending rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
A methodology for translating text from English into the Dravidian language, Malayalam using statistical models is discussed in this paper. The translator utilizes a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase and generates automatically the Malayalam translation of an unseen English sentence. Various techniques to improve the alignment model by incorporating the morphological inputs into the bilingual corpus are discussed. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in producing better alignments. Difficulties in translation process that arise due to the structural difference between the English Malayalam pair is resolved in the decoding phase by applying the order conversion rules. The handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam sentence using statistical models. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set among the sentence pairs of the source and target language before subjecting them for training. This paper deals with certain techniques which can be adopted for improving the alignment model of SMT. Methods to incorporate the parts of speech information into the bilingual corpus has resulted in eliminating many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Presence of Malayalam words with predictable translations has also contributed in reducing the insignificant alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.
Resumo:
Suffix separation plays a vital role in improving the quality of training in the Statistical Machine Translation from English into Malayalam. The morphological richness and the agglutinative nature of Malayalam make it necessary to retrieve the root word from its inflected form in the training process. The suffix separation process accomplishes this task by scrutinizing the Malayalam words and by applying sandhi rules. In this paper, various handcrafted rules designed for the suffix separation process in the English Malayalam SMT are presented. A classification of these rules is done based on the Malayalam syllable preceding the suffix in the inflected form of the word (check_letter). The suffixes beginning with the vowel sounds like ആല, ഉെെ, ഇല etc are mainly considered in this process. By examining the check_letter in a word, the suffix separation rules can be directly applied to extract the root words. The quick look up table provided in this paper can be used as a guideline in implementing suffix separation in Malayalam language
Resumo:
This thesis summarizes the results on the studies on a syntax based approach for translation between Malayalam, one of Dravidian languages and English and also on the development of the major modules in building a prototype machine translation system from Malayalam to English. The development of the system is a pioneering effort in Malayalam language unattempted by previous researchers. The computational models chosen for the system is first of its kind for Malayalam language. An in depth study has been carried out in the design of the computational models and data structures needed for different modules: morphological analyzer , a parser, a syntactic structure transfer module and target language sentence generator required for the prototype system. The generation of list of part of speech tags, chunk tags and the hierarchical dependencies among the chunks required for the translation process also has been done. In the development process, the major goals are: (a) accuracy of translation (b) speed and (c) space. Accuracy-wise, smart tools for handling transfer grammar and translation standards including equivalent words, expressions, phrases and styles in the target language are to be developed. The grammar should be optimized with a view to obtaining a single correct parse and hence a single translated output. Speed-wise, innovative use of corpus analysis, efficient parsing algorithm, design of efficient Data Structure and run-time frequency-based rearrangement of the grammar which substantially reduces the parsing and generation time are required. The space requirement also has to be minimised
Resumo:
Due to the emergence of multiple language support on the Internet, machine translation (MT) technologies are indispensable to the communication between speakers using different languages. Recent research works have started to explore tree-based machine translation systems with syntactical and morphological information. This work aims the development of Syntactic Based Machine Translation from English to Malayalam by adding different case information during translation. The system identifies general rules for various sentence patterns in English. These rules are generated using the Parts Of Speech (POS) tag information of the texts. Word Reordering based on the Syntax Tree is used to improve the translation quality of the system. The system used Bilingual English –Malayalam dictionary for translation.
Resumo:
This paper describes about an English-Malayalam Cross-Lingual Information Retrieval system. The system retrieves Malayalam documents in response to query given in English or Malayalam. Thus monolingual information retrieval is also supported in this system. Malayalam is one of the most prominent regional languages of Indian subcontinent. It is spoken by more than 37 million people and is the native language of Kerala state in India. Since we neither had any full-fledged online bilingual dictionary nor any parallel corpora to build the statistical lexicon, we used a bilingual dictionary developed in house for translation. Other language specific resources like Malayalam stemmer, Malayalam morphological root analyzer etc developed in house were used in this work
Resumo:
This paper investigates certain methods of training adopted in the Statistical Machine Translator (SMT) from English to Malayalam. In English Malayalam SMT, the word to word translation is determined by training the parallel corpus. Our primary goal is to improve the alignment model by reducing the number of possible alignments of all sentence pairs present in the bilingual corpus. Incorporating morphological information into the parallel corpus with the help of the parts of speech tagger has brought around better training results with improved accuracy
Resumo:
Recent advances in the understanding of the genetic, neurochemical, behavioral and cultural underpinnings of addiction have led to rapid advances in the understanding of addiction as a disease. In fact, advances in basic science and the development of new pharmacological and behavioral therapies associated with them are appearing faster than can be assimilated not only by clinical researchers but practitioners and policy makers as well. Translation of science-based addictions knowledge into improved prevention, assessment and treatment, and communication of these changes to researchers and practitioners are significant challenges to the field. The general aim of this book is to summarize current and potential linkages between advances in addiction science and innovations in clinical practice. Whilst this book is primarily focused on translation, it also encompasses some scientific advances that are relevant to dissemination, and the book is itself a tool for disseminating innovative thinking. The goal is to generate interest in application opportunities from both recent research and theoretical advances.
Resumo:
Arabic satellite television has recently attracted tremendous attention in both the academic and professional worlds, with a special interest in Aljazeera as a curious phenomenon in the Arab region. Having made a household name for itself worldwide with the airing of the Bin Laden tapes, Aljazeera has set out to deliberately change the culture of Arabic journalism, as it has been repeatedly stated by its current General Manager Waddah Khanfar, and to shake up the Arab society by raising awareness to issues never discussed on television before and challenging long-established social and cultural values and norms while promoting, as it claims, Arab issues from a presumably Arab perspective. Working within the meta-frame of democracy, this Qatari-based network station has been received with mixed reactions ranging from complete support to utter rejection in both the west and the Arab world. This research examines the social semiotics of Arabic television and the socio-cultural impact of translation-mediated news in Arabic satellite television, with the aim to carry out a qualitative content analysis, informed by framing theory, critical linguistic analysis, social semiotics and translation theory, within a re-mediation framework which rests on the assumption that a medium “appropriates the techniques, forms and social significance of other media and attempts to rival or refashion them in the name of the real" (Bolter and Grusin, 2000: 66). This is a multilayered research into how translation operates at two different yet interwoven levels: translation proper, that is the rendition of discourse from one language into another at the text level, and translation as a broader process of interpretation of social behaviour that is driven by linguistic and cultural forms of another medium resulting in new social signs generated from source meaning reproduced as target meaning that is bound to be different in many respects. The research primarily focuses on the news media, news making and reporting at Arabic satellite television and looks at translation as a reframing process of news stories in terms of content and cultural values. This notion is based on the premise that by its very nature, news reporting is a framing process, which involves a reconstruction of reality into actualities in presenting the news and providing the context for it. In other words, the mediation of perceived reality through a media form, such as television, actually modifies the mind’s ordering and internal representation of the reality that is presented. The research examines the process of reframing through translation news already framed or actualized in another language and argues that in submitting framed news reports to the translation process several alterations take place, driven by the linguistic and cultural constraints and shaped by the context in which the content is presented. These alterations, which involve recontextualizations, may be intentional or unintentional, motivated or unmotivated. Generally, they are the product of lack of awareness of the dynamics and intricacies of turning a message from one language form into another. More specifically, they are the result of a synthesis process that consciously or subconsciously conforms to editorial policy and cultural interpretive frameworks. In either case, the original message is reproduced and the news is reframed. For the case study, this research examines news broadcasts by the now world-renowned Arabic satellite television station Aljazeera, and to a lesser extent the Lebanese Broadcasting Corporation (LBC) and Al- Arabiya where access is feasible, for comparison and crosschecking purposes. As a new phenomenon in the Arab world, Arabic satellite television, especially 24-hour news and current affairs, provides an interesting area worthy of study, not only for its immediate socio-cultural and professional and ethical implications for the Arabic media in particular, but also for news and current affairs production in the western media that rely on foreign language sources and translation mediation for international stories.
Resumo:
Recent advances in the understanding of the genetic, neurochemical, behavioral and cultural underpinnings of addiction have led to rapid advances in the understanding of addiction as a disease. In fact, advances in basic science and the development of new pharmacological and behavioral therapies associated with them are appearing faster than can be assimilated not only by clinical researchers but practitioners and policy makers as well. Translation of science-based addictions knowledge into improved prevention, assessment and treatment, and communication of these changes to researchers and practitioners are significant challenges to the field. The general aim of this book is to summarize current and potential linkages between advances in addiction science and innovations in clinical practice. Whilst this book is primarily focused on translation, it also encompasses some scientific advances that are relevant to dissemination, and the book is itself a tool for disseminating innovative thinking. The goal is to generate interest in application opportunities from both recent research and theoretical advances.