7 resultados para Bilingual Corpus
em Cochin University of Science
Resumo:
This paper underlines a methodology for translating text from English into the Dravidian language, Malayalam using statistical models. By using a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase, the machine automatically generates Malayalam translations of English sentences. This paper also discusses a technique to improve the alignment model by incorporating the parts of speech information into the bilingual corpus. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in training. Various handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. The structural difference between the English Malayalam pair is resolved in the decoder by applying the order conversion rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
This paper investigates certain methods of training adopted in the Statistical Machine Translator (SMT) from English to Malayalam. In English Malayalam SMT, the word to word translation is determined by training the parallel corpus. Our primary goal is to improve the alignment model by reducing the number of possible alignments of all sentence pairs present in the bilingual corpus. Incorporating morphological information into the parallel corpus with the help of the parts of speech tagger has brought around better training results with improved accuracy
Resumo:
A methodology for translating text from English into the Dravidian language, Malayalam using statistical models is discussed in this paper. The translator utilizes a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase and generates automatically the Malayalam translation of an unseen English sentence. Various techniques to improve the alignment model by incorporating the morphological inputs into the bilingual corpus are discussed. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in producing better alignments. Difficulties in translation process that arise due to the structural difference between the English Malayalam pair is resolved in the decoding phase by applying the order conversion rules. The handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam sentence using statistical models. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set among the sentence pairs of the source and target language before subjecting them for training. This paper deals with certain techniques which can be adopted for improving the alignment model of SMT. Methods to incorporate the parts of speech information into the bilingual corpus has resulted in eliminating many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Presence of Malayalam words with predictable translations has also contributed in reducing the insignificant alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.
Resumo:
In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam translation using statistical models like translation model, language model and a decoder. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set up among the sentence pairs of the source and target language before subjecting them for training. This paper is deals with the techniques which can be adopted for improving the alignment model of SMT. Incorporating the parts of speech information into the bilingual corpus has eliminated many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics
Resumo:
Dopamine D2 receptors are involved in ethanol self- administration behavior and also suggested to mediate the onset and offset of ethanol drinking. In the present study, we investigated dopamine (DA) content and Dopamine D2 (DA D2) receptors in the hypothalamus and corpus striatum of ethanol treated rats and aldehyde dehydrogenase (ALDH) activity in the liver and plasma of ethanol treated rats and in vitro hepatocyte cultures. Hypothalamic and corpus striatal DA content decreased significantly (P\0.05, P\0.001 respectively) and homovanillic acid/ dopamine (HVA/DA) ratio increased significantly (P\0.001) in ethanol treated rats when compared to control. Scatchard analysis of [3H] YM-09151-2 binding to DA D2 receptors in hypothalamus showed a significant increase (P\0.001) in Bmax without any change in Kd in ethanol treated rats compared to control. The Kd of DA D2 receptors significantly decreased (P\0.05) in the corpus striatum of ethanol treated rats when compared to control. DA D2 receptor affinity in the hypothalamus and corpus striatum of control and ethanol treated rats fitted to a single site model with unity as Hill slope value. The in vitro studies on hepatocyte cultures showed that 10-5 M and 10-7 M DA can reverse the increased ALDH activity in 10% ethanol treated cells to near control level. Sulpiride, an antagonist of DA D2, reversed the effect of dopamine on 10% ethanol induced ALDH activity in hepatocytes. Our results showed a decreased dopamine concentration with enhanced DA D2 receptors in the hypothalamus and corpus striatum of ethanol treated rats. Also, increased ALDH was observed in the plasma and liver of ethanol treated rats and in vitro hepatocyte cultures with 10% ethanol as a compensatory mechanism for increased aldehyde production due to increased dopamine metabolism. A decrease in dopamine concentration in major brain regions is coupled with an increase in ALDH activity in liver and plasma, which contributes to the tendency for alcoholism. Since the administration of 10-5 M and 10-7 M DA can reverse the increased ALDH activity in ethanol treated cells to near control level, this has therapeutic application to correct ethanol addicts from addiction due to allergic reaction observed in aldehyde accumulation.
Resumo:
In this paper we describe the methodology and the structural design of a system that translates English into Malayalam using statistical models. A monolingual Malayalam corpus and a bilingual English/Malayalam corpus are the main resource in building this Statistical Machine Translator. Training strategy adopted has been enhanced by PoS tagging which helps to get rid of the insignificant alignments. Moreover, incorporating units like suffix separator and the stop word eliminator has proven to be effective in bringing about better training results. In the decoder, order conversion rules are applied to reduce the structural difference between the language pair. The quality of statistical outcome of the decoder is further improved by applying mending rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics