A Framework of Statistical Machine Translator from English to Malayalam


Autoria(s): Santhosh Kumar, G; Mary, Priya Sebastian; Sheena Kurian, K
Data(s)

19/07/2014

19/07/2014

2010

Resumo

In this paper we describe the methodology and the structural design of a system that translates English into Malayalam using statistical models. A monolingual Malayalam corpus and a bilingual English/Malayalam corpus are the main resource in building this Statistical Machine Translator. Training strategy adopted has been enhanced by PoS tagging which helps to get rid of the insignificant alignments. Moreover, incorporating units like suffix separator and the stop word eliminator has proven to be effective in bringing about better training results. In the decoder, order conversion rules are applied to reduce the structural difference between the language pair. The quality of statistical outcome of the decoder is further improved by applying mending rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics

Proceedings of Fourth International Conference on Information Processing, Bangalore, India

Cochin University of Science and Technology

Identificador

http://dyuthi.cusat.ac.in/purl/4138

Idioma(s)

en

Palavras-Chave #Alignment #English Malayalam Translation #PoS Tagging #Statistical Machine Translation #Suffix Separation
Tipo

Article