Biblioteca Digital

3 resultados para Mario Bellatin

em Cambridge University Engineering Department Publications Database

Linguistic knowledge in statistical phrase-based word alignment

Relevância:

10.00% 10.00%

Publicador:

Veja mais

Improving statistical machine translation by classifying and generalizing inflected verb forms

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper introduces a rule-based classification of single-word and compound verbs into a statistical machine translation approach. By substituting verb forms by the lemma of their head verb, the data sparseness problem caused by highly-inflected languages can be successfully addressed. On the other hand, the information of seen verb forms can be used to generate new translations for unseen verb forms. Translation results for an English to Spanish task are reported, producing a significant performance improvement.

Veja mais

An ngram-based statistical machine translation decoder

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we describe MARIE, an Ngram-based statistical machine translation decoder. It is implemented using a beam search strategy, with distortion (or reordering) capabilities. The underlying translation model is based on an Ngram approach, extended to introduce reordering at the phrase level. The search graph structure is designed to perform very accurate comparisons, what allows for a high level of pruning, improving the decoder efficiency. We report several techniques for efficiently prune out the search space. The combinatory explosion of the search space derived from the search graph structure is reduced by limiting the number of reorderings a given translation is allowed to perform, and also the maximum distance a word (or a phrase) is allowed to be reordered. We finally report translation accuracy results on three different translation tasks.

Veja mais

3 resultados para Mario Bellatin

em Cambridge University Engineering Department Publications Database

Filtro por publicador

Linguistic knowledge in statistical phrase-based word alignment

Improving statistical machine translation by classifying and generalizing inflected verb forms

An ngram-based statistical machine translation decoder