6 resultados para Persian language--Grammar, Comparative--Arabic
em Cambridge University Engineering Department Publications Database
Resumo:
A system of computer assisted grammar construction (CAGC) is presented in this paper. The CAGC system is designed to generate broad-coverage grammars for large natural language corpora by utilizing both an extended inside-outside algorithm and an automatic phrase bracketing (AUTO) system which is designed to provide the extended algorithm with constituent information during learning. This paper demonstrates the capability of the CAGC system to deal with realistic natural language problems and the usefulness of the AUTO system for constraining the inside-outside based grammar re-estimation. Performance results, including coverage, recall and precision, are presented for a grammar constructed for the Wall Street Journal (WSJ) corpus using the Penn Treebank.
Resumo:
In recent years, the use of morphological decomposition strategies for Arabic Automatic Speech Recognition (ASR) has become increasingly popular. Systems trained on morphologically decomposed data are often used in combination with standard word-based approaches, and they have been found to yield consistent performance improvements. The present article contributes to this ongoing research endeavour by exploring the use of the 'Morphological Analysis and Disambiguation for Arabic' (MADA) tools for this purpose. System integration issues concerning language modelling and dictionary construction, as well as the estimation of pronunciation probabilities, are discussed. In particular, a novel solution for morpheme-to-word conversion is presented which makes use of an N-gram Statistical Machine Translation (SMT) approach. System performance is investigated within a multi-pass adaptation/combination framework. All the systems described in this paper are evaluated on an Arabic large vocabulary speech recognition task which includes both Broadcast News and Broadcast Conversation test data. It is shown that the use of MADA-based systems, in combination with word-based systems, can reduce the Word Error Rates by up to 8.1 relative. © 2012 Elsevier Ltd. All rights reserved.
Resumo:
This paper extends n-gram graphone model pronunciation generation to use a mixture of such models. This technique is useful when pronunciation data is for a specific variant (or set of variants) of a language, such as for a dialect, and only a small amount of pronunciation dictionary training data for that specific variant is available. The performance of the interpolated n-gram graphone model is evaluated on Arabic phonetic pronunciation generation for words that can't be handled by the Buckwalter Morphological Analyser. The pronunciations produced are also used to train an Arabic broadcast audio speech recognition system. In both cases the interpolated graphone model leads to improved performance. Copyright © 2011 ISCA.