48 resultados para Word and image


Relevância:

80.00% 80.00%

Publicador:

Resumo:

State-of-the-art large vocabulary continuous speech recognition (LVCSR) systems often combine outputs from multiple sub-systems that may even be developed at different sites. Cross system adaptation, in which model adaptation is performed using the outputs from another sub-system, can be used as an alternative to hypothesis level combination schemes such as ROVER. Normally cross adaptation is only performed on the acoustic models. However, there are many other levels in LVCSR systems' modelling hierarchy where complimentary features may be exploited, for example, the sub-word and the word level, to further improve cross adaptation based system combination. It is thus interesting to also cross adapt language models (LMs) to capture these additional useful features. In this paper cross adaptation is applied to three forms of language models, a multi-level LM that models both syllable and word sequences, a word level neural network LM, and the linear combination of the two. Significant error rate reductions of 4.0-7.1% relative were obtained over ROVER and acoustic model only cross adaptation when combining a range of Chinese LVCSR sub-systems used in the 2010 and 2011 DARPA GALE evaluations. © 2012 Elsevier Ltd. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In natural languages multiple word sequences can represent the same underlying meaning. Only modelling the observed surface word sequence can result in poor context coverage, for example, when using n-gram language models (LM). To handle this issue, this paper presents a novel form of language model, the paraphrastic LM. A phrase level transduction model that is statistically learned from standard text data is used to generate paraphrase variants. LM probabilities are then estimated by maximizing their marginal probability. Significant error rate reductions of 0.5%-0.6% absolute were obtained on a state-ofthe-art conversational telephone speech recognition task using a paraphrastic multi-level LM modelling both word and phrase sequences.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

State-of-the-art speech recognisers are usually based on hidden Markov models (HMMs). They model a hidden symbol sequence with a Markov process, with the observations independent given that sequence. These assumptions yield efficient algorithms, but limit the power of the model. An alternative model that allows a wide range of features, including word- and phone-level features, is a log-linear model. To handle, for example, word-level variable-length features, the original feature vectors must be segmented into words. Thus, decoding must find the optimal combination of segmentation of the utterance into words and word sequence. Features must therefore be extracted for each possible segment of audio. For many types of features, this becomes slow. In this paper, long-span features are derived from the likelihoods of word HMMs. Derivatives of the log-likelihoods, which break the Markov assumption, are appended. Previously, decoding with this model took cubic time in the length of the sequence, and longer for higher-order derivatives. This paper shows how to decode in quadratic time. © 2013 IEEE.