130 resultados para Error Probability
Resumo:
A recent trend in spoken dialogue research is the use of reinforcement learning to train dialogue systems in a simulated environment. Past researchers have shown that the types of errors that are simulated can have a significant effect on simulated dialogue performance. Since modern systems typically receive an N-best list of possible user utterances, it is important to be able to simulate a full N-best list of hypotheses. This paper presents a new method for simulating such errors based on logistic regression, as well as a new method for simulating the structure of N-best lists of semantics and their probabilities, based on the Dirichlet distribution. Off-line evaluations show that the new Dirichlet model results in a much closer match to the receiver operating characteristics (ROC) of the live data. Experiments also show that the logistic model gives confusions that are closer to the type of confusions observed in live situations. The hope is that these new error models will be able to improve the resulting performance of trained dialogue systems. © 2012 IEEE.
Resumo:
We report an empirical study of n-gram posterior probability confidence measures for statistical machine translation (SMT). We first describe an efficient and practical algorithm for rapidly computing n-gram posterior probabilities from large translation word lattices. These probabilities are shown to be a good predictor of whether or not the n-gram is found in human reference translations, motivating their use as a confidence measure for SMT. Comprehensive n-gram precision and word coverage measurements are presented for a variety of different language pairs, domains and conditions. We analyze the effect on reference precision of using single or multiple references, and compare the precision of posteriors computed from k-best lists to those computed over the full evidence space of the lattice. We also demonstrate improved confidence by combining multiple lattices in a multi-source translation framework. © 2012 The Author(s).
Resumo:
In natural languages multiple word sequences can represent the same underlying meaning. Only modelling the observed surface word sequence can result in poor context coverage, for example, when using n-gram language models (LM). To handle this issue, this paper presents a novel form of language model, the paraphrastic LM. A phrase level transduction model that is statistically learned from standard text data is used to generate paraphrase variants. LM probabilities are then estimated by maximizing their marginal probability. Significant error rate reductions of 0.5%-0.6% absolute were obtained on a state-ofthe-art conversational telephone speech recognition task using a paraphrastic multi-level LM modelling both word and phrase sequences.
Resumo:
This paper introduces a novel method for the training of a complementary acoustic model with respect to set of given acoustic models. The method is based upon an extension of the Minimum Phone Error (MPE) criterion and aims at producing a model that makes complementary phone errors to those already trained. The technique is therefore called Complementary Phone Error (CPE) training. The method is evaluated using an Arabic large vocabulary continuous speech recognition task. Reductions in word error rate (WER) after combination with a CPE-trained system were obtained with up to 0.7% absolute for a system trained on 172 hours of acoustic data and up to 0.2% absolute for the final system trained on nearly 2000 hours of Arabic data.
Resumo:
A location- and scale-invariant predictor is constructed which exhibits good probability matching for extreme predictions outside the span of data drawn from a variety of (stationary) general distributions. It is constructed via the three-parameter {\mu, \sigma, \xi} Generalized Pareto Distribution (GPD). The predictor is designed to provide matching probability exactly for the GPD in both the extreme heavy-tailed limit and the extreme bounded-tail limit, whilst giving a good approximation to probability matching at all intermediate values of the tail parameter \xi. The predictor is valid even for small sample sizes N, even as small as N = 3. The main purpose of this paper is to present the somewhat lengthy derivations which draw heavily on the theory of hypergeometric functions, particularly the Lauricella functions. Whilst the construction is inspired by the Bayesian approach to the prediction problem, it considers the case of vague prior information about both parameters and model, and all derivations are undertaken using sampling theory.
Resumo:
This paper studies the random-coding exponent of joint source-channel coding for a scheme where source messages are assigned to disjoint subsets (referred to as classes), and codewords are independently generated according to a distribution that depends on the class index of the source message. For discrete memoryless systems, two optimally chosen classes and product distributions are found to be sufficient to attain the sphere-packing exponent in those cases where it is tight. © 2014 IEEE.