63 resultados para extinction probability
Resumo:
We report an empirical study of n-gram posterior probability confidence measures for statistical machine translation (SMT). We first describe an efficient and practical algorithm for rapidly computing n-gram posterior probabilities from large translation word lattices. These probabilities are shown to be a good predictor of whether or not the n-gram is found in human reference translations, motivating their use as a confidence measure for SMT. Comprehensive n-gram precision and word coverage measurements are presented for a variety of different language pairs, domains and conditions. We analyze the effect on reference precision of using single or multiple references, and compare the precision of posteriors computed from k-best lists to those computed over the full evidence space of the lattice. We also demonstrate improved confidence by combining multiple lattices in a multi-source translation framework. © 2012 The Author(s).
Resumo:
A location- and scale-invariant predictor is constructed which exhibits good probability matching for extreme predictions outside the span of data drawn from a variety of (stationary) general distributions. It is constructed via the three-parameter {\mu, \sigma, \xi} Generalized Pareto Distribution (GPD). The predictor is designed to provide matching probability exactly for the GPD in both the extreme heavy-tailed limit and the extreme bounded-tail limit, whilst giving a good approximation to probability matching at all intermediate values of the tail parameter \xi. The predictor is valid even for small sample sizes N, even as small as N = 3. The main purpose of this paper is to present the somewhat lengthy derivations which draw heavily on the theory of hypergeometric functions, particularly the Lauricella functions. Whilst the construction is inspired by the Bayesian approach to the prediction problem, it considers the case of vague prior information about both parameters and model, and all derivations are undertaken using sampling theory.