80 resultados para Structured documents
Resumo:
Recently there has been interest in structured discriminative models for speech recognition. In these models sentence posteriors are directly modelled, given a set of features extracted from the observation sequence, and hypothesised word sequence. In previous work these discriminative models have been combined with features derived from generative models for noise-robust speech recognition for continuous digits. This paper extends this work to medium to large vocabulary tasks. The form of the score-space extracted using the generative models, and parameter tying of the discriminative model, are both discussed. Update formulae for both conditional maximum likelihood and minimum Bayes' risk training are described. Experimental results are presented on small and medium to large vocabulary noise-corrupted speech recognition tasks: AURORA 2 and 4. © 2011 IEEE.
Resumo:
Structured precision modelling is an important approach to improve the intra-frame correlation modelling of the standard HMM, where Gaussian mixture model with diagonal covariance are used. Previous work has all been focused on direct structured representation of the precision matrices. In this paper, a new framework is proposed, where the structure of the Cholesky square root of the precision matrix is investigated, referred to as Cholesky Basis Superposition (CBS). Each Cholesky matrix associated with a particular Gaussian distribution is represented as a linear combination of a set of Gaussian independent basis upper-triangular matrices. Efficient optimization methods are derived for both combination weights and basis matrices. Experiments on a Chinese dictation task showed that the proposed approach can significantly outperformed the direct structured precision modelling with similar number of parameters as well as full covariance modelling. © 2011 IEEE.
Resumo:
We present a novel, implementation friendly and occlusion aware semi-supervised video segmentation algorithm using tree structured graphical models, which delivers pixel labels alongwith their uncertainty estimates. Our motivation to employ supervision is to tackle a task-specific segmentation problem where the semantic objects are pre-defined by the user. The video model we propose for this problem is based on a tree structured approximation of a patch based undirected mixture model, which includes a novel time-series and a soft label Random Forest classifier participating in a feedback mechanism. We demonstrate the efficacy of our model in cutting out foreground objects and multi-class segmentation problems in lengthy and complex road scene sequences. Our results have wide applicability, including harvesting labelled video data for training discriminative models, shape/pose/articulation learning and large scale statistical analysis to develop priors for video segmentation. © 2011 IEEE.
The structured development of simulation-based learning tools with an example for the Taguchi method
Resumo:
This paper describes a structured SVM framework suitable for noise-robust medium/large vocabulary speech recognition. Several theoretical and practical extensions to previous work on small vocabulary tasks are detailed. The joint feature space based on word models is extended to allow context-dependent triphone models to be used. By interpreting the structured SVM as a large margin log-linear model, illustrates that there is an implicit assumption that the prior of the discriminative parameter is a zero mean Gaussian. However, depending on the definition of likelihood feature space, a non-zero prior may be more appropriate. A general Gaussian prior is incorporated into the large margin training criterion in a form that allows the cutting plan algorithm to be directly applied. To further speed up the training process, 1-slack algorithm, caching competing hypothesis and parallelization strategies are also proposed. The performance of structured SVMs is evaluated on noise corrupted medium vocabulary speech recognition task: AURORA 4. © 2011 IEEE.