28 resultados para seminar-based training
Resumo:
The use of hidden Markov models is placed in a connectionist framework, and an alternative approach to improving their ability to discriminate between classes is described. Using a network style of training, a measure of discrimination based on the a posteriori probability of state occupation is proposed, and the theory for its optimization using error back-propagation and gradient ascent is presented. The method is shown to be numerically well behaved, and results are presented which demonstrate that when using a simple threshold test on the probability of state occupation, the proposed optimization scheme leads to improved recognition performance.
Resumo:
This Chapter presents a vision-based system for touch-free interaction with a display at a distance. A single camera is fixed on top of the screen and is pointing towards the user. An attention mechanism allows the user to start the interaction and control a screen pointer by moving their hand in a fist pose directed at the camera. On-screen items can be chosen by a selection mechanism. Current sample applications include browsing video collections as well as viewing a gallery of 3D objects, which the user can rotate with their hand motion. We have included an up-to-date review of hand tracking methods, and comment on the merits and shortcomings of previous approaches. The proposed tracker uses multiple cues, appearance, color, and motion, for robustness. As the space of possible observation models is generally too large for exhaustive online search, we select models that are suitable for the particular tracking task at hand. During a training stage, various off-the-shelf trackers are evaluated. From this data differentmethods of fusing them online are investigated, including parallel and cascaded tracker evaluation. For the case of fist tracking, combining a small number of observers in a cascade results in an efficient algorithm that is used in our gesture interface. The system has been on public display at conferences where over a hundred users have engaged with it. © 2010 Springer-Verlag Berlin Heidelberg.
Resumo:
Model based compensation schemes are a powerful approach for noise robust speech recognition. Recently there have been a number of investigations into adaptive training, and estimating the noise models used for model adaptation. This paper examines the use of EM-based schemes for both canonical models and noise estimation, including discriminative adaptive training. One issue that arises when estimating the noise model is a mismatch between the noise estimation approximation and final model compensation scheme. This paper proposes FA-style compensation where this mismatch is eliminated, though at the expense of a sensitivity to the initial noise estimates. EM-based discriminative adaptive training is evaluated on in-car and Aurora4 tasks. FA-style compensation is then evaluated in an incremental mode on the in-car task. © 2011 IEEE.
Resumo:
Hidden Markov model (HMM)-based speech synthesis systems possess several advantages over concatenative synthesis systems. One such advantage is the relative ease with which HMM-based systems are adapted to speakers not present in the training dataset. Speaker adaptation methods used in the field of HMM-based automatic speech recognition (ASR) are adopted for this task. In the case of unsupervised speaker adaptation, previous work has used a supplementary set of acoustic models to estimate the transcription of the adaptation data. This paper first presents an approach to the unsupervised speaker adaptation task for HMM-based speech synthesis models which avoids the need for such supplementary acoustic models. This is achieved by defining a mapping between HMM-based synthesis models and ASR-style models, via a two-pass decision tree construction process. Second, it is shown that this mapping also enables unsupervised adaptation of HMM-based speech synthesis models without the need to perform linguistic analysis of the estimated transcription of the adaptation data. Third, this paper demonstrates how this technique lends itself to the task of unsupervised cross-lingual adaptation of HMM-based speech synthesis models, and explains the advantages of such an approach. Finally, listener evaluations reveal that the proposed unsupervised adaptation methods deliver performance approaching that of supervised adaptation.
Resumo:
An increasingly common scenario in building speech synthesis and recognition systems is training on inhomogeneous data. This paper proposes a new framework for estimating hidden Markov models on data containing both multiple speakers and multiple languages. The proposed framework, speaker and language factorization, attempts to factorize speaker-/language-specific characteristics in the data and then model them using separate transforms. Language-specific factors in the data are represented by transforms based on cluster mean interpolation with cluster-dependent decision trees. Acoustic variations caused by speaker characteristics are handled by transforms based on constrained maximum-likelihood linear regression. Experimental results on statistical parametric speech synthesis show that the proposed framework enables data from multiple speakers in different languages to be used to: train a synthesis system; synthesize speech in a language using speaker characteristics estimated in a different language; and adapt to a new language. © 2012 IEEE.
Resumo:
The optimization of dialogue policies using reinforcement learning (RL) is now an accepted part of the state of the art in spoken dialogue systems (SDS). Yet, it is still the case that the commonly used training algorithms for SDS require a large number of dialogues and hence most systems still rely on artificial data generated by a user simulator. Optimization is therefore performed off-line before releasing the system to real users. Gaussian Processes (GP) for RL have recently been applied to dialogue systems. One advantage of GP is that they compute an explicit measure of uncertainty in the value function estimates computed during learning. In this paper, a class of novel learning strategies is described which use uncertainty to control exploration on-line. Comparisons between several exploration schemes show that significant improvements to learning speed can be obtained and that rapid and safe online optimisation is possible, even on a complex task. Copyright © 2011 ISCA.
Resumo:
A recent trend in spoken dialogue research is the use of reinforcement learning to train dialogue systems in a simulated environment. Past researchers have shown that the types of errors that are simulated can have a significant effect on simulated dialogue performance. Since modern systems typically receive an N-best list of possible user utterances, it is important to be able to simulate a full N-best list of hypotheses. This paper presents a new method for simulating such errors based on logistic regression, as well as a new method for simulating the structure of N-best lists of semantics and their probabilities, based on the Dirichlet distribution. Off-line evaluations show that the new Dirichlet model results in a much closer match to the receiver operating characteristics (ROC) of the live data. Experiments also show that the logistic model gives confusions that are closer to the type of confusions observed in live situations. The hope is that these new error models will be able to improve the resulting performance of trained dialogue systems. © 2012 IEEE.
Resumo:
Statistical approaches for building non-rigid deformable models, such as the Active Appearance Model (AAM), have enjoyed great popularity in recent years, but typically require tedious manual annotation of training images. In this paper, a learning based approach for the automatic annotation of visually deformable objects from a single annotated frontal image is presented and demonstrated on the example of automatically annotating face images that can be used for building AAMs for fitting and tracking. This approach employs the idea of initially learning the correspondences between landmarks in a frontal image and a set of training images with a face in arbitrary poses. Using this learner, virtual images of unseen faces at any arbitrary pose for which the learner was trained can be reconstructed by predicting the new landmark locations and warping the texture from the frontal image. View-based AAMs are then built from the virtual images and used for automatically annotating unseen images, including images of different facial expressions, at any random pose within the maximum range spanned by the virtually reconstructed images. The approach is experimentally validated by automatically annotating face images from three different databases. © 2009 IEEE.
Resumo:
This paper introduces a novel method for the training of a complementary acoustic model with respect to set of given acoustic models. The method is based upon an extension of the Minimum Phone Error (MPE) criterion and aims at producing a model that makes complementary phone errors to those already trained. The technique is therefore called Complementary Phone Error (CPE) training. The method is evaluated using an Arabic large vocabulary continuous speech recognition task. Reductions in word error rate (WER) after combination with a CPE-trained system were obtained with up to 0.7% absolute for a system trained on 172 hours of acoustic data and up to 0.2% absolute for the final system trained on nearly 2000 hours of Arabic data.
Resumo:
Confronted with high variety and low volume market demands, many companies, especially the Japanese electronics manufacturing companies, have reconfigured their conveyor assembly lines and adopted seru production systems. Seru production system is a new type of work-cell-based manufacturing system. A lot of successful practices and experience show that seru production system can gain considerable flexibility of job shop and high efficiency of conveyor assembly line. In implementing seru production, the multi-skilled worker is the most important precondition, and some issues about multi-skilled workers are central and foremost. In this paper, we investigate the training and assignment problem of workers when a conveyor assembly line is entirely reconfigured into several serus. We formulate a mathematical model with double objectives which aim to minimize the total training cost and to balance the total processing times among multi-skilled workers in each seru. To obtain the satisfied task-to-worker training plan and worker-to-seru assignment plan, a three-stage heuristic algorithm with nine steps is developed to solve this mathematical model. Then, several computational cases are taken and computed by MATLAB programming. The computation and analysis results validate the performances of the proposed mathematical model and heuristic algorithm. © 2013 Springer-Verlag London.
Resumo:
An accurate description of atomic interactions, such as that provided by first principles quantum mechanics, is fundamental to realistic prediction of the properties that govern plasticity, fracture or crack propagation in metals. However, the computational complexity associated with modern schemes explicitly based on quantum mechanics limits their applications to systems of a few hundreds of atoms at most. This thesis investigates the application of the Gaussian Approximation Potential (GAP) scheme to atomistic modelling of tungsten - a bcc transition metal which exhibits a brittle-to-ductile transition and whose plasticity behaviour is controlled by the properties of $\frac{1}{2} \langle 111 \rangle$ screw dislocations. We apply Gaussian process regression to interpolate the quantum-mechanical (QM) potential energy surface from a set of points in atomic configuration space. Our training data is based on QM information that is computed directly using density functional theory (DFT). To perform the fitting, we represent atomic environments using a set of rotationally, permutationally and reflection invariant parameters which act as the independent variables in our equations of non-parametric, non-linear regression. We develop a protocol for generating GAP models capable of describing lattice defects in metals by building a series of interatomic potentials for tungsten. We then demonstrate that a GAP potential based on a Smooth Overlap of Atomic Positions (SOAP) covariance function provides a description of the $\frac{1}{2} \langle 111 \rangle$ screw dislocation that is in agreement with the DFT model. We use this potential to simulate the mobility of $\frac{1}{2} \langle 111 \rangle$ screw dislocations by computing the Peierls barrier and model dislocation-vacancy interactions to QM accuracy in a system containing more than 100,000 atoms.