11 resultados para alternance training system
em Cambridge University Engineering Department Publications Database
Resumo:
This paper describes the development of the 2003 CU-HTK large vocabulary speech recognition system for Conversational Telephone Speech (CTS). The system was designed based on a multi-pass, multi-branch structure where the output of all branches is combined using system combination. A number of advanced modelling techniques such as Speaker Adaptive Training, Heteroscedastic Linear Discriminant Analysis, Minimum Phone Error estimation and specially constructed Single Pronunciation dictionaries were employed. The effectiveness of each of these techniques and their potential contribution to the result of system combination was evaluated in the framework of a state-of-the-art LVCSR system with sophisticated adaptation. The final 2003 CU-HTK CTS system constructed from some of these models is described and its performance on the DARPA/NIST 2003 Rich Transcription (RT-03) evaluation test set is discussed.
Resumo:
A significant cost in obtaining acoustic training data is the generation of accurate transcriptions. For some sources close-caption data is available. This allows the use of lightly-supervised training techniques. However, for some sources and languages close-caption is not available. In these cases unsupervised training techniques must be used. This paper examines the use of unsupervised techniques for discriminative training. In unsupervised training automatic transcriptions from a recognition system are used for training. As these transcriptions may be errorful data selection may be useful. Two forms of selection are described, one to remove non-target language shows, the other to remove segments with low confidence. Experiments were carried out on a Mandarin transcriptions task. Two types of test data were considered, Broadcast News (BN) and Broadcast Conversations (BC). Results show that the gains from unsupervised discriminative training are highly dependent on the accuracy of the automatic transcriptions. © 2007 IEEE.
Resumo:
This paper discusses the development of the CU-HTK Mandarin Broadcast News (BN) transcription system. The Mandarin BN task includes a significant amount of English data. Hence techniques have been investigated to allow the same system to handle both Mandarin and English by augmenting the Mandarin training sets with English acoustic and language model training data. A range of acoustic models were built including models based on Gaussianised features, speaker adaptive training and feature-space MPE. A multi-branch system architecture is described in which multiple acoustic model types, alternate phone sets and segmentations can be used in a system combination framework to generate the final output. The final system shows state-of-the-art performance over a range of test sets. ©2006 British Crown Copyright.
Resumo:
This paper reports our experiences with a phoneme recognition system for the TIMIT database which uses multiple mixture continuous density monophone HMMs trained using MMI. A comprehensive set of results are presented comparing the ML and MMI training criteria for both diagonal and full covariance models. These results using simple monophone HMMs show clear performance gains achieved by MMI training, and are comparable to the best reported by others including those which use context-dependent models. In addition, the paper discusses a number of performance and implementation issues which are crucial to successful MMI training.
Resumo:
The optimization of dialogue policies using reinforcement learning (RL) is now an accepted part of the state of the art in spoken dialogue systems (SDS). Yet, it is still the case that the commonly used training algorithms for SDS require a large number of dialogues and hence most systems still rely on artificial data generated by a user simulator. Optimization is therefore performed off-line before releasing the system to real users. Gaussian Processes (GP) for RL have recently been applied to dialogue systems. One advantage of GP is that they compute an explicit measure of uncertainty in the value function estimates computed during learning. In this paper, a class of novel learning strategies is described which use uncertainty to control exploration on-line. Comparisons between several exploration schemes show that significant improvements to learning speed can be obtained and that rapid and safe online optimisation is possible, even on a complex task. Copyright © 2011 ISCA.
Resumo:
This paper introduces a novel method for the training of a complementary acoustic model with respect to set of given acoustic models. The method is based upon an extension of the Minimum Phone Error (MPE) criterion and aims at producing a model that makes complementary phone errors to those already trained. The technique is therefore called Complementary Phone Error (CPE) training. The method is evaluated using an Arabic large vocabulary continuous speech recognition task. Reductions in word error rate (WER) after combination with a CPE-trained system were obtained with up to 0.7% absolute for a system trained on 172 hours of acoustic data and up to 0.2% absolute for the final system trained on nearly 2000 hours of Arabic data.
Resumo:
Confronted with high variety and low volume market demands, many companies, especially the Japanese electronics manufacturing companies, have reconfigured their conveyor assembly lines and adopted seru production systems. Seru production system is a new type of work-cell-based manufacturing system. A lot of successful practices and experience show that seru production system can gain considerable flexibility of job shop and high efficiency of conveyor assembly line. In implementing seru production, the multi-skilled worker is the most important precondition, and some issues about multi-skilled workers are central and foremost. In this paper, we investigate the training and assignment problem of workers when a conveyor assembly line is entirely reconfigured into several serus. We formulate a mathematical model with double objectives which aim to minimize the total training cost and to balance the total processing times among multi-skilled workers in each seru. To obtain the satisfied task-to-worker training plan and worker-to-seru assignment plan, a three-stage heuristic algorithm with nine steps is developed to solve this mathematical model. Then, several computational cases are taken and computed by MATLAB programming. The computation and analysis results validate the performances of the proposed mathematical model and heuristic algorithm. © 2013 Springer-Verlag London.
Resumo:
Adaptation to speaker and environment changes is an essential part of current automatic speech recognition (ASR) systems. In recent years the use of multi-layer percpetrons (MLPs) has become increasingly common in ASR systems. A standard approach to handling speaker differences when using MLPs is to apply a global speaker-specific constrained MLLR (CMLLR) transform to the features prior to training or using the MLP. This paper considers the situation when there are both speaker and channel, communication link, differences in the data. A more powerful transform, front-end CMLLR (FE-CMLLR), is applied to the inputs to the MLP to represent the channel differences. Though global, these FE-CMLLR transforms vary from time-instance to time-instance. Experiments on a channel distorted dialect Arabic conversational speech recognition task indicates the usefulness of adapting MLP features using both CMLLR and FE-CMLLR transforms. © 2013 IEEE.