897 resultados para Telephone, Automatic


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Mode of access: Internet.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Adjust image to portrait layout

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper discusses the Cambridge University HTK (CU-HTK) system for the automatic transcription of conversational telephone speech. A detailed discussion of the most important techniques in front-end processing, acoustic modeling and model training, language and pronunciation modeling are presented. These include the use of conversation side based cepstral normalization, vocal tract length normalization, heteroscedastic linear discriminant analysis for feature projection, minimum phone error training and speaker adaptive training, lattice-based model adaptation, confusion network based decoding and confidence score estimation, pronunciation selection, language model interpolation, and class based language models. The transcription system developed for participation in the 2002 NIST Rich Transcription evaluations of English conversational telephone speech data is presented in detail. In this evaluation the CU-HTK system gave an overall word error rate of 23.9%, which was the best performance by a statistically significant margin. Further details on the derivation of faster systems with moderate performance degradation are discussed in the context of the 2002 CU-HTK 10 × RT conversational speech transcription system. © 2005 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Automatic recognition of people is an active field of research with important forensic and security applications. In these applications, it is not always possible for the subject to be in close proximity to the system. Voice represents a human behavioural trait which can be used to recognise people in such situations. Automatic Speaker Verification (ASV) is the process of verifying a persons identity through the analysis of their speech and enables recognition of a subject at a distance over a telephone channel { wired or wireless. A significant amount of research has focussed on the application of Gaussian mixture model (GMM) techniques to speaker verification systems providing state-of-the-art performance. GMM's are a type of generative classifier trained to model the probability distribution of the features used to represent a speaker. Recently introduced to the field of ASV research is the support vector machine (SVM). An SVM is a discriminative classifier requiring examples from both positive and negative classes to train a speaker model. The SVM is based on margin maximisation whereby a hyperplane attempts to separate classes in a high dimensional space. SVMs applied to the task of speaker verification have shown high potential, particularly when used to complement current GMM-based techniques in hybrid systems. This work aims to improve the performance of ASV systems using novel and innovative SVM-based techniques. Research was divided into three main themes: session variability compensation for SVMs; unsupervised model adaptation; and impostor dataset selection. The first theme investigated the differences between the GMM and SVM domains for the modelling of session variability | an aspect crucial for robust speaker verification. Techniques developed to improve the robustness of GMMbased classification were shown to bring about similar benefits to discriminative SVM classification through their integration in the hybrid GMM mean supervector SVM classifier. Further, the domains for the modelling of session variation were contrasted to find a number of common factors, however, the SVM-domain consistently provided marginally better session variation compensation. Minimal complementary information was found between the techniques due to the similarities in how they achieved their objectives. The second theme saw the proposal of a novel model for the purpose of session variation compensation in ASV systems. Continuous progressive model adaptation attempts to improve speaker models by retraining them after exploiting all encountered test utterances during normal use of the system. The introduction of the weight-based factor analysis model provided significant performance improvements of over 60% in an unsupervised scenario. SVM-based classification was then integrated into the progressive system providing further benefits in performance over the GMM counterpart. Analysis demonstrated that SVMs also hold several beneficial characteristics to the task of unsupervised model adaptation prompting further research in the area. In pursuing the final theme, an innovative background dataset selection technique was developed. This technique selects the most appropriate subset of examples from a large and diverse set of candidate impostor observations for use as the SVM background by exploiting the SVM training process. This selection was performed on a per-observation basis so as to overcome the shortcoming of the traditional heuristic-based approach to dataset selection. Results demonstrate the approach to provide performance improvements over both the use of the complete candidate dataset and the best heuristically-selected dataset whilst being only a fraction of the size. The refined dataset was also shown to generalise well to unseen corpora and be highly applicable to the selection of impostor cohorts required in alternate techniques for speaker verification.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis describes the investigation of an adaptive method of attenuation control for digital speech signals in an analogue-digital environment and its effects on the transmission performance of a national telecommunication network. The first part gives the design of a digital automatic gain control, able to operate upon a P.C.M. signal in its companded form and whose operation is based upon the counting of peaks of the digital speech signal above certain threshold levels. A study was ma.de of a digital automatic gain control (d.a.g.c.) in open-loop configuration and closed-loop configuration. The former was adopted as the means for carrying out the automatic control of attenuation. It was simulated and tested, both objectively and subjectively. The final part is the assessment of the effects on telephone connections of a d.a.g.c. that introduces gains of 6 dB or 12 dB. This work used a Telephone Connection Assessment Model developed at The University of Aston in Birmingham. The subjective tests showed that the d.a.g.c. gives advantage for listeners when the speech level is very low. The benefit is not great when speech is only a little quieter than preferred. The assessment showed that, when a standard British Telecom earphone is used, insertion of gain is desirable if speech voltage across the earphone terminals is below an upper limit of -38 dBV. People commented upon the presence of an adaptive-like effect during the tests. This could be the reason why they voted against the insertion of gain at level only little quieter than preferred, when they may otherwise have judged it to be desirable. A telephone connection with a d.a.g.c. in has a degree of difficulty less than half of that without it. The score Excellent plus Good is 10-30% greater.