36 resultados para trained incapacity
em Cambridge University Engineering Department Publications Database
Resumo:
As the use of found data increases, more systems are being built using adaptive training. Here transforms are used to represent unwanted acoustic variability, e.g. speaker and acoustic environment changes, allowing a canonical model that models only the "pure" variability of speech to be trained. Adaptive training may be described within a Bayesian framework. By using complexity control approaches to ensure robust parameter estimates, the standard point estimate adaptive training can be justified within this Bayesian framework. However during recognition there is usually no control over the amount of data available. It is therefore preferable to be able to use a full Bayesian approach to applying transforms during recognition rather than the standard point estimates. This paper discusses various approximations to Bayesian approaches including a new variational Bayes approximation. The application of these approaches to state-of-the-art adaptively trained systems using both CAT and MLLR transforms is then described and evaluated on a large vocabulary speech recognition task. © 2005 IEEE.
Resumo:
This study is the first step in the psychoacoustic exploration of perceptual differences between the sounds of different violins. A method was used which enabled the same performance to be replayed on different "virtual violins," so that the relationships between acoustical characteristics of violins and perceived qualities could be explored. Recordings of real performances were made using a bridge-mounted force transducer, giving an accurate representation of the signal from the violin string. These were then played through filters corresponding to the admittance curves of different violins. Initially, limits of listener performance in detecting changes in acoustical characteristics were characterized. These consisted of shifts in frequency or increases in amplitude of single modes or frequency bands that have been proposed previously to be significant in the perception of violin sound quality. Thresholds were significantly lower for musically trained than for nontrained subjects but were not significantly affected by the violin used as a baseline. Thresholds for the musicians typically ranged from 3 to 6 dB for amplitude changes and 1.5%-20% for frequency changes. interpretation of the results using excitation patterns showed that thresholds for the best subjects were quite well predicted by a multichannel model based on optimal processing. (c) 2007 Acoustical Society of America.
Resumo:
INTRODUCTION: Recent studies in other European countries suggest that the prevalence of congenital cryptorchidism continues to increase. This study aimed to explore the prevalence and natural history of congenital cryptorchidism in a UK centre. METHODS: Between October 2001 and July 2008, 784 male infants were born in the prospective Cambridge Baby Growth Study. 742 infants were examined by trained research nurses at birth; testicular position was assessed using standard techniques. Follow-up assessments were completed at ages 3, 12, 18 and 24 months in 615, 462, 393 and 326 infants, respectively. RESULTS: The prevalence of cryptorchidism at birth was 5.9% (95% CI 4.4% to 7.9%). Congenital cryptorchidism was associated with earlier gestational age (p<0.001), lower birth weight (p<0.001), birth length (p<0.001) and shorter penile length at birth (p<0.0001) compared with other infants, but normal size after age 3 months. The prevalence of cryptorchidism declined to 2.4% at 3 months, but unexpectedly rose again to 6.7% at 12 months as a result of new cases. The cumulative incidence of "acquired cryptorchidism" by age 24 months was 7.0% and these cases had shorter penile length during infancy than other infants (p = 0.003). CONCLUSIONS: The prevalence of congenital cryptorchidism was higher than earlier estimates in UK populations. Furthermore, this study for the first time describes acquired cryptorchidism or "ascending testis" as a common entity in male infants, which is possibly associated with reduced early postnatal androgen activity.
Resumo:
The Chinese language is based on characters which are syllabic in nature. Since languages have syllabotactic rules which govern the construction of syllables and their allowed sequences, Chinese character sequence models can be used as a first level approximation of allowed syllable sequences. N-gram character sequence models were trained on 4.3 billion characters. Characters are used as a first level recognition unit with multiple pronunciations per character. For comparison the CU-HTK Mandarin word based system was used to recognize words which were then converted to character sequences. The character only system error rates for one best recognition were slightly worse than word based character recognition. However combining the two systems using log-linear combination gives better results than either system separately. An equally weighted combination gave consistent CER gains of 0.1-0.2% absolute over the word based standard system. Copyright © 2009 ISCA.
Resumo:
This paper describes results obtained using the modified Kanerva model to perform word recognition in continuous speech after being trained on the multi-speaker Alvey 'Hotel' speech corpus. Theoretical discoveries have recently enabled us to increase the speed of execution of part of the model by two orders of magnitude over that previously reported by Prager & Fallside. The memory required for the operation of the model has been similarly reduced. The recognition accuracy reaches 95% without syntactic constraints when tested on different data from seven trained speakers. Real time simulation of a model with 9,734 active units is now possible in both training and recognition modes using the Alvey PARSIFAL transputer array. The modified Kanerva model is a static network consisting of a fixed nonlinear mapping (location matching) followed by a single layer of conventional adaptive links. A section of preprocessed speech is transformed by the non-linear mapping to a high dimensional representation. From this intermediate representation a simple linear mapping is able to perform complex pattern discrimination to form the output, indicating the nature of the speech features present in the input window.
Resumo:
This paper describes two applications in speech recognition of the use of stochastic context-free grammars (SCFGs) trained automatically via the Inside-Outside Algorithm. First, SCFGs are used to model VQ encoded speech for isolated word recognition and are compared directly to HMMs used for the same task. It is shown that SCFGs can model this low-level VQ data accurately and that a regular grammar based pre-training algorithm is effective both for reducing training time and obtaining robust solutions. Second, an SCFG is inferred from a transcription of the speech used to train a phoneme-based recognizer in an attempt to model phonotactic constraints. When used as a language model, this SCFG gives improved performance over a comparable regular grammar or bigram. © 1991.
Resumo:
This paper reports our experiences with a phoneme recognition system for the TIMIT database which uses multiple mixture continuous density monophone HMMs trained using MMI. A comprehensive set of results are presented comparing the ML and MMI training criteria for both diagonal and full covariance models. These results using simple monophone HMMs show clear performance gains achieved by MMI training, and are comparable to the best reported by others including those which use context-dependent models. In addition, the paper discusses a number of performance and implementation issues which are crucial to successful MMI training.
Resumo:
Bayesian formulated neural networks are implemented using hybrid Monte Carlo method for probabilistic fault identification in cylindrical shells. Each of the 20 nominally identical cylindrical shells is divided into three substructures. Holes of (12±2) mm in diameter are introduced in each of the substructures and vibration data are measured. Modal properties and the Coordinate Modal Assurance Criterion (COMAC) are utilized to train the two modal-property-neural-networks. These COMAC are calculated by taking the natural-frequency-vector to be an additional mode. Modal energies are calculated by determining the integrals of the real and imaginary components of the frequency response functions over bandwidths of 12% of the natural frequencies. The modal energies and the Coordinate Modal Energy Assurance Criterion (COMEAC) are used to train the two frequency-response-function-neural-networks. The averages of the two sets of trained-networks (COMAC and COMEAC as well as modal properties and modal energies) form two committees of networks. The COMEAC and the COMAC are found to be better identification data than using modal properties and modal energies directly. The committee approach is observed to give lower standard deviations than the individual methods. The main advantage of the Bayesian formulation is that it gives identities of damage and their respective confidence intervals.
Resumo:
A parallel processing network derived from Kanerva's associative memory theory Kanerva 1984 is shown to be able to train rapidly on connected speech data and recognize further speech data with a label error rate of 0·68%. This modified Kanerva model can be trained substantially faster than other networks with comparable pattern discrimination properties. Kanerva presented his theory of a self-propagating search in 1984, and showed theoretically that large-scale versions of his model would have powerful pattern matching properties. This paper describes how the design for the modified Kanerva model is derived from Kanerva's original theory. Several designs are tested to discover which form may be implemented fastest while still maintaining versatile recognition performance. A method is developed to deal with the time varying nature of the speech signal by recognizing static patterns together with a fixed quantity of contextual information. In order to recognize speech features in different contexts it is necessary for a network to be able to model disjoint pattern classes. This type of modelling cannot be performed by a single layer of links. Network research was once held back by the inability of single-layer networks to solve this sort of problem, and the lack of a training algorithm for multi-layer networks. Rumelhart, Hinton & Williams 1985 provided one solution by demonstrating the "back propagation" training algorithm for multi-layer networks. A second alternative is used in the modified Kanerva model. A non-linear fixed transformation maps the pattern space into a space of higher dimensionality in which the speech features are linearly separable. A single-layer network may then be used to perform the recognition. The advantage of this solution over the other using multi-layer networks lies in the greater power and speed of the single-layer network training algorithm. © 1989.
Resumo:
Over the past decade, a variety of user models have been proposed for user simulation-based reinforcement-learning of dialogue strategies. However, the strategies learned with these models are rarely evaluated in actual user trials and it remains unclear how the choice of user model affects the quality of the learned strategy. In particular, the degree to which strategies learned with a user model generalise to real user populations has not be investigated. This paper presents a series of experiments that qualitatively and quantitatively examine the effect of the user model on the learned strategy. Our results show that the performance and characteristics of the strategy are in fact highly dependent on the user model. Furthermore, a policy trained with a poor user model may appear to perform well when tested with the same model, but fail when tested with a more sophisticated user model. This raises significant doubts about the current practice of learning and evaluating strategies with the same user model. The paper further investigates a new technique for testing and comparing strategies directly on real human-machine dialogues, thereby avoiding any evaluation bias introduced by the user model. © 2005 IEEE.