Biblioteca Digital

Four types of neural networks which have previously been established for speech recognition and tested on a small, seven-speaker, 100-sentence database are applied to the TIMIT database. The networks are a recurrent network phoneme recognizer, a modified Kanerva model morph recognizer, a compositional representation phoneme-to-word recognizer, and a modified Kanerva model morph-to-word recognizer. The major result is for the recurrent net, giving a phoneme recognition accuracy of 57% from the si and sx sentences. The Kanerva morph recognizer achieves 66.2% accuracy for a small subset of the sa and sx sentences. The results for the word recognizers are incomplete.

Veja mais

MMI training for continuous phoneme recognition on the TIMIT database

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper reports our experiences with a phoneme recognition system for the TIMIT database which uses multiple mixture continuous density monophone HMMs trained using MMI. A comprehensive set of results are presented comparing the ML and MMI training criteria for both diagonal and full covariance models. These results using simple monophone HMMs show clear performance gains achieved by MMI training, and are comparable to the best reported by others including those which use context-dependent models. In addition, the paper discusses a number of performance and implementation issues which are crucial to successful MMI training.

Veja mais

Design considerations for a micro aerial vehicle aerodynamic characterization facility at the University of Florida research and engineering education facility

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes the design considerations for a proposed aerodynamic characterization facility (ACF) for micro aerial vehicles (MAVs). This is a collaborative effort between the Air Force Research Laboratory Munitions Directorate (AFRL/MN) and the University of Florida Research and Engineering Education Facility (UF/REEF). The ACF is expected to provide a capability for the characterization of the aerodynamic performance of future MAVs. This includes the ability to gather the data necessary to devise control strategies as well as the potential to investigate aerodynamic 'problem areas' or specific failings. Since it is likely that future MAVs will incorporate advanced control strategies, the facility must enable researchers to critically assess such novel methods. Furthermore, the aerodynamic issues should not be seen (and tested) in isolation, but rather the facility should be able to also provide information on structural responses (such as aeroelasticity) as well as integration issues (say, thrust integration or sensor integration). Therefore the mission for the proposed facility ranges form fairly basic investigations of individual technical issues encountered by MAVs (for example an evaluation of wing shapes or control effectiveness) all the way to testing a fully integrated vehicle in a flight configuration for performance evaluation throughout the mission envelope.

Veja mais

Continuous F0 in the source-excitation generation for HMM-based TTS: Do we need voiced/unvoiced classification?

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Most HMM-based TTS systems use a hard voiced/unvoiced classification to produce a discontinuous F0 signal which is used for the generation of the source-excitation. When a mixed source excitation is used, this decision can be based on two different sources of information: the state-specific MSD-prior of the F0 models, and/or the frame-specific features generated by the aperiodicity model. This paper examines the meaning of these variables in the synthesis process, their interaction, and how they affect the perceived quality of the generated speech The results of several perceptual experiments show that when using mixed excitation, subjects consistently prefer samples with very few or no false unvoiced errors, whereas a reduction in the rate of false voiced errors does not produce any perceptual improvement. This suggests that rather than using any form of hard voiced/unvoiced classification, e.g., the MSD-prior, it is better for synthesis to use a continuous F0 signal and rely on the frame-level soft voiced/unvoiced decision of the aperiodicity model. © 2011 IEEE.

Veja mais

Joint modelling of voicing label and continuous F0 for HMM based speech synthesis

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Fundamental frequency, or F0 is critical for high quality speech synthesis in HMM based speech synthesis. Traditionally, F0 values are considered to depend on a binary voicing decision such that they are continuous in voiced regions and undefined in unvoiced regions. Multi-space distribution HMM (MSDHMM) has been used for modelling the discontinuous F0. Recently, a continuous F0 modelling framework has been proposed and shown to be effective, where continuous F0 observations are assumed to always exist and voicing labels are explicitly modelled by an independent stream. In this paper, a refined continuous F0 modelling approach is proposed. Here, F0 values are assumed to be dependent on voicing labels and both are jointly modelled in a single stream. Due to the enforced dependency, the new method can effectively reduce the voicing classification error. Subjective listening tests also demonstrate that the new approach can yield significant improvements on the naturalness of the synthesised speech. A dynamic random unvoiced F0 generation method is also investigated. Experiments show that it has significant effect on the quality of synthesised speech. © 2011 IEEE.

Veja mais

169 resultados para Continuous flight envelope

Filtro por publicador