Biblioteca Digital

Four types of neural networks which have previously been established for speech recognition and tested on a small, seven-speaker, 100-sentence database are applied to the TIMIT database. The networks are a recurrent network phoneme recognizer, a modified Kanerva model morph recognizer, a compositional representation phoneme-to-word recognizer, and a modified Kanerva model morph-to-word recognizer. The major result is for the recurrent net, giving a phoneme recognition accuracy of 57% from the si and sx sentences. The Kanerva morph recognizer achieves 66.2% accuracy for a small subset of the sa and sx sentences. The results for the word recognizers are incomplete.

Veja mais

MMI training for continuous phoneme recognition on the TIMIT database

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper reports our experiences with a phoneme recognition system for the TIMIT database which uses multiple mixture continuous density monophone HMMs trained using MMI. A comprehensive set of results are presented comparing the ML and MMI training criteria for both diagonal and full covariance models. These results using simple monophone HMMs show clear performance gains achieved by MMI training, and are comparable to the best reported by others including those which use context-dependent models. In addition, the paper discusses a number of performance and implementation issues which are crucial to successful MMI training.

Veja mais

Wadge Bank trawl fishery studies. Pt. 4. An analysis of the length frequency measurements of the sea bream (Lethrinus nebulosus) made in 1949 and 1953 to 1958

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Length frequency distributions of the sea bream collected during the period 1953 to 1958 have been analysed. The increase in average sizes of the sea bream with depth suggests a movement to deeper waters with increase in size. By numbers, the sea bream is more abundant between 21 and 30 fathoms than in deeper areas. The recruitment was continuous and regular. There is no sign of entry or progression of a dominant brood throughout the period under study. Length frequency distribution shows three distinct modes. The first mode occurs regularly but does not progress beyond 40cm, recruitment being balanced by natural and fishing mortality. The other two which are not regular are probably the result of fishing outside regular areas. Short sections of “growth” lines which fit into one another when extrapolated, are evident. The larger lines obtained by extrapolation are parallel to one another. These tentative "growth lines" indicate that this species which enters the fishing grounds, when 15 cm or larger in length are exploited by the trawl fishery for a period of three to four years. This species appears to be six months old when it enters the fishing grounds and increases in length by about 37.5 cm in the next 30 months. Later growth slows down. The average size of the specimens sampled continued to get smaller from 1953 till 1957. It is shown that this reduction in size is due to increased fishing effort.

Veja mais

Continuous F0 in the source-excitation generation for HMM-based TTS: Do we need voiced/unvoiced classification?

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Most HMM-based TTS systems use a hard voiced/unvoiced classification to produce a discontinuous F0 signal which is used for the generation of the source-excitation. When a mixed source excitation is used, this decision can be based on two different sources of information: the state-specific MSD-prior of the F0 models, and/or the frame-specific features generated by the aperiodicity model. This paper examines the meaning of these variables in the synthesis process, their interaction, and how they affect the perceived quality of the generated speech The results of several perceptual experiments show that when using mixed excitation, subjects consistently prefer samples with very few or no false unvoiced errors, whereas a reduction in the rate of false voiced errors does not produce any perceptual improvement. This suggests that rather than using any form of hard voiced/unvoiced classification, e.g., the MSD-prior, it is better for synthesis to use a continuous F0 signal and rely on the frame-level soft voiced/unvoiced decision of the aperiodicity model. © 2011 IEEE.

Veja mais

955 resultados para Continuous progression

Filtro por publicador