Biblioteca Digital

A model of the auditory periphery assembled from analog network submodels of all the relevant anatomical structures is described. There is bidirectional coupling between networks representing the outer ear, middle ear and cochlea. A simple voltage source representation of the outer hair cells provides level-dependent basilar membrane curves. The networks are translated into efficient computational modules by means of wave digital filtering. A feedback unit regulates the average firing rate at the output of an inner hair cell module via a simplified modelling of the dynamics of the descending paths to the peripheral ear. This leads to a digital model of the entire auditory periphery with applications to both speech and hearing research.

Veja mais

Exploiting variable-width features in large vocabulary speech recognition

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The use of variable-width features (prosodics, broad structural information etc.) in large vocabulary speech recognition systems is discussed. Although the value of this sort of information has been recognized in the past, previous approaches have not been widely used in speech systems because either they have not been robust enough for realistic, large vocabulary tasks or they have been limited to certain recognizer architectures. A framework for the use of variable-width features is presented which employs the N-Best algorithm with the features being applied in a post-processing phase. The framework is flexible and widely applicable, giving greater scope for exploitation of the features than previous approaches. Large vocabulary speech recognition experiments using TIMIT show that the application of variable-width features has potential benefits.

Veja mais

Spoken document representations for probabilistic retrieval

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents some developments in query expansion and document representation of our spoken document retrieval system and shows how various retrieval techniques affect performance for different sets of transcriptions derived from a common speech source. Modifications of the document representation are used, which combine several techniques for query expansion, knowledge-based on one hand and statistics-based on the other. Taken together, these techniques can improve Average Precision by over 19% relative to a system similar to that which we presented at TREC-7. These new experiments have also confirmed that the degradation of Average Precision due to a word error rate (WER) of 25% is quite small (3.7% relative) and can be reduced to almost zero (0.2% relative). The overall improvement of the retrieval system can also be observed for seven different sets of transcriptions from different recognition engines with a WER ranging from 24.8% to 61.5%. We hope to repeat these experiments when larger document collections become available, in order to evaluate the scalability of these techniques.

Veja mais

Erratum: Language modelling for Russian and English using words and classes (Computer Speech and Language (2003) 17 (87-104))

Relevância:

10.00% 10.00%

Publicador:

Veja mais

Investigation of acoustic units for LVCSR systems

Relevância:

10.00% 10.00%

Publicador:

Resumo:

One important issue in designing state-of-the-art LVCSR systems is the choice of acoustic units. Context dependent (CD) phones remain the dominant form of acoustic units. They can capture the co-articulatory effect in speech via explicit modelling. However, for other more complicated phonological processes, they rely on the implicit modelling ability of the underlying statistical models. Alternatively, it is possible to construct acoustic models based on higher level linguistic units, for example, syllables, to explicitly capture these complex patterns. When sufficient training data is available, this approach may show an advantage over implicit acoustic modelling. In this paper a wide range of acoustic units are investigated to improve LVCSR system performance. Significant error rate gains up to 7.1% relative (0.8% abs.) were obtained on a state-of-the-art Mandarin Chinese broadcast audio recognition task using word and syllable position dependent triphone and quinphone models. © 2011 IEEE.

Veja mais

Morphological decomposition in Arabic ASR systems

Relevância:

10.00% 10.00%

Publicador:

Veja mais

231 resultados para Savanna woodland

Filtro por publicador