995 resultados para Acoustic modeling
Modeling pronunciation variation using context-dependent weighting and B/S refined acoustic modeling
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
Acoustic modeling using mixtures of multivariate Gaussians is the prevalent approach for many speech processing problems. Computing likelihoods against a large set of Gaussians is required as a part of many speech processing systems and it is the computationally dominant phase for LVCSR systems. We express the likelihood computation as a multiplication of matrices representing augmented feature vectors and Gaussian parameters. The computational gain of this approach over traditional methods is by exploiting the structure of these matrices and efficient implementation of their multiplication.In particular, we explore direct low-rank approximation of the Gaussian parameter matrix and indirect derivation of low-rank factors of the Gaussian parameter matrix by optimum approximation of the likelihood matrix. We show that both the methods lead to similar speedups but the latter leads to far lesser impact on the recognition accuracy. Experiments on a 1138 word vocabulary RM1 task using Sphinx 3.7 system show that, for a typical case the matrix multiplication approach leads to overall speedup of 46%. Both the low-rank approximation methods increase the speedup to around 60%, with the former method increasing the word error rate (WER) from 3.2% to 6.6%, while the latter increases the WER from 3.2% to 3.5%.
Resumo:
Tese de doutoramento, Engenharia Electrónica e Telecomunicações (Processamento de Sinal), Faculdade de Ciências e Tecnologia, Universidade do Algarve, 2014
Resumo:
Performance of any continuous speech recognition system is dependent on the accuracy of its acoustic model. Hence, preparation of a robust and accurate acoustic model lead to satisfactory recognition performance for a speech recognizer. In acoustic modeling of phonetic unit, context information is of prime importance as the phonemes are found to vary according to the place of occurrence in a word. In this paper we compare and evaluate the effect of context dependent tied (CD tied) models, context dependent (CD) and context independent (CI) models in the perspective of continuous speech recognition of Malayalam language. The database for the speech recognition system has utterance from 21 speakers including 11 female and 10 males. Our evaluation results show that CD tied models outperforms CI models over 21%.
Resumo:
Acoustic modeling using mixtures of multivariate Gaussians is the prevalent approach for many speech processing problems. Computing likelihoods against a large set of Gaussians is required as a part of many speech processing systems and it is the computationally dominant phase for Large Vocabulary Continuous Speech Recognition (LVCSR) systems. We express the likelihood computation as a multiplication of matrices representing augmented feature vectors and Gaussian parameters. The computational gain of this approach over traditional methods is by exploiting the structure of these matrices and efficient implementation of their multiplication. In particular, we explore direct low-rank approximation of the Gaussian parameter matrix and indirect derivation of low-rank factors of the Gaussian parameter matrix by optimum approximation of the likelihood matrix. We show that both the methods lead to similar speedups but the latter leads to far lesser impact on the recognition accuracy. Experiments on 1,138 work vocabulary RM1 task and 6,224 word vocabulary TIMIT task using Sphinx 3.7 system show that, for a typical case the matrix multiplication based approach leads to overall speedup of 46 % on RM1 task and 115 % for TIMIT task. Our low-rank approximation methods provide a way for trading off recognition accuracy for a further increase in computational performance extending overall speedups up to 61 % for RM1 and 119 % for TIMIT for an increase of word error rate (WER) from 3.2 to 3.5 % for RM1 and for no increase in WER for TIMIT. We also express pairwise Euclidean distance computation phase in Dynamic Time Warping (DTW) in terms of matrix multiplication leading to saving of approximately of computational operations. In our experiments using efficient implementation of matrix multiplication, this leads to a speedup of 5.6 in computing the pairwise Euclidean distances and overall speedup up to 3.25 for DTW.
Resumo:
This paper discusses the Cambridge University HTK (CU-HTK) system for the automatic transcription of conversational telephone speech. A detailed discussion of the most important techniques in front-end processing, acoustic modeling and model training, language and pronunciation modeling are presented. These include the use of conversation side based cepstral normalization, vocal tract length normalization, heteroscedastic linear discriminant analysis for feature projection, minimum phone error training and speaker adaptive training, lattice-based model adaptation, confusion network based decoding and confidence score estimation, pronunciation selection, language model interpolation, and class based language models. The transcription system developed for participation in the 2002 NIST Rich Transcription evaluations of English conversational telephone speech data is presented in detail. In this evaluation the CU-HTK system gave an overall word error rate of 23.9%, which was the best performance by a statistically significant margin. Further details on the derivation of faster systems with moderate performance degradation are discussed in the context of the 2002 CU-HTK 10 × RT conversational speech transcription system. © 2005 IEEE.
Resumo:
To further enhance the sound absorption of metal foams via combining the high sound absorption and good heat conductivity of the cellular foam metals, the use and acoustic modeling of these materials are reviewed. The predictions made by three viscous models developed by the authors for the propagation of sound through open-cell metal foams are compared with an experiment both for the metal foams and for the polymer substrates used to manufacture the foam. All models are valid in the limit of low Reynold's number which is valid for the typical cell dimensions found in metal foams provided the amplitude of the waves is below 160 dB. The first model considers the drag experienced by acoustic waves as they propagate passing rigid cylinders parallel to their axes, the second considers the propagation normal to their axes, and the third considers the propagation passing the spherical joints. All three are combined together to give a general model of the acoustic behavior of the foams. In particular, the sound absorption is found to be significant and well predicted by the combined model. In addition, a post-processing technique is described for the experiment used to extract the fundamental wave propagation characteristics of the material.
Resumo:
A scalable large vocabulary, speaker independent speech recognition system is being developed using Hidden Markov Models (HMMs) for acoustic modeling and a Weighted Finite State Transducer (WFST) to compile sentence, word, and phoneme models. The system comprises a software backend search and an FPGA-based Gaussian calculation which are covered here. In this paper, we present an efficient pipelined design implemented both as an embedded peripheral and as a scalable, parallel hardware accelerator. Both architectures have been implemented on an Alpha Data XRC-5T1, reconfigurable computer housing a Virtex 5 SX95T FPGA. The core has been tested and is capable of calculating a full set of Gaussian results from 3825 acoustic models in 9.03 ms which coupled with a backend search of 5000 words has provided an accuracy of over 80%. Parallel implementations have been designed with up to 32 cores and have been successfully implemented with a clock frequency of 133?MHz.
Resumo:
This paper aims at providing a better insight into the 3D approximations of the wave equation using compact finite-difference time-domain (FDTD) schemes in the context of room acoustic simulations. A general family of 3D compact explicit and implicit schemes based on a nonstaggered rectilinear grid is analyzed in terms of stability, numerical error, and accuracy. Various special cases are compared and the most accurate explicit and implicit schemes are identified. Further considerations presented in the paper include the direct relationship with other numerical approaches found in the literature on room acoustic modeling such as the 3D digital waveguide mesh and Yee's staggered grid technique.
Resumo:
Malayalam is one of the 22 scheduled languages in India with more than 130 million speakers. This paper presents a report on the development of a speaker independent, continuous transcription system for Malayalam. The system employs Hidden Markov Model (HMM) for acoustic modeling and Mel Frequency Cepstral Coefficient (MFCC) for feature extraction. It is trained with 21 male and female speakers in the age group ranging from 20 to 40 years. The system obtained a word recognition accuracy of 87.4% and a sentence recognition accuracy of 84%, when tested with a set of continuous speech data.
Resumo:
A modelagem acústica fornece dados úteis para avaliação de metodologias de processamento e imageamento sísmico, em modelos com estrutura geológica complexa. Esquemas de diferenças finitas (DF) e elementos finitos (EF) foram implementados e avaliados em modelos homogêneos e heterogêneos. O algoritmo de diferenças finitas foi estendido para o caso 2,5-D em modelos com densidade variável. Foi apresentada a modelagem de alvos geológicos de interesse exploratório existentes na Bacia Paleozóica do Solimões na Amazônia. Reflexões múltiplas de longo período produzidas entre a superfície livre e a discordância Cretáceo-Paleozóica, a baixa resolução da onda sísmica nas proximidades do reservatório e as fracas reflexões na interface entre as rochas reservatório e as rochas selantes são as principais características dos dados sintéticos obtidos, os quais representam um grande desafio ao imageamento sísmico.