198 resultados para free speech
Resumo:
Model-based approaches to handle additive and convolutional noise have been extensively investigated and used. However, the application of these schemes to handling reverberant noise has received less attention. This paper examines the extension of two standard additive/convolutional noise approaches to handling reverberant noise. The first is an extension of vector Taylor series (VTS) compensation, reverberant VTS, where a mismatch function including reverberant noise is used. The second scheme modifies constrained MLLR to allow a wide-span of frames to be taken into account and projected into the required dimensionality. To allow additive noise to be handled, both these schemes are combined with standard VTS. The approaches are evaluated and compared on two tasks, MC-WSJ-AV, and a reverberant simulated version of AURORA-4. © 2011 IEEE.
Resumo:
Model-based approaches to handling additive background noise and channel distortion, such as Vector Taylor Series (VTS), have been intensively studied and extended in a number of ways. In previous work, VTS has been extended to handle both reverberant and background noise, yielding the Reverberant VTS (RVTS) scheme. In this work, rather than assuming the observation vector is generated by the reverberation of a sequence of background noise corrupted speech vectors, as in RVTS, the observation vector is modelled as a superposition of the background noise and the reverberation of clean speech. This yields a new compensation scheme RVTS Joint (RVTSJ), which allows an easy formulation for joint estimation of both additive and reverberation noise parameters. These two compensation schemes were evaluated and compared on a simulated reverberant noise corrupted AURORA4 task. Both yielded large gains over VTS baseline system, with RVTSJ outperforming the previous RVTS scheme. © 2011 IEEE.
Resumo:
Using an entropy argument, it is shown that stochastic context-free grammars (SCFG's) can model sources with hidden branching processes more efficiently than stochastic regular grammars (or equivalently HMM's). However, the automatic estimation of SCFG's using the Inside-Outside algorithm is limited in practice by its O(n3) complexity. In this paper, a novel pre-training algorithm is described which can give significant computational savings. Also, the need for controlling the way that non-terminals are allocated to hidden processes is discussed and a solution is presented in the form of a grammar minimization procedure. © 1990.
Resumo:
VODIS II, a research system in which recognition is based on the conventional one-pass connected-word algorithm extended in two ways, is described. Syntactic constraints can now be applied directly via context-free-grammar rules, and the algorithm generates a lattice of candidate word matches rather than a single globally optimal sequence. This lattice is then processed by a chart parser and an intelligent dialogue controller to obtain the most plausible interpretations of the input. A key feature of the VODIS II architecture is that the concept of an abstract word model allows the system to be used with different pattern-matching technologies and hardware. The current system implements the word models on a real-time dynamic-time-warping recognizer.
Resumo:
This paper describes two applications in speech recognition of the use of stochastic context-free grammars (SCFGs) trained automatically via the Inside-Outside Algorithm. First, SCFGs are used to model VQ encoded speech for isolated word recognition and are compared directly to HMMs used for the same task. It is shown that SCFGs can model this low-level VQ data accurately and that a regular grammar based pre-training algorithm is effective both for reducing training time and obtaining robust solutions. Second, an SCFG is inferred from a transcription of the speech used to train a phoneme-based recognizer in an attempt to model phonotactic constraints. When used as a language model, this SCFG gives improved performance over a comparable regular grammar or bigram. © 1991.
Resumo:
This paper describes the development of the 2003 CU-HTK large vocabulary speech recognition system for Conversational Telephone Speech (CTS). The system was designed based on a multi-pass, multi-branch structure where the output of all branches is combined using system combination. A number of advanced modelling techniques such as Speaker Adaptive Training, Heteroscedastic Linear Discriminant Analysis, Minimum Phone Error estimation and specially constructed Single Pronunciation dictionaries were employed. The effectiveness of each of these techniques and their potential contribution to the result of system combination was evaluated in the framework of a state-of-the-art LVCSR system with sophisticated adaptation. The final 2003 CU-HTK CTS system constructed from some of these models is described and its performance on the DARPA/NIST 2003 Rich Transcription (RT-03) evaluation test set is discussed.
Resumo:
Polyaniline (PANI) nanobrushes were synthesized by template-free electrochemical galvanostatic methods. When the same method was applied to the carbon nanohorn (CNH) solution containing aniline monomers, a hybrid nanostructure containing PANI and CNHs was enabled after electropolymerization. This is the first report on the template-free method to make PANI nanobrushes and homogeneous hybrid soft matter (PANI) with carbon nanoparticles. Raman spectroscopy was used to analyze the interaction between CNH and PANI. Electrochemical nanofabrication offers simplicity and good control when used to make electronic devices. Both of these materials were applied in supercapacitors and an improvement capacitive current by using the hybrid material was observed.