996 resultados para Speech Acoustics
Resumo:
VODIS II, a research system in which recognition is based on the conventional one-pass connected-word algorithm extended in two ways, is described. Syntactic constraints can now be applied directly via context-free-grammar rules, and the algorithm generates a lattice of candidate word matches rather than a single globally optimal sequence. This lattice is then processed by a chart parser and an intelligent dialogue controller to obtain the most plausible interpretations of the input. A key feature of the VODIS II architecture is that the concept of an abstract word model allows the system to be used with different pattern-matching technologies and hardware. The current system implements the word models on a real-time dynamic-time-warping recognizer.
Resumo:
Four types of neural networks which have previously been established for speech recognition and tested on a small, seven-speaker, 100-sentence database are applied to the TIMIT database. The networks are a recurrent network phoneme recognizer, a modified Kanerva model morph recognizer, a compositional representation phoneme-to-word recognizer, and a modified Kanerva model morph-to-word recognizer. The major result is for the recurrent net, giving a phoneme recognition accuracy of 57% from the si and sx sentences. The Kanerva morph recognizer achieves 66.2% accuracy for a small subset of the sa and sx sentences. The results for the word recognizers are incomplete.
Resumo:
The use of variable-width features (prosodics, broad structural information etc.) in large vocabulary speech recognition systems is discussed. Although the value of this sort of information has been recognized in the past, previous approaches have not been widely used in speech systems because either they have not been robust enough for realistic, large vocabulary tasks or they have been limited to certain recognizer architectures. A framework for the use of variable-width features is presented which employs the N-Best algorithm with the features being applied in a post-processing phase. The framework is flexible and widely applicable, giving greater scope for exploitation of the features than previous approaches. Large vocabulary speech recognition experiments using TIMIT show that the application of variable-width features has potential benefits.
Resumo:
For speech recognition, mismatches between training and testing for speaker and noise are normally handled separately. The work presented in this paper aims at jointly applying speaker adaptation and model-based noise compensation by embedding speaker adaptation as part of the noise mismatch function. The proposed method gives a faster and more optimum adaptation compared to compensating for these two factors separately. It is also more consistent with respect to the basic assumptions of speaker and noise adaptation. Experimental results show significant and consistent gains from the proposed method. © 2011 IEEE.
Resumo:
Fundamental frequency, or F0 is critical for high quality speech synthesis in HMM based speech synthesis. Traditionally, F0 values are considered to depend on a binary voicing decision such that they are continuous in voiced regions and undefined in unvoiced regions. Multi-space distribution HMM (MSDHMM) has been used for modelling the discontinuous F0. Recently, a continuous F0 modelling framework has been proposed and shown to be effective, where continuous F0 observations are assumed to always exist and voicing labels are explicitly modelled by an independent stream. In this paper, a refined continuous F0 modelling approach is proposed. Here, F0 values are assumed to be dependent on voicing labels and both are jointly modelled in a single stream. Due to the enforced dependency, the new method can effectively reduce the voicing classification error. Subjective listening tests also demonstrate that the new approach can yield significant improvements on the naturalness of the synthesised speech. A dynamic random unvoiced F0 generation method is also investigated. Experiments show that it has significant effect on the quality of synthesised speech. © 2011 IEEE.
Resumo:
Recently there has been interest in structured discriminative models for speech recognition. In these models sentence posteriors are directly modelled, given a set of features extracted from the observation sequence, and hypothesised word sequence. In previous work these discriminative models have been combined with features derived from generative models for noise-robust speech recognition for continuous digits. This paper extends this work to medium to large vocabulary tasks. The form of the score-space extracted using the generative models, and parameter tying of the discriminative model, are both discussed. Update formulae for both conditional maximum likelihood and minimum Bayes' risk training are described. Experimental results are presented on small and medium to large vocabulary noise-corrupted speech recognition tasks: AURORA 2 and 4. © 2011 IEEE.
Resumo:
Structured precision modelling is an important approach to improve the intra-frame correlation modelling of the standard HMM, where Gaussian mixture model with diagonal covariance are used. Previous work has all been focused on direct structured representation of the precision matrices. In this paper, a new framework is proposed, where the structure of the Cholesky square root of the precision matrix is investigated, referred to as Cholesky Basis Superposition (CBS). Each Cholesky matrix associated with a particular Gaussian distribution is represented as a linear combination of a set of Gaussian independent basis upper-triangular matrices. Efficient optimization methods are derived for both combination weights and basis matrices. Experiments on a Chinese dictation task showed that the proposed approach can significantly outperformed the direct structured precision modelling with similar number of parameters as well as full covariance modelling. © 2011 IEEE.
Resumo:
In this paper, a complete method for finite-difference time-domain modeling of rooms in 2-D using compact explicit schemes is presented. A family of interpolated schemes using a rectilinear, nonstaggered grid is reviewed, and the most accurate and isotropic schemes are identified. Frequency-dependent boundaries are modeled using a digital impedance filter formulation that is consistent with locally reacting surface theory. A structurally stable and efficient boundary formulation is constructed by carefully combining the boundary condition with the interpolated scheme. An analytic prediction formula for the effective numerical reflectance is given, and a stability proof provided. The results indicate that the identified accurate and isotropic schemes are also very accurate in terms of numerical boundary reflectance, and outperform directly related methods such as Yee's scheme and the standard digital waveguide mesh. In addition, one particular scheme-referred to here as the interpolated wideband scheme-is suggested as the best scheme for most applications.
Resumo:
This paper presents methods for simulating room acoustics using the finite-difference time-domain (FDTD) technique, focusing on boundary and medium modeling. A family of nonstaggered 3-D compact explicit FDTD schemes is analyzed in terms of stability, accuracy, and computational efficiency, and the most accurate and isotropic schemes based on a rectilinear grid are identified. A frequency-dependent boundary model that is consistent with locally reacting surface theory is also presented, in which the wall impedance is represented with a digital filter. For boundaries, accuracy in numerical reflection is analyzed and a stability proof is provided. The results indicate that the proposed 3-D interpolated wideband and isotropic schemes outperform directly related techniques based on Yee's staggered grid and standard digital waveguide mesh, and that the boundary formulations generally have properties that are similar to that of the basic scheme used.
Resumo:
In this paper, a method for modeling diffusive boundaries in finite difference time domain (FDTD) room acoustics simulations with the use of impedance filters is presented. The proposed technique is based on the concept of phase grating diffusers, and realized by designing boundary impedance filters from normal-incidence reflection filters with added delay. These added delays, that correspond to the diffuser well depths, are varied across the boundary surface, and implemented using Thiran allpass filters. The proposed method for simulating sound scattering is suitable for modeling high frequency diffusion caused by small variations in surface roughness and, more generally, diffusers characterized by narrow wells with infinitely thin separators. This concept is also applicable to other wave-based modeling techniques. The approach is validated by comparing numerical results for Schroeder diffusers to measured data. In addition, it is proposed that irregular surfaces are modeled by shaping them with Brownian noise, giving good control over the sound scattering properties of the simulated boundary through two parameters, namely the spectral density exponent and the maximum well depth.