182 resultados para Speech and voice functions


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article presents a novel algorithm for learning parameters in statistical dialogue systems which are modeled as Partially Observable Markov Decision Processes (POMDPs). The three main components of a POMDP dialogue manager are a dialogue model representing dialogue state information; a policy that selects the system's responses based on the inferred state; and a reward function that specifies the desired behavior of the system. Ideally both the model parameters and the policy would be designed to maximize the cumulative reward. However, while there are many techniques available for learning the optimal policy, no good ways of learning the optimal model parameters that scale to real-world dialogue systems have been found yet. The presented algorithm, called the Natural Actor and Belief Critic (NABC), is a policy gradient method that offers a solution to this problem. Based on observed rewards, the algorithm estimates the natural gradient of the expected cumulative reward. The resulting gradient is then used to adapt both the prior distribution of the dialogue model parameters and the policy parameters. In addition, the article presents a variant of the NABC algorithm, called the Natural Belief Critic (NBC), which assumes that the policy is fixed and only the model parameters need to be estimated. The algorithms are evaluated on a spoken dialogue system in the tourist information domain. The experiments show that model parameters estimated to maximize the expected cumulative reward result in significantly improved performance compared to the baseline hand-crafted model parameters. The algorithms are also compared to optimization techniques using plain gradients and state-of-the-art random search algorithms. In all cases, the algorithms based on the natural gradient work significantly better. © 2011 ACM.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech recognition systems typically contain many Gaussian distributions, and hence a large number of parameters. This makes them both slow to decode speech, and large to store. Techniques have been proposed to decrease the number of parameters. One approach is to share parameters between multiple Gaussians, thus reducing the total number of parameters and allowing for shared likelihood calculation. Gaussian tying and subspace clustering are two related techniques which take this approach to system compression. These techniques can decrease the number of parameters with no noticeable drop in performance for single systems. However, multiple acoustic models are often used in real speech recognition systems. This paper considers the application of Gaussian tying and subspace compression to multiple systems. Results show that two speech recognition systems can be modelled using the same number of Gaussians as just one system, with little effect on individual system performance. Copyright © 2009 ISCA.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Language models (LMs) are often constructed by building multiple individual component models that are combined using context independent interpolation weights. By tuning these weights, using either perplexity or discriminative approaches, it is possible to adapt LMs to a particular task. This paper investigates the use of context dependent weighting in both interpolation and test-time adaptation of language models. Depending on the previous word contexts, a discrete history weighting function is used to adjust the contribution from each component model. As this dramatically increases the number of parameters to estimate, robust weight estimation schemes are required. Several approaches are described in this paper. The first approach is based on MAP estimation where interpolation weights of lower order contexts are used as smoothing priors. The second approach uses training data to ensure robust estimation of LM interpolation weights. This can also serve as a smoothing prior for MAP adaptation. A normalized perplexity metric is proposed to handle the bias of the standard perplexity criterion to corpus size. A range of schemes to combine weight information obtained from training data and test data hypotheses are also proposed to improve robustness during context dependent LM adaptation. In addition, a minimum Bayes' risk (MBR) based discriminative training scheme is also proposed. An efficient weighted finite state transducer (WFST) decoding algorithm for context dependent interpolation is also presented. The proposed technique was evaluated using a state-of-the-art Mandarin Chinese broadcast speech transcription task. Character error rate (CER) reductions up to 7.3 relative were obtained as well as consistent perplexity improvements. © 2012 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There are many methods for decomposing signals into a sum of amplitude and frequency modulated sinusoids. In this paper we take a new estimation based approach. Identifying the problem as ill-posed, we show how to regularize the solution by imposing soft constraints on the amplitude and phase variables of the sinusoids. Estimation proceeds using a version of Kalman smoothing. We evaluate the method on synthetic and natural, clean and noisy signals, showing that it outperforms previous decompositions, but at a higher computational cost. © 2012 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Lyapunov-like conditions that utilize generalizations of energy and barrier functions certifying Zeno behavior near Zeno equilibria are presented. To better illustrate these conditions, we will study them in the context of Lagrangian hybrid systems. Through the observation that Lagrangian hybrid systems with isolated Zeno equilibria must have a onedimensional configuration space, we utilize our Lyapunov-like conditions to obtain easily verifiable necessary and sufficient conditions for the existence of Zeno behavior in systems of this form. © 2007 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents flow field measurements for the turbulent stratified burner introduced in two previous publications in which high resolution scalar measurements were made by Sweeney et al. [1,2] for model validation. The flow fields of the series of premixed and stratified methane/air flames are investigated under turbulent, globally lean conditions (φg=0.75). Velocity data acquired with laser Doppler anemometry (LDA) and particle image velocimetry (PIV) are presented and discussed. Pairwise 2-component LDA measurements provide profiles of axial velocity, radial velocity, tangential velocity and corresponding fluctuating velocities. The LDA measurements of axial and tangential velocities enable the swirl number to be evaluated and the degree of swirl characterized. Power spectral density and autocorrelation functions derived from the LDA data acquired at 10kHz are optimized to calculate the integral time scales. Flow patterns are obtained using a 2-component PIV system operated at 7Hz. Velocity profiles and spatial correlations derived from the PIV and LDA measurements are shown to be in very good agreement, thus offering 3D mapping of the velocities. A strong correlation was observed between the shape of the recirculation zones above the central bluff body and the effects of heat release, stoichiometry and swirl. Detailed analyses of the LDA data further demonstrate that the flow behavior changes significantly with the levels of swirl and stratification, which combines the contributions of dilatation, recirculation and swirl. Key turbulence parameters are derived from the total velocity components, combining axial, radial and tangential velocities. © 2013 The Combustion Institute.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The design of wind turbine blades is a true multi-objective engineering task. The aerodynamic effectiveness of the turbine needs to be balanced with the system loads introduced by the rotor. Moreover the problem is not dependent on a single geometric property, but besides other parameters on a combination of aerofoil family and various blade functions. The aim of this paper is therefore to present a tool which can help designers to get a deeper insight into the complexity of the design space and to find a blade design which is likely to have a low cost of energy. For the research we use a Computational Blade Optimisation and Load Deflation Tool (CoBOLDT) to investigate the three extreme point designs obtained from a multi-objective optimisation of turbine thrust, annual energy production as well as mass for a horizontal axis wind turbine blade. The optimisation algorithm utilised is based on Multi-Objective Tabu Search which constitutes the core of CoBOLDT. The methodology is capable to parametrise the spanning aerofoils with two-dimensional Free Form Deformation and blade functions with two tangentially connected cubic splines. After geometry generation we use a panel code to create aerofoil polars and a stationary Blade Element Momentum code to evaluate turbine performance. Finally, the obtained loads are fed into a structural layout module to estimate the mass and stiffness of the current blade by means of a fully stressed design. For the presented test case we chose post optimisation analysis with parallel coordinates to reveal geometrical features of the extreme point designs and to select a compromise design from the Pareto set. The research revealed that a blade with a feasible laminate layout can be obtained, that can increase the energy capture and lower steady state systems loads. The reduced aerofoil camber and an increased L/. D-ratio could be identified as the main drivers. This statement could not be made with other tools of the research community before. © 2013 Elsevier Ltd.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spoken content in languages of emerging importance needs to be searchable to provide access to the underlying information. In this paper, we investigate the problem of extending data fusion methodologies from Information Retrieval for Spoken Term Detection on low-resource languages in the framework of the IARPA Babel program. We describe a number of alternative methods improving keyword search performance. We apply these methods to Cantonese, a language that presents some new issues in terms of reduced resources and shorter query lengths. First, we show score normalization methodology that improves in average by 20% keyword search performance. Second, we show that properly combining the outputs of diverse ASR systems performs 14% better than the best normalized ASR system. © 2013 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In natural languages multiple word sequences can represent the same underlying meaning. Only modelling the observed surface word sequence can result in poor context coverage, for example, when using n-gram language models (LM). To handle this issue, paraphrastic LMs were proposed in previous research and successfully applied to a US English conversational telephone speech transcription task. In order to exploit the complementary characteristics of paraphrastic LMs and neural network LMs (NNLM), the combination between the two is investigated in this paper. To investigate paraphrastic LMs' generalization ability to other languages, experiments are conducted on a Mandarin Chinese broadcast speech transcription task. Using a paraphrastic multi-level LM modelling both word and phrase sequences, significant error rate reductions of 0.9% absolute (9% relative) and 0.5% absolute (5% relative) were obtained over the baseline n-gram and NNLM systems respectively, after a combination with word and phrase level NNLMs. © 2013 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spoken dialogue systems provide a convenient way for users to interact with a machine using only speech. However, they often rely on a rigid turn taking regime in which a voice activity detection (VAD) module is used to determine when the user is speaking and decide when is an appropriate time for the system to respond. This paper investigates replacing the VAD and discrete utterance recogniser of a conventional turn-taking system with a continuously operating recogniser that is always listening, and using the recogniser 1-best path to guide turn taking. In this way, a flexible framework for incremental dialogue management is possible. Experimental results show that it is possible to remove the VAD component and successfully use the recogniser best path to identify user speech, with more robustness to noise, potentially smaller latency times, and a reduction in overall recognition error rate compared to using the conventional approach. © 2013 IEEE.