174 resultados para continuous representations


20.00% 20.00%



This paper presents some developments in query expansion and document representation of our spoken document retrieval system and shows how various retrieval techniques affect performance for different sets of transcriptions derived from a common speech source. Modifications of the document representation are used, which combine several techniques for query expansion, knowledge-based on one hand and statistics-based on the other. Taken together, these techniques can improve Average Precision by over 19% relative to a system similar to that which we presented at TREC-7. These new experiments have also confirmed that the degradation of Average Precision due to a word error rate (WER) of 25% is quite small (3.7% relative) and can be reduced to almost zero (0.2% relative). The overall improvement of the retrieval system can also be observed for seven different sets of transcriptions from different recognition engines with a WER ranging from 24.8% to 61.5%. We hope to repeat these experiments when larger document collections become available, in order to evaluate the scalability of these techniques.


20.00% 20.00%



In sensorimotor integration, sensory input and motor output signals are combined to provide an internal estimate of the state of both the world and one's own body. Although a single perceptual and motor snapshot can provide information about the current state, computational models show that the state can be optimally estimated by a recursive process in which an internal estimate is maintained and updated by the current sensory and motor signals. These models predict that an internal state estimate is maintained or stored in the brain. Here we report a patient with a lesion of the superior parietal lobe who shows both sensory and motor deficits consistent with an inability to maintain such an internal representation between updates. Our findings suggest that the superior parietal lobe is critical for sensorimotor integration, by maintaining an internal representation of the body's state.


20.00% 20.00%



Most HMM-based TTS systems use a hard voiced/unvoiced classification to produce a discontinuous F0 signal which is used for the generation of the source-excitation. When a mixed source excitation is used, this decision can be based on two different sources of information: the state-specific MSD-prior of the F0 models, and/or the frame-specific features generated by the aperiodicity model. This paper examines the meaning of these variables in the synthesis process, their interaction, and how they affect the perceived quality of the generated speech The results of several perceptual experiments show that when using mixed excitation, subjects consistently prefer samples with very few or no false unvoiced errors, whereas a reduction in the rate of false voiced errors does not produce any perceptual improvement. This suggests that rather than using any form of hard voiced/unvoiced classification, e.g., the MSD-prior, it is better for synthesis to use a continuous F0 signal and rely on the frame-level soft voiced/unvoiced decision of the aperiodicity model. © 2011 IEEE.


20.00% 20.00%



Fundamental frequency, or F0 is critical for high quality speech synthesis in HMM based speech synthesis. Traditionally, F0 values are considered to depend on a binary voicing decision such that they are continuous in voiced regions and undefined in unvoiced regions. Multi-space distribution HMM (MSDHMM) has been used for modelling the discontinuous F0. Recently, a continuous F0 modelling framework has been proposed and shown to be effective, where continuous F0 observations are assumed to always exist and voicing labels are explicitly modelled by an independent stream. In this paper, a refined continuous F0 modelling approach is proposed. Here, F0 values are assumed to be dependent on voicing labels and both are jointly modelled in a single stream. Due to the enforced dependency, the new method can effectively reduce the voicing classification error. Subjective listening tests also demonstrate that the new approach can yield significant improvements on the naturalness of the synthesised speech. A dynamic random unvoiced F0 generation method is also investigated. Experiments show that it has significant effect on the quality of synthesised speech. © 2011 IEEE.


20.00% 20.00%



Recently there has been interest in structured discriminative models for speech recognition. In these models sentence posteriors are directly modelled, given a set of features extracted from the observation sequence, and hypothesised word sequence. In previous work these discriminative models have been combined with features derived from generative models for noise-robust speech recognition for continuous digits. This paper extends this work to medium to large vocabulary tasks. The form of the score-space extracted using the generative models, and parameter tying of the discriminative model, are both discussed. Update formulae for both conditional maximum likelihood and minimum Bayes' risk training are described. Experimental results are presented on small and medium to large vocabulary noise-corrupted speech recognition tasks: AURORA 2 and 4. © 2011 IEEE.


20.00% 20.00%



In this paper we present Poisson sum series representations for α-stable (αS) random variables and a-stable processes, in particular concentrating on continuous-time autoregressive (CAR) models driven by α-stable Lévy processes. Our representations aim to provide a conditionally Gaussian framework, which will allow parameter estimation using Rao-Blackwellised versions of state of the art Bayesian computational methods such as particle filters and Markov chain Monte Carlo (MCMC). To overcome the issues due to truncation of the series, novel residual approximations are developed. Simulations demonstrate the potential of these Poisson sum representations for inference in otherwise intractable α-stable models. © 2011 IEEE.


20.00% 20.00%



Simulated annealing is a popular method for approaching the solution of a global optimization problem. Existing results on its performance apply to discrete combinatorial optimization where the optimization variables can assume only a finite set of possible values. We introduce a new general formulation of simulated annealing which allows one to guarantee finite-time performance in the optimization of functions of continuous variables. The results hold universally for any optimization problem on a bounded domain and establish a connection between simulated annealing and up-to-date theory of convergence of Markov chain Monte Carlo methods on continuous domains. This work is inspired by the concept of finite-time learning with known accuracy and confidence developed in statistical learning theory.


20.00% 20.00%



A dynamical system can exhibit structure on multiple levels. Different system representations can capture different elements of a dynamical system's structure. We consider LTI input-output dynamical systems and present four representations of structure: complete computational structure, subsystem structure, signal structure, and input output sparsity structure. We then explore some of the mathematical relationships that relate these different representations of structure. In particular, we show that signal and subsystem structure are fundamentally different ways of representing system structure. A signal structure does not always specify a unique subsystem structure nor does subsystem structure always specify a unique signal structure. We illustrate these concepts with a numerical example. © 2011 AACC American Automatic Control Council.


20.00% 20.00%



A control algorithm is presented that addresses the stability issues inherent to the operation of monolithic mode-locked laser diodes. It enables a continuous pulse duration tuning without any onset of Q-switching instabilities. A demonstration of the algorithm performance is presented for two radically different laser diode geometries and continuous pulse duration tuning between 0.5 ps to 2.2 ps and 1.2 ps to 10.2 ps is achieved. With practical applications in mind, this algorithm also facilitates control over performance parameters such as output power and wavelength during pulse duration tuning. The developed algorithm enables the user to harness the operational flexibility from such a laser with 'push-button' simplicity.


20.00% 20.00%



The contribution described in this paper is an algorithm for learning nonlinear, reference tracking, control policies given no prior knowledge of the dynamical system and limited interaction with the system through the learning process. Concepts from the field of reinforcement learning, Bayesian statistics and classical control have been brought together in the formulation of this algorithm which can be viewed as a form of indirect self tuning regulator. On the task of reference tracking using a simulated inverted pendulum it was shown to yield generally improved performance on the best controller derived from the standard linear quadratic method using only 30 s of total interaction with the system. Finally, the algorithm was shown to work on the simulated double pendulum proving its ability to solve nontrivial control tasks. © 2011 IEEE.