914 resultados para Speech synthesis Data processing
Resumo:
A novel method for modelling the statistics of 2D photographic images useful in image restoration is defined. The new method is based on the Dual Tree Complex Wavelet Transform (DT-CWT) but a phase rotation is applied to the coefficients to create complex coefficients whose phase is shift-invariant at multiscale edge and ridge features. This is in addition to the magnitude shift invariance achieved by the DT-CWT. The increased correlation between coefficients adjacent in space and scale provides an improved mechanism for signal estimation. © 2006 IEEE.
Resumo:
In this paper we study parameter estimation for time series with asymmetric α-stable innovations. The proposed methods use a Poisson sum series representation (PSSR) for the asymmetric α-stable noise to express the process in a conditionally Gaussian framework. That allows us to implement Bayesian parameter estimation using Markov chain Monte Carlo (MCMC) methods. We further enhance the series representation by introducing a novel approximation of the series residual terms in which we are able to characterise the mean and variance of the approximation. Simulations illustrate the proposed framework applied to linear time series, estimating the model parameter values and model order P for an autoregressive (AR(P)) model driven by asymmetric α-stable innovations. © 2012 IEEE.
Resumo:
There are many methods for decomposing signals into a sum of amplitude and frequency modulated sinusoids. In this paper we take a new estimation based approach. Identifying the problem as ill-posed, we show how to regularize the solution by imposing soft constraints on the amplitude and phase variables of the sinusoids. Estimation proceeds using a version of Kalman smoothing. We evaluate the method on synthetic and natural, clean and noisy signals, showing that it outperforms previous decompositions, but at a higher computational cost. © 2012 IEEE.
Resumo:
In natural languages multiple word sequences can represent the same underlying meaning. Only modelling the observed surface word sequence can result in poor context coverage, for example, when using n-gram language models (LM). To handle this issue, paraphrastic LMs were proposed in previous research and successfully applied to a US English conversational telephone speech transcription task. In order to exploit the complementary characteristics of paraphrastic LMs and neural network LMs (NNLM), the combination between the two is investigated in this paper. To investigate paraphrastic LMs' generalization ability to other languages, experiments are conducted on a Mandarin Chinese broadcast speech transcription task. Using a paraphrastic multi-level LM modelling both word and phrase sequences, significant error rate reductions of 0.9% absolute (9% relative) and 0.5% absolute (5% relative) were obtained over the baseline n-gram and NNLM systems respectively, after a combination with word and phrase level NNLMs. © 2013 IEEE.
Resumo:
The task in keyword spotting (KWS) is to hypothesise times at which any of a set of key terms occurs in audio. An important aspect of such systems are the scores assigned to these hypotheses, the accuracy of which have a significant impact on performance. Estimating these scores may be formulated as a confidence estimation problem, where a measure of confidence is assigned to each key term hypothesis. In this work, a set of discriminative features is defined, and combined using a conditional random field (CRF) model for improved confidence estimation. An extension to this model to directly address the problem of score normalisation across key terms is also introduced. The implicit score normalisation which results from applying this approach to separate systems in a hybrid configuration yields further benefits. Results are presented which show notable improvements in KWS performance using the techniques presented in this work. © 2013 IEEE.
Resumo:
Spoken dialogue systems provide a convenient way for users to interact with a machine using only speech. However, they often rely on a rigid turn taking regime in which a voice activity detection (VAD) module is used to determine when the user is speaking and decide when is an appropriate time for the system to respond. This paper investigates replacing the VAD and discrete utterance recogniser of a conventional turn-taking system with a continuously operating recogniser that is always listening, and using the recogniser 1-best path to guide turn taking. In this way, a flexible framework for incremental dialogue management is possible. Experimental results show that it is possible to remove the VAD component and successfully use the recogniser best path to identify user speech, with more robustness to noise, potentially smaller latency times, and a reduction in overall recognition error rate compared to using the conventional approach. © 2013 IEEE.
Resumo:
A partially observable Markov decision process has been proposed as a dialogue model that enables robustness to speech recognition errors and automatic policy optimisation using reinforcement learning (RL). However, conventional RL algorithms require a very large number of dialogues, necessitating a user simulator. Recently, Gaussian processes have been shown to substantially speed up the optimisation, making it possible to learn directly from interaction with human users. However, early studies have been limited to very low dimensional spaces and the learning has exhibited convergence problems. Here we investigate learning from human interaction using the Bayesian Update of Dialogue State system. This dynamic Bayesian network based system has an optimisation space covering more than one hundred features, allowing a wide range of behaviours to be learned. Using an improved policy model and a more robust reward function, we show that stable learning can be achieved that significantly outperforms a simulator trained policy. © 2013 IEEE.
Resumo:
Accurate estimation of the instantaneous frequency of speech resonances is a hard problem mainly due to phase discontinuities in the speech signal associated with excitation instants. We review a variety of approaches for enhanced frequency and bandwidth estimation in the time-domain and propose a new cognitively motivated approach using filterbank arrays. We show that by filtering speech resonances using filters of different center frequency, bandwidth and shape, the ambiguity in instantaneous frequency estimation associated with amplitude envelope minima and phase discontinuities can be significantly reduced. The novel estimators are shown to perform well on synthetic speech signals with frequency and bandwidth micro-modulations (i.e., modulations within a pitch period), as well as on real speech signals. Filterbank arrays, when applied to frequency and bandwidth modulation index estimation, are shown to reduce the estimation error variance by 85% and 70% respectively. © 2013 IEEE.
Resumo:
A partially observable Markov decision process (POMDP) has been proposed as a dialog model that enables automatic optimization of the dialog policy and provides robustness to speech understanding errors. Various approximations allow such a model to be used for building real-world dialog systems. However, they require a large number of dialogs to train the dialog policy and hence they typically rely on the availability of a user simulator. They also require significant designer effort to hand-craft the policy representation. We investigate the use of Gaussian processes (GPs) in policy modeling to overcome these problems. We show that GP policy optimization can be implemented for a real world POMDP dialog manager, and in particular: 1) we examine different formulations of a GP policy to minimize variability in the learning process; 2) we find that the use of GP increases the learning rate by an order of magnitude thereby allowing learning by direct interaction with human users; and 3) we demonstrate that designer effort can be substantially reduced by basing the policy directly on the full belief space thereby avoiding ad hoc feature space modeling. Overall, the GP approach represents an important step forward towards fully automatic dialog policy optimization in real world systems. © 2013 IEEE.
Resumo:
The Pade approximation with Baker's algorithm is compared with the least-squares Prony method and the generalized pencil-of-functions (GPOF) method for calculating mode frequencies and mode Q factors for coupled optical microdisks by FDTD technique. Comparisons of intensity spectra and the corresponding mode frequencies and Q factors show that the Pade approximation can yield more stable results than the Prony and the GPOF methods, especially the intensity spectrum. The results of the Prony method and the GPOF method are greatly influenced by the selected number of resonant modes, which need to be optimized during the data processing, in addition to the length of the time response signal. Furthermore, the Pade approximation is applied to calculate light delay for embedded microring resonators from complex transmission spectra obtained by the Pade approximation from a FDTD output. The Prony and the GPOF methods cannot be applied to calculate the transmission spectra, because the transmission signal obtained by the FDTD simulation cannot be expressed as a sum of damped complex exponentials. (C) 2009 Optical Society of America
Resumo:
语音是人们日常生活中高效、自然的交流方式之一。但是直到目前为止,语音交互方式在计算机技术上的应用还是比较少的。近年来,随着Ubiquitous Computing和便携式计算机的出现,再次对语音用户界面的应用提出了迫切的需求。而且语音识别、合成技术的发展也为语音交互界面的实现提供了技术基础。本文综合参考了国内外语音界面的一些应用系统实例以及语音这种独特的交流媒体的优点和局限性.总结了语音用户界面的适用环境和设计指导原则,并提出了对语音界面的发展展望。
Resumo:
Gas chromatography-mass spectrometry with electron ionization and positive-ion chemical ionization and comprehensive two-dimensional gas chromatography-time-of-flight mass spectrometry (GC x GC-TOF-MS) were applied for the characterization of the chemical composition of complex hydrocarbons in the non-polar neutral fraction of cigarette smoke condensates. Automated data processing by TOF-MS software combined with structured chromatograms and manual review of library hits were used to assign the components from GC x GC-TOF-MS analysis. The distributions of aliphatic hydrocarbons and aromatics were also investigated. Over 100 isoprenoid hydrocarbons were detected, including carotene degradation products, phytadiene isomers and carbocyclic diterpenoids. A total of 1800 hydrocarbons were tentatively identified, including aliphatic hydrocarbons, aromatics, and isoprenoid hydrocarbons. The identified hydrocarbons by GC x GC-TOF-MS were far more than those by GC-MS. (C) 2004 Elsevier B.V. All rights reserved.
Resumo:
数据是地理信息系统 (GIS)应用的核心。现实世界的数据具有普遍的多样性 ,关于 GIS接受不同数据的研究已成为当前 GIS研究中的一个难点和热点。从常见空间数据类型的格式、GIS接受外部空间数据的方式以及 GIS接受外部空间数据中的数据精度、比例尺、坐标变换等几个方面 ,对外部空间数据处理系统的数据向 GIS转换的问题进行了探讨 ,同时以通用 GIS工具软件 ARC/INFO为例分析了其接受外部矢量空间数据的方式。
Resumo:
降雨雨谱的观测,是了解降雨特性的一项重要工作。修建小流域模型,是对小流域进行各种试验研究的有效手段。该文基于相似率要求,在模型降雨雨滴较小,雨谱参数难以测定问题的基础上,运用CorelDRAW软件与传统色斑法相结合,探索出一种新的测量与数据处理方法,并得出不同情况下模型的雨谱特性。该法提高了测量精度,并减少了工作量。