991 resultados para Raphael, 1483-1520
Resumo:
In this paper we study parameter estimation for time series with asymmetric α-stable innovations. The proposed methods use a Poisson sum series representation (PSSR) for the asymmetric α-stable noise to express the process in a conditionally Gaussian framework. That allows us to implement Bayesian parameter estimation using Markov chain Monte Carlo (MCMC) methods. We further enhance the series representation by introducing a novel approximation of the series residual terms in which we are able to characterise the mean and variance of the approximation. Simulations illustrate the proposed framework applied to linear time series, estimating the model parameter values and model order P for an autoregressive (AR(P)) model driven by asymmetric α-stable innovations. © 2012 IEEE.
Resumo:
Vector Taylor Series (VTS) model based compensation is a powerful approach for noise robust speech recognition. An important extension to this approach is VTS adaptive training (VAT), which allows canonical models to be estimated on diverse noise-degraded training data. These canonical model can be estimated using EM-based approaches, allowing simple extensions to discriminative VAT (DVAT). However to ensure a diagonal corrupted speech covariance matrix the Jacobian (loading matrix) relating the noise and clean speech is diagonalised. In this work an approach for yielding optimal diagonal loading matrices based on minimising the expected KL-divergence between the diagonal loading matrix and "correct" distributions is proposed. The performance of DVAT using the standard and optimal diagonalisation was evaluated on both in-car collected data and the Aurora4 task. © 2012 IEEE.
Resumo:
There are many methods for decomposing signals into a sum of amplitude and frequency modulated sinusoids. In this paper we take a new estimation based approach. Identifying the problem as ill-posed, we show how to regularize the solution by imposing soft constraints on the amplitude and phase variables of the sinusoids. Estimation proceeds using a version of Kalman smoothing. We evaluate the method on synthetic and natural, clean and noisy signals, showing that it outperforms previous decompositions, but at a higher computational cost. © 2012 IEEE.
Resumo:
饲养在滇池水中的椭圆背角无齿蚌氨基酸含量只有饲养在松华坝水库水中蚌的氨基酸含量一半,质量分数分别是680 57mg/hg和1477 14mg/hg。检测表明椭圆背角无齿蚌对砷、铅、锌、铜、铁元素有一定的富集能力,说明由于滇池污染,椭圆背角无齿蚌在滇池中虽然可以生存,但这一物种资源的利用价值受到严重影响。
Resumo:
Spoken content in languages of emerging importance needs to be searchable to provide access to the underlying information. In this paper, we investigate the problem of extending data fusion methodologies from Information Retrieval for Spoken Term Detection on low-resource languages in the framework of the IARPA Babel program. We describe a number of alternative methods improving keyword search performance. We apply these methods to Cantonese, a language that presents some new issues in terms of reduced resources and shorter query lengths. First, we show score normalization methodology that improves in average by 20% keyword search performance. Second, we show that properly combining the outputs of diverse ASR systems performs 14% better than the best normalized ASR system. © 2013 IEEE.
Resumo:
We present a system for keyword search on Cantonese conversational telephony audio, collected for the IARPA Babel program, that achieves good performance by combining postings lists produced by diverse speech recognition systems from three different research groups. We describe the keyword search task, the data on which the work was done, four different speech recognition systems, and our approach to system combination for keyword search. We show that the combination of four systems outperforms the best single system by 7%, achieving an actual term-weighted value of 0.517. © 2013 IEEE.
Resumo:
In natural languages multiple word sequences can represent the same underlying meaning. Only modelling the observed surface word sequence can result in poor context coverage, for example, when using n-gram language models (LM). To handle this issue, paraphrastic LMs were proposed in previous research and successfully applied to a US English conversational telephone speech transcription task. In order to exploit the complementary characteristics of paraphrastic LMs and neural network LMs (NNLM), the combination between the two is investigated in this paper. To investigate paraphrastic LMs' generalization ability to other languages, experiments are conducted on a Mandarin Chinese broadcast speech transcription task. Using a paraphrastic multi-level LM modelling both word and phrase sequences, significant error rate reductions of 0.9% absolute (9% relative) and 0.5% absolute (5% relative) were obtained over the baseline n-gram and NNLM systems respectively, after a combination with word and phrase level NNLMs. © 2013 IEEE.
Resumo:
The task in keyword spotting (KWS) is to hypothesise times at which any of a set of key terms occurs in audio. An important aspect of such systems are the scores assigned to these hypotheses, the accuracy of which have a significant impact on performance. Estimating these scores may be formulated as a confidence estimation problem, where a measure of confidence is assigned to each key term hypothesis. In this work, a set of discriminative features is defined, and combined using a conditional random field (CRF) model for improved confidence estimation. An extension to this model to directly address the problem of score normalisation across key terms is also introduced. The implicit score normalisation which results from applying this approach to separate systems in a hybrid configuration yields further benefits. Results are presented which show notable improvements in KWS performance using the techniques presented in this work. © 2013 IEEE.
Resumo:
Spoken dialogue systems provide a convenient way for users to interact with a machine using only speech. However, they often rely on a rigid turn taking regime in which a voice activity detection (VAD) module is used to determine when the user is speaking and decide when is an appropriate time for the system to respond. This paper investigates replacing the VAD and discrete utterance recogniser of a conventional turn-taking system with a continuously operating recogniser that is always listening, and using the recogniser 1-best path to guide turn taking. In this way, a flexible framework for incremental dialogue management is possible. Experimental results show that it is possible to remove the VAD component and successfully use the recogniser best path to identify user speech, with more robustness to noise, potentially smaller latency times, and a reduction in overall recognition error rate compared to using the conventional approach. © 2013 IEEE.
Resumo:
A partially observable Markov decision process has been proposed as a dialogue model that enables robustness to speech recognition errors and automatic policy optimisation using reinforcement learning (RL). However, conventional RL algorithms require a very large number of dialogues, necessitating a user simulator. Recently, Gaussian processes have been shown to substantially speed up the optimisation, making it possible to learn directly from interaction with human users. However, early studies have been limited to very low dimensional spaces and the learning has exhibited convergence problems. Here we investigate learning from human interaction using the Bayesian Update of Dialogue State system. This dynamic Bayesian network based system has an optimisation space covering more than one hundred features, allowing a wide range of behaviours to be learned. Using an improved policy model and a more robust reward function, we show that stable learning can be achieved that significantly outperforms a simulator trained policy. © 2013 IEEE.
Resumo:
Accurate estimation of the instantaneous frequency of speech resonances is a hard problem mainly due to phase discontinuities in the speech signal associated with excitation instants. We review a variety of approaches for enhanced frequency and bandwidth estimation in the time-domain and propose a new cognitively motivated approach using filterbank arrays. We show that by filtering speech resonances using filters of different center frequency, bandwidth and shape, the ambiguity in instantaneous frequency estimation associated with amplitude envelope minima and phase discontinuities can be significantly reduced. The novel estimators are shown to perform well on synthetic speech signals with frequency and bandwidth micro-modulations (i.e., modulations within a pitch period), as well as on real speech signals. Filterbank arrays, when applied to frequency and bandwidth modulation index estimation, are shown to reduce the estimation error variance by 85% and 70% respectively. © 2013 IEEE.
Resumo:
A synthetic strategy for fabricating a dense amine functionalized self-assembled monolayer (SAM) on hydroxylated surfaces is presented. The assembly steps are monitored by X-ray photoelectron spectroscopy, Fourier transform infrared- attenuated total reflection, atomic force microscopy, variable angle spectroscopic ellipsometry, UV-vis surface spectroscopy, contact angle wettability, and contact potential difference measurements. The method applies alkylbromide-trichlorosilane for the fabrication of the SAM followed by surface transformation of the bromine moiety to amine by a two-step procedure: S(N)2 reaction that introduces the hidden amine, phthalimide, followed by the removal of the protecting group and exposing the free amine. The use of phthalimide moiety in the process enabled monitoring the substitution reaction rate on the surface (by absorption spectroscopy) and showed first-order kinetics. The simplicity of the process, nonharsh reagents, and short reaction time allow the use of such SAMs in molecular nanoelectronics applications, where complete control of the used SAM is needed. The different molecular dipole of each step of the process, which is verified by DFT calculations, supports the use of these SAMs as means to tune the electronic properties of semiconductors and for better synergism between SAMs and standard microelectronics processes and devices.
Resumo:
Iron deficiency can induce cyanobacteria to synthesize siderophore receptor proteins on the outer membrane to enhance the uptake of iron. In this study, an outer membrane of high purity was prepared from Anabaena sp. PCC 7120 based on aqueous polymer two-phase partitioning and discontinuous sucrose density ultra-centrifugation, and the induction of outer membrane proteins by iron deficiency was investigated using 2-D gel electrophoresis. At least. five outer membrane proteins were newly synthesized or significantly up-regulated in cells transferred to iron-deficient conditions, which were all identified to be siderophore receptor proteins according to MALDI-TOF-MS analyses. Bacterial luciferase reporter genes luxAB were employed to monitor the transcription of the encoding genes. The genes were induced by iron deficiency at the transcriptional level in different responsive modes. Luciferase activity expressed from an iron-regulated promoter may be used as a bioreporter for utilizable iron in natural water samples. (C) 2009 National Natural Science Foundation of China and Chinese Academy of Sciences. Published by Elsevier Limited and Science in China Press. All rights reserved.
Resumo:
Adaptation to speaker and environment changes is an essential part of current automatic speech recognition (ASR) systems. In recent years the use of multi-layer percpetrons (MLPs) has become increasingly common in ASR systems. A standard approach to handling speaker differences when using MLPs is to apply a global speaker-specific constrained MLLR (CMLLR) transform to the features prior to training or using the MLP. This paper considers the situation when there are both speaker and channel, communication link, differences in the data. A more powerful transform, front-end CMLLR (FE-CMLLR), is applied to the inputs to the MLP to represent the channel differences. Though global, these FE-CMLLR transforms vary from time-instance to time-instance. Experiments on a channel distorted dialect Arabic conversational speech recognition task indicates the usefulness of adapting MLP features using both CMLLR and FE-CMLLR transforms. © 2013 IEEE.