54 resultados para Raphael, 1483-1520.
Resumo:
We present a system for keyword search on Cantonese conversational telephony audio, collected for the IARPA Babel program, that achieves good performance by combining postings lists produced by diverse speech recognition systems from three different research groups. We describe the keyword search task, the data on which the work was done, four different speech recognition systems, and our approach to system combination for keyword search. We show that the combination of four systems outperforms the best single system by 7%, achieving an actual term-weighted value of 0.517. © 2013 IEEE.
Resumo:
In natural languages multiple word sequences can represent the same underlying meaning. Only modelling the observed surface word sequence can result in poor context coverage, for example, when using n-gram language models (LM). To handle this issue, paraphrastic LMs were proposed in previous research and successfully applied to a US English conversational telephone speech transcription task. In order to exploit the complementary characteristics of paraphrastic LMs and neural network LMs (NNLM), the combination between the two is investigated in this paper. To investigate paraphrastic LMs' generalization ability to other languages, experiments are conducted on a Mandarin Chinese broadcast speech transcription task. Using a paraphrastic multi-level LM modelling both word and phrase sequences, significant error rate reductions of 0.9% absolute (9% relative) and 0.5% absolute (5% relative) were obtained over the baseline n-gram and NNLM systems respectively, after a combination with word and phrase level NNLMs. © 2013 IEEE.
Resumo:
The task in keyword spotting (KWS) is to hypothesise times at which any of a set of key terms occurs in audio. An important aspect of such systems are the scores assigned to these hypotheses, the accuracy of which have a significant impact on performance. Estimating these scores may be formulated as a confidence estimation problem, where a measure of confidence is assigned to each key term hypothesis. In this work, a set of discriminative features is defined, and combined using a conditional random field (CRF) model for improved confidence estimation. An extension to this model to directly address the problem of score normalisation across key terms is also introduced. The implicit score normalisation which results from applying this approach to separate systems in a hybrid configuration yields further benefits. Results are presented which show notable improvements in KWS performance using the techniques presented in this work. © 2013 IEEE.
Resumo:
Spoken dialogue systems provide a convenient way for users to interact with a machine using only speech. However, they often rely on a rigid turn taking regime in which a voice activity detection (VAD) module is used to determine when the user is speaking and decide when is an appropriate time for the system to respond. This paper investigates replacing the VAD and discrete utterance recogniser of a conventional turn-taking system with a continuously operating recogniser that is always listening, and using the recogniser 1-best path to guide turn taking. In this way, a flexible framework for incremental dialogue management is possible. Experimental results show that it is possible to remove the VAD component and successfully use the recogniser best path to identify user speech, with more robustness to noise, potentially smaller latency times, and a reduction in overall recognition error rate compared to using the conventional approach. © 2013 IEEE.
Resumo:
A partially observable Markov decision process has been proposed as a dialogue model that enables robustness to speech recognition errors and automatic policy optimisation using reinforcement learning (RL). However, conventional RL algorithms require a very large number of dialogues, necessitating a user simulator. Recently, Gaussian processes have been shown to substantially speed up the optimisation, making it possible to learn directly from interaction with human users. However, early studies have been limited to very low dimensional spaces and the learning has exhibited convergence problems. Here we investigate learning from human interaction using the Bayesian Update of Dialogue State system. This dynamic Bayesian network based system has an optimisation space covering more than one hundred features, allowing a wide range of behaviours to be learned. Using an improved policy model and a more robust reward function, we show that stable learning can be achieved that significantly outperforms a simulator trained policy. © 2013 IEEE.
Resumo:
Accurate estimation of the instantaneous frequency of speech resonances is a hard problem mainly due to phase discontinuities in the speech signal associated with excitation instants. We review a variety of approaches for enhanced frequency and bandwidth estimation in the time-domain and propose a new cognitively motivated approach using filterbank arrays. We show that by filtering speech resonances using filters of different center frequency, bandwidth and shape, the ambiguity in instantaneous frequency estimation associated with amplitude envelope minima and phase discontinuities can be significantly reduced. The novel estimators are shown to perform well on synthetic speech signals with frequency and bandwidth micro-modulations (i.e., modulations within a pitch period), as well as on real speech signals. Filterbank arrays, when applied to frequency and bandwidth modulation index estimation, are shown to reduce the estimation error variance by 85% and 70% respectively. © 2013 IEEE.
Resumo:
A synthetic strategy for fabricating a dense amine functionalized self-assembled monolayer (SAM) on hydroxylated surfaces is presented. The assembly steps are monitored by X-ray photoelectron spectroscopy, Fourier transform infrared- attenuated total reflection, atomic force microscopy, variable angle spectroscopic ellipsometry, UV-vis surface spectroscopy, contact angle wettability, and contact potential difference measurements. The method applies alkylbromide-trichlorosilane for the fabrication of the SAM followed by surface transformation of the bromine moiety to amine by a two-step procedure: S(N)2 reaction that introduces the hidden amine, phthalimide, followed by the removal of the protecting group and exposing the free amine. The use of phthalimide moiety in the process enabled monitoring the substitution reaction rate on the surface (by absorption spectroscopy) and showed first-order kinetics. The simplicity of the process, nonharsh reagents, and short reaction time allow the use of such SAMs in molecular nanoelectronics applications, where complete control of the used SAM is needed. The different molecular dipole of each step of the process, which is verified by DFT calculations, supports the use of these SAMs as means to tune the electronic properties of semiconductors and for better synergism between SAMs and standard microelectronics processes and devices.
Resumo:
Adaptation to speaker and environment changes is an essential part of current automatic speech recognition (ASR) systems. In recent years the use of multi-layer percpetrons (MLPs) has become increasingly common in ASR systems. A standard approach to handling speaker differences when using MLPs is to apply a global speaker-specific constrained MLLR (CMLLR) transform to the features prior to training or using the MLP. This paper considers the situation when there are both speaker and channel, communication link, differences in the data. A more powerful transform, front-end CMLLR (FE-CMLLR), is applied to the inputs to the MLP to represent the channel differences. Though global, these FE-CMLLR transforms vary from time-instance to time-instance. Experiments on a channel distorted dialect Arabic conversational speech recognition task indicates the usefulness of adapting MLP features using both CMLLR and FE-CMLLR transforms. © 2013 IEEE.
Resumo:
In this paper, we propose a low complexity and reliable wideband spectrum sensing technique that operates at sub-Nyquist sampling rates. Unlike the majority of other sub-Nyquist spectrum sensing algorithms that rely on the Compressive Sensing (CS) methodology, the introduced method does not entail solving an optimisation problem. It is characterised by simplicity and low computational complexity without compromising the system performance and yet delivers substantial reductions on the operational sampling rates. The reliability guidelines of the devised non-compressive sensing approach are provided and simulations are presented to illustrate its superior performance. © 2013 IEEE.