Biblioteca Digital

11 resultados para Readers and speakers

em Cambridge University Engineering Department Publications Database

WWW++ - Adding why to what, when and where

Relevância:

90.00% 90.00%

Publicador:

Resumo:

RFID technology can be used to its fullest potential only with software to supplement the hardware with powerful capabilities for data capture, filtering, counting and storage. The EPCglobal Network architecture encourages minimizing the amount of business logic embedded in the tags, readers and middleware. This creates the need for a Business Logic Layer above the event filtering layer that enhances basic observation events with business context - i.e. in addition to the (what, when, where) information about an observation, it adds context information about why the object was there. The purpose of this project is to develop an implementation of the Business Logic Layer. This application accepts observation event data (e.g. from the Application Level Events (ALE) standard interface), enriches them with business context and provides these enriched events to a repository of business-level events (e.g. via the EPC Information Services (EPCIS) capture interface). The strength of the application lies in the automatic addition of business context. It is quick and easy to adapt any business process to the framework suggested and equally easy to reconfigure it if the business process is changed. A sample application has been developed for a business scenario in the retail sector.

Veja mais

Lexicon adaptation for LVCSR: speaker idiosyncracies, non-native speakers, and pronunciation choice

Relevância:

40.00% 40.00%

Publicador:

Veja mais

Acoustic Source Localization and Tracking of a Time-Varying Number of Speakers

Relevância:

40.00% 40.00%

Publicador:

Veja mais

Tracking multiple speakers using random sets

Relevância:

30.00% 30.00%

Publicador:

Veja mais

Voice conversion for unknown speakers

Relevância:

30.00% 30.00%

Publicador:

Veja mais

Tracking an unknown time-varying number of speakers using TDOA measurements: a random finite set approach

Relevância:

30.00% 30.00%

Publicador:

Veja mais

A Unit Selection Text-to-Speech Synthesis System Optimized for Use with Screen Readers

Relevância:

30.00% 30.00%

Publicador:

Veja mais

Phone-level pronunciation scoring and assessment for interactive language learning

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper investigates a method of automatic pronunciation scoring for use in computer-assisted language learning (CALL) systems. The method utilizes a likelihood-based `Goodness of Pronunciation' (GOP) measure which is extended to include individual thresholds for each phone based on both averaged native confidence scores and on rejection statistics provided by human judges. Further improvements are obtained by incorporating models of the subject's native language and by augmenting the recognition networks to include expected pronunciation errors. The various GOP measures are assessed using a specially recorded database of non-native speakers which has been annotated to mark phone-level pronunciation errors. Since pronunciation assessment is highly subjective, a set of four performance measures has been designed, each of them measuring different aspects of how well computer-derived phone-level scores agree with human scores. These performance measures are used to cross-validate the reference annotations and to assess the basic GOP algorithm and its refinements. The experimental results suggest that a likelihood-based pronunciation scoring metric can achieve usable performance, especially after applying the various enhancements.

Veja mais

Unsupervised intra-lingual and cross-lingual speaker adaptation for HMM-based speech synthesis using two-pass decision tree construction

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hidden Markov model (HMM)-based speech synthesis systems possess several advantages over concatenative synthesis systems. One such advantage is the relative ease with which HMM-based systems are adapted to speakers not present in the training dataset. Speaker adaptation methods used in the field of HMM-based automatic speech recognition (ASR) are adopted for this task. In the case of unsupervised speaker adaptation, previous work has used a supplementary set of acoustic models to estimate the transcription of the adaptation data. This paper first presents an approach to the unsupervised speaker adaptation task for HMM-based speech synthesis models which avoids the need for such supplementary acoustic models. This is achieved by defining a mapping between HMM-based synthesis models and ASR-style models, via a two-pass decision tree construction process. Second, it is shown that this mapping also enables unsupervised adaptation of HMM-based speech synthesis models without the need to perform linguistic analysis of the estimated transcription of the adaptation data. Third, this paper demonstrates how this technique lends itself to the task of unsupervised cross-lingual adaptation of HMM-based speech synthesis models, and explains the advantages of such an approach. Finally, listener evaluations reveal that the proposed unsupervised adaptation methods deliver performance approaching that of supervised adaptation.

Veja mais

Statistical parametric speech synthesis based on speaker and language factorization

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An increasingly common scenario in building speech synthesis and recognition systems is training on inhomogeneous data. This paper proposes a new framework for estimating hidden Markov models on data containing both multiple speakers and multiple languages. The proposed framework, speaker and language factorization, attempts to factorize speaker-/language-specific characteristics in the data and then model them using separate transforms. Language-specific factors in the data are represented by transforms based on cluster mean interpolation with cluster-dependent decision trees. Acoustic variations caused by speaker characteristics are handled by transforms based on constrained maximum-likelihood linear regression. Experimental results on statistical parametric speech synthesis show that the proposed framework enables data from multiple speakers in different languages to be used to: train a synthesis system; synthesize speech in a language using speaker characteristics estimated in a different language; and adapt to a new language. © 2012 IEEE.

Veja mais

Integrated online speaker clustering and adaptation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

For many applications, it is necessary to produce speech transcriptions in a causal fashion. To produce high quality transcripts, speaker adaptation is often used. This requires online speaker clustering and incremental adaptation techniques to be developed. This paper presents an integrated approach to online speaker clustering and adaptation which allows efficient clustering of speakers using the same accumulated statistics that are normally used for adaptation. Using a consistent criterion for both clustering and adaptation should yield gains for both stages. The proposed approach is evaluated on a meetings transcription task using audio from multiple distant microphones. Consistent gains over standard clustering and adaptation were obtained. Copyright © 2011 ISCA.

Veja mais

11 resultados para Readers and speakers

em Cambridge University Engineering Department Publications Database

Filtro por publicador