944 resultados para Visual Speech Recognition, Multiple Views, Frontal View, Profile View


Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Query-by-Example Spoken Term Detection (QbE STD) aims at retrieving data from a speech data repository given an acoustic query containing the term of interest as input. Nowadays, it has been receiving much interest due to the high volume of information stored in audio or audiovisual format. QbE STD differs from automatic speech recognition (ASR) and keyword spotting (KWS)/spoken term detection (STD) since ASR is interested in all the terms/words that appear in the speech signal and KWS/STD relies on a textual transcription of the search term to retrieve the speech data. This paper presents the systems submitted to the ALBAYZIN 2012 QbE STD evaluation held as a part of ALBAYZIN 2012 evaluation campaign within the context of the IberSPEECH 2012 Conference(a). The evaluation consists of retrieving the speech files that contain the input queries, indicating their start and end timestamps within the appropriate speech file. Evaluation is conducted on a Spanish spontaneous speech database containing a set of talks from MAVIR workshops(b), which amount at about 7 h of speech in total. We present the database metric systems submitted along with all results and some discussion. Four different research groups took part in the evaluation. Evaluation results show the difficulty of this task and the limited performance indicates there is still a lot of room for improvement. The best result is achieved by a dynamic time warping-based search over Gaussian posteriorgrams/posterior phoneme probabilities. This paper also compares the systems aiming at establishing the best technique dealing with that difficult task and looking for defining promising directions for this relatively novel task.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Does language-specific orthography help language detection and lexical access in naturalistic bilingual contexts? This study investigates how L2 orthotactic properties influence bilingual language detection in bilingual societies and the extent to which it modulates lexical access and single word processing. Language specificity of naturalistically learnt L2 words was manipulated by including bigram combinations that could be either L2 language-specific or common in the two languages known by bilinguals. A group of balanced bilinguals and a group of highly proficient but unbalanced bilinguals who grew up in a bilingual society were tested, together with a group of monolinguals (for control purposes). All the participants completed a speeded language detection task and a progressive demasking task. Results showed that the use of the information of orthotactic rules across languages depends on the task demands at hand, and on participants' proficiency in the second language. The influence of language orthotactic rules during language detection, lexical access and word identification are discussed according to the most prominent models of bilingual word recognition.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes two applications in speech recognition of the use of stochastic context-free grammars (SCFGs) trained automatically via the Inside-Outside Algorithm. First, SCFGs are used to model VQ encoded speech for isolated word recognition and are compared directly to HMMs used for the same task. It is shown that SCFGs can model this low-level VQ data accurately and that a regular grammar based pre-training algorithm is effective both for reducing training time and obtaining robust solutions. Second, an SCFG is inferred from a transcription of the speech used to train a phoneme-based recognizer in an attempt to model phonotactic constraints. When used as a language model, this SCFG gives improved performance over a comparable regular grammar or bigram. © 1991.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Over the past decade, a variety of user models have been proposed for user simulation-based reinforcement-learning of dialogue strategies. However, the strategies learned with these models are rarely evaluated in actual user trials and it remains unclear how the choice of user model affects the quality of the learned strategy. In particular, the degree to which strategies learned with a user model generalise to real user populations has not be investigated. This paper presents a series of experiments that qualitatively and quantitatively examine the effect of the user model on the learned strategy. Our results show that the performance and characteristics of the strategy are in fact highly dependent on the user model. Furthermore, a policy trained with a poor user model may appear to perform well when tested with the same model, but fail when tested with a more sophisticated user model. This raises significant doubts about the current practice of learning and evaluating strategies with the same user model. The paper further investigates a new technique for testing and comparing strategies directly on real human-machine dialogues, thereby avoiding any evaluation bias introduced by the user model. © 2005 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As the use of found data increases, more systems are being built using adaptive training. Here transforms are used to represent unwanted acoustic variability, e.g. speaker and acoustic environment changes, allowing a canonical model that models only the "pure" variability of speech to be trained. Adaptive training may be described within a Bayesian framework. By using complexity control approaches to ensure robust parameter estimates, the standard point estimate adaptive training can be justified within this Bayesian framework. However during recognition there is usually no control over the amount of data available. It is therefore preferable to be able to use a full Bayesian approach to applying transforms during recognition rather than the standard point estimates. This paper discusses various approximations to Bayesian approaches including a new variational Bayes approximation. The application of these approaches to state-of-the-art adaptively trained systems using both CAT and MLLR transforms is then described and evaluated on a large vocabulary speech recognition task. © 2005 IEEE.