66 resultados para Visual Speaker Recognition, Visual Speech Recognition, Cascading Appearance-Based Features


Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper provides a summary of our studies on robust speech recognition based on a new statistical approach – the probabilistic union model. We consider speech recognition given that part of the acoustic features may be corrupted by noise. The union model is a method for basing the recognition on the clean part of the features, thereby reducing the effect of the noise on recognition. To this end, the union model is similar to the missing feature method. However, the two methods achieve this end through different routes. The missing feature method usually requires the identity of the noisy data for noise removal, while the union model combines the local features based on the union of random events, to reduce the dependence of the model on information about the noise. We previously investigated the applications of the union model to speech recognition involving unknown partial corruption in frequency band, in time duration, and in feature streams. Additionally, a combination of the union model with conventional noise-reduction techniques was studied, as a means of dealing with a mixture of known or trainable noise and unknown unexpected noise. In this paper, a unified review, in the context of dealing with unknown partial feature corruption, is provided into each of these applications, giving the appropriate theory and implementation algorithms, along with an experimental evaluation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Alloparental care was investigated in the biparental West African cichlid, Pelvicachromis pulcher. Non-breeding adults typically consumed young conspecifics but this trait was inhibited in both sexes during reproductive attempts. Alien conspecific young were accepted into the brood if they were of a similar age/developmental stage to the parents' own young but not if they were much older or much younger. If not accepted they were consumed by whichever adult located them. Parents separated from their brood for up to 4 days accepted their young on reunion but separation for more than 4 days resulted in the young being consumed. This latter response occurred if chemical stimuli from the young were available during the separation but not if visual stimuli were available. In this latter case parental responsiveness was maintained. Both sexes of this externally fertilizing species appeared to have the same information about their young and showed the same changes in responsiveness and the same discriminatory abilities. (C) 1997 The Association for the Study of Animal Behaviour.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Audio/Visual Emotion Challenge and Workshop (AVEC 2011) is the first competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and audiovisual emotion analysis, with all participants competing under strictly the same conditions. This paper first describes the challenge participation conditions. Next follows the data used – the SEMAINE corpus – and its partitioning into train, development, and test partitions for the challenge with labelling in four dimensions, namely activity, expectation, power, and valence. Further, audio and video baseline features are introduced as well as baseline results that use these features for the three sub-challenges of audio, video, and audiovisual emotion recognition.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

There is considerable interest in creating embedded, speech recognition hardware using the weighted finite state transducer (WFST) technique but there are performance and memory usage challenges. Two system optimization techniques are presented to address this; one approach improves token propagation by removing the WFST epsilon input arcs; another one-pass, adaptive pruning algorithm gives a dramatic reduction in active nodes to be computed. Results for memory and bandwidth are given for a 5,000 word vocabulary giving a better practical performance than conventional WFST; this is then exploited in an adaptive pruning algorithm that reduces the active nodes from 30,000 down to 4,000 with only a 2 percent sacrifice in speech recognition accuracy; these optimizations lead to a more simplified design with deterministic performance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper considers the separation and recognition of overlapped speech sentences assuming single-channel observation. A system based on a combination of several different techniques is proposed. The system uses a missing-feature approach for improving crosstalk/noise robustness, a Wiener filter for speech enhancement, hidden Markov models for speech reconstruction, and speaker-dependent/-independent modeling for speaker and speech recognition. We develop the system on the Speech Separation Challenge database, involving a task of separating and recognizing two mixing sentences without assuming advanced knowledge about the identity of the speakers nor about the signal-to-noise ratio. The paper is an extended version of a previous conference paper submitted for the challenge.