904 resultados para audio-visual automatic speech recognition


Relevância:

40.00% 40.00%

Publicador:

Resumo:

This thesis presents there important results in visual object recognition based on shape. (1) A new algorithm (RAST; Recognition by Adaptive Sudivisions of Tranformation space) is presented that has lower average-case complexity than any known recognition algorithm. (2) It is shown, both theoretically and empirically, that representing 3D objects as collections of 2D views (the "View-Based Approximation") is feasible and affects the reliability of 3D recognition systems no more than other commonly made approximations. (3) The problem of recognition in cluttered scenes is considered from a Bayesian perspective; the commonly-used "bounded-error errorsmeasure" is demonstrated to correspond to an independence assumption. It is shown that by modeling the statistical properties of real-scenes better, objects can be recognized more reliably.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper examines the visual speech processing abilities of older adults and the age-related effects on speechreading abilities.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper details a technique for training auditory memory for length of speech sounds in preschool children with a profound hearing loss.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper discusses a test for speech perception and scoring to test likelihood of success with mainstreaming.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper studies the relationship between consonant duration and recognition of these consanants by listeners with high frequency hearing loss.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Auditory-visual speech perception testing was completed using wordandconsonant-level stimuli in individuals with known degrees of dementia of theAlzheimer’s type. The correlations with the cognitive measures and the speechperception measures (A-only, V-only, AV, VE or AE) did not reveal significantrelationships.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This workshop paper reports recent developments to a vision system for traffic interpretation which relies extensively on the use of geometrical and scene context. Firstly, a new approach to pose refinement is reported, based on forces derived from prominent image derivatives found close to an initial hypothesis. Secondly, a parameterised vehicle model is reported, able to represent different vehicle classes. This general vehicle model has been fitted to sample data, and subjected to a Principal Component Analysis to create a deformable model of common car types having 6 parameters. We show that the new pose recovery technique is also able to operate on the PCA model, to allow the structure of an initial vehicle hypothesis to be adapted to fit the prevailing context. We report initial experiments with the model, which demonstrate significant improvements to pose recovery.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

There are still major challenges in the area of automatic indexing and retrieval of multimedia content data for very large multimedia content corpora. Current indexing and retrieval applications still use keywords to index multimedia content and those keywords usually do not provide any knowledge about the semantic content of the data. With the increasing amount of multimedia content, it is inefficient to continue with this approach. In this paper, we describe the project DREAM, which addresses such challenges by proposing a new framework for semi-automatic annotation and retrieval of multimedia based on the semantic content. The framework uses the Topic Map Technology, as a tool to model the knowledge automatically extracted from the multimedia content using an Automatic Labelling Engine. We describe how we acquire knowledge from the content and represent this knowledge using the support of NLP to automatically generate Topic Maps. The framework is described in the context of film post-production.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper presents a video surveillance framework that robustly and efficiently detects abandoned objects in surveillance scenes. The framework is based on a novel threat assessment algorithm which combines the concept of ownership with automatic understanding of social relations in order to infer abandonment of objects. Implementation is achieved through development of a logic-based inference engine based on Prolog. Threat detection performance is conducted by testing against a range of datasets describing realistic situations and demonstrates a reduction in the number of false alarms generated. The proposed system represents the approach employed in the EU SUBITO project (Surveillance of Unattended Baggage and the Identification and Tracking of the Owner).

Relevância:

40.00% 40.00%

Publicador:

Resumo:

There is evidence that automatic visual attention favors the right side. This study investigated whether this lateral asymmetry interacts with the right hemisphere dominance for visual location processing and left hemisphere dominance for visual shape processing. Volunteers were tested in a location discrimination task and a shape discrimination task. The target stimuli (S2) could occur in the left or right hemifield. They were preceded by an ipsilateral, contralateral or bilateral prime stimulus (S1). The attentional effect produced by the right S1 was larger than that produced by the left S1. This lateral asymmetry was similar between the two tasks suggesting that the hemispheric asymmetries of visual mechanisms do not contribute to it. The finding that it was basically due to a longer reaction time to the left S2 than to the right S2 for the contralateral S1 condition suggests that the inhibitory component of attention is laterally asymmetric.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Condition monitoring of wooden railway sleepers applications are generallycarried out by visual inspection and if necessary some impact acoustic examination iscarried out intuitively by skilled personnel. In this work, a pattern recognition solutionhas been proposed to automate the process for the achievement of robust results. Thestudy presents a comparison of several pattern recognition techniques together withvarious nonstationary feature extraction techniques for classification of impactacoustic emissions. Pattern classifiers such as multilayer perceptron, learning cectorquantization and gaussian mixture models, are combined with nonstationary featureextraction techniques such as Short Time Fourier Transform, Continuous WaveletTransform, Discrete Wavelet Transform and Wigner-Ville Distribution. Due to thepresence of several different feature extraction and classification technqies, datafusion has been investigated. Data fusion in the current case has mainly beeninvestigated on two levels, feature level and classifier level respectively. Fusion at thefeature level demonstrated best results with an overall accuracy of 82% whencompared to the human operator.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Parkinson's disease (PD) is a degenerative illness whose cardinal symptoms include rigidity, tremor, and slowness of movement. In addition to its widely recognized effects PD can have a profound effect on speech and voice.The speech symptoms most commonly demonstrated by patients with PD are reduced vocal loudness, monopitch, disruptions of voice quality, and abnormally fast rate of speech. This cluster of speech symptoms is often termed Hypokinetic Dysarthria.The disease can be difficult to diagnose accurately, especially in its early stages, due to this reason, automatic techniques based on Artificial Intelligence should increase the diagnosing accuracy and to help the doctors make better decisions. The aim of the thesis work is to predict the PD based on the audio files collected from various patients.Audio files are preprocessed in order to attain the features.The preprocessed data contains 23 attributes and 195 instances. On an average there are six voice recordings per person, By using data compression technique such as Discrete Cosine Transform (DCT) number of instances can be minimized, after data compression, attribute selection is done using several WEKA build in methods such as ChiSquared, GainRatio, Infogain after identifying the important attributes, we evaluate attributes one by one by using stepwise regression.Based on the selected attributes we process in WEKA by using cost sensitive classifier with various algorithms like MultiPass LVQ, Logistic Model Tree(LMT), K-Star.The classified results shows on an average 80%.By using this features 95% approximate classification of PD is acheived.This shows that using the audio dataset, PD could be predicted with a higher level of accuracy.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Johansson, Fredrik (2012). Filmljudets funktioner i dramafilm – En audio-visuell analys av filmen The King´s Speech. Examensuppsats inom Ljudproduktion, Högskolan Dalarna, Akademin för språk och medier, Falun. I denna uppsats undersöktes filmljudet i dramafilmen The King´s Speech. Detta för att ta reda på vilka funktioner filmljudet fyller i de valda sekvenserna ur nämnda film, samt hur ljudet är placerat i filmens flerkanalsmix. Filmen granskades med hjälp av en audio-visuell analys. Denna metod går ut på att ljudet och bilden undersöks separat, för att sedan åter kombineras och analyseras som helhet. Den audio-visuella analysmetod som använts kommer från ljudteoretikern Michel Chion, och kallas Masking. Resultatet av den audio-visuella analysen pekade mot att ljudets huvudsakliga funktioner var att skapa en realistisk skildring av karaktärer och omgivningar, skapa en känsla av närvaro, samt att skapa och bibehålla olika perspektiv i den narrativa världen. Den stora majoriteten av ljud visade sig vara placerade i centerkanalen, medan främst ickediegetisk musik och ambiensljud var placerade i front- och surroundkanalerna. Detta kanalanvändande tycktes gynna de funna funktionerna, främst genom att bidra till känslan av närvaro och realism, genom att omsluta filmpubliken med ambienta ljud.