891 resultados para lip movements


Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper investigated using lip movements as a behavioural biometric for person authentication. The system was trained, evaluated and tested using the XM2VTS dataset, following the Lausanne Protocol configuration II. Features were selected from the DCT coefficients of the greyscale lip image. This paper investigated the number of DCT coefficients selected, the selection process, and static and dynamic feature combinations. Using a Gaussian Mixture Model - Universal Background Model framework an Equal Error Rate of 2.20% was achieved during evaluation and on an unseen test set a False Acceptance Rate of 1.7% and False Rejection Rate of 3.0% was achieved. This compares favourably with face authentication results on the same dataset whilst not being susceptible to spoofing attacks.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Most existing models of language production and speech motor control do not explicitly address how language requirements affect speech motor functions, as these domains are usually treated as separate and independent from one another. This investigation compared lip movements during bilabial closure between five individuals with mild aphasia and five age and gender-matched control speakers when the linguistic characteristics of the stimuli were varied by increasing the number of syllables. Upper and lower lip movement data were collected for mono-, bi- and tri-syllabic nonword sequences using an AG 100 EMMA system. Each task was performed under both normal and fast rate conditions. Single articulator kinematic parameters (peak velocity, amplitude, duration,and cyclic spatio-temporal index) were measured to characterize lip movements. Results revealed that compared to control speakers, individuals with aphasia showed significantly longer movement duration and lower movement stability for longer items (bi- and tri-syllables). Moreover, utterance length affected the lip kinematics, in that the monosyllables had smaller peak velocities, smaller amplitudes and shorter durations compared to bi- and trisyllables, and movement stability was lowest for the trisyllables. In addition, the rate-induced changes (smaller amplitude and shorter duration with increased rate) were most prominent for the short items (i.e., monosyllables). These findings provide further support for the notion that linguistic changes have an impact on the characteristics of speech movements, and that individuals with aphasia are more affected by such changes than control speakers.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Background and aims: In addition to the well-known linguistic processing impairments in aphasia, oro-motor skills and articulatory implementation of speech segments are reported to be compromised to some degree in most types of aphasia. This study aimed to identify differences in the characteristics and coordination of lip movements in the production of a bilabial closure gesture between speech-like and nonspeech tasks in individuals with aphasia and healthy control subjects. Method and procedure: Upper and lower lip movement data were collected for a speech-like and a nonspeech task using an AG 100 EMMA system from five individuals with aphasia and five age and gender matched control subjects. Each task was produced at two rate conditions (normal and fast), and in a familiar and a less-familiar manner. Single articulator kinematic parameters (peak velocity, amplitude, duration, and cyclic spatio-temporal index) and multi-articulator coordination indices (average relative phase and variability of relative phase) were measured to characterize lip movements. Outcome and results: The results showed that when the two lips had similar task goals (bilabial closure) in speech-like versus nonspeech task, kinematic and coordination characteristics were not found to be different. However, when changes in rate were imposed on the bilabial gesture, only speech-like task showed functional adaptations, indicated by a greater decrease in amplitude and duration at fast rates. In terms of group differences, individuals with aphasia showed smaller amplitudes and longer movement durations for upper lip, higher spatio-temporal variability for both lips, and higher variability in lip coordination than the control speakers. Rate was an important factor in distinguishing the two groups, and individuals with aphasia were limited in implementing the rate changes. Conclusion and implications: The findings support the notion of subtle but robust differences in motor control characteristics between individuals with aphasia and the control participants, even in the context of producing bilabial closing gestures for a relatively simple speech-like task. The findings also highlight the functional differences between speech-like and nonspeech tasks, despite a common movement coordination goal for bilabial closure.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Acoustically, car cabins are extremely noisy and as a consequence, existing audio-only speech recognition systems, for voice-based control of vehicle functions such as the GPS based navigator, perform poorly. Audio-only speech recognition systems fail to make use of the visual modality of speech (eg: lip movements). As the visual modality is immune to acoustic noise, utilising this visual information in conjunction with an audio only speech recognition system has the potential to improve the accuracy of the system. The field of recognising speech using both auditory and visual inputs is known as Audio Visual Speech Recognition (AVSR). Continuous research in AVASR field has been ongoing for the past twenty-five years with notable progress being made. However, the practical deployment of AVASR systems for use in a variety of real-world applications has not yet emerged. The main reason is due to most research to date neglecting to address variabilities in the visual domain such as illumination and viewpoint in the design of the visual front-end of the AVSR system. In this paper we present an AVASR system in a real-world car environment using the AVICAR database [1], which is publicly available in-car database and we show that the use of visual speech conjunction with the audio modality is a better approach to improve the robustness and effectiveness of voice-only recognition systems in car cabin environments.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The use of visual features in the form of lip movements to improve the performance of acoustic speech recognition has been shown to work well, particularly in noisy acoustic conditions. However, whether this technique can outperform speech recognition incorporating well-known acoustic enhancement techniques, such as spectral subtraction, or multi-channel beamforming is not known. This is an important question to be answered especially in an automotive environment, for the design of an efficient human-vehicle computer interface. We perform a variety of speech recognition experiments on a challenging automotive speech dataset and results show that synchronous HMM-based audio-visual fusion can outperform traditional single as well as multi-channel acoustic speech enhancement techniques. We also show that further improvement in recognition performance can be obtained by fusing speech-enhanced audio with the visual modality, demonstrating the complementary nature of the two robust speech recognition approaches.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Visual activity detection of lip movements can be used to overcome the poor performance of voice activity detection based solely in the audio domain, particularly in noisy acoustic conditions. However, most of the research conducted in visual voice activity detection (VVAD) has neglected addressing variabilities in the visual domain such as viewpoint variation. In this paper we investigate the effectiveness of the visual information from the speaker’s frontal and profile views (i.e left and right side views) for the task of VVAD. As far as we are aware, our work constitutes the first real attempt to study this problem. We describe our visual front end approach and the Gaussian mixture model (GMM) based VVAD framework, and report the experimental results using the freely available CUAVE database. The experimental results show that VVAD is indeed possible from profile views and we give a quantitative comparison of VVAD based on frontal and profile views The results presented are useful in the development of multi-modal Human Machine Interaction (HMI) using a single camera, where the speaker’s face may not always be frontal.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Novel techniques have been developed for the automatic recognition of human behaviour in challenging environments using information from visual and infra-red camera feeds. The techniques have been applied to two interesting scenarios: Recognise drivers' speech using lip movements and recognising audience behaviour, while watching a movie, using facial features and body movements. Outcome of the research in these two areas will be useful in the improving the performance of voice recognition in automobiles for voice based control and for obtaining accurate movie interest ratings based on live audience response analysis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Visual information in the form of lip movements of the speaker has been shown to improve the performance of speech recognition and search applications. In our previous work, we proposed cross database training of synchronous hidden Markov models (SHMMs) to make use of external large and publicly available audio databases in addition to the relatively small given audio visual database. In this work, the cross database training approach is improved by performing an additional audio adaptation step, which enables audio visual SHMMs to benefit from audio observations of the external audio models before adding visual modality to them. The proposed approach outperforms the baseline cross database training approach in clean and noisy environments in terms of phone recognition accuracy as well as spoken term detection (STD) accuracy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Spoken term detection (STD) is the task of looking up a spoken term in a large volume of speech segments. In order to provide fast search, speech segments are first indexed into an intermediate representation using speech recognition engines which provide multiple hypotheses for each speech segment. Approximate matching techniques are usually applied at the search stage to compensate the poor performance of automatic speech recognition engines during indexing. Recently, using visual information in addition to audio information has been shown to improve phone recognition performance, particularly in noisy environments. In this paper, we will make use of visual information in the form of lip movements of the speaker in indexing stage and will investigate its effect on STD performance. Particularly, we will investigate if gains in phone recognition accuracy will carry through the approximate matching stage to provide similar gains in the final audio-visual STD system over a traditional audio only approach. We will also investigate the effect of using visual information on STD performance in different noise environments.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Speech recognition can be improved by using visual information in the form of lip movements of the speaker in addition to audio information. To date, state-of-the-art techniques for audio-visual speech recognition continue to use audio and visual data of the same database for training their models. In this paper, we present a new approach to make use of one modality of an external dataset in addition to a given audio-visual dataset. By so doing, it is possible to create more powerful models from other extensive audio-only databases and adapt them on our comparatively smaller multi-stream databases. Results show that the presented approach outperforms the widely adopted synchronous hidden Markov models (HMM) trained jointly on audio and visual data of a given audio-visual database for phone recognition by 29% relative. It also outperforms the external audio models trained on extensive external audio datasets and also internal audio models by 5.5% and 46% relative respectively. We also show that the proposed approach is beneficial in noisy environments where the audio source is affected by the environmental noise.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This research has made contributions to the area of spoken term detection (STD), defined as the process of finding all occurrences of a specified search term in a large collection of speech segments. The use of visual information in the form of lip movements of the speaker in addition to audio and the use of topic of the speech segments, and the expected frequency of words in the target speech domain, are proposed. By using these complementary information, improvement in the performance of STD has been achieved which enables efficient search of key words in large collection of multimedia documents.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Automatic gender classification has many security and commercial applications. Various modalities have been investigated for gender classification with face-based classification being the most popular. In some real-world scenarios the face may be partially occluded. In these circumstances a classification based on individual parts of the face known as local features must be adopted. We investigate gender classification using lip movements. We show for the first time that important gender specific information can be obtained from the way in which a person moves their lips during speech. Furthermore our study indicates that the lip dynamics during speech provide greater gender discriminative information than simply lip appearance. We also show that the lip dynamics and appearance contain complementary gender information such that a model which captures both traits gives the highest overall classification result. We use Discrete Cosine Transform based features and Gaussian Mixture Modelling to model lip appearance and dynamics and employ the XM2VTS database for our experiments. Our experiments show that a model which captures lip dynamics along with appearance can improve gender classification rates by between 16-21% compared to models of only lip appearance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Apraxia of speech (AOS) is typically described as a motor-speech disorder with clinically well-defined symptoms, but without a clear understanding of the underlying problems in motor control. A number of studies have compared the speech of subjects with AOS to the fluent speech of controls, but only a few have included speech movement data and if so, this was primarily restricted to the study of single articulators. If AOS reflects a basic neuromotor dysfunction, this should somehow be evident in the production of both dysfluent and perceptually fluent speech. The current study compared motor control strategies for the production of perceptually fluent speech between a young woman with apraxia of speech (AOS) and Broca’s aphasia and a group of age-matched control speakers using concepts and tools from articulation-based theories. In addition, to examine the potential role of specific movement variables on gestural coordination, a second part of this study involved a comparison of fluent and dysfluent speech samples from the speaker with AOS. Movement data from the lips, jaw and tongue were acquired using the AG-100 EMMA system during the reiterated production of multisyllabic nonwords. The findings indicated that although in general kinematic parameters of fluent speech were similar in the subject with AOS and Broca’s aphasia to those of the age-matched controls, speech task-related differences were observed in upper lip movements and lip coordination. The comparison between fluent and dysfluent speech characteristics suggested that fluent speech was achieved through the use of specific motor control strategies, highlighting the potential association between the stability of coordinative patterns and movement range, as described in Coordination Dynamics theory.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The McGurk effect, in which auditory [ba] dubbed onto [go] lip movements is perceived as da or tha, was employed in a real-time task to investigate auditory-visual speech perception in prelingual infants. Experiments 1A and 1B established the validity of real-time dubbing for producing the effect. In Experiment 2, 4(1)/(2)-month-olds were tested in a habituation-test paradigm, in which 2 an auditory-visual stimulus was presented contingent upon visual fixation of a live face. The experimental group was habituated to a McGurk stimulus (auditory [ba] visual [ga]), and the control group to matching auditory-visual [ba]. Each group was then presented with three auditory-only test trials, [ba], [da], and [deltaa] (as in then). Visual-fixation durations in test trials showed that the experimental group treated the emergent percept in the McGurk effect, [da] or [deltaa], as familiar (even though they had not heard these sounds previously) and [ba] as novel. For control group infants [da] and [deltaa] were no more familiar than [ba]. These results are consistent with infants'perception of the McGurk effect, and support the conclusion that prelinguistic infants integrate auditory and visual speech information. (C) 2004 Wiley Periodicals, Inc.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Se realizó una investigación con el objetivo de establecer los cambios dentarios y evaluar la influencia de estos en los tejidos blandos comparando pacientes tratados con sistema Damon Q (N=23) y MBT (N=16). Se seleccionó una muestra de estudio conformada por 39 individuos siendo 28 mujeres y 11 varones de edades comprendidas entre 11 y 26 años. Se registraron radiografías cefálicas laterales digitales tomadas antes y después de concluido el tratamiento. Las inclinaciones de los incisivos superiores e inferiores así como las alteraciones del perfil facial fueron medidas mediante el Software de análisis cefalométrico Nemotec Dental. Para determinar los cambios en los tejidos duros y blandos se aplicó la prueba T-Student verificándose diferencias estadísticamente significativas (p < 0,05) para los dos sistemas en las variables: ángulo incisivo maxilar, mandibular, posición del labio inferior según el plano estético de Ricketts y el de Burstone. La correlación existente entre los cambios en la inclinación de los incisivos y la posición labial, se evaluó mediante el coeficiente de relación de Pearson encontrando para el sistema Damon una correlación entre la posición del incisivo superior y la del labio superior e inferior, la posición del incisivo inferior tuvo una correlación moderada con el labio inferior, con la técnica MBT se encontró una relación moderada entre la posición del incisivo superior y el labio inferior. Se concluyó que existen cambios postratamiento en la posición de los incisivos, así como protrusión labial inferior hallando relación entre la inclinación vestibular de los incisivos y el movimiento de los labios en ambas técnicas