951 resultados para Visual Speech Recognition, Multiple Views, Frontal View, Profile View
Resumo:
Data mining can be defined as the extraction of previously unknown and potentially useful information from large datasets. The main principle is to devise computer programs that run through databases and automatically seek deterministic patterns. It is applied in different fields of application, e.g., remote sensing, biometry, speech recognition, but has seldom been applied to forensic case data. The intrinsic difficulty related to the use of such data lies in its heterogeneity, which comes from the many different sources of information. The aim of this study is to highlight potential uses of pattern recognition that would provide relevant results from a criminal intelligence point of view. The role of data mining within a global crime analysis methodology is to detect all types of structures in a dataset. Once filtered and interpreted, those structures can point to previously unseen criminal activities. The interpretation of patterns for intelligence purposes is the final stage of the process. It allows the researcher to validate the whole methodology and to refine each step if necessary. An application to cutting agents found in illicit drug seizures was performed. A combinatorial approach was done, using the presence and the absence of products. Methods coming from the graph theory field were used to extract patterns in data constituted by links between products and place and date of seizure. A data mining process completed using graphing techniques is called ``graph mining''. Patterns were detected that had to be interpreted and compared with preliminary knowledge to establish their relevancy. The illicit drug profiling process is actually an intelligence process that uses preliminary illicit drug classes to classify new samples. Methods proposed in this study could be used \textit{a priori} to compare structures from preliminary and post-detection patterns. This new knowledge of a repeated structure may provide valuable complementary information to profiling and become a source of intelligence.
Resumo:
Multisensory experiences enhance perceptions and facilitate memory retrieval processes, even when only unisensory information is available for accessing such memories. Using fMRI, we identified human brain regions involved in discriminating visual stimuli according to past multisensory vs. unisensory experiences. Subjects performed a completely orthogonal task, discriminating repeated from initial image presentations intermixed within a continuous recognition task. Half of initial presentations were multisensory, and all repetitions were exclusively visual. Despite only single-trial exposures to initial image presentations, accuracy in indicating image repetitions was significantly improved by past auditory-visual multisensory experiences over images only encountered visually. Similarly, regions within the lateral-occipital complex-areas typically associated with visual object recognition processes-were more active to visual stimuli with multisensory than unisensory pasts. Additional differential responses were observed in the anterior cingulate and frontal cortices. Multisensory experiences are registered by the brain even when of no immediate behavioral relevance and can be used to categorize memories. These data reveal the functional efficacy of multisensory processing.
Resumo:
We perceive our environment through multiple sensory channels. Nonetheless, research has traditionally focused on the investigation of sensory processing within single modalities. Thus, investigating how our brain integrates multisensory information is of crucial importance for understanding how organisms cope with a constantly changing and dynamic environment. During my thesis I have investigated how multisensory events impact our perception and brain responses, either when auditory-visual stimuli were presented simultaneously or how multisensory events at one point in time impact later unisensory processing. In "Looming signals reveal synergistic principles of multisensory integration" (Cappe, Thelen et al., 2012) we investigated the neuronal substrates involved in motion detection in depth under multisensory vs. unisensory conditions. We have shown that congruent auditory-visual looming (i.e. approaching) signals are preferentially integrated by the brain. Further, we show that early effects under these conditions are relevant for behavior, effectively speeding up responses to these combined stimulus presentations. In "Electrical neuroimaging of memory discrimination based on single-trial multisensory learning" (Thelen et al., 2012), we investigated the behavioral impact of single encounters with meaningless auditory-visual object parings upon subsequent visual object recognition. In addition to showing that these encounters lead to impaired recognition accuracy upon repeated visual presentations, we have shown that the brain discriminates images as soon as ~100ms post-stimulus onset according to the initial encounter context. In "Single-trial multisensory memories affect later visual and auditory object recognition" (Thelen et al., in review) we have addressed whether auditory object recognition is affected by single-trial multisensory memories, and whether recognition accuracy of sounds was similarly affected by the initial encounter context as visual objects. We found that this is in fact the case. We propose that a common underlying brain network is differentially involved during encoding and retrieval of images and sounds based on our behavioral findings. - Nous percevons l'environnement qui nous entoure à l'aide de plusieurs organes sensoriels. Antérieurement, la recherche sur la perception s'est focalisée sur l'étude des systèmes sensoriels indépendamment les uns des autres. Cependant, l'étude des processus cérébraux qui soutiennent l'intégration de l'information multisensorielle est d'une importance cruciale pour comprendre comment notre cerveau travail en réponse à un monde dynamique en perpétuel changement. Pendant ma thèse, j'ai ainsi étudié comment des événements multisensoriels impactent notre perception immédiate et/ou ultérieure et comment ils sont traités par notre cerveau. Dans l'étude " Looming signals reveal synergistic principles of multisensory integration" (Cappe, Thelen et al., 2012), nous nous sommes intéressés aux processus neuronaux impliqués dans la détection de mouvements à l'aide de l'utilisation de stimuli audio-visuels seuls ou combinés. Nos résultats ont montré que notre cerveau intègre de manière préférentielle des stimuli audio-visuels combinés s'approchant de l'observateur. De plus, nous avons montré que des effets précoces, observés au niveau de la réponse cérébrale, influencent notre comportement, en accélérant la détection de ces stimuli. Dans l'étude "Electrical neuroimaging of memory discrimination based on single-trial multisensory learning" (Thelen et al., 2012), nous nous sommes intéressés à l'impact qu'a la présentation d'un stimulus audio-visuel sur l'exactitude de reconnaissance d'une image. Nous avons étudié comment la présentation d'une combinaison audio-visuelle sans signification, impacte, au niveau comportementale et cérébral, sur la reconnaissance ultérieure de l'image. Les résultats ont montré que l'exactitude de la reconnaissance d'images, présentées dans le passé, avec un son sans signification, est inférieure à celle obtenue dans le cas d'images présentées seules. De plus, notre cerveau différencie ces deux types de stimuli très tôt dans le traitement d'images. Dans l'étude "Single-trial multisensory memories affect later visual and auditory object recognition" (Thelen et al., in review), nous nous sommes posés la question si l'exactitude de ia reconnaissance de sons était affectée de manière semblable par la présentation d'événements multisensoriels passés. Ceci a été vérifié par nos résultats. Nous avons proposé que cette similitude puisse être expliquée par le recrutement différentiel d'un réseau neuronal commun.
Resumo:
Multisensory memory traces established via single-trial exposures can impact subsequent visual object recognition. This impact appears to depend on the meaningfulness of the initial multisensory pairing, implying that multisensory exposures establish distinct object representations that are accessible during later unisensory processing. Multisensory contexts may be particularly effective in influencing auditory discrimination, given the purportedly inferior recognition memory in this sensory modality. The possibility of this generalization and the equivalence of effects when memory discrimination was being performed in the visual vs. auditory modality were at the focus of this study. First, we demonstrate that visual object discrimination is affected by the context of prior multisensory encounters, replicating and extending previous findings by controlling for the probability of multisensory contexts during initial as well as repeated object presentations. Second, we provide the first evidence that single-trial multisensory memories impact subsequent auditory object discrimination. Auditory object discrimination was enhanced when initial presentations entailed semantically congruent multisensory pairs and was impaired after semantically incongruent multisensory encounters, compared to sounds that had been encountered only in a unisensory manner. Third, the impact of single-trial multisensory memories upon unisensory object discrimination was greater when the task was performed in the auditory vs. visual modality. Fourth, there was no evidence for correlation between effects of past multisensory experiences on visual and auditory processing, suggestive of largely independent object processing mechanisms between modalities. We discuss these findings in terms of the conceptual short term memory (CSTM) model and predictive coding. Our results suggest differential recruitment and modulation of conceptual memory networks according to the sensory task at hand.
Resumo:
Psychopathy is associated with well-known characteristics such as a lack of empathy and impulsive behaviour, but it has also been associated with impaired recognition of emotional facial expressions. The use of event-related potentials (ERPs) to examine this phenomenon could shed light on the specific time course and neural activation associated with emotion recognition processes as they relate to psychopathic traits. In the current study we examined the PI , N170, and vertex positive potential (VPP) ERP components and behavioural performance with respect to scores on the Self-Report Psychopathy (SRP-III) questionnaire. Thirty undergraduates completed two tasks, the first of which required the recognition and categorization of affective face stimuli under varying presentation conditions. Happy, angry or fearful faces were presented under with attention directed to the mouth, nose or eye region and varied stimulus exposure duration (30, 75, or 150 ms). We found that behavioural performance to be unrelated to psychopathic personality traits in all conditions, but there was a trend for the Nl70 to peak later in response to fearful and happy facial expressions for individuals high in psychopathic traits. However, the amplitude of the VPP was significantly negatively associated with psychopathic traits, but only in response to stimuli presented under a nose-level fixation. Finally, psychopathic traits were found to be associated with longer N170 latencies in response to stimuli presented under the 30 ms exposure duration. In the second task, participants were required to inhibit processing of irrelevant affective and scrambled face distractors while categorizing unrelated word stimuli as living or nonliving. Psychopathic traits were hypothesized to be positively associated with behavioural performance, as it was proposed that individuals high in psychopathic traits would be less likely to automatically attend to task-irrelevant affective distractors, facilitating word categorization. Thus, decreased interference would be reflected in smaller N170 components, indicating less neural activity associated with processing of distractor faces. We found that overall performance decreased in the presence of angry and fearful distractor faces as psychopathic traits increased. In addition, the amplitude of the N170 decreased and the latency increased in response to affective distractor faces for individuals with higher levels of psychopathic traits. Although we failed to find the predicted behavioural deficit in emotion recognition in Task 1 and facilitation effect in Task 2, the findings of increased N170 and VPP latencies in response to emotional faces are consistent wi th the proposition that abnormal emotion recognition processes may in fact be inherent to psychopathy as a continuous personality trait.
Resumo:
This lexical decision study with eye tracking of Japanese two-kanji-character words investigated the order in which a whole two-character word and its morphographic constituents are activated in the course of lexical access, the relative contributions of the left and the right characters in lexical decision, the depth to which semantic radicals are processed, and how nonlinguistic factors affect lexical processes. Mixed-effects regression analyses of response times and subgaze durations (i.e., first-pass fixation time spent on each of the two characters) revealed joint contributions of morphographic units at all levels of the linguistic structure with the magnitude and the direction of the lexical effects modulated by readers’ locus of attention in a left-to-right preferred processing path. During the early time frame, character effects were larger in magnitude and more robust than radical and whole-word effects, regardless of the font size and the type of nonwords. Extending previous radical-based and character-based models, we propose a task/decision-sensitive character-driven processing model with a level-skipping assumption: Connections from the feature level bypass the lower radical level and link up directly to the higher character level.
Resumo:
L’objectif principal de cette thèse était de quantifier et comparer l’effort requis pour reconnaître la parole dans le bruit chez les jeunes adultes et les personnes aînées ayant une audition normale et une acuité visuelle normale (avec ou sans lentille de correction de la vue). L’effort associé à la perception de la parole est lié aux ressources attentionnelles et cognitives requises pour comprendre la parole. La première étude (Expérience 1) avait pour but d’évaluer l’effort associé à la reconnaissance auditive de la parole (entendre un locuteur), tandis que la deuxième étude (Expérience 2) avait comme but d’évaluer l’effort associé à la reconnaissance auditivo-visuelle de la parole (entendre et voir le visage d’un locuteur). L’effort fut mesuré de deux façons différentes. D’abord par une approche comportementale faisant appel à un paradigme expérimental nommé double tâche. Il s’agissait d’une tâche de reconnaissance de mot jumelée à une tâche de reconnaissance de patrons vibro-tactiles. De plus, l’effort fut quantifié à l’aide d’un questionnaire demandant aux participants de coter l’effort associé aux tâches comportementales. Les deux mesures d’effort furent utilisées dans deux conditions expérimentales différentes : 1) niveau équivalent – c'est-à-dire lorsque le niveau du bruit masquant la parole était le même pour tous les participants et, 2) performance équivalente – c'est-à-dire lorsque le niveau du bruit fut ajusté afin que les performances à la tâche de reconnaissance de mots soient identiques pour les deux groupes de participant. Les niveaux de performance obtenus pour la tâche vibro-tactile ont révélé que les personnes aînées fournissent plus d’effort que les jeunes adultes pour les deux conditions expérimentales, et ce, quelle que soit la modalité perceptuelle dans laquelle les stimuli de la parole sont présentés (c.-à.-d., auditive seulement ou auditivo-visuelle). Globalement, le ‘coût’ associé aux performances de la tâche vibro-tactile était au plus élevé pour les personnes aînées lorsque la parole était présentée en modalité auditivo-visuelle. Alors que les indices visuels peuvent améliorer la reconnaissance auditivo-visuelle de la parole, nos résultats suggèrent qu’ils peuvent aussi créer une charge additionnelle sur les ressources utilisées pour traiter l’information. Cette charge additionnelle a des conséquences néfastes sur les performances aux tâches de reconnaissance de mots et de patrons vibro-tactiles lorsque celles-ci sont effectuées sous des conditions de double tâche. Conformément aux études antérieures, les coefficients de corrélations effectuées à partir des données de l’Expérience 1 et de l’Expérience 2 soutiennent la notion que les mesures comportementales de double tâche et les réponses aux questionnaires évaluent différentes dimensions de l’effort associé à la reconnaissance de la parole. Comme l’effort associé à la perception de la parole repose sur des facteurs auditifs et cognitifs, une troisième étude fut complétée afin d’explorer si la mémoire auditive de travail contribue à expliquer la variance dans les données portant sur l’effort associé à la perception de la parole. De plus, ces analyses ont permis de comparer les patrons de réponses obtenues pour ces deux facteurs après des jeunes adultes et des personnes aînées. Pour les jeunes adultes, les résultats d’une analyse de régression séquentielle ont démontré qu’une mesure de la capacité auditive (taille de l’empan) était reliée à l’effort, tandis qu’une mesure du traitement auditif (rappel alphabétique) était reliée à la précision avec laquelle les mots étaient reconnus lorsqu’ils étaient présentés sous les conditions de double tâche. Cependant, ces mêmes relations n’étaient pas présentes dans les données obtenues pour le groupe de personnes aînées ni dans les données obtenues lorsque les tâches de reconnaissance de la parole étaient effectuées en modalité auditivo-visuelle. D’autres études sont nécessaires pour identifier les facteurs cognitifs qui sous-tendent l’effort associé à la perception de la parole, et ce, particulièrement chez les personnes aînées.
Resumo:
Development of Malayalam speech recognition system is in its infancy stage; although many works have been done in other Indian languages. In this paper we present the first work on speaker independent Malayalam isolated speech recognizer based on PLP (Perceptual Linear Predictive) Cepstral Coefficient and Hidden Markov Model (HMM). The performance of the developed system has been evaluated with different number of states of HMM (Hidden Markov Model). The system is trained with 21 male and female speakers in the age group ranging from 19 to 41 years. The system obtained an accuracy of 99.5% with the unseen data
Resumo:
A primary medium for the human beings to communicate through language is Speech. Automatic Speech Recognition is wide spread today. Recognizing single digits is vital to a number of applications such as voice dialling of telephone numbers, automatic data entry, credit card entry, PIN (personal identification number) entry, entry of access codes for transactions, etc. In this paper we present a comparative study of SVM (Support Vector Machine) and HMM (Hidden Markov Model) to recognize and identify the digits used in Malayalam speech.
Resumo:
Speech is the primary, most prominent and convenient means of communication in audible language. Through speech, people can express their thoughts, feelings or perceptions by the articulation of words. Human speech is a complex signal which is non stationary in nature. It consists of immensely rich information about the words spoken, accent, attitude of the speaker, expression, intention, sex, emotion as well as style. The main objective of Automatic Speech Recognition (ASR) is to identify whatever people speak by means of computer algorithms. This enables people to communicate with a computer in a natural spoken language. Automatic recognition of speech by machines has been one of the most exciting, significant and challenging areas of research in the field of signal processing over the past five to six decades. Despite the developments and intensive research done in this area, the performance of ASR is still lower than that of speech recognition by humans and is yet to achieve a completely reliable performance level. The main objective of this thesis is to develop an efficient speech recognition system for recognising speaker independent isolated words in Malayalam.
Resumo:
Sketches are commonly used in the early stages of design. Our previous system allows users to sketch mechanical systems that the computer interprets. However, some parts of the mechanical system might be too hard or too complicated to express in the sketch. Adding speech recognition to create a multimodal system would move us toward our goal of creating a more natural user interface. This thesis examines the relationship between the verbal and sketch input, particularly how to segment and align the two inputs. Toward this end, subjects were recorded while they sketched and talked. These recordings were transcribed, and a set of rules to perform segmentation and alignment was created. These rules represent the knowledge that the computer needs to perform segmentation and alignment. The rules successfully interpreted the 24 data sets that they were given.
Resumo:
The registration of full 3-D models is an important task in computer vision. Range finders only reconstruct a partial view of the object. Many authors have proposed several techniques to register 3D surfaces from multiple views in which there are basically two aspects to consider. First, poor registration in which some sort of correspondences are established. Second, accurate registration in order to obtain a better solution. A survey of the most common techniques is presented and includes experimental results of some of them
Resumo:
Federmeier and Benjamin (2005) have suggested that semantic encoding for verbal information in the right hemisphere can be more effective when memory demands are higher. However, other studies (Kanske & Kotz, 2007) also suggest that visual word recognition differ in function of emotional valence. In this context, the present study was designed to evaluate the effects of retention level upon recognition memory processes for negative and neutral words. Sample consisted of 15 right-handed undergraduate portuguese students with normal or corrected to normal vision. Portuguese concrete negative and neutral words were selected in accordance to known linguistic capabilities of the right hemisphere. The participants were submitted to a visual half-field word presentation using a continuous recognition memory paradigm. Eye movements were continuously monitored with a Tobii T60 eye-tracker that showed no significant differences in fixations to negative and neutral words. Reaction times in word recognition suggest an overall advantage of negative words in comparison to the neutral words. Further analysis showed faster responses for negative words than for neutral words when were recognised at longer retention intervals for left-hemisphere encoding. Electrophysiological data through event related potentials revealed larger P2 amplitude over centro-posterior electrode sites for words studied in the left hemifield suggesting a priming effect for right-hemisphere encoding. Overall data suggest different hemispheric memory strategies for the semantic encoding of negative and neutral words.
Resumo:
This paper discusses a study on postlingual cochlear implantees and the effectiveness of the CST in evaluating enhancement of speech recognition abilities.