904 resultados para audio-visual automatic speech recognition


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Perceiving a possible predator may promote physiological changes to support prey 'fight or flight'. In this case, an increase in ventilatory frequency (VF) may be expected, because this is a way to improve oxygen uptake for escape tasks. Therefore, changes in VF may be used as a behavioral tool to evaluate visual recognition of a predator threat. Thus, we tested the effects of predator visual exposure on VF in the fish Nile tilapia, Oreochromis niloticus. For this, we measured tilapia VF before and after the presentation of three stimuli: an aquarium with a harmless fish or a predator or water (control). Nile tilapia VF increased significantly in the group visually exposed to a predator compared with the other two, which were similar to each other. Hence, we conclude that Nile tilapia may recognize an allopatric predator; consequently VF is an effective tool to indicate visual recognition of predator threat in fish. (C) 2002 Elsevier B.V. B.V. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

An intelligent system that emulates human decision behaviour based on visual data acquisition is proposed. The approach is useful in applications where images are used to supply information to specialists who will choose suitable actions. An artificial neural classifier aids a fuzzy decision support system to deal with uncertainty and imprecision present in available information. Advantages of both techniques are exploited complementarily. As an example, this method was applied in automatic focus checking and adjustment in video monitor manufacturing. Copyright © 2005 IFAC.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Image categorization by means of bag of visual words has received increasing attention by the image processing and vision communities in the last years. In these approaches, each image is represented by invariant points of interest which are mapped to a Hilbert Space representing a visual dictionary which aims at comprising the most discriminative features in a set of images. Notwithstanding, the main problem of such approaches is to find a compact and representative dictionary. Finding such representative dictionary automatically with no user intervention is an even more difficult task. In this paper, we propose a method to automatically find such dictionary by employing a recent developed graph-based clustering algorithm called Optimum-Path Forest, which does not make any assumption about the visual dictionary's size and is more efficient and effective than the state-of-the-art techniques used for dictionary generation. © 2012 IEEE.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Human intestinal parasites constitute a problem in most tropical countries, causing death or physical and mental disorders. Their diagnosis usually relies on the visual analysis of microscopy images, with error rates that may range from moderate to high. The problem has been addressed via computational image analysis, but only for a few species and images free of fecal impurities. In routine, fecal impurities are a real challenge for automatic image analysis. We have circumvented this problem by a method that can segment and classify, from bright field microscopy images with fecal impurities, the 15 most common species of protozoan cysts, helminth eggs, and larvae in Brazil. Our approach exploits ellipse matching and image foresting transform for image segmentation, multiple object descriptors and their optimum combination by genetic programming for object representation, and the optimum-path forest classifier for object recognition. The results indicate that our method is a promising approach toward the fully automation of the enteroparasitosis diagnosis. © 2012 IEEE.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Image categorization by means of bag of visual words has received increasing attention by the image processing and vision communities in the last years. In these approaches, each image is represented by invariant points of interest which are mapped to a Hilbert Space representing a visual dictionary which aims at comprising the most discriminative features in a set of images. Notwithstanding, the main problem of such approaches is to find a compact and representative dictionary. Finding such representative dictionary automatically with no user intervention is an even more difficult task. In this paper, we propose a method to automatically find such dictionary by employing a recent developed graph-based clustering algorithm called Optimum-Path Forest, which does not make any assumption about the visual dictionary's size and is more efficient and effective than the state-of-the-art techniques used for dictionary generation.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This study investigated the influence of top-down and bottom-up information on speech perception in complex listening environments. Specifically, the effects of listening to different types of processed speech were examined on intelligibility and on simultaneous visual-motor performance. The goal was to extend the generalizability of results in speech perception to environments outside of the laboratory. The effect of bottom-up information was evaluated with natural, cell phone and synthetic speech. The effect of simultaneous tasks was evaluated with concurrent visual-motor and memory tasks. Earlier works on the perception of speech during simultaneous visual-motor tasks have shown inconsistent results (Choi, 2004; Strayer & Johnston, 2001). In the present experiments, two dual-task paradigms were constructed in order to mimic non-laboratory listening environments. In the first two experiments, an auditory word repetition task was the primary task and a visual-motor task was the secondary task. Participants were presented with different kinds of speech in a background of multi-speaker babble and were asked to repeat the last word of every sentence while doing the simultaneous tracking task. Word accuracy and visual-motor task performance were measured. Taken together, the results of Experiments 1 and 2 showed that the intelligibility of natural speech was better than synthetic speech and that synthetic speech was better perceived than cell phone speech. The visual-motor methodology was found to demonstrate independent and supplemental information and provided a better understanding of the entire speech perception process. Experiment 3 was conducted to determine whether the automaticity of the tasks (Schneider & Shiffrin, 1977) helped to explain the results of the first two experiments. It was found that cell phone speech allowed better simultaneous pursuit rotor performance only at low intelligibility levels when participants ignored the listening task. Also, simultaneous task performance improved dramatically for natural speech when intelligibility was good. Overall, it could be concluded that knowledge of intelligibility alone is insufficient to characterize processing of different speech sources. Additional measures such as attentional demands and performance of simultaneous tasks were also important in characterizing the perception of different kinds of speech in complex listening environments.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Visual tracking is the problem of estimating some variables related to a target given a video sequence depicting the target. Visual tracking is key to the automation of many tasks, such as visual surveillance, robot or vehicle autonomous navigation, automatic video indexing in multimedia databases. Despite many years of research, long term tracking in real world scenarios for generic targets is still unaccomplished. The main contribution of this thesis is the definition of effective algorithms that can foster a general solution to visual tracking by letting the tracker adapt to mutating working conditions. In particular, we propose to adapt two crucial components of visual trackers: the transition model and the appearance model. The less general but widespread case of tracking from a static camera is also considered and a novel change detection algorithm robust to sudden illumination changes is proposed. Based on this, a principled adaptive framework to model the interaction between Bayesian change detection and recursive Bayesian trackers is introduced. Finally, the problem of automatic tracker initialization is considered. In particular, a novel solution for categorization of 3D data is presented. The novel category recognition algorithm is based on a novel 3D descriptors that is shown to achieve state of the art performances in several applications of surface matching.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Introduction and aims of the research Nitric oxide (NO) and endocannabinoids (eCBs) are major retrograde messengers, involved in synaptic plasticity (long-term potentiation, LTP, and long-term depression, LTD) in many brain areas (including hippocampus and neocortex), as well as in learning and memory processes. NO is synthesized by NO synthase (NOS) in response to increased cytosolic Ca2+ and mainly exerts its functions through soluble guanylate cyclase (sGC) and cGMP production. The main target of cGMP is the cGMP-dependent protein kinase (PKG). Activity-dependent release of eCBs in the CNS leads to the activation of the Gαi/o-coupled cannabinoid receptor 1 (CB1) at both glutamatergic and inhibitory synapses. The perirhinal cortex (Prh) is a multimodal associative cortex of the temporal lobe, critically involved in visual recognition memory. LTD is proposed to be the cellular correlate underlying this form of memory. Cholinergic neurotransmission has been shown to play a critical role in both visual recognition memory and LTD in Prh. Moreover, visual recognition memory is one of the main cognitive functions impaired in the early stages of Alzheimer’s disease. The main aim of my research was to investigate the role of NO and ECBs in synaptic plasticity in rat Prh and in visual recognition memory. Part of this research was dedicated to the study of synaptic transmission and plasticity in a murine model (Tg2576) of Alzheimer’s disease. Methods Field potential recordings. Extracellular field potential recordings were carried out in horizontal Prh slices from Sprague-Dawley or Dark Agouti juvenile (p21-35) rats. LTD was induced with a single train of 3000 pulses delivered at 5 Hz (10 min), or via bath application of carbachol (Cch; 50 μM) for 10 min. LTP was induced by theta-burst stimulation (TBS). In addition, input/output curves and 5Hz-LTD were carried out in Prh slices from 3 month-old Tg2576 mice and littermate controls. Behavioural experiments. The spontaneous novel object exploration task was performed in intra-Prh bilaterally cannulated adult Dark Agouti rats. Drugs or vehicle (saline) were directly infused into the Prh 15 min before training to verify the role of nNOS and CB1 in visual recognition memory acquisition. Object recognition memory was tested at 20 min and 24h after the end of the training phase. Results Electrophysiological experiments in Prh slices from juvenile rats showed that 5Hz-LTD is due to the activation of the NOS/sGC/PKG pathway, whereas Cch-LTD relies on NOS/sGC but not PKG activation. By contrast, NO does not appear to be involved in LTP in this preparation. Furthermore, I found that eCBs are involved in LTP induction, but not in basal synaptic transmission, 5Hz-LTD and Cch-LTD. Behavioural experiments demonstrated that the blockade of nNOS impairs rat visual recognition memory tested at 24 hours, but not at 20 min; however, the blockade of CB1 did not affect visual recognition memory acquisition tested at both time points specified. In three month-old Tg2576 mice, deficits in basal synaptic transmission and 5Hz-LTD were observed compared to littermate controls. Conclusions The results obtained in Prh slices from juvenile rats indicate that NO and CB1 play a role in the induction of LTD and LTP, respectively. These results are confirmed by the observation that nNOS, but not CB1, is involved in visual recognition memory acquisition. The preliminary results obtained in the murine model of Alzheimer’s disease indicate that deficits in synaptic transmission and plasticity occur very early in Prh; further investigations are required to characterize the molecular mechanisms underlying these deficits.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The identification of people by measuring some traits of individual anatomy or physiology has led to a specific research area called biometric recognition. This thesis is focused on improving fingerprint recognition systems considering three important problems: fingerprint enhancement, fingerprint orientation extraction and automatic evaluation of fingerprint algorithms. An effective extraction of salient fingerprint features depends on the quality of the input fingerprint. If the fingerprint is very noisy, we are not able to detect a reliable set of features. A new fingerprint enhancement method, which is both iterative and contextual, is proposed. This approach detects high-quality regions in fingerprints, selectively applies contextual filtering and iteratively expands like wildfire toward low-quality ones. A precise estimation of the orientation field would greatly simplify the estimation of other fingerprint features (singular points, minutiae) and improve the performance of a fingerprint recognition system. The fingerprint orientation extraction is improved following two directions. First, after the introduction of a new taxonomy of fingerprint orientation extraction methods, several variants of baseline methods are implemented and, pointing out the role of pre- and post- processing, we show how to improve the extraction. Second, the introduction of a new hybrid orientation extraction method, which follows an adaptive scheme, allows to improve significantly the orientation extraction in noisy fingerprints. Scientific papers typically propose recognition systems that integrate many modules and therefore an automatic evaluation of fingerprint algorithms is needed to isolate the contributions that determine an actual progress in the state-of-the-art. The lack of a publicly available framework to compare fingerprint orientation extraction algorithms, motivates the introduction of a new benchmark area called FOE (including fingerprints and manually-marked orientation ground-truth) along with fingerprint matching benchmarks in the FVC-onGoing framework. The success of such framework is discussed by providing relevant statistics: more than 1450 algorithms submitted and two international competitions.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Coordinated eye and head movements simultaneously occur to scan the visual world for relevant targets. However, measuring both eye and head movements in experiments allowing natural head movements may be challenging. This paper provides an approach to study eye-head coordination: First, we demonstra- te the capabilities and limits of the eye-head tracking system used, and compare it to other technologies. Second, a beha- vioral task is introduced to invoke eye-head coordination. Third, a method is introduced to reconstruct signal loss in video- based oculography caused by cornea reflection artifacts in order to extend the tracking range. Finally, parameters of eye- head coordination are identified using EHCA (eye-head co- ordination analyzer), a MATLAB software which was developed to analyze eye-head shifts. To demonstrate the capabilities of the approach, a study with 11 healthy subjects was performed to investigate motion behavior. The approach presented here is discussed as an instrument to explore eye-head coordination, which may lead to further insights into attentional and motor symptoms of certain neurological or psychiatric diseases, e.g., schizophrenia.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

BACKGROUND The aim of this study was to evaluate imaging-based response to standardized neoadjuvant chemotherapy (NACT) regimen by dynamic contrast-enhanced magnetic resonance mammography (DCE-MRM), whereas MR images were analyzed by an automatic computer-assisted diagnosis (CAD) system in comparison to visual evaluation. MRI findings were correlated with histopathologic response to NACT and also with the occurrence of metastases in a follow-up analysis. PATIENTS AND METHODS Fifty-four patients with invasive ductal breast carcinomas received two identical MRI examinations (before and after NACT; 1.5T, contrast medium gadoteric acid). Pre-therapeutic images were compared with post-therapeutic examinations by CAD and two blinded human observers, considering morphologic and dynamic MRI parameters as well as tumor size measurements. Imaging-assessed response to NACT was compared with histopathologically verified response. All clinical, histopathologic, and DCE-MRM parameters were correlated with the occurrence of distant metastases. RESULTS Initial and post-initial dynamic parameters significantly changed between pre- and post-therapeutic DCE-MRM. Visually evaluated DCE-MRM revealed sensitivity of 85.7%, specificity of 91.7%, and diagnostic accuracy of 87.0% in evaluating the response to NACT compared to histopathology. CAD analysis led to more false-negative findings (37.0%) compared to visual evaluation (11.1%), resulting in sensitivity of 52.4%, specificity of 100.0%, and diagnostic accuracy of 63.0%. The following dynamic MRI parameters showed significant associations to occurring metastases: Post-initial curve type before NACT (entire lesions, calculated by CAD) and post-initial curve type of the most enhancing tumor parts after NACT (calculated by CAD and manually). CONCLUSIONS In the accurate evaluation of response to neoadjuvant treatment, CAD systems can provide useful additional information due to the high specificity; however, they cannot replace visual imaging evaluation. Besides traditional prognostic factors, contrast medium-induced dynamic MRI parameters reveal significant associations to patient outcome, i.e. occurrence of distant metastases.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

INTRODUCTION The Rondo is a single-unit cochlear implant (CI) audio processor comprising the identical components as its behind-the-ear predecessor, the Opus 2. An interchange of the Opus 2 with the Rondo leads to a shift of the microphone position toward the back of the head. This study aimed to investigate the influence of the Rondo wearing position on speech intelligibility in noise. METHODS Speech intelligibility in noise was measured in 4 spatial configurations with 12 experienced CI users using the German adaptive Oldenburg sentence test. A physical model and a numerical model were used to enable a comparison of the observations. RESULTS No statistically significant differences of the speech intelligibility were found in the situations in which the signal came from the front and the noise came from the frontal, ipsilateral, or contralateral side. The signal-to-noise ratio (SNR) was significantly better with the Opus 2 in the case with the noise presented from the back (4.4 dB, p < 0.001). The differences in the SNR were significantly worse with the Rondo processors placed further behind the ear than closer to the ear. CONCLUSION The study indicates that CI users with the receiver/stimulator implanted in positions further behind the ear are expected to have higher difficulties in noisy situations when wearing the single-unit audio processor.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Detecting user affect automatically during real-time conversation is the main challenge towards our greater aim of infusing social intelligence into a natural-language mixed-initiative High-Fidelity (Hi-Fi) audio control spoken dialog agent. In recent years, studies on affect detection from voice have moved on to using realistic, non-acted data, which is subtler. However, it is more challenging to perceive subtler emotions and this is demonstrated in tasks such as labelling and machine prediction. This paper attempts to address part of this challenge by considering the role of user satisfaction ratings and also conversational/dialog features in discriminating contentment and frustration, two types of emotions that are known to be prevalent within spoken human-computer interaction. However, given the laboratory constraints, users might be positively biased when rating the system, indirectly making the reliability of the satisfaction data questionable. Machine learning experiments were conducted on two datasets, users and annotators, which were then compared in order to assess the reliability of these datasets. Our results indicated that standard classifiers were significantly more successful in discriminating the abovementioned emotions and their intensities (reflected by user satisfaction ratings) from annotator data than from user data. These results corroborated that: first, satisfaction data could be used directly as an alternative target variable to model affect, and that they could be predicted exclusively by dialog features. Second, these were only true when trying to predict the abovementioned emotions using annotator?s data, suggesting that user bias does exist in a laboratory-led evaluation.