Biblioteca Digital

904 resultados para audio-visual automatic speech recognition

Alpha-band suppression in the visual word form area as a functional bottleneck to consciousness.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The current state of empirical investigations refers to consciousness as an all-or-none phenomenon. However, a recent theoretical account opens up this perspective by proposing a partial level (between nil and full) of conscious perception. In the well-studied case of single-word reading, short-lived exposure can trigger incomplete word-form recognition wherein letters fall short of forming a whole word in one's conscious perception thereby hindering word-meaning access and report. Hence, the processing from incomplete to complete word-form recognition straightforwardly mirrors a transition from partial to full-blown consciousness. We therefore hypothesized that this putative functional bottleneck to consciousness (i.e. the perceptual boundary between partial and full conscious perception) would emerge at a major key hub region for word-form recognition during reading, namely the left occipito-temporal junction. We applied a real-time staircase procedure and titrated subjective reports at the threshold between partial (letters) and full (whole word) conscious perception. This experimental approach allowed us to collect trials with identical physical stimulation, yet reflecting distinct perceptual experience levels. Oscillatory brain activity was monitored with magnetoencephalography and revealed that the transition from partial-to-full word-form perception was accompanied by alpha-band (7-11 Hz) power suppression in the posterior left occipito-temporal cortex. This modulation of rhythmic activity extended anteriorly towards the visual word form area (VWFA), a region whose selectivity for word-forms in perception is highly debated. The current findings provide electrophysiological evidence for a functional bottleneck to consciousness thereby empirically instantiating a recently proposed partial perspective on consciousness. Moreover, the findings provide an entirely new outlook on the functioning of the VWFA as a late bottleneck to full-blown conscious word-form perception.

Automatic Prediction of Facial Trait Judgments: Appearance vs. Structural Models

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Evaluating other individuals with respect to personality characteristics plays a crucial role in human relations and it is the focus of attention for research in diverse fields such as psychology and interactive computer systems. In psychology, face perception has been recognized as a key component of this evaluation system. Multiple studies suggest that observers use face information to infer personality characteristics. Interactive computer systems are trying to take advantage of these findings and apply them to increase the natural aspect of interaction and to improve the performance of interactive computer systems. Here, we experimentally test whether the automatic prediction of facial trait judgments (e.g. dominance) can be made by using the full appearance information of the face and whether a reduced representation of its structure is sufficient. We evaluate two separate approaches: a holistic representation model using the facial appearance information and a structural model constructed from the relations among facial salient points. State of the art machine learning methods are applied to a) derive a facial trait judgment model from training data and b) predict a facial trait value for any face. Furthermore, we address the issue of whether there are specific structural relations among facial points that predict perception of facial traits. Experimental results over a set of labeled data (9 different trait evaluations) and classification rules (4 rules) suggest that a) prediction of perception of facial traits is learnable by both holistic and structural approaches; b) the most reliable prediction of facial trait judgments is obtained by certain type of holistic descriptions of the face appearance; and c) for some traits such as attractiveness and extroversion, there are relationships between specific structural features and social perceptions.

An active contour-based atlas registration model applied to automatic subthalamic nucleus targeting on MRI: method and validation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a new non parametric atlas registration framework, derived from the optical flow model and the active contour theory, applied to automatic subthalamic nucleus (STN) targeting in deep brain stimulation (DBS) surgery. In a previous work, we demonstrated that the STN position can be predicted based on the position of surrounding visible structures, namely the lateral and third ventricles. A STN targeting process can thus be obtained by registering these structures of interest between a brain atlas and the patient image. Here we aim to improve the results of the state of the art targeting methods and at the same time to reduce the computational time. Our simultaneous segmentation and registration model shows mean STN localization errors statistically similar to the most performing registration algorithms tested so far and to the targeting expert's variability. Moreover, the computational time of our registration method is much lower, which is a worthwhile improvement from a clinical point of view.

Application of the Mutual Information Minimization to speaker recognition / identification improvement

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we propose the inversion of nonlinear distortions in order to improve the recognition rates of a speaker recognizer system. We study the effect of saturations on the test signals, trying to take into account real situations where the training material has been recorded in a controlled situation but the testing signals present some mismatch with the input signal level (saturations). The experimental results for speaker recognition shows that a combination of several strategies can improve the recognition rates with saturated test sentences from 80% to 89.39%, while the results with clean speech (without saturation) is 87.76% for one microphone, and for speaker identification can reduce the minimum detection cost function with saturated test sentences from 6.42% to 4.15%, while the results with clean speech (without saturation) is 5.74% for one microphone and 7.02% for the other one.

Detection of Raga-characteristic phrases from Hindustani Classical Music Audio

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Melodic motifs form essential building blocks in Indian Classical music. The motifs, or key phrases, providestrong cues to the identity of the underlying raga in both Hindustani and Carnatic styles of Indian music. Automatic identification and clustering of similar motifs is relevant in this context. The inherent variations in various instances of a characteristic phrase in a bandish (composition)performance make it challenging to identify similar phrases in a performance. A nyas svara (long note)marks the ending of these phrases. The proposed method does segmentation of phrases through identification ofnyas and computes similarity with the reference characteristic phrase.

Auditory localisation and recognition after left inferior collicular lesion: effects of work load on cortical activation patterns (fMRI study)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Evidence from neuropsychological and activation studies (Clarke et al., 2oo0, Maeder et al., 2000) suggests that sound recognitionand localisation are processed by two anatomically and functionally distinct cortical networks. We report here on a case of a patientthat had an interruption of auditory information and we show: i) the effects of this interruption on cortical auditory processing; ii)the effect of the workload on activation pattern.A 36 year old man suffered from a small left mesencephalic haemotrhage, due to cavernous angioma; the let% inferior colliculuswas resected in the surgical approach of the vascular malformation. In the acute stage, the patient complained of auditoryhallucinations and of auditory loss in right ear, while tonal audiometry was normal. At 12 months, auditory recognition, auditorylocalisation (assessed by lTD and IID cues) and auditory motion perception were normal (Clarke et al., 2000), while verbal dichoticlistening was deficient on the right side.Sound recognition and sound localisation activation patterns were investigated with fMRI, using a passive and an activeparadigm. In normal subjects, distinct cortical networks were involved in sound recognition and localisation, both in passive andactive paradigm (Maeder et al., 2OOOa, 2000b).Passive listening of environmental and spatial stimuli as compared to rest strongly activated right auditory cortex, but failed toactivate left primary auditory cortex. The specialised networks for sound recognition and localisation could not be visual&d onthe right and only minimally on the left convexity. A very different activation pattern was obtained in the active condition wherea motor response was required. Workload not only increased the activation of the right auditory cortex, but also allowed theactivation of the left primary auditory cortex. The specialised networks for sound recognition and localisation were almostcompletely present in both hemispheres.These results show that increasing the workload can i) help to recruit cortical region in the auditory deafferented hemisphere;and ii) lead to processing auditory information within specific cortical networks.References:Clarke et al. (2000). Neuropsychologia 38: 797-807.Mae.der et al. (2OOOa), Neuroimage 11: S52.Maeder et al. (2OOOb), Neuroimage 11: S33

Bootstrapping a statistical speech translator from a rule-based one

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We describe a series of experiments in which we start with English to French and English to Japanese versions of an Open Source rule-based speech translation system for a medical domain, and bootstrap correspondign statistical systems. Comparative evaluation reveals that the rule-based systems are still significantly better than the statistical ones, despite the fact that considerable effort has been invested in tuning both the recognition and translation components; also, a hybrid system only marginally improved recall at the cost of a los in precision. The result suggests that rule-based architectures may still be preferable to statistical ones for safety-critical speech translation tasks.

Visual measurement of laser-arc hybrid welding

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Tässä työssä raportoidaan hybridihitsauksesta otettujen suurnopeuskuvasarjojen automaattisen analyysijärjestelmän kehittäminen.Järjestelmän tarkoitus oli tuottaa tietoa, joka avustaisi analysoijaa arvioimaan kuvatun hitsausprosessin laatua. Tutkimus keskittyi valokaaren taajuuden säännöllisyyden ja lisäainepisaroiden lentosuuntien mittaamiseen. Valokaaria havaittiin kuvasarjoista sumean c-means-klusterointimenetelmän avullaja perättäisten valokaarien välistä aikaväliä käytettiin valokaaren taajuuden säännöllisyyden mittarina. Pisaroita paikannettiin menetelmällä, jossa yhdistyi pääkomponenttianalyysi ja tukivektoriluokitin. Kalman-suodinta käytettiin tuottamaan arvioita pisaroiden lentosuunnista ja nopeuksista. Lentosuunnanmääritysmenetelmä luokitteli pisarat niiden arvioitujen lentosuuntien perusteella. Järjestelmän kehittämiseen käytettävissä olleet kuvasarjat poikkesivat merkittävästi toisistaan kuvanlaadun ja pisaroiden ulkomuodon osalta, johtuen eroista kuvaus- ja hitsausprosesseissa. Analyysijärjestelmä kehitettiin toimimaan pienellä osajoukolla kuvasarjoja, joissa oli tietynlainen kuvaus- ja hitsausprosessi ja joiden kuvanlaatu ja pisaroiden ulkomuoto olivat samankaltaisia, mutta järjestelmää testattiin myös osajoukon ulkopuolisilla kuvasarjoilla. Testitulokset osoittivat, että lentosuunnanmääritystarkkuus oli kohtuullisen suuri osajoukonsisällä ja pieni muissa kuvasarjoissa. Valokaaren taajuuden säännöllisyyden määritys oli tarkka useammassa kuvasarjassa.

The role of auditory cortices in the retrieval of single-trial auditory-visual object memories.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Single-trial encounters with multisensory stimuli affect both memory performance and early-latency brain responses to visual stimuli. Whether and how auditory cortices support memory processes based on single-trial multisensory learning is unknown and may differ qualitatively and quantitatively from comparable processes within visual cortices due to purported differences in memory capacities across the senses. We recorded event-related potentials (ERPs) as healthy adults (n = 18) performed a continuous recognition task in the auditory modality, discriminating initial (new) from repeated (old) sounds of environmental objects. Initial presentations were either unisensory or multisensory; the latter entailed synchronous presentation of a semantically congruent or a meaningless image. Repeated presentations were exclusively auditory, thus differing only according to the context in which the sound was initially encountered. Discrimination abilities (indexed by d') were increased for repeated sounds that were initially encountered with a semantically congruent image versus sounds initially encountered with either a meaningless or no image. Analyses of ERPs within an electrical neuroimaging framework revealed that early stages of auditory processing of repeated sounds were affected by prior single-trial multisensory contexts. These effects followed from significantly reduced activity within a distributed network, including the right superior temporal cortex, suggesting an inverse relationship between brain activity and behavioural outcome on this task. The present findings demonstrate how auditory cortices contribute to long-term effects of multisensory experiences on auditory object discrimination. We propose a new framework for the efficacy of multisensory processes to impact both current multisensory stimulus processing and unisensory discrimination abilities later in time.

Working memory load improves early stages of independent visual processing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Increasing evidence suggests that working memory and perceptual processes are dynamically interrelated due to modulating activity in overlapping brain networks. However, the direct influence of working memory on the spatio-temporal brain dynamics of behaviorally relevant intervening information remains unclear. To investigate this issue, subjects performed a visual proximity grid perception task under three different visual-spatial working memory (VSWM) load conditions. VSWM load was manipulated by asking subjects to memorize the spatial locations of 6 or 3 disks. The grid was always presented between the encoding and recognition of the disk pattern. As a baseline condition, grid stimuli were presented without a VSWM context. VSWM load altered both perceptual performance and neural networks active during intervening grid encoding. Participants performed faster and more accurately on a challenging perceptual task under high VSWM load as compared to the low load and the baseline condition. Visual evoked potential (VEP) analyses identified changes in the configuration of the underlying sources in one particular period occurring 160-190 ms post-stimulus onset. Source analyses further showed an occipito-parietal down-regulation concurrent to the increased involvement of temporal and frontal resources in the high VSWM context. Together, these data suggest that cognitive control mechanisms supporting working memory may selectively enhance concurrent visual processing related to an independent goal. More broadly, our findings are in line with theoretical models implicating the engagement of frontal regions in synchronizing and optimizing mnemonic and perceptual resources towards multiple goals.

Single-trial multisensory memories affect later auditory and visual object discrimination.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Multisensory memory traces established via single-trial exposures can impact subsequent visual object recognition. This impact appears to depend on the meaningfulness of the initial multisensory pairing, implying that multisensory exposures establish distinct object representations that are accessible during later unisensory processing. Multisensory contexts may be particularly effective in influencing auditory discrimination, given the purportedly inferior recognition memory in this sensory modality. The possibility of this generalization and the equivalence of effects when memory discrimination was being performed in the visual vs. auditory modality were at the focus of this study. First, we demonstrate that visual object discrimination is affected by the context of prior multisensory encounters, replicating and extending previous findings by controlling for the probability of multisensory contexts during initial as well as repeated object presentations. Second, we provide the first evidence that single-trial multisensory memories impact subsequent auditory object discrimination. Auditory object discrimination was enhanced when initial presentations entailed semantically congruent multisensory pairs and was impaired after semantically incongruent multisensory encounters, compared to sounds that had been encountered only in a unisensory manner. Third, the impact of single-trial multisensory memories upon unisensory object discrimination was greater when the task was performed in the auditory vs. visual modality. Fourth, there was no evidence for correlation between effects of past multisensory experiences on visual and auditory processing, suggestive of largely independent object processing mechanisms between modalities. We discuss these findings in terms of the conceptual short term memory (CSTM) model and predictive coding. Our results suggest differential recruitment and modulation of conceptual memory networks according to the sensory task at hand.

Speaker recognition improvement using blind inversion of distortions

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we propose the inversion of nonlinear distortions in order to improve the recognition rates of a speaker recognizer system. We study the effect of saturations on the test signals, trying to take into account real situations where the training material has been recorded in a controlled situation but the testing signals present some mismatch with the input signal level (saturations). The experimental results shows that a combination of several strategies can improve the recognition rates with saturated test sentences from 80% to 89.39%, while the results with clean speech (without saturation) is 87.76% for one microphone.

Visual Measurement and Modelling of Tool Wear in Rough Turning

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Huolimatta korkeasta automaatioasteesta sorvausteollisuudessa, muutama keskeinen ongelma estää sorvauksen täydellisen automatisoinnin. Yksi näistä ongelmista on työkalun kuluminen. Tämä työ keskittyy toteuttamaan automaattisen järjestelmän kulumisen, erityisesti viistekulumisen, mittaukseen konenäön avulla. Kulumisen mittausjärjestelmä poistaa manuaalisen mittauksen tarpeen ja minimoi ajan, joka käytetään työkalun kulumisen mittaukseen. Mittauksen lisäksi tutkitaan kulumisen mallinnusta sekä ennustamista. Automaattinen mittausjärjestelmä sijoitettiin sorvin sisälle ja järjestelmä integroitiin onnistuneesti ulkopuolisten järjestelmien kanssa. Tehdyt kokeet osoittivat, että mittausjärjestelmä kykenee mittaamaan työkalun kulumisen järjestelmän oikeassa ympäristössä. Mittausjärjestelmä pystyy myös kestämään häiriöitä, jotka ovat konenäköjärjestelmille yleisiä. Työkalun kulumista mallinnusta tutkittiin useilla eri menetelmillä. Näihin kuuluivat muiden muassa neuroverkot ja tukivektoriregressio. Kokeet osoittivat, että tutkitut mallit pystyivät ennustamaan työkalun kulumisasteen käytetyn ajan perusteella. Parhaan tuloksen antoivat neuroverkot Bayesiläisellä regularisoinnilla.

Lexical Influences on Spoken Spondaic Word Recognition in Hearing-Impaired Patients.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Top-down contextual influences play a major part in speech understanding, especially in hearing-impaired patients with deteriorated auditory input. Those influences are most obvious in difficult listening situations, such as listening to sentences in noise but can also be observed at the word level under more favorable conditions, as in one of the most commonly used tasks in audiology, i.e., repeating isolated words in silence. This study aimed to explore the role of top-down contextual influences and their dependence on lexical factors and patient-specific factors using standard clinical linguistic material. Spondaic word perception was tested in 160 hearing-impaired patients aged 23-88 years with a four-frequency average pure-tone threshold ranging from 21 to 88 dB HL. Sixty spondaic words were randomly presented at a level adjusted to correspond to a speech perception score ranging between 40 and 70% of the performance intensity function obtained using monosyllabic words. Phoneme and whole-word recognition scores were used to calculate two context-influence indices (the j factor and the ratio of word scores to phonemic scores) and were correlated with linguistic factors, such as the phonological neighborhood density and several indices of word occurrence frequencies. Contextual influence was greater for spondaic words than in similar studies using monosyllabic words, with an overall j factor of 2.07 (SD = 0.5). For both indices, context use decreased with increasing hearing loss once the average hearing loss exceeded 55 dB HL. In right-handed patients, significantly greater context influence was observed for words presented in the right ears than for words presented in the left, especially in patients with many years of education. The correlations between raw word scores (and context influence indices) and word occurrence frequencies showed a significant age-dependent effect, with a stronger correlation between perception scores and word occurrence frequencies when the occurrence frequencies were based on the years corresponding to the patients' youth, showing a "historic" word frequency effect. This effect was still observed for patients with few years of formal education, but recent occurrence frequencies based on current word exposure had a stronger influence for those patients, especially for younger ones.

Model-based objects recognition in man-made environments

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We describe a model-based objects recognition system which is part of an image interpretation system intended to assist autonomous vehicles navigation. The system is intended to operate in man-made environments. Behavior-based navigation of autonomous vehicles involves the recognition of navigable areas and the potential obstacles. The recognition system integrates color, shape and texture information together with the location of the vanishing point. The recognition process starts from some prior scene knowledge, that is, a generic model of the expected scene and the potential objects. The recognition system constitutes an approach where different low-level vision techniques extract a multitude of image descriptors which are then analyzed using a rule-based reasoning system to interpret the image content. This system has been implemented using CEES, the C++ embedded expert system shell developed in the Systems Engineering and Automatic Control Laboratory (University of Girona) as a specific rule-based problem solving tool. It has been especially conceived for supporting cooperative expert systems, and uses the object oriented programming paradigm

«
1
2
...
41
42
43
44
45
46
47
...
60
61
»