4 resultados para auditory scene analysis

em Duke University


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Integrating information from multiple sources is a crucial function of the brain. Examples of such integration include multiple stimuli of different modalties, such as visual and auditory, multiple stimuli of the same modality, such as auditory and auditory, and integrating stimuli from the sensory organs (i.e. ears) with stimuli delivered from brain-machine interfaces.

The overall aim of this body of work is to empirically examine stimulus integration in these three domains to inform our broader understanding of how and when the brain combines information from multiple sources.

First, I examine visually-guided auditory, a problem with implications for the general problem in learning of how the brain determines what lesson to learn (and what lessons not to learn). For example, sound localization is a behavior that is partially learned with the aid of vision. This process requires correctly matching a visual location to that of a sound. This is an intrinsically circular problem when sound location is itself uncertain and the visual scene is rife with possible visual matches. Here, we develop a simple paradigm using visual guidance of sound localization to gain insight into how the brain confronts this type of circularity. We tested two competing hypotheses. 1: The brain guides sound location learning based on the synchrony or simultaneity of auditory-visual stimuli, potentially involving a Hebbian associative mechanism. 2: The brain uses a ‘guess and check’ heuristic in which visual feedback that is obtained after an eye movement to a sound alters future performance, perhaps by recruiting the brain’s reward-related circuitry. We assessed the effects of exposure to visual stimuli spatially mismatched from sounds on performance of an interleaved auditory-only saccade task. We found that when humans and monkeys were provided the visual stimulus asynchronously with the sound but as feedback to an auditory-guided saccade, they shifted their subsequent auditory-only performance toward the direction of the visual cue by 1.3-1.7 degrees, or 22-28% of the original 6 degree visual-auditory mismatch. In contrast when the visual stimulus was presented synchronously with the sound but extinguished too quickly to provide this feedback, there was little change in subsequent auditory-only performance. Our results suggest that the outcome of our own actions is vital to localizing sounds correctly. Contrary to previous expectations, visual calibration of auditory space does not appear to require visual-auditory associations based on synchrony/simultaneity.

My next line of research examines how electrical stimulation of the inferior colliculus influences perception of sounds in a nonhuman primate. The central nucleus of the inferior colliculus is the major ascending relay of auditory information before it reaches the forebrain, and thus an ideal target for understanding low-level information processing prior to the forebrain, as almost all auditory signals pass through the central nucleus of the inferior colliculus before reaching the forebrain. Thus, the inferior colliculus is the ideal structure to examine to understand the format of the inputs into the forebrain and, by extension, the processing of auditory scenes that occurs in the brainstem. Therefore, the inferior colliculus was an attractive target for understanding stimulus integration in the ascending auditory pathway.

Moreover, understanding the relationship between the auditory selectivity of neurons and their contribution to perception is critical to the design of effective auditory brain prosthetics. These prosthetics seek to mimic natural activity patterns to achieve desired perceptual outcomes. We measured the contribution of inferior colliculus (IC) sites to perception using combined recording and electrical stimulation. Monkeys performed a frequency-based discrimination task, reporting whether a probe sound was higher or lower in frequency than a reference sound. Stimulation pulses were paired with the probe sound on 50% of trials (0.5-80 µA, 100-300 Hz, n=172 IC locations in 3 rhesus monkeys). Electrical stimulation tended to bias the animals’ judgments in a fashion that was coarsely but significantly correlated with the best frequency of the stimulation site in comparison to the reference frequency employed in the task. Although there was considerable variability in the effects of stimulation (including impairments in performance and shifts in performance away from the direction predicted based on the site’s response properties), the results indicate that stimulation of the IC can evoke percepts correlated with the frequency tuning properties of the IC. Consistent with the implications of recent human studies, the main avenue for improvement for the auditory midbrain implant suggested by our findings is to increase the number and spatial extent of electrodes, to increase the size of the region that can be electrically activated and provide a greater range of evoked percepts.

My next line of research employs a frequency-tagging approach to examine the extent to which multiple sound sources are combined (or segregated) in the nonhuman primate inferior colliculus. In the single-sound case, most inferior colliculus neurons respond and entrain to sounds in a very broad region of space, and many are entirely spatially insensitive, so it is unknown how the neurons will respond to a situation with more than one sound. I use multiple AM stimuli of different frequencies, which the inferior colliculus represents using a spike timing code. This allows me to measure spike timing in the inferior colliculus to determine which sound source is responsible for neural activity in an auditory scene containing multiple sounds. Using this approach, I find that the same neurons that are tuned to broad regions of space in the single sound condition become dramatically more selective in the dual sound condition, preferentially entraining spikes to stimuli from a smaller region of space. I will examine the possibility that there may be a conceptual linkage between this finding and the finding of receptive field shifts in the visual system.

In chapter 5, I will comment on these findings more generally, compare them to existing theoretical models, and discuss what these results tell us about processing in the central nervous system in a multi-stimulus situation. My results suggest that the brain is flexible in its processing and can adapt its integration schema to fit the available cues and the demands of the task.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This dissertation studies the coding strategies of computational imaging to overcome the limitation of conventional sensing techniques. The information capacity of conventional sensing is limited by the physical properties of optics, such as aperture size, detector pixels, quantum efficiency, and sampling rate. These parameters determine the spatial, depth, spectral, temporal, and polarization sensitivity of each imager. To increase sensitivity in any dimension can significantly compromise the others.

This research implements various coding strategies subject to optical multidimensional imaging and acoustic sensing in order to extend their sensing abilities. The proposed coding strategies combine hardware modification and signal processing to exploiting bandwidth and sensitivity from conventional sensors. We discuss the hardware architecture, compression strategies, sensing process modeling, and reconstruction algorithm of each sensing system.

Optical multidimensional imaging measures three or more dimensional information of the optical signal. Traditional multidimensional imagers acquire extra dimensional information at the cost of degrading temporal or spatial resolution. Compressive multidimensional imaging multiplexes the transverse spatial, spectral, temporal, and polarization information on a two-dimensional (2D) detector. The corresponding spectral, temporal and polarization coding strategies adapt optics, electronic devices, and designed modulation techniques for multiplex measurement. This computational imaging technique provides multispectral, temporal super-resolution, and polarization imaging abilities with minimal loss in spatial resolution and noise level while maintaining or gaining higher temporal resolution. The experimental results prove that the appropriate coding strategies may improve hundreds times more sensing capacity.

Human auditory system has the astonishing ability in localizing, tracking, and filtering the selected sound sources or information from a noisy environment. Using engineering efforts to accomplish the same task usually requires multiple detectors, advanced computational algorithms, or artificial intelligence systems. Compressive acoustic sensing incorporates acoustic metamaterials in compressive sensing theory to emulate the abilities of sound localization and selective attention. This research investigates and optimizes the sensing capacity and the spatial sensitivity of the acoustic sensor. The well-modeled acoustic sensor allows localizing multiple speakers in both stationary and dynamic auditory scene; and distinguishing mixed conversations from independent sources with high audio recognition rate.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Bayesian nonparametric models, such as the Gaussian process and the Dirichlet process, have been extensively applied for target kinematics modeling in various applications including environmental monitoring, traffic planning, endangered species tracking, dynamic scene analysis, autonomous robot navigation, and human motion modeling. As shown by these successful applications, Bayesian nonparametric models are able to adjust their complexities adaptively from data as necessary, and are resistant to overfitting or underfitting. However, most existing works assume that the sensor measurements used to learn the Bayesian nonparametric target kinematics models are obtained a priori or that the target kinematics can be measured by the sensor at any given time throughout the task. Little work has been done for controlling the sensor with bounded field of view to obtain measurements of mobile targets that are most informative for reducing the uncertainty of the Bayesian nonparametric models. To present the systematic sensor planning approach to leaning Bayesian nonparametric models, the Gaussian process target kinematics model is introduced at first, which is capable of describing time-invariant spatial phenomena, such as ocean currents, temperature distributions and wind velocity fields. The Dirichlet process-Gaussian process target kinematics model is subsequently discussed for modeling mixture of mobile targets, such as pedestrian motion patterns.

Novel information theoretic functions are developed for these introduced Bayesian nonparametric target kinematics models to represent the expected utility of measurements as a function of sensor control inputs and random environmental variables. A Gaussian process expected Kullback Leibler divergence is developed as the expectation of the KL divergence between the current (prior) and posterior Gaussian process target kinematics models with respect to the future measurements. Then, this approach is extended to develop a new information value function that can be used to estimate target kinematics described by a Dirichlet process-Gaussian process mixture model. A theorem is proposed that shows the novel information theoretic functions are bounded. Based on this theorem, efficient estimators of the new information theoretic functions are designed, which are proved to be unbiased with the variance of the resultant approximation error decreasing linearly as the number of samples increases. Computational complexities for optimizing the novel information theoretic functions under sensor dynamics constraints are studied, and are proved to be NP-hard. A cumulative lower bound is then proposed to reduce the computational complexity to polynomial time.

Three sensor planning algorithms are developed according to the assumptions on the target kinematics and the sensor dynamics. For problems where the control space of the sensor is discrete, a greedy algorithm is proposed. The efficiency of the greedy algorithm is demonstrated by a numerical experiment with data of ocean currents obtained by moored buoys. A sweep line algorithm is developed for applications where the sensor control space is continuous and unconstrained. Synthetic simulations as well as physical experiments with ground robots and a surveillance camera are conducted to evaluate the performance of the sweep line algorithm. Moreover, a lexicographic algorithm is designed based on the cumulative lower bound of the novel information theoretic functions, for the scenario where the sensor dynamics are constrained. Numerical experiments with real data collected from indoor pedestrians by a commercial pan-tilt camera are performed to examine the lexicographic algorithm. Results from both the numerical simulations and the physical experiments show that the three sensor planning algorithms proposed in this dissertation based on the novel information theoretic functions are superior at learning the target kinematics with

little or no prior knowledge

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Remembering past events - or episodic retrieval - consists of several components. There is evidence that mental imagery plays an important role in retrieval and that the brain regions supporting imagery overlap with those supporting retrieval. An open issue is to what extent these regions support successful vs. unsuccessful imagery and retrieval processes. Previous studies that examined regional overlap between imagery and retrieval used uncontrolled memory conditions, such as autobiographical memory tasks, that cannot distinguish between successful and unsuccessful retrieval. A second issue is that fMRI studies that compared imagery and retrieval have used modality-aspecific cues that are likely to activate auditory and visual processing regions simultaneously. Thus, it is not clear to what extent identified brain regions support modality-specific or modality-independent imagery and retrieval processes. In the current fMRI study, we addressed this issue by comparing imagery to retrieval under controlled memory conditions in both auditory and visual modalities. We also obtained subjective measures of imagery quality allowing us to dissociate regions contributing to successful vs. unsuccessful imagery. Results indicated that auditory and visual regions contribute both to imagery and retrieval in a modality-specific fashion. In addition, we identified four sets of brain regions with distinct patterns of activity that contributed to imagery and retrieval in a modality-independent fashion. The first set of regions, including hippocampus, posterior cingulate cortex, medial prefrontal cortex and angular gyrus, showed a pattern common to imagery/retrieval and consistent with successful performance regardless of task. The second set of regions, including dorsal precuneus, anterior cingulate and dorsolateral prefrontal cortex, also showed a pattern common to imagery and retrieval, but consistent with unsuccessful performance during both tasks. Third, left ventrolateral prefrontal cortex showed an interaction between task and performance and was associated with successful imagery but unsuccessful retrieval. Finally, the fourth set of regions, including ventral precuneus, midcingulate cortex and supramarginal gyrus, showed the opposite interaction, supporting unsuccessful imagery, but successful retrieval performance. Results are discussed in relation to reconstructive, attentional, semantic memory, and working memory processes. This is the first study to separate the neural correlates of successful and unsuccessful performance for both imagery and retrieval and for both auditory and visual modalities.