5 resultados para sound source segregation
em Duke University
Resumo:
The ability to isolate a single sound source among concurrent sources and reverberant energy is necessary for understanding the auditory world. The precedence effect describes a related experimental finding, that when presented with identical sounds from two locations with a short onset asynchrony (on the order of milliseconds), listeners report a single source with a location dominated by the lead sound. Single-cell recordings in multiple animal models have indicated that there are low-level mechanisms that may contribute to the precedence effect, yet psychophysical studies in humans have provided evidence that top-down cognitive processes have a great deal of influence on the perception of simulated echoes. In the present study, event-related potentials evoked by click pairs at and around listeners' echo thresholds indicate that perception of the lead and lag sound as individual sources elicits a negativity between 100 and 250 msec, previously termed the object-related negativity (ORN). Even for physically identical stimuli, the ORN is evident when listeners report hearing, as compared with not hearing, a second sound source. These results define a neural mechanism related to the conscious perception of multiple auditory objects.
Resumo:
Social structure is a key determinant of population biology and is central to the way animals exploit their environment. The risk of predation is often invoked as an important factor influencing the evolution of social structure in cetaceans and other mammals, but little direct information is available about how cetaceans actually respond to predators or other perceived threats. The playback of sounds to an animal is a powerful tool for assessing behavioral responses to predators, but quantifying behavioral responses to playback experiments requires baseline knowledge of normal behavioral patterns and variation. The central goal of my dissertation is to describe baseline foraging behavior for the western Atlantic short-finnned pilot whales (Globicephala macrohynchus) and examine the role of social organization in their response to predators. To accomplish this I used multi-sensor digital acoustic tags (DTAGs), satellite-linked time-depth recorders (SLTDR), and playback experiments to study foraging behavior and behavioral response to predators in pilot whales. Fine scale foraging strategies and population level patterns were identified by estimating the body size and examining the location and movement around feeding events using data collected with DTAGs deployed on 40 pilot whales in summers of 2008-2014 off the coast of Cape Hatteras, North Carolina. Pilot whales were found to forage throughout the water column and performed feeding buzzes at depths ranging from 29-1176 meters. The results indicated potential habitat segregation in foraging depth in short-finned pilot whales with larger individuals foraging on average at deeper depths. Calculated aerobic dive limit for large adult males was approximately 6 minutes longer than that of females and likely facilitated the difference in foraging depth. Furthermore, the buzz frequency and speed around feeding attempts indicate this population pilot whales are likely targeting multiple small prey items. Using these results, I built decision trees to inform foraging dive classification in coarse, long-term dive data collected with SLTDRs deployed on 6 pilot whales in the summers of 2014 and 2015 in the same area off the coast of North Carolina. I used these long term foraging records to compare diurnal foraging rates and depths, as well as classify bouts with a maximum likelihood method, and evaluate behavioral aerobic dive limits (ADLB) through examination of dive durations and inter-dive intervals. Dive duration was the best predictor of foraging, with dives >400.6 seconds classified as foraging, and a 96% classification accuracy. There were no diurnal patterns in foraging depth or rates and average duration of bouts was 2.94 hours with maximum bout durations lasting up to 14 hours. The results indicated that pilot whales forage in relatively long bouts and the ADLB indicate that pilot whales rarely, if ever exceed their aerobic limits. To evaluate the response to predators I used controlled playback experiments to examine the behavioral responses of 10 of the tagged short-finned pilot whales off Cape Hatteras, North Carolina and 4 Risso’s dolphins (Grampus griseus) off Southern California to the calls of mammal-eating killer whales (MEK). Both species responded to a subset of MEK calls with increased movement, swim speed and increased cohesion of the focal groups, but the two species exhibited different directional movement and vocal responses. Pilot whales increased their call rate and approached the sound source, but Risso’s dolphins exhibited no change in their vocal behavior and moved in a rapid, directed manner away from the source. Thus, at least to a sub-set of mammal-eating killer whale calls, these two study species reacted in a manner that is consistent with their patterns of social organization. Pilot whales, which live in relatively permanent groups bound by strong social bonds, responded in a manner that built on their high levels of social cohesion. In contrast, Risso’s dolphins exhibited an exaggerated flight response and moved rapidly away from the sound source. The fact that both species responded strongly to a select number of MEK calls, suggests that structural features of signals play critical contextual roles in the probability of response to potential threats in odontocete cetaceans.
Resumo:
Integrating information from multiple sources is a crucial function of the brain. Examples of such integration include multiple stimuli of different modalties, such as visual and auditory, multiple stimuli of the same modality, such as auditory and auditory, and integrating stimuli from the sensory organs (i.e. ears) with stimuli delivered from brain-machine interfaces.
The overall aim of this body of work is to empirically examine stimulus integration in these three domains to inform our broader understanding of how and when the brain combines information from multiple sources.
First, I examine visually-guided auditory, a problem with implications for the general problem in learning of how the brain determines what lesson to learn (and what lessons not to learn). For example, sound localization is a behavior that is partially learned with the aid of vision. This process requires correctly matching a visual location to that of a sound. This is an intrinsically circular problem when sound location is itself uncertain and the visual scene is rife with possible visual matches. Here, we develop a simple paradigm using visual guidance of sound localization to gain insight into how the brain confronts this type of circularity. We tested two competing hypotheses. 1: The brain guides sound location learning based on the synchrony or simultaneity of auditory-visual stimuli, potentially involving a Hebbian associative mechanism. 2: The brain uses a ‘guess and check’ heuristic in which visual feedback that is obtained after an eye movement to a sound alters future performance, perhaps by recruiting the brain’s reward-related circuitry. We assessed the effects of exposure to visual stimuli spatially mismatched from sounds on performance of an interleaved auditory-only saccade task. We found that when humans and monkeys were provided the visual stimulus asynchronously with the sound but as feedback to an auditory-guided saccade, they shifted their subsequent auditory-only performance toward the direction of the visual cue by 1.3-1.7 degrees, or 22-28% of the original 6 degree visual-auditory mismatch. In contrast when the visual stimulus was presented synchronously with the sound but extinguished too quickly to provide this feedback, there was little change in subsequent auditory-only performance. Our results suggest that the outcome of our own actions is vital to localizing sounds correctly. Contrary to previous expectations, visual calibration of auditory space does not appear to require visual-auditory associations based on synchrony/simultaneity.
My next line of research examines how electrical stimulation of the inferior colliculus influences perception of sounds in a nonhuman primate. The central nucleus of the inferior colliculus is the major ascending relay of auditory information before it reaches the forebrain, and thus an ideal target for understanding low-level information processing prior to the forebrain, as almost all auditory signals pass through the central nucleus of the inferior colliculus before reaching the forebrain. Thus, the inferior colliculus is the ideal structure to examine to understand the format of the inputs into the forebrain and, by extension, the processing of auditory scenes that occurs in the brainstem. Therefore, the inferior colliculus was an attractive target for understanding stimulus integration in the ascending auditory pathway.
Moreover, understanding the relationship between the auditory selectivity of neurons and their contribution to perception is critical to the design of effective auditory brain prosthetics. These prosthetics seek to mimic natural activity patterns to achieve desired perceptual outcomes. We measured the contribution of inferior colliculus (IC) sites to perception using combined recording and electrical stimulation. Monkeys performed a frequency-based discrimination task, reporting whether a probe sound was higher or lower in frequency than a reference sound. Stimulation pulses were paired with the probe sound on 50% of trials (0.5-80 µA, 100-300 Hz, n=172 IC locations in 3 rhesus monkeys). Electrical stimulation tended to bias the animals’ judgments in a fashion that was coarsely but significantly correlated with the best frequency of the stimulation site in comparison to the reference frequency employed in the task. Although there was considerable variability in the effects of stimulation (including impairments in performance and shifts in performance away from the direction predicted based on the site’s response properties), the results indicate that stimulation of the IC can evoke percepts correlated with the frequency tuning properties of the IC. Consistent with the implications of recent human studies, the main avenue for improvement for the auditory midbrain implant suggested by our findings is to increase the number and spatial extent of electrodes, to increase the size of the region that can be electrically activated and provide a greater range of evoked percepts.
My next line of research employs a frequency-tagging approach to examine the extent to which multiple sound sources are combined (or segregated) in the nonhuman primate inferior colliculus. In the single-sound case, most inferior colliculus neurons respond and entrain to sounds in a very broad region of space, and many are entirely spatially insensitive, so it is unknown how the neurons will respond to a situation with more than one sound. I use multiple AM stimuli of different frequencies, which the inferior colliculus represents using a spike timing code. This allows me to measure spike timing in the inferior colliculus to determine which sound source is responsible for neural activity in an auditory scene containing multiple sounds. Using this approach, I find that the same neurons that are tuned to broad regions of space in the single sound condition become dramatically more selective in the dual sound condition, preferentially entraining spikes to stimuli from a smaller region of space. I will examine the possibility that there may be a conceptual linkage between this finding and the finding of receptive field shifts in the visual system.
In chapter 5, I will comment on these findings more generally, compare them to existing theoretical models, and discuss what these results tell us about processing in the central nervous system in a multi-stimulus situation. My results suggest that the brain is flexible in its processing and can adapt its integration schema to fit the available cues and the demands of the task.
Resumo:
Marine mammals exploit the efficiency of sound propagation in the marine environment for essential activities like communication and navigation. For this reason, passive acoustics has particularly high potential for marine mammal studies, especially those aimed at population management and conservation. Despite the rapid realization of this potential through a growing number of studies, much crucial information remains unknown or poorly understood. This research attempts to address two key knowledge gaps, using the well-studied bottlenose dolphin (Tursiops truncatus) as a model species, and underwater acoustic recordings collected on four fixed autonomous sensors deployed at multiple locations in Sarasota Bay, Florida, between September 2012 and August 2013. Underwater noise can hinder dolphin communication. The ability of these animals to overcome this obstacle was examined using recorded noise and dolphin whistles. I found that bottlenose dolphins are able to compensate for increased noise in their environment using a wide range of strategies employed in a singular fashion or in various combinations, depending on the frequency content of the noise, noise source, and time of day. These strategies include modifying whistle frequency characteristics, increasing whistle duration, and increasing whistle redundancy. Recordings were also used to evaluate the performance of six recently developed passive acoustic abundance estimation methods, by comparing their results to the true abundance of animals, obtained via a census conducted within the same area and time period. The methods employed were broadly divided into two categories – those involving direct counts of animals, and those involving counts of cues (signature whistles). The animal-based methods were traditional capture-recapture, spatially explicit capture-recapture (SECR), and an approach that blends the “snapshot” method and mark-recapture distance sampling, referred to here as (SMRDS). The cue-based methods were conventional distance sampling (CDS), an acoustic modeling approach involving the use of the passive sonar equation, and SECR. In the latter approach, detection probability was modelled as a function of sound transmission loss, rather than the Euclidean distance typically used. Of these methods, while SMRDS produced the most accurate estimate, SECR demonstrated the greatest potential for broad applicability to other species and locations, with minimal to no auxiliary data, such as distance from sound source to detector(s), which is often difficult to obtain. This was especially true when this method was compared to traditional capture-recapture results, which greatly underestimated abundance, despite attempts to account for major unmodelled heterogeneity. Furthermore, the incorporation of non-Euclidean distance significantly improved model accuracy. The acoustic modelling approach performed similarly to CDS, but both methods also strongly underestimated abundance. In particular, CDS proved to be inefficient. This approach requires at least 3 sensors for localization at a single point. It was also difficult to obtain accurate distances, and the sample size was greatly reduced by the failure to detect some whistles on all three recorders. As a result, this approach is not recommended for marine mammal abundance estimation when few recorders are available, or in high sound attenuation environments with relatively low sample sizes. It is hoped that these results lead to more informed management decisions, and therefore, more effective species conservation.
Resumo:
OBJECTIVES: In natural hearing, cochlear mechanical compression is dynamically adjusted via the efferent medial olivocochlear reflex (MOCR). These adjustments probably help understanding speech in noisy environments and are not available to the users of current cochlear implants (CIs). The aims of the present study are to: (1) present a binaural CI sound processing strategy inspired by the control of cochlear compression provided by the contralateral MOCR in natural hearing; and (2) assess the benefits of the new strategy for understanding speech presented in competition with steady noise with a speech-like spectrum in various spatial configurations of the speech and noise sources. DESIGN: Pairs of CI sound processors (one per ear) were constructed to mimic or not mimic the effects of the contralateral MOCR on compression. For the nonmimicking condition (standard strategy or STD), the two processors in a pair functioned similarly to standard clinical processors (i.e., with fixed back-end compression and independently of each other). When configured to mimic the effects of the MOCR (MOC strategy), the two processors communicated with each other and the amount of back-end compression in a given frequency channel of each processor in the pair decreased/increased dynamically (so that output levels dropped/increased) with increases/decreases in the output energy from the corresponding frequency channel in the contralateral processor. Speech reception thresholds in speech-shaped noise were measured for 3 bilateral CI users and 2 single-sided deaf unilateral CI users. Thresholds were compared for the STD and MOC strategies in unilateral and bilateral listening conditions and for three spatial configurations of the speech and noise sources in simulated free-field conditions: speech and noise sources colocated in front of the listener, speech on the left ear with noise in front of the listener, and speech on the left ear with noise on the right ear. In both bilateral and unilateral listening, the electrical stimulus delivered to the test ear(s) was always calculated as if the listeners were wearing bilateral processors. RESULTS: In both unilateral and bilateral listening conditions, mean speech reception thresholds were comparable with the two strategies for colocated speech and noise sources, but were at least 2 dB lower (better) with the MOC than with the STD strategy for spatially separated speech and noise sources. In unilateral listening conditions, mean thresholds improved with increasing the spatial separation between the speech and noise sources regardless of the strategy but the improvement was significantly greater with the MOC strategy. In bilateral listening conditions, thresholds improved significantly with increasing the speech-noise spatial separation only with the MOC strategy. CONCLUSIONS: The MOC strategy (1) significantly improved the intelligibility of speech presented in competition with a spatially separated noise source, both in unilateral and bilateral listening conditions; (2) produced significant spatial release from masking in bilateral listening conditions, something that did not occur with fixed compression; and (3) enhanced spatial release from masking in unilateral listening conditions. The MOC strategy as implemented here, or a modified version of it, may be usefully applied in CIs and in hearing aids.