865 resultados para sound source segregation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

While humans can easily segregate and track a speaker's voice in a loud noisy environment, most modern speech recognition systems still perform poorly in loud background noise. The computational principles behind auditory source segregation in humans is not yet fully understood. In this dissertation, we develop a computational model for source segregation inspired by auditory processing in the brain. To support the key principles behind the computational model, we conduct a series of electro-encephalography experiments using both simple tone-based stimuli and more natural speech stimulus. Most source segregation algorithms utilize some form of prior information about the target speaker or use more than one simultaneous recording of the noisy speech mixtures. Other methods develop models on the noise characteristics. Source segregation of simultaneous speech mixtures with a single microphone recording and no knowledge of the target speaker is still a challenge. Using the principle of temporal coherence, we develop a novel computational model that exploits the difference in the temporal evolution of features that belong to different sources to perform unsupervised monaural source segregation. While using no prior information about the target speaker, this method can gracefully incorporate knowledge about the target speaker to further enhance the segregation.Through a series of EEG experiments we collect neurological evidence to support the principle behind the model. Aside from its unusual structure and computational innovations, the proposed model provides testable hypotheses of the physiological mechanisms of the remarkable perceptual ability of humans to segregate acoustic sources, and of its psychophysical manifestations in navigating complex sensory environments. Results from EEG experiments provide further insights into the assumptions behind the model and provide motivation for future single unit studies that can provide more direct evidence for the principle of temporal coherence.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sound source localization (SSL) is an essential task in many applications involving speech capture and enhancement. As such, speaker localization with microphone arrays has received significant research attention. Nevertheless, existing SSL algorithms for small arrays still have two significant limitations: lack of range resolution, and accuracy degradation with increasing reverberation. The latter is natural and expected, given that strong reflections can have amplitudes similar to that of the direct signal, but different directions of arrival. Therefore, correctly modeling the room and compensating for the reflections should reduce the degradation due to reverberation. In this paper, we show a stronger result. If modeled correctly, early reflections can be used to provide more information about the source location than would have been available in an anechoic scenario. The modeling not only compensates for the reverberation, but also significantly increases resolution for range and elevation. Thus, we show that under certain conditions and limitations, reverberation can be used to improve SSL performance. Prior attempts to compensate for reverberation tried to model the room impulse response (RIR). However, RIRs change quickly with speaker position, and are nearly impossible to track accurately. Instead, we build a 3-D model of the room, which we use to predict early reflections, which are then incorporated into the SSL estimation. Simulation results with real and synthetic data show that even a simplistic room model is sufficient to produce significant improvements in range and elevation estimation, tasks which would be very difficult when relying only on direct path signal components.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a new verification procedure for sound source coverage according to ISO 140?5 requirements. The ISO 140?5 standard applies to the measurement of façade insulation and requires a sound source able to achieve a sufficiently uniform sound field in free field conditions on the façade under study. The proposed method involves the electroacoustic characterisation of the sound source in laboratory free field conditions (anechoic room) and the subsequent prediction by computer simulation of the sound free field radiated on a rectangular surface equal in size to the façade being measured. The loudspeaker is characterised in an anechoic room under laboratory controlled conditions, carefully measuring directivity, and then a computer model is designed to calculate the acoustic free field coverage for different loudspeaker positions and façade sizes. For each sound source position, the method provides the maximum direct acoustic level differences on a façade specimen and therefore determines whether the loudspeaker verifies the maximum allowed level difference of 5 dB (or 10 dB for façade dimensions greater than 5 m) required by the ISO standard. Additionally, the maximum horizontal dimension of the façade meeting the standard is calculated and provided for each sound source position, both with the 5 dB and 10 dB criteria. In the last section of the paper, the proposed procedure is compared with another method used by the authors in the past to achieve the same purpose: in situ outdoor measurements attempting to recreate free field conditions. From this comparison, it is concluded that the proposed method is able to reproduce the actual measurements with high accuracy, for example, the ground reflection effect, at least at low frequencies, which is difficult to avoid in the outdoor measurement method, and it is fully eliminated with the proposed method to achieve the free field requisite.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a novel method for enabling a robot to determine the direction to a sound source through interacting with its environment. The method uses a new neural network, the Parameter-Less Self-Organizing Map algorithm, and reinforcement learning to achieve rapid and accurate response.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

SOUND OBJECTS IN TIME, SPACE AND ACTIONThe term "sound object" describes an auditory experience that is associated with an acoustic event produced by a sound source. At cortical level, sound objects are represented by temporo-spatial activity patterns within distributed neural networks. This investigation concerns temporal, spatial and action aspects as assessed in normal subjects using electrical imaging or measurement of motor activity induced by transcranial magnetic stimulation (TMS).Hearing the same sound again has been shown to facilitate behavioral responses (repetition priming) and to modulate neural activity (repetition suppression). In natural settings the same source is often heard again and again, with variations in spectro-temporal and spatial characteristics. I have investigated how such repeats influence response times in a living vs. non-living categorization task and the associated spatio-temporal patterns of brain activity in humans. Dynamic analysis of distributed source estimations revealed differential sound object representations within the auditory cortex as a function of the temporal history of exposure to these objects. Often heard sounds are coded by a modulation in a bilateral network. Recently heard sounds, independently of the number of previous exposures, are coded by a modulation of a left-sided network.With sound objects which carry spatial information, I have investigated how spatial aspects of the repeats influence neural representations. Dynamics analyses of distributed source estimations revealed an ultra rapid discrimination of sound objects which are characterized by spatial cues. This discrimination involved two temporo-spatially distinct cortical representations, one associated with position-independent and the other with position-linked representations within the auditory ventral/what stream.Action-related sounds were shown to increase the excitability of motoneurons within the primary motor cortex, possibly via an input from the mirror neuron system. The role of motor representations remains unclear. I have investigated repetition priming-induced plasticity of the motor representations of action sounds with the measurement of motor activity induced by TMS pulses applied on the hand motor cortex. TMS delivered to the hand area within the primary motor cortex yielded larger magnetic evoked potentials (MEPs) while the subject was listening to sounds associated with manual than non- manual actions. Repetition suppression was observed at motoneuron level, since during a repeated exposure to the same manual action sound the MEPs were smaller. I discuss these results in terms of specialized neural network involved in sound processing, which is characterized by repetition-induced plasticity.Thus, neural networks which underlie sound object representations are characterized by modulations which keep track of the temporal and spatial history of the sound and, in case of action related sounds, also of the way in which the sound is produced.LES OBJETS SONORES AU TRAVERS DU TEMPS, DE L'ESPACE ET DES ACTIONSLe terme "objet sonore" décrit une expérience auditive associée avec un événement acoustique produit par une source sonore. Au niveau cortical, les objets sonores sont représentés par des patterns d'activités dans des réseaux neuronaux distribués. Ce travail traite les aspects temporels, spatiaux et liés aux actions, évalués à l'aide de l'imagerie électrique ou par des mesures de l'activité motrice induite par stimulation magnétique trans-crânienne (SMT) chez des sujets sains. Entendre le même son de façon répétitive facilite la réponse comportementale (amorçage de répétition) et module l'activité neuronale (suppression liée à la répétition). Dans un cadre naturel, la même source est souvent entendue plusieurs fois, avec des variations spectro-temporelles et de ses caractéristiques spatiales. J'ai étudié la façon dont ces répétitions influencent le temps de réponse lors d'une tâche de catégorisation vivant vs. non-vivant, et les patterns d'activité cérébrale qui lui sont associés. Des analyses dynamiques d'estimations de sources ont révélé des représentations différenciées des objets sonores au niveau du cortex auditif en fonction de l'historique d'exposition à ces objets. Les sons souvent entendus sont codés par des modulations d'un réseau bilatéral. Les sons récemment entendus sont codé par des modulations d'un réseau du côté gauche, indépendamment du nombre d'expositions. Avec des objets sonores véhiculant de l'information spatiale, j'ai étudié la façon dont les aspects spatiaux des sons répétés influencent les représentations neuronales. Des analyses dynamiques d'estimations de sources ont révélé une discrimination ultra rapide des objets sonores caractérisés par des indices spatiaux. Cette discrimination implique deux représentations corticales temporellement et spatialement distinctes, l'une associée à des représentations indépendantes de la position et l'autre à des représentations liées à la position. Ces représentations sont localisées dans la voie auditive ventrale du "quoi".Des sons d'actions augmentent l'excitabilité des motoneurones dans le cortex moteur primaire, possiblement par une afférence du system des neurones miroir. Le rôle des représentations motrices des sons d'actions reste peu clair. J'ai étudié la plasticité des représentations motrices induites par l'amorçage de répétition à l'aide de mesures de potentiels moteurs évoqués (PMEs) induits par des pulsations de SMT sur le cortex moteur de la main. La SMT appliquée sur le cortex moteur primaire de la main produit de plus grands PMEs alors que les sujets écoutent des sons associée à des actions manuelles en comparaison avec des sons d'actions non manuelles. Une suppression liée à la répétition a été observée au niveau des motoneurones, étant donné que lors de l'exposition répétée au son de la même action manuelle les PMEs étaient plus petits. Ces résultats sont discuté en termes de réseaux neuronaux spécialisés impliqués dans le traitement des sons et caractérisés par de la plasticité induite par la répétition. Ainsi, les réseaux neuronaux qui sous-tendent les représentations des objets sonores sont caractérisés par des modulations qui gardent une trace de l'histoire temporelle et spatiale du son ainsi que de la manière dont le son a été produit, en cas de sons d'actions.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Environmental sounds are highly complex stimuli whose recognition depends on the interaction of top-down and bottom-up processes in the brain. Their semantic representations were shown to yield repetition suppression effects, i. e. a decrease in activity during exposure to a sound that is perceived as belonging to the same source as a preceding sound. Making use of the high spatial resolution of 7T fMRI we have investigated the representations of sound objects within early-stage auditory areas on the supratemporal plane. The primary auditory cortex was identified by means of tonotopic mapping and the non-primary areas by comparison with previous histological studies. Repeated presentations of different exemplars of the same sound source, as compared to the presentation of different sound sources, yielded significant repetition suppression effects within a subset of early-stage areas. This effect was found within the right hemisphere in primary areas A1 and R as well as two non-primary areas on the antero-medial part of the planum temporale, and within the left hemisphere in A1 and a non-primary area on the medial part of Heschl's gyrus. Thus, several, but not all early-stage auditory areas encode the meaning of environmental sounds.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The characteristics of moving sound sources have strong implications on the listener's distance perception and the estimation of velocity. Modifications of the typical sound emissions as they are currently occurring due to the tendency towards electromobility have an impact on the pedestrian's safety in road traffic. Thus, investigations of the relevant cues for velocity and distance perception of moving sound sources are not only of interest for the psychoacoustic community, but also for several applications, like e.g. virtual reality, noise pollution and safety aspects of road traffic. This article describes a series of psychoacoustic experiments in this field. Dichotic and diotic stimuli of a set of real-life recordings taken from a passing passenger car and a motorcycle were presented to test subjects who in turn were asked to determine the velocity of the object and its minimal distance from the listener. The results of these psychoacoustic experiments show that the estimated velocity is strongly linked to the object's distance. Furthermore, it could be shown that binaural cues contribute significantly to the perception of velocity. In a further experiment, it was shown that - independently of the type of the vehicle - the main parameter for distance determination is the maximum sound pressure level at the listener's position. The article suggests a system architecture for the adequate consideration of moving sound sources in virtual auditory environments. Virtual environments can thus be used to investigate the influence of new vehicle powertrain concepts and the related sound emissions of these vehicles on the pedestrians' ability to estimate the distance and velocity of moving objects.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Barn owls can localize a sound source using either the map of auditory space contained in the optic tectum or the auditory forebrain. The auditory thalamus, nucleus ovoidalis (N.Ov), is situated between these two auditory areas, and its inactivation precludes the use of the auditory forebrain for sound localization. We examined the sources of inputs to the N.Ov as well as their patterns of termination within the nucleus. We also examined the response of single neurons within the N.Ov to tonal stimuli and sound localization cues. Afferents to the N.Ov originated with a diffuse population of neurons located bilaterally within the lateral shell, core, and medial shell subdivisions of the central nucleus of the inferior colliculus. Additional afferent input originated from the ipsilateral ventral nucleus of the lateral lemniscus. No afferent input was provided to the N.Ov from the external nucleus of the inferior colliculus or the optic tectum. The N.Ov was tonotopically organized with high frequencies represented dorsally and low frequencies ventrally. Although neurons in the N.Ov responded to localization cues, there was no apparent topographic mapping of these cues within the nucleus, in contrast to the tectal pathway. However, nearly all possible types of binaural response to sound localization cues were represented. These findings suggest that in the thalamo-telencephalic auditory pathway, sound localization is subserved by a nontopographic representation of auditory space.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Our current understanding of the sound-generating mechanism in the songbird vocal organ, the syrinx, is based on indirect evidence and theoretical treatments. The classical avian model of sound production postulates that the medial tympaniform membranes (MTM) are the principal sound generators. We tested the role of the MTM in sound generation and studied the songbird syrinx more directly by filming it endoscopically. After we surgically incapacitated the MTM as a vibratory source, zebra finches and cardinals were not only able to vocalize, but sang nearly normal song. This result shows clearly that the MTM are not the principal sound source. The endoscopic images of the intact songbird syrinx during spontaneous and brain stimulation-induced vocalizations illustrate the dynamics of syringeal reconfiguration before phonation and suggest a different model for sound production. Phonation is initiated by rostrad movement and stretching of the syrinx. At the same time, the syrinx is closed through movement of two soft tissue masses, the medial and lateral labia, into the bronchial lumen. Sound production always is accompanied by vibratory motions of both labia, indicating that these vibrations may be the sound source. However, because of the low temporal resolution of the imaging system, the frequency and phase of labial vibrations could not be assessed in relation to that of the generated sound. Nevertheless, in contrast to the previous model, these observations show that both labia contribute to aperture control and strongly suggest that they play an important role as principal sound generators.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

One of the most popular techniques for creating spatialized virtual sounds is based on the use of Head-Related Transfer Functions (HRTFs). HRTFs are signal processing models that represent the modifications undergone by the acoustic signal as it travels from a sound source to each of the listener's eardrums. These modifications are due to the interaction of the acoustic waves with the listener's torso, shoulders, head and pinnae, or outer ears. As such, HRTFs are somewhat different for each listener. For a listener to perceive synthesized 3-D sound cues correctly, the synthesized cues must be similar to the listener's own HRTFs. ^ One can measure individual HRTFs using specialized recording systems, however, these systems are prohibitively expensive and restrict the portability of the 3-D sound system. HRTF-based systems also face several computational challenges. This dissertation presents an alternative method for the synthesis of binaural spatialized sounds. The sound entering the pinna undergoes several reflective, diffractive and resonant phenomena, which determine the HRTF. Using signal processing tools, such as Prony's signal modeling method, an appropriate set of time delays and a resonant frequency were used to approximate the measured Head-Related Impulse Responses (HRIRs). Statistical analysis was used to find out empirical equations describing how the reflections and resonances are determined by the shape and size of the pinna features obtained from 3D images of 15 experimental subjects modeled in the project. These equations were used to yield “Model HRTFs” that can create elevation effects. ^ Listening tests conducted on 10 subjects show that these model HRTFs are 5% more effective than generic HRTFs when it comes to localizing sounds in the frontal plane. The number of reversals (perception of sound source above the horizontal plane when actually it is below the plane and vice versa) was also reduced by 5.7%, showing the perceptual effectiveness of this approach. The model is simple, yet versatile because it relies on easy to measure parameters to create an individualized HRTF. This low-order parameterized model also reduces the computational and storage demands, while maintaining a sufficient number of perceptually relevant spectral cues. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Social structure is a key determinant of population biology and is central to the way animals exploit their environment. The risk of predation is often invoked as an important factor influencing the evolution of social structure in cetaceans and other mammals, but little direct information is available about how cetaceans actually respond to predators or other perceived threats. The playback of sounds to an animal is a powerful tool for assessing behavioral responses to predators, but quantifying behavioral responses to playback experiments requires baseline knowledge of normal behavioral patterns and variation. The central goal of my dissertation is to describe baseline foraging behavior for the western Atlantic short-finnned pilot whales (Globicephala macrohynchus) and examine the role of social organization in their response to predators. To accomplish this I used multi-sensor digital acoustic tags (DTAGs), satellite-linked time-depth recorders (SLTDR), and playback experiments to study foraging behavior and behavioral response to predators in pilot whales. Fine scale foraging strategies and population level patterns were identified by estimating the body size and examining the location and movement around feeding events using data collected with DTAGs deployed on 40 pilot whales in summers of 2008-2014 off the coast of Cape Hatteras, North Carolina. Pilot whales were found to forage throughout the water column and performed feeding buzzes at depths ranging from 29-1176 meters. The results indicated potential habitat segregation in foraging depth in short-finned pilot whales with larger individuals foraging on average at deeper depths. Calculated aerobic dive limit for large adult males was approximately 6 minutes longer than that of females and likely facilitated the difference in foraging depth. Furthermore, the buzz frequency and speed around feeding attempts indicate this population pilot whales are likely targeting multiple small prey items. Using these results, I built decision trees to inform foraging dive classification in coarse, long-term dive data collected with SLTDRs deployed on 6 pilot whales in the summers of 2014 and 2015 in the same area off the coast of North Carolina. I used these long term foraging records to compare diurnal foraging rates and depths, as well as classify bouts with a maximum likelihood method, and evaluate behavioral aerobic dive limits (ADLB) through examination of dive durations and inter-dive intervals. Dive duration was the best predictor of foraging, with dives >400.6 seconds classified as foraging, and a 96% classification accuracy. There were no diurnal patterns in foraging depth or rates and average duration of bouts was 2.94 hours with maximum bout durations lasting up to 14 hours. The results indicated that pilot whales forage in relatively long bouts and the ADLB indicate that pilot whales rarely, if ever exceed their aerobic limits. To evaluate the response to predators I used controlled playback experiments to examine the behavioral responses of 10 of the tagged short-finned pilot whales off Cape Hatteras, North Carolina and 4 Risso’s dolphins (Grampus griseus) off Southern California to the calls of mammal-eating killer whales (MEK). Both species responded to a subset of MEK calls with increased movement, swim speed and increased cohesion of the focal groups, but the two species exhibited different directional movement and vocal responses. Pilot whales increased their call rate and approached the sound source, but Risso’s dolphins exhibited no change in their vocal behavior and moved in a rapid, directed manner away from the source. Thus, at least to a sub-set of mammal-eating killer whale calls, these two study species reacted in a manner that is consistent with their patterns of social organization. Pilot whales, which live in relatively permanent groups bound by strong social bonds, responded in a manner that built on their high levels of social cohesion. In contrast, Risso’s dolphins exhibited an exaggerated flight response and moved rapidly away from the sound source. The fact that both species responded strongly to a select number of MEK calls, suggests that structural features of signals play critical contextual roles in the probability of response to potential threats in odontocete cetaceans.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Sound localisation is defined as the ability to identify the position of a sound source. The brain employs two cues to achieve this functionality for the horizontal plane, interaural time difference (ITD) by means of neurons in the medial superior olive (MSO) and interaural intensity difference (IID) by neurons of the lateral superior olive (LSO), both located in the superior olivary complex of the auditory pathway. This paper presents spiking neuron architectures of the MSO and LSO. An implementation of the Jeffress model using spiking neurons is presented as a representation of the MSO, while a spiking neuron architecture showing how neurons of the medial nucleus of the trapezoid body interact with LSO neurons to determine the azimuthal angle is discussed. Experimental results to support this work are presented.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

One of the major challenges in the development of an immersive system is handling the delay between the tracking of the user’s head position and the updated projection of a 3D image or auralised sound, also called end-to-end delay. Excessive end-to-end delay can result in the general decrement of the “feeling of presence”, the occurrence of motion sickness and poor performance in perception-action tasks. These latencies must be known in order to provide insights on the technological (hardware/software optimization) or psychophysical (recalibration sessions) strategies to deal with them. Our goal was to develop a new measurement method of end-to-end delay that is both precise and easily replicated. We used a Head and Torso simulator (HATS) as an auditory signal sensor, a fast response photo-sensor to detect a visual stimulus response from a Motion Capture System, and a voltage input trigger as real-time event. The HATS was mounted in a turntable which allowed us to precisely change the 3D sound relative to the head position. When the virtual sound source was at 90º azimuth, the correspondent HRTF would set all the intensity values to zero, at the same time a trigger would register the real-time event of turning the HATS 90º azimuth. Furthermore, with the HATS turned 90º to the left, the motion capture marker visualization would fell exactly in the photo-sensor receptor. This method allowed us to precisely measure the delay from tracking to displaying. Moreover, our results show that the method of tracking, its tracking frequency, and the rendering of the sound reflections are the main predictors of end-to-end delay.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The paper analyses and compares infrasonic and seismic data from snow avalanches monitored at the Vallée de la Sionne test site in Switzerland from 2009 to 2010. Using a combination of seismic and infrasound sensors, it is possible not only to detect a snow avalanche but also to distinguish between the different flow regimes and to analyse duration, average speed (for sections of the avalanche path) and avalanche size. Different sensitiveness of the seismic and infrasound sensors to the avalanche regimes is shown. Furthermore, the high amplitudes observed in the infrasound signal for one avalanche were modelled assuming that the suspension layer of the avalanche acts as a moving turbulent sound source. Our results show reproducibility for similar avalanches on the same avalanche path.