991 resultados para sound source segregation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

While humans can easily segregate and track a speaker's voice in a loud noisy environment, most modern speech recognition systems still perform poorly in loud background noise. The computational principles behind auditory source segregation in humans is not yet fully understood. In this dissertation, we develop a computational model for source segregation inspired by auditory processing in the brain. To support the key principles behind the computational model, we conduct a series of electro-encephalography experiments using both simple tone-based stimuli and more natural speech stimulus. Most source segregation algorithms utilize some form of prior information about the target speaker or use more than one simultaneous recording of the noisy speech mixtures. Other methods develop models on the noise characteristics. Source segregation of simultaneous speech mixtures with a single microphone recording and no knowledge of the target speaker is still a challenge. Using the principle of temporal coherence, we develop a novel computational model that exploits the difference in the temporal evolution of features that belong to different sources to perform unsupervised monaural source segregation. While using no prior information about the target speaker, this method can gracefully incorporate knowledge about the target speaker to further enhance the segregation.Through a series of EEG experiments we collect neurological evidence to support the principle behind the model. Aside from its unusual structure and computational innovations, the proposed model provides testable hypotheses of the physiological mechanisms of the remarkable perceptual ability of humans to segregate acoustic sources, and of its psychophysical manifestations in navigating complex sensory environments. Results from EEG experiments provide further insights into the assumptions behind the model and provide motivation for future single unit studies that can provide more direct evidence for the principle of temporal coherence.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multiple sound sources often contain harmonics that overlap and may be degraded by environmental noise. The auditory system is capable of teasing apart these sources into distinct mental objects, or streams. Such an "auditory scene analysis" enables the brain to solve the cocktail party problem. A neural network model of auditory scene analysis, called the AIRSTREAM model, is presented to propose how the brain accomplishes this feat. The model clarifies how the frequency components that correspond to a give acoustic source may be coherently grouped together into distinct streams based on pitch and spatial cues. The model also clarifies how multiple streams may be distinguishes and seperated by the brain. Streams are formed as spectral-pitch resonances that emerge through feedback interactions between frequency-specific spectral representaion of a sound source and its pitch. First, the model transforms a sound into a spatial pattern of frequency-specific activation across a spectral stream layer. The sound has multiple parallel representations at this layer. A sound's spectral representation activates a bottom-up filter that is sensitive to harmonics of the sound's pitch. The filter activates a pitch category which, in turn, activate a top-down expectation that allows one voice or instrument to be tracked through a noisy multiple source environment. Spectral components are suppressed if they do not match harmonics of the top-down expectation that is read-out by the selected pitch, thereby allowing another stream to capture these components, as in the "old-plus-new-heuristic" of Bregman. Multiple simultaneously occuring spectral-pitch resonances can hereby emerge. These resonance and matching mechanisms are specialized versions of Adaptive Resonance Theory, or ART, which clarifies how pitch representations can self-organize durin learning of harmonic bottom-up filters and top-down expectations. The model also clarifies how spatial location cues can help to disambiguate two sources with similar spectral cures. Data are simulated from psychophysical grouping experiments, such as how a tone sweeping upwards in frequency creates a bounce percept by grouping with a downward sweeping tone due to proximity in frequency, even if noise replaces the tones at their interection point. Illusory auditory percepts are also simulated, such as the auditory continuity illusion of a tone continuing through a noise burst even if the tone is not present during the noise, and the scale illusion of Deutsch whereby downward and upward scales presented alternately to the two ears are regrouped based on frequency proximity, leading to a bounce percept. Since related sorts of resonances have been used to quantitatively simulate psychophysical data about speech perception, the model strengthens the hypothesis the ART-like mechanisms are used at multiple levels of the auditory system. Proposals for developing the model to explain more complex streaming data are also provided.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A new algorithm based on signal subspace approach is proposed for localizing a sound source in shallow water. In the first instance we assumed an ideal channel with plane parallel boundaries and known reflection properties. The sound source is assumed to emit a broadband stationary stochastic signal. The algorithm takes into account the spatial distribution of all images and reflection characteristics of the sea bottom. It is shown that both range and depth of a source can be measured accurately with the help of a vertical array of sensors. For good results the number of sensors should be greater than the number of significant images; however, localization is possible even with a smaller array but at the cost of higher side lobes. Next, we allowed the channel to be stochastically perturbed; this resulted in random phase errors in the reflection coefficients. The most singular effect of the phase errors is to introduce into the spectral matrix an extra term which may be looked upon as a signal generated coloured noise. It is shown through computer simulations that the signal peak height is reduced considerably as a consequence of random phase errors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper addresses the problem of separation of pitched sounds in monaural recordings. We present a novel feature for the estimation of parameters of overlapping harmonics which considers the covariance of partials of pitched sounds. Sound templates are formed from the monophonic parts of the mixture recording. A match for every note is found among these templates on the basis of covariance profile of their harmonics. The matching template for the note provides the second order characteristics for the overlapped harmonics of the note. The algorithm is tested on the RWC music database instrument sounds. The results clearly show that the covariance characteristics can be used to reconstruct overlapping harmonics effectively.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a new verification procedure for sound source coverage according to ISO 140?5 requirements. The ISO 140?5 standard applies to the measurement of façade insulation and requires a sound source able to achieve a sufficiently uniform sound field in free field conditions on the façade under study. The proposed method involves the electroacoustic characterisation of the sound source in laboratory free field conditions (anechoic room) and the subsequent prediction by computer simulation of the sound free field radiated on a rectangular surface equal in size to the façade being measured. The loudspeaker is characterised in an anechoic room under laboratory controlled conditions, carefully measuring directivity, and then a computer model is designed to calculate the acoustic free field coverage for different loudspeaker positions and façade sizes. For each sound source position, the method provides the maximum direct acoustic level differences on a façade specimen and therefore determines whether the loudspeaker verifies the maximum allowed level difference of 5 dB (or 10 dB for façade dimensions greater than 5 m) required by the ISO standard. Additionally, the maximum horizontal dimension of the façade meeting the standard is calculated and provided for each sound source position, both with the 5 dB and 10 dB criteria. In the last section of the paper, the proposed procedure is compared with another method used by the authors in the past to achieve the same purpose: in situ outdoor measurements attempting to recreate free field conditions. From this comparison, it is concluded that the proposed method is able to reproduce the actual measurements with high accuracy, for example, the ground reflection effect, at least at low frequencies, which is difficult to avoid in the outdoor measurement method, and it is fully eliminated with the proposed method to achieve the free field requisite.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a novel method for enabling a robot to determine the direction to a sound source through interacting with its environment. The method uses a new neural network, the Parameter-Less Self-Organizing Map algorithm, and reinforcement learning to achieve rapid and accurate response.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We consider sound source mechanisms involving the acoustic and instability modes of dual-stream isothermal supersonic jets with the inner nozzle buried within an outer shroud-like nozzle. A particular focus is scattering into radiating sound waves at the shroud lip. For such jets, several families of acoustically coupled instability waves exist, beyond the regular vortical Kelvin-Helmholtz mode, with different shapes and propagation characteristics, which can therefore affect the character of the radiated sound. In our model, the coaxial shear layers are vortex sheets while the incident acoustic disturbances are the propagating shroud modes. The Wiener-Hopf method is used to compute their scattering at the sharp shroud edge to obtain the far-field radiation. The resulting far-field directivity quantifies the acoustic efficiency of different mechanisms, which is particularly important in the upstream direction, where the results show that the scattered sound is more intense than that radiated directly by the shear-layer modes.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Over the past 50 years, economic and technological developments have dramatically increased the human contribution to ambient noise in the ocean. The dominant frequencies of most human-made noise in the ocean is in the low-frequency range (defined as sound energy below 1000Hz), and low-frequency sound (LFS) may travel great distances in the ocean due to the unique propagation characteristics of the deep ocean (Munk et al. 1989). For example, in the Northern Hemisphere oceans low-frequency ambient noise levels have increased by as much as 10 dB during the period from 1950 to 1975 (Urick 1986; review by NRC 1994). Shipping is the overwhelmingly dominant source of low-frequency manmade noise in the ocean, but other sources of manmade LFS including sounds from oil and gas industrial development and production activities (seismic exploration, construction work, drilling, production platforms), and scientific research (e.g., acoustic tomography and thermography, underwater communication). The SURTASS LFA system is an additional source of human-produced LFS in the ocean, contributing sound energy in the 100-500 Hz band. When considering a document that addresses the potential effects of a low-frequency sound source on the marine environment, it is important to focus upon those species that are the most likely to be affected. Important criteria are: 1) the physics of sound as it relates to biological organisms; 2) the nature of the exposure (i.e. duration, frequency, and intensity); and 3) the geographic region in which the sound source will be operated (which, when considered with the distribution of the organisms will determine which species will be exposed). The goal in this section of the LFA/EIS is to examine the status, distribution, abundance, reproduction, foraging behavior, vocal behavior, and known impacts of human activity of those species may be impacted by LFA operations. To focus our efforts, we have examined species that may be physically affected and are found in the region where the LFA source will be operated. The large-scale geographic location of species in relation to the sound source can be determined from the distribution of each species. However, the physical ability for the organism to be impacted depends upon the nature of the sound source (i.e. explosive, impulsive, or non-impulsive); and the acoustic properties of the medium (i.e. seawater) and the organism. Non-impulsive sound is comprised of the movement of particles in a medium. Motion is imparted by a vibrating object (diaphragm of a speaker, vocal chords, etc.). Due to the proximity of the particles in the medium, this motion is transmitted from particle to particle in waves away from the sound source. Because the particle motion is along the same axis as the propagating wave, the waves are longitudinal. Particles move away from then back towards the vibrating source, creating areas of compression (high pressure) and areas of rarefaction (low pressure). As the motion is transferred from one particle to the next, the sound propagates away from the sound source. Wavelength is the distance from one pressure peak to the next. Frequency is the number of waves passing per unit time (Hz). Sound velocity (not to be confused with particle velocity) is the impedance is loosely equivalent to the resistance of a medium to the passage of sound waves (technically it is the ratio of acoustic pressure to particle velocity). A high impedance means that acoustic particle velocity is small for a given pressure (low impedance the opposite). When a sound strikes a boundary between media of different impedances, both reflection and refraction, and a transfer of energy can occur. The intensity of the reflection is a function of the intensity of the sound wave and the impedances of the two media. Two key factors in determining the potential for damage due to a sound source are the intensity of the sound wave and the impedance difference between the two media (impedance mis-match). The bodies of the vast majority of organisms in the ocean (particularly phytoplankton and zooplankton) have similar sound impedence values to that of seawater. As a result, the potential for sound damage is low; organisms are effectively transparent to the sound – it passes through them without transferring damage-causing energy. Due to the considerations above, we have undertaken a detailed analysis of species which met the following criteria: 1) Is the species capable of being physically affected by LFS? Are acoustic impedence mis-matches large enough to enable LFS to have a physical affect or allow the species to sense LFS? 2) Does the proposed SURTASS LFA geographical sphere of acoustic influence overlap the distribution of the species? Species that did not meet the above criteria were excluded from consideration. For example, phytoplankton and zooplankton species lack acoustic impedance mis-matches at low frequencies to expect them to be physically affected SURTASS LFA. Vertebrates are the organisms that fit these criteria and we have accordingly focused our analysis of the affected environment on these vertebrate groups in the world’s oceans: fishes, reptiles, seabirds, pinnipeds, cetaceans, pinnipeds, mustelids, sirenians (Table 1).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The ability to isolate a single sound source among concurrent sources and reverberant energy is necessary for understanding the auditory world. The precedence effect describes a related experimental finding, that when presented with identical sounds from two locations with a short onset asynchrony (on the order of milliseconds), listeners report a single source with a location dominated by the lead sound. Single-cell recordings in multiple animal models have indicated that there are low-level mechanisms that may contribute to the precedence effect, yet psychophysical studies in humans have provided evidence that top-down cognitive processes have a great deal of influence on the perception of simulated echoes. In the present study, event-related potentials evoked by click pairs at and around listeners' echo thresholds indicate that perception of the lead and lag sound as individual sources elicits a negativity between 100 and 250 msec, previously termed the object-related negativity (ORN). Even for physically identical stimuli, the ORN is evident when listeners report hearing, as compared with not hearing, a second sound source. These results define a neural mechanism related to the conscious perception of multiple auditory objects.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

An aerodynamic sound source extraction from a general flow field is applied to a number of model problems and to a problem of engineering interest. The extraction technique is based on a variable decomposition, which results to an acoustic correction method, of each of the flow variables into a dominant flow component and a perturbation component. The dominant flow component is obtained with a general-purpose Computational Fluid Dynamics (CFD) code which uses a cell-centred finite volume method to solve the Reynolds-averaged Navier–Stokes equations. The perturbations are calculated from a set of acoustic perturbation equations with source terms extracted from unsteady CFD solutions at each time step via the use of a staggered dispersion-relation-preserving (DRP) finite-difference scheme. Numerical experiments include (1) propagation of a 1-D acoustic pulse without mean flow, (2) propagation of a 2-D acoustic pulse with/without mean flow, (3) reflection of an acoustic pulse from a flat plate with mean flow, and (4) flow-induced noise generated by the an unsteady laminar flow past a 2-D cavity. The computational results demonstrate the accuracy for model problems and illustrate the feasibility for more complex aeroacoustic problems of the source extraction technique.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

To intercept a moving object, one needs to be in the right place at the right time. In order to do this, it is necessary to pick up and use perceptual information that specifies the time to arrival of an object at an interception point. In the present study, we examined the ability to intercept a laterally moving virtual sound object by controlling the displacement of a sliding handle and tested whether and how the interaural time difference (ITD) could be the main source of perceptual information for successfully intercepting the virtual object. The results revealed that in order to accomplish the task, one might need to vary the duration of the movement, control the hand velocity and time to reach the peak velocity (speed coupling), while the adjustment of movement initiation did not facilitate performance. Furthermore, the overall performance was more successful when subjects employed a time-to-contact (tau) coupling strategy. This result shows that prospective information is available in sound for guiding goal-directed actions.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Sound localization can be defined as the ability to identify the position of an input sound source and is considered a powerful aspect of mammalian perception. For low frequency sounds, i.e., in the range 270 Hz-1.5 KHz, the mammalian auditory pathway achieves this by extracting the Interaural Time Difference between sound signals being received by the left and right ear. This processing is performed in a region of the brain known as the Medial Superior Olive (MSO). This paper presents a Spiking Neural Network (SNN) based model of the MSO. The network model is trained using the Spike Timing Dependent Plasticity learning rule using experimentally observed Head Related Transfer Function data in an adult domestic cat. The results presented demonstrate how the proposed SNN model is able to perform sound localization with an accuracy of 91.82% when an error tolerance of +/-10 degrees is used. For angular resolutions down to 2.5 degrees , it will be demonstrated how software based simulations of the model incur significant computation times. The paper thus also addresses preliminary implementation on a Field Programmable Gate Array based hardware platform to accelerate system performance.