Biblioteca Digital

998 resultados para Moving Sound Source

Robust indoor speaker recognition in a network of audio and video sensors

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Situational awareness is achieved naturally by the human senses of sight and hearing in combination. Automatic scene understanding aims at replicating this human ability using microphones and cameras in cooperation. In this paper, audio and video signals are fused and integrated at different levels of semantic abstractions. We detect and track a speaker who is relatively unconstrained, i.e., free to move indoors within an area larger than the comparable reported work, which is usually limited to round table meetings. The system is relatively simple: consisting of just 4 microphone pairs and a single camera. Results show that the overall multimodal tracker is more reliable than single modality systems, tolerating large occlusions and cross-talk. System evaluation is performed on both single and multi-modality tracking. The performance improvement given by the audio–video integration and fusion is quantified in terms of tracking precision and accuracy as well as speaker diarisation error rate and precision–recall (recognition). Improvements vs. the closest works are evaluated: 56% sound source localisation computational cost over an audio only system, 8% speaker diarisation error rate over an audio only speaker recognition unit and 36% on the precision–recall metric over an audio–video dominant speaker recognition method.

Student's second-language grade may depend on classroom listening position

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The purpose of this experiment was to explore whether listening positions (close or distant location from the sound source) in the classroom, and classroom reverberation, influence students' score on a test for second-language (L2) listening comprehension (i.e., comprehension of English in Swedish speaking participants). The listening comprehension test administered was part of a standardized national test of English used in the Swedish school system. A total of 125 high school pupils, 15 years old, participated. Listening position was manipulated within subjects, classroom reverberation between subjects. The results showed that L2 listening comprehension decreased as distance from the sound source increased. The effect of reverberation was qualified by the participants' baseline L2 proficiency. A shorter reverberation was beneficial to participants with high L2 proficiency, while the opposite pattern was found among the participants with low L2 proficiency. The results indicate that listening comprehension scores-and hence students' grade in English-may depend on students' classroom listening position.

Using the interaural time difference and cross-correlation to localise short-term complex noises

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The mammalian binaural cue of interaural time difference (ITD) and cross-correlation have long been used to determine the point of origin of a sound source. The ITD can be defined as the different points in time at which a sound from a single location arrives at each individual ear [1]. From this time difference, the brain can calculate the angle of the sound source in relation to the head [2]. Cross-correlation compares the similarity of each channel of a binaural waveform producing the time lag or offset required for both channels to be in phase with one another. This offset corresponds to the maximum value produced by the cross-correlation function and can be used to determine the ITD and thus the azimuthal angle θ of the original sound source. However, in indoor environments, cross-correlation has been known to have problems with both sound reflections and reverberations. Additionally, cross-correlation has difficulties with localising short-term complex noises when they occur during a longer duration waveform, i.e. in the presence of background noise. The crosscorrelation algorithm processes the entire waveform and the short-term complex noise can be ignored. This paper presents a technique using thresholding which enables higher-localisation abilities for short-term complex sounds in the midst of background noise. To determine the success of this thresholding technique, twenty-five sounds were recorded in a dynamic and echoic environment. The twenty-five sounds consist of hand-claps, finger-clicks and speech. The proposed technique was compared to the regular cross-correlation function for the same waveforms, and an average of the azimuthal angles determined for each individual sample. The sound localisation ability for all twenty-five sound samples is as follows: average of the sampled angles using cross-correlation: 44%; cross-correlation technique with thresholding: 84%. From these results, it is clear that this proposed technique is very successful for the localisation of short-term complex sounds in the midst of background noise and in a dynamic and echoic indoor environment.

Using electroencephalography signals to control acoustical processing

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Current hearing-assistive technology performs poorly in noisy multi-talker conditions. The goal of this thesis was to establish the feasibility of using EEG to guide acoustic processing in such conditions. To attain this goal, this research developed a model via the constructive research method, relying on literature review. Several approaches have revealed improvements in the performance of hearing-assistive devices under multi-talker conditions, namely beamforming spatial filtering, model-based sparse coding shrinkage, and onset enhancement of the speech signal. Prior research has shown that electroencephalography (EEG) signals contain information that concerns whether the person is actively listening, what the listener is listening to, and where the attended sound source is. This thesis constructed a model for using EEG information to control beamforming, model-based sparse coding shrinkage, and onset enhancement of the speech signal. The purpose of this model is to propose a framework for using EEG signals to control sound processing to select a single talker in a noisy environment containing multiple talkers speaking simultaneously. On a theoretical level, the model showed that EEG can control acoustical processing. An analysis of the model identified a requirement for real-time processing and that the model inherits the computationally intensive properties of acoustical processing, although the model itself is low complexity placing a relatively small load on computational resources. A research priority is to develop a prototype that controls hearing-assistive devices with EEG. This thesis concludes highlighting challenges for future research.

Leader Based Cyclic Pursuit

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this work a system of autonomous agents engaged in cyclic pursuit (under constant bearing (CB) strategy) is considered, for which one informed agent (the leader) also senses and responds to a stationary beacon. Building on the framework proposed in a previous work on beacon-referenced cyclic pursuit, necessary and suffi- cient conditions for the existence of circling equilibria in a system with one informed agent are derived, with discussion of stability and performance. In a physical testbed, the leader (robot) is equipped with a sound sensing apparatus composed of a real time embedded system, estimating direction of arrival of sound by an Interaural Level and Phase Difference Algorithm, using empirically determined phase and level signatures, and breaking front-back ambiguity with appropriate sensor placement. Furthermore a simple framework for implementing and evaluating the performance of control laws with the Robot Operating System (ROS) is proposed, demonstrated, and discussed.

Étude des mécanismes de localisation auditive et de leur plasticité dans le cortex auditif humain

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Pouvoir déterminer la provenance des sons est fondamental pour bien interagir avec notre environnement. La localisation auditive est une faculté importante et complexe du système auditif humain. Le cerveau doit décoder le signal acoustique pour en extraire les indices qui lui permettent de localiser une source sonore. Ces indices de localisation auditive dépendent en partie de propriétés morphologiques et environnementales qui ne peuvent être anticipées par l'encodage génétique. Le traitement de ces indices doit donc être ajusté par l'expérience durant la période de développement. À l’âge adulte, la plasticité en localisation auditive existe encore. Cette plasticité a été étudiée au niveau comportemental, mais on ne connaît que très peu ses corrélats et mécanismes neuronaux. La présente recherche avait pour objectif d'examiner cette plasticité, ainsi que les mécanismes d'encodage des indices de localisation auditive, tant sur le plan comportemental, qu'à travers les corrélats neuronaux du comportement observé. Dans les deux premières études, nous avons imposé un décalage perceptif de l’espace auditif horizontal à l’aide de bouchons d’oreille numériques. Nous avons montré que de jeunes adultes peuvent rapidement s’adapter à un décalage perceptif important. Au moyen de l’IRM fonctionnelle haute résolution, nous avons observé des changements de l’activité corticale auditive accompagnant cette adaptation, en termes de latéralisation hémisphérique. Nous avons également pu confirmer l’hypothèse de codage par hémichamp comme représentation de l'espace auditif horizontal. Dans une troisième étude, nous avons modifié l’indice auditif le plus important pour la perception de l’espace vertical à l’aide de moulages en silicone. Nous avons montré que l’adaptation à cette modification n’était suivie d’aucun effet consécutif au retrait des moulages, même lors de la toute première présentation d’un stimulus sonore. Ce résultat concorde avec l’hypothèse d’un mécanisme dit de many-to-one mapping, à travers lequel plusieurs profils spectraux peuvent être associés à une même position spatiale. Dans une quatrième étude, au moyen de l’IRM fonctionnelle et en tirant profit de l’adaptation aux moulages de silicone, nous avons révélé l’encodage de l’élévation sonore dans le cortex auditif humain.

Modelagem da propagação sonora em dutos : abordagem matemática no domínio da frequência e do tempo utilizando transformada de laplace

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Dissertação (mestrado)—Universidade de Brasília, Faculdade UnB Gama, Faculdade de Tecnologia, Programa de Pós-graduação em Integridade de Materiais da Engenharia, 2016.

Softening the Lower Crust: Modes of Syn-Transport Transposition Around and Adjacent to a Deep Crustal Granulite Nappe, Parry Sound Domain, Grenville Province, Ontario, Canada

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The Parry Sound domain is a granulite nappe-stack transported cratonward during reactivation of the ductile lower and middle crust in the late convergence of the Mesoproterozoic Grenville orogeny. Field observations suggest the following with respect to the ductile sheath: (1) Formation of a carapace of transposed amphibolite facies gneiss derived from and enveloping the western extremity of the Parry Sound domain and separating it from high-strain gneiss of adjacent allochthons. This ductile sheath formed dynamically around the moving granulite nappe through the development of systems of progressively linked shear zones. (2) Transposition initiated by hydration (amphibolization) of granulite facies gneiss by introduction of fluid along cracks accompanying pegmatite emplacement. Shear zones nucleated along pegmatite margins and subsequently linked and rotated. The source of the pegmatites was most likely subjacent migmatitic and pegmatite-rich units or units over which Parry Sound domain was transported. Comparison of gneisses of the ductile sheath with high-strain layered gneiss of adjacent allochthons show the mode of transposition of penetratively layered gneiss depended on whether or not the gneiss protoliths were amphibolite or granulite facies tectonites before initiation of transposition, resulting in, e.g., folding before shearing, no folding before shearing, respectively. Meter-scale truncation along high-strain gradients at the margins of both types of transposition-related shear zones observed within and marginal to Parry Sound domain mimic features at kilometer scales, implying that apparent truncation by transposition originating in a manner similar to the ductile sheath may be a common feature of deep crustal ductile reworking. Citation: Culshaw, N., C. Gerbi, and J. Marsh (2010), Softening the lower crust: Modes of syn-transport transposition around and adjacent to a deep crustal granulite nappe, Parry Sound domain, Grenville Province, Ontario, Canada, Tectonics, 29, TC5013, doi:10.1029/2009TC002537.

Moving toward change: Institutionalizing reform through implementation of the Learning Assistant model and Open Source Tutorials

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Florida International University has undergone a reform in the introductory physics classes by focusing on the laboratory component of these classes. We present results from the secondary implementation of two research-based instructional strategies: the implementation of the Learning Assistant model as developed by the University of Colorado at Boulder and the Open Source Tutorial curriculum developed at the University of Maryland, College Park. We examine the results of the Force Concept Inventory (FCI) for introductory students over five years (n=872) and find that the mean raw gain of students in transformed lab sections was 0.243, while the mean raw gain of the traditional labs was 0.159, with a Cohen’s d effect size of 0.59. Average raw gains on the FCI were 0.243 for Hispanic students and 0.213 for women in the transformed labs, indicating that these reforms are not widening the gaps between underrepresented student groups and majority groups. Our results illustrate how research-based instructional strategies can be successfully implemented in a physics department with minimal department engagement and in a sustainable manner.

Towards an Asynchronous Cinema: how can the asynchronous use of sound in artists' moving image underpin the creation of dialectic tension between the audio, the visual and the audience?

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This PhD by publication examines selected practice-based audio-visual works made by the author over a ten-year period, placing them in a critical context. Central to the publications, and the focus of the thesis, is an exploration of the role of sound in the creation of dialectic tension between the audio, the visual and the audience. By first analysing a number of texts (films/videos and key writings) the thesis locates the principal issues and debates around the use of audio in artists’ moving image practice. From this it is argued that asynchronism, first advocated in 1929 by Pudovkin as a response to the advent of synchronised sound, can be used to articulate audio-visual relationships. Central to asynchronism’s application in this paper is a recognition of the propensity for sound and image to adhere, and in visual music for there to be a literal equation of audio with the visual, often married with a quest for the synaesthetic. These elements can either be used in an illusionist fashion, or employed as part of an anti-illusionist strategy for realising dialectic. Using this as a theoretical basis, the paper examines how the publications implement asynchronism, including digital mapping to facilitate innovative reciprocal sound and image combinations, and the asynchronous use of ‘found sound’ from a range of online sources to reframe the moving image. The synthesis of publications and practice demonstrates that asynchronism can both underpin the creation of dialectic, and be an integral component in an audio-visual anti-illusionist methodology.

High-frequency acousto-electric single-photon source

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose a single optical photon source for quantum cryptography based on the acoustoelectric effect. Surface acoustic waves (SAWs) propagating through a quasi-one-dimensional channel have been shown to produce packets of electrons that reside in the SAW minima and travel at the velocity of sound. In our scheme, the electron packets are injected into a p-type region, resulting in photon emission. Since the number of electrons in each packet can be controlled down to a single electron, a stream of single- (or N-) photon states, with a creation time strongly correlated with the driving acoustic field, should be generated.

Direct measurement of instantaneous source speed for a HDR brachytherapy unit using an optical fiber based detector

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Purpose: Several attempts to determine the transit time of a high dose rate (HDR) brachytherapy unit have been reported in the literature with controversial results. The determination of the source speed is necessary to accurately calculate the transient dose in brachytherapy treatments. In these studies, only the average speed of the source was measured as a parameter for transit dose calculation, which does not account for the realistic movement of the source, and is therefore inaccurate for numerical simulations. The purpose of this work is to report the implementation and technical design of an optical fiber based detector to directly measure the instantaneous speed profile of a (192)Ir source in a Nucletron HDR brachytherapy unit. Methods: To accomplish this task, we have developed a setup that uses the Cerenkov light induced in optical fibers as a detection signal for the radiation source moving inside the HDR catheter. As the (192)Ir source travels between two optical fibers with known distance, the threshold of the induced signals are used to extract the transit time and thus the velocity. The high resolution of the detector enables the measurement of the transit time at short separation distance of the fibers, providing the instantaneous speed. Results: Accurate and high resolution speed profiles of the 192Ir radiation source traveling from the safe to the end of the catheter and between dwell positions are presented. The maximum and minimum velocities of the source were found to be 52.0 +/- 1.0 and 17.3 +/- 1:2 cm/s. The authors demonstrate that the radiation source follows a uniformly accelerated linear motion with acceleration of vertical bar a vertical bar = 113 cm/s(2). In addition, the authors compare the average speed measured using the optical fiber detector to those obtained in the literature, showing deviation up to 265%. Conclusions: To the best of the authors` knowledge, the authors directly measured for the first time the instantaneous speed profile of a radiation source in a HDR brachytherapy unit traveling from the unit safe to the end of the catheter and between interdwell distances. The method is feasible and accurate to implement on quality assurance tests and provides a unique database for efficient computational simulations of the transient dose. (C) 2010 American Association of Physicists in Medicine. [DOI: 10.1118/1.3483780]

Interictal Spike EEG Source Analysis in Hypothalamic Hamartoma Epilepsy

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective: The epilepsy associated with the hypothalamic hamartomas constitutes a syndrome with peculiar seizures, usually refractory to medical therapy, mild cognitive delay, behavioural problems and multifocal spike activity in the scalp electroencephalogram (EEG). The cortical origin of spikes has been widely assumed but not specifically demonstrated. Methods: We present results of a source analysis of interictal spikes from 4 patients (age 2–25 years) with epilepsy and hypothalamic hamartoma, using EEG scalp recordings (32 electrodes) and realistic boundary element models constructed from volumetric magnetic resonance imaging (MRIs). Multifocal spike activity was the most common finding, distributed mainly over the frontal and temporal lobes. A spike classification based on scalp topography was done and averaging within each class performed to improve the signal to noise ratio. Single moving dipole models were used, as well as the Rap-MUSIC algorithm. Results: All spikes with good signal to noise ratio were best explained by initial deep sources in the neighbourhood of the hamartoma, with late sources located in the cortex. Not a single patient could have his spike activity explained by a combination of cortical sources. Conclusions: Overall, the results demonstrate a consistent origin of spike activity in the subcortical region in the neighbourhood of the hamartoma, with late spread to cortical areas.

The spatio-temporal brain dynamics of processing and integrating sound localization cues in humans.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Interaural intensity and time differences (IID and ITD) are two binaural auditory cues for localizing sounds in space. This study investigated the spatio-temporal brain mechanisms for processing and integrating IID and ITD cues in humans. Auditory-evoked potentials were recorded, while subjects passively listened to noise bursts lateralized with IID, ITD or both cues simultaneously, as well as a more frequent centrally presented noise. In a separate psychophysical experiment, subjects actively discriminated lateralized from centrally presented stimuli. IID and ITD cues elicited different electric field topographies starting at approximately 75 ms post-stimulus onset, indicative of the engagement of distinct cortical networks. By contrast, no performance differences were observed between IID and ITD cues during the psychophysical experiment. Subjects did, however, respond significantly faster and more accurately when both cues were presented simultaneously. This performance facilitation exceeded predictions from probability summation, suggestive of interactions in neural processing of IID and ITD cues. Supra-additive neural response interactions as well as topographic modulations were indeed observed approximately 200 ms post-stimulus for the comparison of responses to the simultaneous presentation of both cues with the mean of those to separate IID and ITD cues. Source estimations revealed differential processing of IID and ITD cues initially within superior temporal cortices and also at later stages within temporo-parietal and inferior frontal cortices. Differences were principally in terms of hemispheric lateralization. The collective psychophysical and electrophysiological results support the hypothesis that IID and ITD cues are processed by distinct, but interacting, cortical networks that can in turn facilitate auditory localization.

Sound Texture Modelling

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the PhD thesis “Sound Texture Modeling” we deal with statistical modelling or textural sounds like water, wind, rain, etc. For synthesis and classification. Our initial model is based on a wavelet tree signal decomposition and the modeling of the resulting sequence by means of a parametric probabilistic model, that can be situated within the family of models trainable via expectation maximization (hidden Markov tree model ). Our model is able to capture key characteristics of the source textures (water, rain, fire, applause, crowd chatter ), and faithfully reproduces some of the sound classes. In terms of a more general taxonomy of natural events proposed by Graver, we worked on models for natural event classification and segmentation. While the event labels comprise physical interactions between materials that do not have textural propierties in their enterity, those segmentation models can help in identifying textural portions of an audio recording useful for analysis and resynthesis. Following our work on concatenative synthesis of musical instruments, we have developed a pattern-based synthesis system, that allows to sonically explore a database of units by means of their representation in a perceptual feature space. Concatenative syntyhesis with “molecules” built from sparse atomic representations also allows capture low-level correlations in perceptual audio features, while facilitating the manipulation of textural sounds based on their physical and perceptual properties. We have approached the problem of sound texture modelling for synthesis from different directions, namely a low-level signal-theoretic point of view through a wavelet transform, and a more high-level point of view driven by perceptual audio features in the concatenative synthesis setting. The developed framework provides unified approach to the high-quality resynthesis of natural texture sounds. Our research is embedded within the Metaverse 1 European project (2008-2011), where our models are contributting as low level building blocks within a semi-automated soundscape generation system.

«
1
2
3
4
5
6
7
8
...
66
67
»