980 resultados para auditory scene analysis


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Multi-camera 3D tracking systems with overlapping cameras represent a powerful mean for scene analysis, as they potentially allow greater robustness than monocular systems and provide useful 3D information about object location and movement. However, their performance relies on accurately calibrated camera networks, which is not a realistic assumption in real surveillance environments. Here, we introduce a multi-camera system for tracking the 3D position of a varying number of objects and simultaneously refin-ing the calibration of the network of overlapping cameras. Therefore, we introduce a Bayesian framework that combines Particle Filtering for tracking with recursive Bayesian estimation methods by means of adapted transdimensional MCMC sampling. Addi-tionally, the system has been designed to work on simple motion detection masks, making it suitable for camera networks with low transmission capabilities. Tests show that our approach allows a successful performance even when starting from clearly inaccurate camera calibrations, which would ruin conventional approaches.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This dissertation studies the coding strategies of computational imaging to overcome the limitation of conventional sensing techniques. The information capacity of conventional sensing is limited by the physical properties of optics, such as aperture size, detector pixels, quantum efficiency, and sampling rate. These parameters determine the spatial, depth, spectral, temporal, and polarization sensitivity of each imager. To increase sensitivity in any dimension can significantly compromise the others.

This research implements various coding strategies subject to optical multidimensional imaging and acoustic sensing in order to extend their sensing abilities. The proposed coding strategies combine hardware modification and signal processing to exploiting bandwidth and sensitivity from conventional sensors. We discuss the hardware architecture, compression strategies, sensing process modeling, and reconstruction algorithm of each sensing system.

Optical multidimensional imaging measures three or more dimensional information of the optical signal. Traditional multidimensional imagers acquire extra dimensional information at the cost of degrading temporal or spatial resolution. Compressive multidimensional imaging multiplexes the transverse spatial, spectral, temporal, and polarization information on a two-dimensional (2D) detector. The corresponding spectral, temporal and polarization coding strategies adapt optics, electronic devices, and designed modulation techniques for multiplex measurement. This computational imaging technique provides multispectral, temporal super-resolution, and polarization imaging abilities with minimal loss in spatial resolution and noise level while maintaining or gaining higher temporal resolution. The experimental results prove that the appropriate coding strategies may improve hundreds times more sensing capacity.

Human auditory system has the astonishing ability in localizing, tracking, and filtering the selected sound sources or information from a noisy environment. Using engineering efforts to accomplish the same task usually requires multiple detectors, advanced computational algorithms, or artificial intelligence systems. Compressive acoustic sensing incorporates acoustic metamaterials in compressive sensing theory to emulate the abilities of sound localization and selective attention. This research investigates and optimizes the sensing capacity and the spatial sensitivity of the acoustic sensor. The well-modeled acoustic sensor allows localizing multiple speakers in both stationary and dynamic auditory scene; and distinguishing mixed conversations from independent sources with high audio recognition rate.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Bayesian nonparametric models, such as the Gaussian process and the Dirichlet process, have been extensively applied for target kinematics modeling in various applications including environmental monitoring, traffic planning, endangered species tracking, dynamic scene analysis, autonomous robot navigation, and human motion modeling. As shown by these successful applications, Bayesian nonparametric models are able to adjust their complexities adaptively from data as necessary, and are resistant to overfitting or underfitting. However, most existing works assume that the sensor measurements used to learn the Bayesian nonparametric target kinematics models are obtained a priori or that the target kinematics can be measured by the sensor at any given time throughout the task. Little work has been done for controlling the sensor with bounded field of view to obtain measurements of mobile targets that are most informative for reducing the uncertainty of the Bayesian nonparametric models. To present the systematic sensor planning approach to leaning Bayesian nonparametric models, the Gaussian process target kinematics model is introduced at first, which is capable of describing time-invariant spatial phenomena, such as ocean currents, temperature distributions and wind velocity fields. The Dirichlet process-Gaussian process target kinematics model is subsequently discussed for modeling mixture of mobile targets, such as pedestrian motion patterns.

Novel information theoretic functions are developed for these introduced Bayesian nonparametric target kinematics models to represent the expected utility of measurements as a function of sensor control inputs and random environmental variables. A Gaussian process expected Kullback Leibler divergence is developed as the expectation of the KL divergence between the current (prior) and posterior Gaussian process target kinematics models with respect to the future measurements. Then, this approach is extended to develop a new information value function that can be used to estimate target kinematics described by a Dirichlet process-Gaussian process mixture model. A theorem is proposed that shows the novel information theoretic functions are bounded. Based on this theorem, efficient estimators of the new information theoretic functions are designed, which are proved to be unbiased with the variance of the resultant approximation error decreasing linearly as the number of samples increases. Computational complexities for optimizing the novel information theoretic functions under sensor dynamics constraints are studied, and are proved to be NP-hard. A cumulative lower bound is then proposed to reduce the computational complexity to polynomial time.

Three sensor planning algorithms are developed according to the assumptions on the target kinematics and the sensor dynamics. For problems where the control space of the sensor is discrete, a greedy algorithm is proposed. The efficiency of the greedy algorithm is demonstrated by a numerical experiment with data of ocean currents obtained by moored buoys. A sweep line algorithm is developed for applications where the sensor control space is continuous and unconstrained. Synthetic simulations as well as physical experiments with ground robots and a surveillance camera are conducted to evaluate the performance of the sweep line algorithm. Moreover, a lexicographic algorithm is designed based on the cumulative lower bound of the novel information theoretic functions, for the scenario where the sensor dynamics are constrained. Numerical experiments with real data collected from indoor pedestrians by a commercial pan-tilt camera are performed to examine the lexicographic algorithm. Results from both the numerical simulations and the physical experiments show that the three sensor planning algorithms proposed in this dissertation based on the novel information theoretic functions are superior at learning the target kinematics with

little or no prior knowledge

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In public places, crowd size may be an indicator of congestion, delay, instability, or of abnormal events, such as a fight, riot or emergency. Crowd related information can also provide important business intelligence such as the distribution of people throughout spaces, throughput rates, and local densities. A major drawback of many crowd counting approaches is their reliance on large numbers of holistic features, training data requirements of hundreds or thousands of frames per camera, and that each camera must be trained separately. This makes deployment in large multi-camera environments such as shopping centres very costly and difficult. In this chapter, we present a novel scene-invariant crowd counting algorithm that uses local features to monitor crowd size. The use of local features allows the proposed algorithm to calculate local occupancy statistics, scale to conditions which are unseen in the training data, and be trained on significantly less data. Scene invariance is achieved through the use of camera calibration, allowing the system to be trained on one or more viewpoints and then deployed on any number of new cameras for testing without further training. A pre-trained system could then be used as a ‘turn-key’ solution for crowd counting across a wide range of environments, eliminating many of the costly barriers to deployment which currently exist.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Time-varying bispectra, computed using a classical sliding window short-time Fourier approach, are analyzed for scalp EEG potentials evoked by an auditory stimulus and new observations are presented. A single, short duration tone is presented from the left or the right, direction unknown to the test subject. The subject responds by moving the eyes to the direction of the sound. EEG epochs sampled at 200 Hz for repeated trials are processed between -70 ms and +1200 ms with reference to the stimulus. It is observed that for an ensemble of correctly recognized cases, the best matching timevarying bispectra at (8 Hz, 8Hz) are for PZ-FZ channels and this is also largely the case for grand averages but not for power spectra at 8 Hz. Out of 11 subjects, the only exception for time-varying bispectral match was a subject with family history of Alzheimer’s disease and the difference was in bicoherence, not biphase.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A study on Extremenian music and audiovisual production has been carried out through YouTube media. The purpose of this study is to analyze all audiovisual products and also assessing what broadcasting processes are conducted by creators to publicize their musical and audiovisual works. A quantitative and qualitative study of audiovisual materials of Extremadura bands was done to find out which processes are most suitable to increase the number of viewers for videos.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

(ABR) is of fundamental importance to the investiga- tion of the auditory system behavior, though its in- terpretation has a subjective nature because of the manual process employed in its study and the clinical experience required for its analysis. When analyzing the ABR, clinicians are often interested in the identi- fication of ABR signal components referred to as Jewett waves. In particular, the detection and study of the time when these waves occur (i.e., the wave la- tency) is a practical tool for the diagnosis of disorders affecting the auditory system. In this context, the aim of this research is to compare ABR manual/visual analysis provided by different examiners. Methods: The ABR data were collected from 10 normal-hearing subjects (5 men and 5 women, from 20 to 52 years). A total of 160 data samples were analyzed and a pair- wise comparison between four distinct examiners was executed. We carried out a statistical study aiming to identify significant differences between assessments provided by the examiners. For this, we used Linear Regression in conjunction with Bootstrap, as a me- thod for evaluating the relation between the responses given by the examiners. Results: The analysis sug- gests agreement among examiners however reveals differences between assessments of the variability of the waves. We quantified the magnitude of the ob- tained wave latency differences and 18% of the inves- tigated waves presented substantial differences (large and moderate) and of these 3.79% were considered not acceptable for the clinical practice. Conclusions: Our results characterize the variability of the manual analysis of ABR data and the necessity of establishing unified standards and protocols for the analysis of these data. These results may also contribute to the validation and development of automatic systems that are employed in the early diagnosis of hearing loss.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Full-waveform laser scanning data acquired with a Riegl LMS-Q560 instrument were used to classify an orange orchard into orange trees, grass and ground using waveform parameters alone. Gaussian decomposition was performed on this data capture from the National Airborne Field Experiment in November 2006 using a custom peak-detection procedure and a trust-region-reflective algorithm for fitting Gauss functions. Calibration was carried out using waveforms returned from a road surface, and the backscattering coefficient c was derived for every waveform peak. The processed data were then analysed according to the number of returns detected within each waveform and classified into three classes based on pulse width and c. For single-peak waveforms the scatterplot of c versus pulse width was used to distinguish between ground, grass and orange trees. In the case of multiple returns, the relationship between first (or first plus middle) and last return c values was used to separate ground from other targets. Refinement of this classification, and further sub-classification into grass and orange trees was performed using the c versus pulse width scatterplots of last returns. In all cases the separation was carried out using a decision tree with empirical relationships between the waveform parameters. Ground points were successfully separated from orange tree points. The most difficult class to separate and verify was grass, but those points in general corresponded well with the grass areas identified in the aerial photography. The overall accuracy reached 91%, using photography and relative elevation as ground truth. The overall accuracy for two classes, orange tree and combined class of grass and ground, yielded 95%. Finally, the backscattering coefficient c of single-peak waveforms was also used to derive reflectance values of the three classes. The reflectance of the orange tree class (0.31) and ground class (0.60) are consistent with published values at the wavelength of the Riegl scanner (1550 nm). The grass class reflectance (0.46) falls in between the other two classes as might be expected, as this class has a mixture of the contributions of both vegetation and ground reflectance properties.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The study proposes a method of interpretation, which gives the contemporary singer-actor and director a method of performance analysis considering both the visual and the auditory aspects of vocal performance. The analysis begins with an analysis of the structure, which maps out the key ideas and translates them into main states of mind. The text analysis is followed by an analysis of the musical treatment of the text, also scanning for extra dramatic meaning inherent in vocal line and musical accompaniment. The so isolated dramatic ideas in text and musical structure and content are finally discussed in terms of their physical aspects, using the system of gesture as practiced by actors and singers before, during and beyond the Baroque period. The method is exemplified by the obbligato recitative "In quali eccessi" from Donna Elvira’s scene in the second act of the opera "Don Giovanni" by L. Da Ponte and W. A. Mozart. The musical example includes gestural notation and a table gestural notation symbols.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We examine localised sound energy patterns, or events, that we associate with high level affect experienced with films. The study of sound energy events in conjunction with their intended affect enable the analysis of film at a higher conceptual level, such as genre. The various affect/emotional responses we investigate in this paper are brought about by well established patterns of sound energy dynamics employed in audio tracks of horror films. This allows the examination of the thematic content of the films in relation to horror elements. We analyse the frequency of sound energy and affect events at a film level as well as at a scene level, and propose measures indicative of the film genre and scene content. Using 4 horror, and 2 non-horror movies as experimental data we establish a correlation between the sound energy event types and horrific thematic content within film, thus enabling an automated mechanism for genre typing and scene content labeling in film.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A hiperbilirrubinemia é tóxica às vias auditivas e ao sistema nervoso central, deixando sequelas como surdez e encefalopatia. OBJETIVOS: avaliar a audição de neonatos portadores de hiperbilirrubinemia, utilizando-se a pesquisa das emissões otoacústicas evocadas transientes (EOAET) e dos potenciais evocados auditivos do tronco encefálico (PEATE). Estudo prospectivo. CASUÍSTICA E MÉTODOS: Constituíram-se dois grupos: GI (n-25), neonatos com hiperbilirrubinemia; GII (n-22), neonatos sem hiperbilirrubinemia e sem fatores de risco para surdez. Todos os neonatos tinham até 60 dias de vida e foram submetidos à EOAET e ao PEATE. RESULTADOS: 12 neonatos de GI e 10 de GII eram meninas e 13 de GI e 12 de GII eram meninos. As EOAET estavam presentes em todas as crianças, porém com amplitudes menores em GI, especialmente nas frequências de 2 e 3KHz (p < 0,05). No PEATE, observou-se discreto prolongamento de PV e de LI-V em GI. As alterações observadas nesses testes não se correlacionaram aos níveis séricos da bilirrubinemia. CONCLUSÕES: em neonatos portadores de hiperbilirrubinemia, menores amplitudes das EOAET e discreto prolongamento de PV e de LI-V foram constatados indicando comprometimento coclear e retrococlear das vias auditivas, salientando-se a importância da utilização e da interpretação minuciosa de ambos os testes nessas avaliações.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Introduction: The effects of lead on children’s health have been widely studied. Aim: To analyze the correlation between the long latency auditory evoked potential N2 and cognitive P3 with the level of lead poisoning in Brazilian children. Methods: This retrospective study evaluated 20 children ranging in age from 7 to 14 years at the time of audiological and electrophysiological evaluations. We performed periodic surveys of the lead concentration in the blood and basic audiological evaluations. Furthermore, we studied the auditory evoked potential long latency N2 and cognitive P3 by analyzing the absolute latency of the N2 and P3 potentials and the P3 amplitude recorded at Cz. At the time of audiological and electrophysiological evaluations, the average concentration of lead in the blood was less than 10 ug/dL. Results: In conventional audiologic evaluations, all children had hearing thresholds below 20 dBHL for the frequencies tested and normal tympanometry findings; the auditory evoked potential long latency N2 and cognitive P3 were present in 95% of children. No significant correlations were found between the blood lead concentration and latency (p = 0.821) or amplitude (p = 0.411) of the P3 potential. However, the latency of the N2 potential increased with the concentration of lead in the blood, with a significant correlation (p = 0.030). Conclusion: Among Brazilian children with low lead exposure, a significant correlation was found between blood lead levels and the average latency of the auditory evoked potential long latency N2; however, a significant correlation was not observed for the amplitude and latency of the cognitive potential P3

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Heschl's gyrus (HG) is functionally involved in the genesis of auditory verbal hallucinations (AVH). This dysfunction seems to be structurally facilitated. The aim of the study was to analyze macrostructural features of HG in a group of patients reporting AVH who demonstrated white matter diffusion tensor imaging abnormalities reported previously.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this article, we will link neuroimaging, data analysis, and intervention methods in an important psychiatric condition: auditory verbal hallucinations (AVH). The clinical and phenomenological background as well as neurophysiological findings will be covered and discussed with respect to noninvasive brain stimulation. Additionally, methods of noninvasive brain stimulation will be presented as ways to intervene with AVH. Finally, preliminary conclusions and possible future perspectives will be proposed.