939 resultados para Audio signal
Resumo:
On étudie l’application des algorithmes de décomposition matricielles tel que la Factorisation Matricielle Non-négative (FMN), aux représentations fréquentielles de signaux audio musicaux. Ces algorithmes, dirigés par une fonction d’erreur de reconstruction, apprennent un ensemble de fonctions de base et un ensemble de coef- ficients correspondants qui approximent le signal d’entrée. On compare l’utilisation de trois fonctions d’erreur de reconstruction quand la FMN est appliquée à des gammes monophoniques et harmonisées: moindre carré, divergence Kullback-Leibler, et une mesure de divergence dépendente de la phase, introduite récemment. Des nouvelles méthodes pour interpréter les décompositions résultantes sont présentées et sont comparées aux méthodes utilisées précédemment qui nécessitent des connaissances du domaine acoustique. Finalement, on analyse la capacité de généralisation des fonctions de bases apprises par rapport à trois paramètres musicaux: l’amplitude, la durée et le type d’instrument. Pour ce faire, on introduit deux algorithmes d’étiquetage des fonctions de bases qui performent mieux que l’approche précédente dans la majorité de nos tests, la tâche d’instrument avec audio monophonique étant la seule exception importante.
Resumo:
This paper addresses the problem of processing biological data, such as cardiac beats in the audio and ultrasonic range, and on calculating wavelet coefficients in real time, with the processor clock running at a frequency of present application-specified integrated circuits and field programmable gate array. The parallel filter architecture for discrete wavelet transform (DWT) has been improved, calculating the wavelet coefficients in real time with hardware reduced up to 60%. The new architecture, which also processes inverse DWT, is implemented with the Radix-2 or the Booth-Wallace constant multipliers. One integrated circuit signal analyzer in the ultrasonic range, including series memory register banks, is presented. © 2007 IEEE.
Resumo:
Purpose: To analyze the components of the acoustic signal of swallowing using a specific software. Methods: Fourteen healthy subjects ranging in age from 20 to 50 years (mean age 31±10 years), were evaluated. Data collection consisted on the simultaneous capture of the swallowing audio with a microphone and of the swallowing videofluoroscopic image. The bursts of the swallowing acoustic signal were identified and their duration and the interval between them were later analyzed using a specific software, which allowed the simultaneous analyses between the acoustic wave and the videofluoroscopic image. Results: Three burst components were identified in most of the swallows evaluated. The first burst presented mean time of 87.3 milliseconds (ms) for water and 78.2 for the substance. The second burst presented mean time of 112.9 ms for water and 85.5 for the pasty substance. The mean interval between first and second burst was 82.1 ms for water and 95.3 ms for the pasty consistency, and between second and third burst was 339.8 ms for water and 322.0 ms for the pasty consistency. Conclusion: The software allowed the visualization of three bursts during the swallowing of healthy individuals, and showed that the swallowing signal in normal subjects is highly variable.
Resumo:
INTRODUCTION The Rondo is a single-unit cochlear implant (CI) audio processor comprising the identical components as its behind-the-ear predecessor, the Opus 2. An interchange of the Opus 2 with the Rondo leads to a shift of the microphone position toward the back of the head. This study aimed to investigate the influence of the Rondo wearing position on speech intelligibility in noise. METHODS Speech intelligibility in noise was measured in 4 spatial configurations with 12 experienced CI users using the German adaptive Oldenburg sentence test. A physical model and a numerical model were used to enable a comparison of the observations. RESULTS No statistically significant differences of the speech intelligibility were found in the situations in which the signal came from the front and the noise came from the frontal, ipsilateral, or contralateral side. The signal-to-noise ratio (SNR) was significantly better with the Opus 2 in the case with the noise presented from the back (4.4 dB, p < 0.001). The differences in the SNR were significantly worse with the Rondo processors placed further behind the ear than closer to the ear. CONCLUSION The study indicates that CI users with the receiver/stimulator implanted in positions further behind the ear are expected to have higher difficulties in noisy situations when wearing the single-unit audio processor.
Resumo:
Este proyecto consiste en el diseño e implementación de un procesador digital de efectos de audio en tiempo real orientado a instrumentos eléctricos tales como guitarras, bajos, teclados, etc. El procesador está basado en la tarjeta Raspberry Pi B+, ordenador de placa reducida de bajo coste, desarrollado en Reino unido y cuyo lanzamiento tuvo lugar en el año 2012. En primer lugar, ha sido necesario lograr que la tarjeta asuma la funcionalidad de un procesador de audio en tiempo real. Para ello se ha instalado un sistema operativo Linux orientado a Raspberry (Raspbian) y se ha hecho uso de Pure Data (Pd): lenguaje de programación gráfico que fue desarrollado en los años 90 por Miller Puckette con intención de ser enfocado a la creación de eventos multimedia y de música por computador. El papel que desempeña Pd es de capa intermedia entre el hardware y el software ya que se encarga de tomar bloques de N muestras del convertidor analógico/digital y encaminarlas a través del flujo de señal diseñado gráficamente. En segundo lugar, se han implementado diferentes efectos de audio de distintas características. Así pues, se encuentran efectos basados en retardos, filtros digitales y procesadores de dinámica. Concretamente, los efectos implementados son los siguientes: delay, flanger, vibrato, reverberador de Schroeder, filtros (paso bajo, paso alto y paso banda), ecualizador paramétrico y compresor y expansor de dinámica. Estos efectos han sido implementados en lenguaje C de acuerdo con la API de Pd. Con esto se ha conseguido obtener un objeto por cada efecto, el cual es “instanciado” en Pd pudiendo ejecutarlo en tiempo real. En este proyecto se expone la problemática que supone cada paso del diseño proponiendo soluciones válidas. Además se incluye una guía paso a paso para configurar la tarjeta y lograr realizar un bypass de señal y un efecto simple partiendo desde cero. ABSTRACT. This project involves the design and implementation of a digital real-time audio processor for electrical instruments (guitars, basses, keyboards, etc.). The processor is based on the Raspberry Pi B + card: low cost computer, developed in UK in 2012. First, it was necessary to make the cards assume the functionality of a real time audio processor. A Linux operating system called Raspberry (Raspbian) was installed. In this Project is used Pure Data (Pd): a graphical programming language developed in the 90s by Miller Puckette intending to be focused on creating multimedia and computer music events. The role of Pd is an intermediate layer between the hardware and the software. It is responsible for taking blocks of N samples of the analog/digital converter and route it through the signal flow. Secondly, it is necessary to implemented the different audio effects. There are delays based effects, digital filter and dynamics effects. Specifically, the implemented effects are: delay, flanger, vibrato, Schroeder reverb, filters (lowpass, highpass and bandpass), parametric equalizer and compressor and expander dynamics. These effects have been implemented in C language according to the Pd API. As a result, it has been obtained an object for each effect, which is instantiated in Pd. In this Project, the problems of every step are exposed with his corresponding solution. It is inlcuded a step-by-step guide to configure the card and achieve perform a bypass signal process and a simple effect.
Resumo:
We propose an original method to geoposition an audio/video stream with multiple emitters that are at the same time receivers of the mixed signal. The achieved method is suitable for those comes where a list of positions within a designated area is encoded with a degree of precision adjusted to the visualization capabilities; and is also easily extensible to support new requirements. This method extends a previously proposed protocol, without incurring in any performance penalty.
Resumo:
In this paper, we propose an original method to geoposition an audio/video stream with multiple emitters that are at the same time receivers of the mixed signal. The obtained method is suitable when a list of positions within a known area is encoded with precision tailored to the visualization capabilities of the target device. Nevertheless, it is easily adaptable to new precision requirements, as well as parameterized data precision. This method extends a previously proposed protocol, without incurring in any performance penalty.
Resumo:
In this report we summarize the state-of-the-art of speech emotion recognition from the signal processing point of view. On the bases of multi-corporal experiments with machine-learning classifiers, the observation is made that existing approaches for supervised machine learning lead to database dependent classifiers which can not be applied for multi-language speech emotion recognition without additional training because they discriminate the emotion classes following the used training language. As there are experimental results showing that Humans can perform language independent categorisation, we made a parallel between machine recognition and the cognitive process and tried to discover the sources of these divergent results. The analysis suggests that the main difference is that the speech perception allows extraction of language independent features although language dependent features are incorporated in all levels of the speech signal and play as a strong discriminative function in human perception. Based on several results in related domains, we have suggested that in addition, the cognitive process of emotion-recognition is based on categorisation, assisted by some hierarchical structure of the emotional categories, existing in the cognitive space of all humans. We propose a strategy for developing language independent machine emotion recognition, related to the identification of language independent speech features and the use of additional information from visual (expression) features.
Resumo:
Situational awareness is achieved naturally by the human senses of sight and hearing in combination. Automatic scene understanding aims at replicating this human ability using microphones and cameras in cooperation. In this paper, audio and video signals are fused and integrated at different levels of semantic abstractions. We detect and track a speaker who is relatively unconstrained, i.e., free to move indoors within an area larger than the comparable reported work, which is usually limited to round table meetings. The system is relatively simple: consisting of just 4 microphone pairs and a single camera. Results show that the overall multimodal tracker is more reliable than single modality systems, tolerating large occlusions and cross-talk. System evaluation is performed on both single and multi-modality tracking. The performance improvement given by the audio–video integration and fusion is quantified in terms of tracking precision and accuracy as well as speaker diarisation error rate and precision–recall (recognition). Improvements vs. the closest works are evaluated: 56% sound source localisation computational cost over an audio only system, 8% speaker diarisation error rate over an audio only speaker recognition unit and 36% on the precision–recall metric over an audio–video dominant speaker recognition method.
Resumo:
There are many different designs for audio amplifiers. Class-D, or switching, amplifiers generate their output signal in the form of a high-frequency square wave of variable duty cycle (ratio of on time to off time). The square-wave nature of the output allows a particularly efficient output stage, with minimal losses. The output is ultimately filtered to remove components of the spectrum above the audio range. Mathematical models are derived here for a variety of related class-D amplifier designs that use negative feedback. These models use an asymptotic expansion in powers of a small parameter related to the ratio of typical audio frequencies to the switching frequency to develop a power series for the output component in the audio spectrum. These models confirm that there is a form of distortion intrinsic to such amplifier designs. The models also explain why two approaches used commercially succeed in largely eliminating this distortion; a new means of overcoming the intrinsic distortion is revealed by the analysis. Copyright (2006) Society for Industrial and Applied Mathematics