899 resultados para Audio signals


Relevância:

100.00% 100.00%

Publicador:

Resumo:

On étudie l’application des algorithmes de décomposition matricielles tel que la Factorisation Matricielle Non-négative (FMN), aux représentations fréquentielles de signaux audio musicaux. Ces algorithmes, dirigés par une fonction d’erreur de reconstruction, apprennent un ensemble de fonctions de base et un ensemble de coef- ficients correspondants qui approximent le signal d’entrée. On compare l’utilisation de trois fonctions d’erreur de reconstruction quand la FMN est appliquée à des gammes monophoniques et harmonisées: moindre carré, divergence Kullback-Leibler, et une mesure de divergence dépendente de la phase, introduite récemment. Des nouvelles méthodes pour interpréter les décompositions résultantes sont présentées et sont comparées aux méthodes utilisées précédemment qui nécessitent des connaissances du domaine acoustique. Finalement, on analyse la capacité de généralisation des fonctions de bases apprises par rapport à trois paramètres musicaux: l’amplitude, la durée et le type d’instrument. Pour ce faire, on introduit deux algorithmes d’étiquetage des fonctions de bases qui performent mieux que l’approche précédente dans la majorité de nos tests, la tâche d’instrument avec audio monophonique étant la seule exception importante.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A novel approach to watermarking of audio signals using Independent Component Analysis (ICA) is proposed. It exploits the statistical independence of components obtained by practical ICA algorithms to provide a robust watermarking scheme with high information rate and low distortion. Numerical simulations have been performed on audio signals, showing good robustness of the watermark against common attacks with unnoticeable distortion, even for high information rates. An important aspect of the method is its domain independence: it can be used to hide information in other types of data, with minor technical adaptations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is well established that accent recognition can be as accurate as up to 95% when the signals are noise-free, using feature extraction techniques such as mel-frequency cepstral coefficients and binary classifiers such as discriminant analysis, support vector machine and k-nearest neighbors. In this paper, we demonstrate that the predictive performance can be reduced by as much as 15% when the signals are noisy. Specifically, in this paper we perturb the signals with different levels of white noise, and as the noise become stronger, the out-of-sample predictive performance deteriorates from 95% to 80%, although the in-sample prediction gives overly-optimistic results. ACM Computing Classification System (1998): C.3, C.5.1, H.1.2, H.2.4., G.3.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Chaque année, le piratage mondial de la musique coûte plusieurs milliards de dollars en pertes économiques, pertes d’emplois et pertes de gains des travailleurs ainsi que la perte de millions de dollars en recettes fiscales. La plupart du piratage de la musique est dû à la croissance rapide et à la facilité des technologies actuelles pour la copie, le partage, la manipulation et la distribution de données musicales [Domingo, 2015], [Siwek, 2007]. Le tatouage des signaux sonores a été proposé pour protéger les droit des auteurs et pour permettre la localisation des instants où le signal sonore a été falsifié. Dans cette thèse, nous proposons d’utiliser la représentation parcimonieuse bio-inspirée par graphe de décharges (spikegramme), pour concevoir une nouvelle méthode permettant la localisation de la falsification dans les signaux sonores. Aussi, une nouvelle méthode de protection du droit d’auteur. Finalement, une nouvelle attaque perceptuelle, en utilisant le spikegramme, pour attaquer des systèmes de tatouage sonore. Nous proposons tout d’abord une technique de localisation des falsifications (‘tampering’) des signaux sonores. Pour cela nous combinons une méthode à spectre étendu modifié (‘modified spread spectrum’, MSS) avec une représentation parcimonieuse. Nous utilisons une technique de poursuite perceptive adaptée (perceptual marching pursuit, PMP [Hossein Najaf-Zadeh, 2008]) pour générer une représentation parcimonieuse (spikegramme) du signal sonore d’entrée qui est invariante au décalage temporel [E. C. Smith, 2006] et qui prend en compte les phénomènes de masquage tels qu’ils sont observés en audition. Un code d’authentification est inséré à l’intérieur des coefficients de la représentation en spikegramme. Puis ceux-ci sont combinés aux seuils de masquage. Le signal tatoué est resynthétisé à partir des coefficients modifiés, et le signal ainsi obtenu est transmis au décodeur. Au décodeur, pour identifier un segment falsifié du signal sonore, les codes d’authentification de tous les segments intacts sont analysés. Si les codes ne peuvent être détectés correctement, on sait qu’alors le segment aura été falsifié. Nous proposons de tatouer selon le principe à spectre étendu (appelé MSS) afin d’obtenir une grande capacité en nombre de bits de tatouage introduits. Dans les situations où il y a désynchronisation entre le codeur et le décodeur, notre méthode permet quand même de détecter des pièces falsifiées. Par rapport à l’état de l’art, notre approche a le taux d’erreur le plus bas pour ce qui est de détecter les pièces falsifiées. Nous avons utilisé le test de l’opinion moyenne (‘MOS’) pour mesurer la qualité des systèmes tatoués. Nous évaluons la méthode de tatouage semi-fragile par le taux d’erreur (nombre de bits erronés divisé par tous les bits soumis) suite à plusieurs attaques. Les résultats confirment la supériorité de notre approche pour la localisation des pièces falsifiées dans les signaux sonores tout en préservant la qualité des signaux. Ensuite nous proposons une nouvelle technique pour la protection des signaux sonores. Cette technique est basée sur la représentation par spikegrammes des signaux sonores et utilise deux dictionnaires (TDA pour Two-Dictionary Approach). Le spikegramme est utilisé pour coder le signal hôte en utilisant un dictionnaire de filtres gammatones. Pour le tatouage, nous utilisons deux dictionnaires différents qui sont sélectionnés en fonction du bit d’entrée à tatouer et du contenu du signal. Notre approche trouve les gammatones appropriés (appelés noyaux de tatouage) sur la base de la valeur du bit à tatouer, et incorpore les bits de tatouage dans la phase des gammatones du tatouage. De plus, il est montré que la TDA est libre d’erreur dans le cas d’aucune situation d’attaque. Il est démontré que la décorrélation des noyaux de tatouage permet la conception d’une méthode de tatouage sonore très robuste. Les expériences ont montré la meilleure robustesse pour la méthode proposée lorsque le signal tatoué est corrompu par une compression MP3 à 32 kbits par seconde avec une charge utile de 56.5 bps par rapport à plusieurs techniques récentes. De plus nous avons étudié la robustesse du tatouage lorsque les nouveaux codec USAC (Unified Audion and Speech Coding) à 24kbps sont utilisés. La charge utile est alors comprise entre 5 et 15 bps. Finalement, nous utilisons les spikegrammes pour proposer trois nouvelles méthodes d’attaques. Nous les comparons aux méthodes récentes d’attaques telles que 32 kbps MP3 et 24 kbps USAC. Ces attaques comprennent l’attaque par PMP, l’attaque par bruit inaudible et l’attaque de remplacement parcimonieuse. Dans le cas de l’attaque par PMP, le signal de tatouage est représenté et resynthétisé avec un spikegramme. Dans le cas de l’attaque par bruit inaudible, celui-ci est généré et ajouté aux coefficients du spikegramme. Dans le cas de l’attaque de remplacement parcimonieuse, dans chaque segment du signal, les caractéristiques spectro-temporelles du signal (les décharges temporelles ;‘time spikes’) se trouvent en utilisant le spikegramme et les spikes temporelles et similaires sont remplacés par une autre. Pour comparer l’efficacité des attaques proposées, nous les comparons au décodeur du tatouage à spectre étendu. Il est démontré que l’attaque par remplacement parcimonieux réduit la corrélation normalisée du décodeur de spectre étendu avec un plus grand facteur par rapport à la situation où le décodeur de spectre étendu est attaqué par la transformation MP3 (32 kbps) et 24 kbps USAC.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Audio coding is used to compress digital audio signals, thereby reducing the amount of bits needed to transmit or to store an audio signal. This is useful when network bandwidth or storage capacity is very limited. Audio compression algorithms are based on an encoding and decoding process. In the encoding step, the uncompressed audio signal is transformed into a coded representation, thereby compressing the audio signal. Thereafter, the coded audio signal eventually needs to be restored (e.g. for playing back) through decoding of the coded audio signal. The decoder receives the bitstream and reconverts it into an uncompressed signal. ISO-MPEG is a standard for high-quality, low bit-rate video and audio coding. The audio part of the standard is composed by algorithms for high-quality low-bit-rate audio coding, i.e. algorithms that reduce the original bit-rate, while guaranteeing high quality of the audio signal. The audio coding algorithms consists of MPEG-1 (with three different layers), MPEG-2, MPEG-2 AAC, and MPEG-4. This work presents a study of the MPEG-4 AAC audio coding algorithm. Besides, it presents the implementation of the AAC algorithm on different platforms, and comparisons among implementations. The implementations are in C language, in Assembly of Intel Pentium, in C-language using DSP processor, and in HDL. Since each implementation has its own application niche, each one is valid as a final solution. Moreover, another purpose of this work is the comparison among these implementations, considering estimated costs, execution time, and advantages and disadvantages of each one.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Este proyecto pretende mostrar los desfases existentes entre señales de audio obtenidas de la misma fuente en distintos puntos distanciados entre sí. Para ello nos basamos en el análisis de la correlación de las señales de audio multi-microfónicas, para determinar los retrasos entre dichas señales. Durante las de tres partes diferentes que conforman este proyecto, explicaremos el dónde, cómo y por qué se produce este efecto en este tipo de señales. En la primera se presentan algunos de los conceptos teóricos necesarios para entender el desarrollo posterior, tales como la coherencia y correlación entre señales, los retardos de fase y la importancia del micro-tiempo. Además se explican diversas técnicas microfónicas que se utilizarán en la tercera parte. A lo largo de la segunda, se presenta el software desarrollado para determinar y corregir el retraso entre las señales que se deseen analizar. Para ello se ha escogido la herramienta de programación Matlab, ya que ha sido la más utilizada en la mayoría de las asignaturas que componen la titulación y por ello se posee el suficiente dominio de la misma. Además de presentar el propio software, al final de esta parte hay un manual de usuario del mismo, en el que se explica el manejo para posibles usos futuros por parte de otras personas interesadas. En la última parte se demuestra en varios casos reales, el estudio de la alineación de tomas multi-microfónicas en las cuales se produce en efecto que se intenta detectar y corregir. Aquí se realizan tres estudios de dicho fenómeno. En el primero se emplean señales digitales internas, concretamente ruido blanco, retrasando algunas muestras dichas señales unas de otras, para luego analizarlas con el software desarrollado y comprobar la eficacia del mismo. En el segundo se analizan la señales de audio obtenidas en el estudio de grabación de varios grupos de música moderna, mostrando los resultados del empleo del software en algunas de ellas, tales como las tomas de batería, bajo y guitarra. En el tercero se analizan las señales de audio obtenidas fuera del estudio de grabación, en donde no se dispone de las supuestas condiciones ideales que se tienen en el entorno que rodea a un estudio de grabación (acústicamente hablando). Se utilizan algunas de las técnicas microfónicas explicadas en el último apartado de la parte dedicada a los conceptos teóricos, para la grabación de una orquesta sinfónica, para luego analizar el efecto buscado mediante nuestro software, presentando los resultados obtenidos. De igual manera se realiza en el estudio con una agrupación coral de cuatro voces dentro de una Iglesia. ABSTRACT This project aims to show delays between audio signals obtained from the same source at diferent points spaced apart. To do this we rely on the analysis of the correlation of multi-microphonic audio signals, to determine the delay between these signals. During three diferent parts that make up this project, we will explain where, how and why this effect occurs in this type of signals. At the first part we present some of the theoretical concepts necessary to understand the subsequent development, such as coherence and correlation between signals, phase delays and the importance of micro-time. Also explains several microphone techniques to be used in the third part. During the second, it presents the software developed to determine and correct the delay between the signals that are desired to analyze. For this we have chosen the programming software Matlab , as it has been the most used in the majority of the subjects in the degree and therefore has suficient command of it. Besides presenting the software at the end of this part there is a user manual of it , which explains the handling for future use by other interested people. The last part is shown in several real cases, the study of aligning multi- microphonic sockets in which it is produced in effect trying to detect and correct. This includes three studies of this phenomenon. In the first internal digital signals are used, basically white noise, delaying some samples the signals from each other, then with software developed analyzing and verifying its efectiveness. In the second analyzes the audio signals obtained in the recording studio several contemporary bands, showing the results of using the software in some of them, such as the taking of drums, bass and guitar. In the third analyzes audio signals obtained outside the recording studio, where there are no ideal conditions alleged to have on the environment surrounding a recording studio (acoustically speaking). We use some of the microphone techniques explained in the last paragraph of the section on theoretical concepts, for the recording of a symphony orchestra, and then analyze the effect sought by our software, presenting the results. Similarly, in the study performed with a four-voice choir in a church.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This study examines the correlation between how certified music educators understand audio technology and how they incorporate it in their instructional methods. Participants were classroom music teachers selected from fifty middle schools in Miami- Dade Public Schools. The study adopted a non-experimental research design in which a survey was the primary tool of investigation. The findings reveal that a majority of middle school music teachers in Miami-Dade are not familiar with advanced audiorecording software or any other digital device dedicated to the recording and processing of audio signals. Moreover, they report a lack of opportunities to develop this knowledge. Younger music teachers, however, are more open to developing up-to-date instructional methodologies. Most of the participants agreed that music instruction should be a platform for preparing students for a future in the entertainment industry. A basic knowledge of music business should be delivered to students enrolled in middle-school music courses.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A classificação automática de sons urbanos é importante para o monitoramento ambiental. Este trabalho apresenta uma nova metodologia para classificar sons urbanos, que se baseia na descoberta de padrões frequentes (motifs) nos sinais sonoros e utiliza-los como atributos para a classificação. Para extrair os motifs é utilizado um método de descoberta multi-resolução baseada em SAX. Para a classificação são usadas árvores de decisão e SVMs. Esta nova metodologia é comparada com outra bastante utilizada baseada em MFCC. Para a realização de experiências foi utilizado o dataset UrbanSound disponível publicamente. Realizadas as experiências, foi possível concluir que os atributos motif são melhores que os MFCC a discriminar sons com timbres semelhantes e que os melhores resultados são conseguidos com ambos os tipos de atributos combinados. Neste trabalho foi também desenvolvida uma aplicação móvel para Android que permite utilizar os métodos de classificação desenvolvidos num contexto de vida real e expandir o dataset.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Non-audio signals have been recorded in the flash ROM memory of a portable MP3 player, in WAV format file, to examine the possibility of using these cheap and small instruments as general-purpose portable data loggers. A 1200-Hz FM carrier modulated by the non-audio signal has replaced the microphone signal, while using the REC operating mode of the MP3 player, which triggers the voice recording function. The signal recovery was carried out by a PLL-based FM demodulator whose input is the FM signal captured in the coil leads of the MP3 player's earphone. Sinusoidal and electrocardiogram signals have been used in the system evaluation. Although the quality of low frequency signals needs improvement, overall the results indicate the viability of the proposal. Suggestions are made for improvements and extensions of the work.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper discusses two pitch detection algorithms (PDA) for simple audio signals which are based on zero-cross rate (ZCR) and autocorrelation function (ACF). As it is well known, pitch detection methods based on ZCR and ACF are widely used in signal processing. This work shows some features and problems in using these methods, as well as some improvements developed to increase their performance. © 2008 IEEE.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

For enhanced immersion into a virtual scene more than just the visual sense should be addressed by a Virtual Reality system. Additional auditory stimulation appears to have much potential, as it realizes a multisensory system. This is especially useful when the user does not have to wear any additional hardware, e.g., headphones. Creating a virtual sound scene with spatially distributed sources requires a technique for adding spatial cues to audio signals and an appropriate reproduction. In this paper we present a real-time audio rendering system that combines dynamic crosstalk cancellation and multi-track binaural synthesis for virtual acoustical imaging. This provides the possibility of simulating spatially distributed sources and, in addition to that, near-to-head sources for a freely moving listener in room-mounted virtual environments without using any headphones. A special focus will be put on near-to-head acoustics, and requirements in respect of the head-related transfer function databases are discussed.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

La música puede afectar al individuo en todos sus niveles –físico, mental y espiritual–. El presente artículo se centra en el papel que ésta desempeña en el desarrollo de la vida espiritual y trascendental. Para ello, realizaremos un repaso histórico de su evolución estética y social, abordaremos dicho fenómeno a nivel fisiológico y presentaremos sus aplicaciones clínicas y sociales. Seguidamente y a modo de ejemplo de las concepciones de pensamiento occidental y oriental, trataremos la forma en que el cristianismo y el budismo conciben la música dentro de su doctrina. Finalizaremos con algunas reflexiones sobre el tema.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A simple but efficient voice activity detector based on the Hilbert transform and a dynamic threshold is presented to be used on the pre-processing of audio signals -- The algorithm to define the dynamic threshold is a modification of a convex combination found in literature -- This scheme allows the detection of prosodic and silence segments on a speech in presence of non-ideal conditions like a spectral overlapped noise -- The present work shows preliminary results over a database built with some political speech -- The tests were performed adding artificial noise to natural noises over the audio signals, and some algorithms are compared -- Results will be extrapolated to the field of adaptive filtering on monophonic signals and the analysis of speech pathologies on futures works

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose a study of the mathematical properties of voice as an audio signal -- This work includes signals in which the channel conditions are not ideal for emotion recognition -- Multiresolution analysis- discrete wavelet transform – was performed through the use of Daubechies Wavelet Family (Db1-Haar, Db6, Db8, Db10) allowing the decomposition of the initial audio signal into sets of coefficients on which a set of features was extracted and analyzed statistically in order to differentiate emotional states -- ANNs proved to be a system that allows an appropriate classification of such states -- This study shows that the extracted features using wavelet decomposition are enough to analyze and extract emotional content in audio signals presenting a high accuracy rate in classification of emotional states without the need to use other kinds of classical frequency-time features -- Accordingly, this paper seeks to characterize mathematically the six basic emotions in humans: boredom, disgust, happiness, anxiety, anger and sadness, also included the neutrality, for a total of seven states to identify

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pitch Estimation, also known as Fundamental Frequency (F0) estimation, has been a popular research topic for many years, and is still investigated nowadays. The goal of Pitch Estimation is to find the pitch or fundamental frequency of a digital recording of a speech or musical notes. It plays an important role, because it is the key to identify which notes are being played and at what time. Pitch Estimation of real instruments is a very hard task to address. Each instrument has its own physical characteristics, which reflects in different spectral characteristics. Furthermore, the recording conditions can vary from studio to studio and background noises must be considered. This dissertation presents a novel approach to the problem of Pitch Estimation, using Cartesian Genetic Programming (CGP).We take advantage of evolutionary algorithms, in particular CGP, to explore and evolve complex mathematical functions that act as classifiers. These classifiers are used to identify piano notes pitches in an audio signal. To help us with the codification of the problem, we built a highly flexible CGP Toolbox, generic enough to encode different kind of programs. The encoded evolutionary algorithm is the one known as 1 + , and we can choose the value for . The toolbox is very simple to use. Settings such as the mutation probability, number of runs and generations are configurable. The cartesian representation of CGP can take multiple forms and it is able to encode function parameters. It is prepared to handle with different type of fitness functions: minimization of f(x) and maximization of f(x) and has a useful system of callbacks. We trained 61 classifiers corresponding to 61 piano notes. A training set of audio signals was used for each of the classifiers: half were signals with the same pitch as the classifier (true positive signals) and the other half were signals with different pitches (true negative signals). F-measure was used for the fitness function. Signals with the same pitch of the classifier that were correctly identified by the classifier, count as a true positives. Signals with the same pitch of the classifier that were not correctly identified by the classifier, count as a false negatives. Signals with different pitch of the classifier that were not identified by the classifier, count as a true negatives. Signals with different pitch of the classifier that were identified by the classifier, count as a false positives. Our first approach was to evolve classifiers for identifying artifical signals, created by mathematical functions: sine, sawtooth and square waves. Our function set is basically composed by filtering operations on vectors and by arithmetic operations with constants and vectors. All the classifiers correctly identified true positive signals and did not identify true negative signals. We then moved to real audio recordings. For testing the classifiers, we picked different audio signals from the ones used during the training phase. For a first approach, the obtained results were very promising, but could be improved. We have made slight changes to our approach and the number of false positives reduced 33%, compared to the first approach. We then applied the evolved classifiers to polyphonic audio signals, and the results indicate that our approach is a good starting point for addressing the problem of Pitch Estimation.