939 resultados para Audio signal


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Biomuse Trio (R. Benjamin Knapp, Eric Lyon, Gascia Ouzounian) was formed in 2008 to perform computer chamber music integrating performance, laptop processing of sound and the transduction of bio-signals for the control of musical gesture. The work of the ensemble encompasses hardware design, audio signal processing, bio-signal processing, composition, improvisation and gesture choreography. The Biomuse Trio has performed and lectured across North America and Europe, including at BEAM Festival, CHI, Diapason Gallery, Green Man Festival, Issue Project Room, NIME, Science Gallery Dublin, STEIM and TheatreLab NYC.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Estudar os mecanismos subjacentes à produção de fala é uma tarefa complexa e exigente, requerendo a obtenção de dados mediante a utilização de variadas técnicas, onde se incluem algumas modalidades imagiológicas. De entre estas, a Ressonância Magnética (RM) tem ganho algum destaque, nos últimos anos, posicionando-se como uma das mais promissoras no domínio da produção de fala. Um importante contributo deste trabalho prende-se com a otimização e implementação de protocolos (RM) e proposta de estratégias de processamento de imagem ajustados aos requisitos da produção de fala, em geral, e às especificidades dos diferentes sons. Para além disso, motivados pela escassez de dados para o Português Europeu (PE), constitui-se como objetivo a obtenção de dados articulatórios que permitam complementar informação já existente e clarificar algumas questões relativas à produção dos sons do PE (nomeadamente, consoantes laterais e vogais nasais). Assim, para as consoantes laterais foram obtidas imagens RM (2D e 3D), através de produções sustidas, com recurso a uma sequência Eco de Gradiente (EG) rápida (3D VIBE), no plano sagital, englobando todo o trato vocal. O corpus, adquirido por sete falantes, contemplou diferentes posições silábicas e contextos vocálicos. Para as vogais nasais, foram adquiridas, em três falantes, imagens em tempo real com uma sequência EG - Spoiled (TurboFLASH), nos planos sagital e coronal, obtendo-se uma resolução temporal de 72 ms (14 frames/s). Foi efetuada aquisição sincronizada das imagens com o sinal acústico mediante utilização de um microfone ótico. Para o processamento e análise de imagem foram utilizados vários algoritmos semiautomáticos. O tratamento e análise dos dados permitiu efetuar uma descrição articulatória das consoantes laterais, ancorada em dados qualitativos (e.g., visualizações 3D, comparação de contornos) e quantitativos que incluem áreas, funções de área do trato vocal, extensão e área das passagens laterais, avaliação de efeitos contextuais e posicionais, etc. No que respeita à velarização da lateral alveolar /l/, os resultados apontam para um /l/ velarizado independentemente da sua posição silábica. Relativamente ao /L/, em relação ao qual a informação disponível era escassa, foi possível verificar que a sua articulação é bastante mais anteriorizada do que tradicionalmente descrito e também mais extensa do que a da lateral alveolar. A resolução temporal de 72 ms conseguida com as aquisições de RM em tempo real, revelou-se adequada para o estudo das características dinâmicas das vogais nasais, nomeadamente, aspetos como a duração do gesto velar, gesto oral, coordenação entre gestos, etc. complementando e corroborando resultados, já existentes para o PE, obtidos com recurso a outras técnicas instrumentais. Para além disso, foram obtidos novos dados de produção relevantes para melhor compreensão da nasalidade (variação área nasal/oral no tempo, proporção nasal/oral). Neste estudo, fica patente a versatilidade e potencial da RM para o estudo da produção de fala, com contributos claros e importantes para um melhor conhecimento da articulação do Português, para a evolução de modelos de síntese de voz, de base articulatória, e para aplicação futura em áreas mais clínicas (e.g., perturbações da fala).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciências, 2015

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Les avancés dans le domaine de l’intelligence artificielle, permettent à des systèmes informatiques de résoudre des tâches de plus en plus complexes liées par exemple à la vision, à la compréhension de signaux sonores ou au traitement de la langue. Parmi les modèles existants, on retrouve les Réseaux de Neurones Artificiels (RNA), dont la popularité a fait un grand bond en avant avec la découverte de Hinton et al. [22], soit l’utilisation de Machines de Boltzmann Restreintes (RBM) pour un pré-entraînement non-supervisé couche après couche, facilitant grandement l’entraînement supervisé du réseau à plusieurs couches cachées (DBN), entraînement qui s’avérait jusqu’alors très difficile à réussir. Depuis cette découverte, des chercheurs ont étudié l’efficacité de nouvelles stratégies de pré-entraînement, telles que l’empilement d’auto-encodeurs traditionnels(SAE) [5, 38], et l’empilement d’auto-encodeur débruiteur (SDAE) [44]. C’est dans ce contexte qu’a débuté la présente étude. Après un bref passage en revue des notions de base du domaine de l’apprentissage machine et des méthodes de pré-entraînement employées jusqu’à présent avec les modules RBM, AE et DAE, nous avons approfondi notre compréhension du pré-entraînement de type SDAE, exploré ses différentes propriétés et étudié des variantes de SDAE comme stratégie d’initialisation d’architecture profonde. Nous avons ainsi pu, entre autres choses, mettre en lumière l’influence du niveau de bruit, du nombre de couches et du nombre d’unités cachées sur l’erreur de généralisation du SDAE. Nous avons constaté une amélioration de la performance sur la tâche supervisée avec l’utilisation des bruits poivre et sel (PS) et gaussien (GS), bruits s’avérant mieux justifiés que celui utilisé jusqu’à présent, soit le masque à zéro (MN). De plus, nous avons démontré que la performance profitait d’une emphase imposée sur la reconstruction des données corrompues durant l’entraînement des différents DAE. Nos travaux ont aussi permis de révéler que le DAE était en mesure d’apprendre, sur des images naturelles, des filtres semblables à ceux retrouvés dans les cellules V1 du cortex visuel, soit des filtres détecteurs de bordures. Nous aurons par ailleurs pu montrer que les représentations apprises du SDAE, composées des caractéristiques ainsi extraites, s’avéraient fort utiles à l’apprentissage d’une machine à vecteurs de support (SVM) linéaire ou à noyau gaussien, améliorant grandement sa performance de généralisation. Aussi, nous aurons observé que similairement au DBN, et contrairement au SAE, le SDAE possédait une bonne capacité en tant que modèle générateur. Nous avons également ouvert la porte à de nouvelles stratégies de pré-entraînement et découvert le potentiel de l’une d’entre elles, soit l’empilement d’auto-encodeurs rebruiteurs (SRAE).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Frequency recognition is an important task in many engineering fields such as audio signal processing and telecommunications engineering, for example in applications like Dual-Tone Multi-Frequency (DTMF) detection or the recognition of the carrier frequency of a Global Positioning, System (GPS) signal. This paper will present results of investigations on several common Fourier Transform-based frequency recognition algorithms implemented in real time on a Texas Instruments (TI) TMS320C6713 Digital Signal Processor (DSP) core. In addition, suitable metrics are going to be evaluated in order to ascertain which of these selected algorithms is appropriate for audio signal processing(1).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The aim of this thesis is to investigate computerized voice assessment methods to classify between the normal and Dysarthric speech signals. In this proposed system, computerized assessment methods equipped with signal processing and artificial intelligence techniques have been introduced. The sentences used for the measurement of inter-stress intervals (ISI) were read by each subject. These sentences were computed for comparisons between normal and impaired voice. Band pass filter has been used for the preprocessing of speech samples. Speech segmentation is performed using signal energy and spectral centroid to separate voiced and unvoiced areas in speech signal. Acoustic features are extracted from the LPC model and speech segments from each audio signal to find the anomalies. The speech features which have been assessed for classification are Energy Entropy, Zero crossing rate (ZCR), Spectral-Centroid, Mean Fundamental-Frequency (Meanf0), Jitter (RAP), Jitter (PPQ), and Shimmer (APQ). Naïve Bayes (NB) has been used for speech classification. For speech test-1 and test-2, 72% and 80% accuracies of classification between healthy and impaired speech samples have been achieved respectively using the NB. For speech test-3, 64% correct classification is achieved using the NB. The results direct the possibility of speech impairment classification in PD patients based on the clinical rating scale.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Non-audio signals have been recorded in the flash ROM memory of a portable MP3 player, in WAV format file, to examine the possibility of using these cheap and small instruments as general-purpose portable data loggers. A 1200-Hz FM carrier modulated by the non-audio signal has replaced the microphone signal, while using the REC operating mode of the MP3 player, which triggers the voice recording function. The signal recovery was carried out by a PLL-based FM demodulator whose input is the FM signal captured in the coil leads of the MP3 player's earphone. Sinusoidal and electrocardiogram signals have been used in the system evaluation. Although the quality of low frequency signals needs improvement, overall the results indicate the viability of the proposal. Suggestions are made for improvements and extensions of the work.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The effect of snoring on the cardiovascular system is not well-known. In this study we analyzed the Heart Rate Variability (HRV) differences between light and heavy snorers. The experiments are done on the full-whole-night polysomnography (PSG) with ECG and audio channels from patient group (heavy snorer) and control group (light snorer), which are gender- and age-paired, totally 30 subjects. A feature Snoring Density (SND) of audio signal as classification criterion and HRV features are computed. Mann-Whitney statistical test and Support Vector Machine (SVM) classification are done to see the correlation. The result of this study shows that snoring has close impact on the HRV features. This result can provide a deeper insight into the physiological understand of snoring. © 2011 CCAL.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Neste trabalho se propôs a utilização do MP3 player portátil para gravação de sinal nãoáudio em sua memória, utilizando-se seu recurso para gravação de voz. Para avaliar a proposta, retirou-se o microfone de eletreto do MP3 player e em seu lugar injetou-se um sinal de FM, com portadora de áudio de 1200 Hz modulada pelo sinal de interesse. O sinal é gravado no aparelho em arquivo no formato WAV. Sua recuperação se fez usando um demodulador de FM a PLL cuja entrada é o sinal de FM captado nos terminais da bobina do fone de ouvido do MP3 player. Nos testes, usou-se sinal senoidal e sinal de eletrocardiograma. Embora a qualidade dos sinais de baixa freqüência precise ser melhorada, em geral os resultados apontam para a viabilidade da proposta. Fazem-se sugestões para melhorias e extensões do trabalho.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we present the design and implementation of a wearable application in Prolog. The application program is a "sound spatializer." Given an audio signal and real time data from a head-mounted compass, a signal is generated for stereo headphones that will appear to come from a position in space. We describe high-level and low-level optimizations and transformations that have been applied in order to fit this application on the wearable device. The end application operates comfortably in real-time on a wearable computer, and has a memory foot print that remains constant over time enabling it to run on continuous audio streams. Comparison with a version hand-written in C shows that the C version is no more than 20-40% faster; a small price to pay for a high level description.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

La teoría de reconocimiento y clasificación de patrones y el aprendizaje automático son actualmente áreas de conocimiento en constante desarrollo y con aplicaciones prácticas en múltiples ámbitos de la industria. El propósito de este Proyecto de Fin de Grado es el estudio de las mismas así como la implementación de un sistema software que dé solución a un problema de clasificación de ruido impulsivo, concretamente mediante el desarrollo de un sistema de seguridad basado en la clasificación de eventos sonoros en tiempo real. La solución será integral, comprendiendo todas las fases del proceso, desde la captación de sonido hasta el etiquetado de los eventos registrados, pasando por el procesado digital de señal y la extracción de características. Para su desarrollo se han diferenciado dos partes fundamentales; una primera que comprende la interfaz de usuario y el procesado de la señal de audio donde se desarrollan las labores de monitorización y detección de ruido impulsivo y otra segunda centrada únicamente en la clasificación de los eventos sonoros detectados, definiendo una arquitectura de doble clasificador donde se determina si los eventos detectados son falsas alarmas o amenazas, etiquetándolos como de un tipo concreto en este segundo caso. Los resultados han sido satisfactorios, mostrando una fiabilidad global en el proceso de entorno al 90% a pesar de algunas limitaciones a la hora de construir la base de datos de archivos de audio, lo que prueba que un dispositivo de seguridad basado en el análisis de ruido ambiente podría incluirse en un sistema integral de alarma doméstico aumentando la protección del hogar. ABSTRACT. Pattern classification and machine learning are currently expertise areas under continuous development and also with extensive applications in many business sectors. The aim of this Final Degree Project is to study them as well as the implementation of software to carry on impulsive noise classification tasks, particularly through the development of a security system based on sound events classification. The solution will go over all process stages, from capturing sound to the labelling of the events recorded, without forgetting digital signal processing and feature extraction, everything in real time. In the development of the Project a distinction has been made between two main parts. The first one comprises the user’s interface and the audio signal processing module, where monitoring and impulsive noise detection tasks take place. The second one is focussed in sound events classification tasks, defining a double classifier architecture where it is determined whether detected events are false alarms or threats, labelling them from a concrete category in the latter case. The obtained results have been satisfactory, with an overall reliability of 90% despite some limitations when building the audio files database. This proves that a safety device based on the analysis of environmental noise could be included in a full alarm system increasing home protection standards.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Con el fin de analizar la caracterización del proceso de evolución de los sistemas técnicos de reproducción musical se han planteado dos vías principales de análisis: en primer lugar, el estudio de las características técnicas de cuatro sistemas de reproducción que se han considerado representativos de dicho avance (sistemas basados en disco de vinilo, casete compacto, disco compacto y formato de codificación perceptual) y, en segundo lugar, el funcionamiento de dichos sistemas dentro del concepto que se ha denominado como cadena de transmisión de información de tipo musical. A través de la evaluación del grado de evolución de cada sistema y de su posterior comparación se procederá a realizar dicha caracterización. Para ello se tendrán en cuenta factores objetivos como, por ejemplo, la calidad de la señal de audio pero también cualitativos como, por ejemplo, la portabilidad que pueda proporcionar un tipo de sistema u otro. Por otra parte, se plantea un segundo objetivo que es actualizar la definición de reproductor, si así se desprende de los análisis, con la aparición de reproductores basados en codificación perceptual. ABSTRACT. In order to analyze the characterization of the process of evolution of technical systems of music playback considered two main ways of analysis: first, the study of the technical characteristics of four playback systems that have been considered representative of the advance (based on vinyl, compact cassette, compact disc and perceptual formats coding systems) and, second, the operation of these systems within the concept that has been called chain of information transmission of music. Through the evaluation of the degree of development of each system and its subsequent comparison will proceed to make such characterization. For this objective kept in mind factors such as the quality of the audio signal but also qualitative, for example, portability that can provide a particular device type will be considered. Moreover, a second objective is to update the definition of the player, if it is apparent from the analysis, with the appearance of players based on perceptual coding arises.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose a study of the mathematical properties of voice as an audio signal -- This work includes signals in which the channel conditions are not ideal for emotion recognition -- Multiresolution analysis- discrete wavelet transform – was performed through the use of Daubechies Wavelet Family (Db1-Haar, Db6, Db8, Db10) allowing the decomposition of the initial audio signal into sets of coefficients on which a set of features was extracted and analyzed statistically in order to differentiate emotional states -- ANNs proved to be a system that allows an appropriate classification of such states -- This study shows that the extracted features using wavelet decomposition are enough to analyze and extract emotional content in audio signals presenting a high accuracy rate in classification of emotional states without the need to use other kinds of classical frequency-time features -- Accordingly, this paper seeks to characterize mathematically the six basic emotions in humans: boredom, disgust, happiness, anxiety, anger and sadness, also included the neutrality, for a total of seven states to identify

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Common bottlenose dolphins (Tursiops truncatus), produce a wide variety of vocal emissions for communication and echolocation, of which the pulsed repertoire has been the most difficult to categorize. Packets of high repetition, broadband pulses are still largely reported under a general designation of burst-pulses, and traditional attempts to classify these emissions rely mainly in their aural characteristics and in graphical aspects of spectrograms. Here, we present a quantitative analysis of pulsed signals emitted by wild bottlenose dolphins, in the Sado estuary, Portugal (2011-2014), and test the reliability of a traditional classification approach. Acoustic parameters (minimum frequency, maximum frequency, peak frequency, duration, repetition rate and inter-click-interval) were extracted from 930 pulsed signals, previously categorized using a traditional approach. Discriminant function analysis revealed a high reliability of the traditional classification approach (93.5% of pulsed signals were consistently assigned to their aurally based categories). According to the discriminant function analysis (Wilk's Λ = 0.11, F3, 2.41 = 282.75, P < 0.001), repetition rate is the feature that best enables the discrimination of different pulsed signals (structure coefficient = 0.98). Classification using hierarchical cluster analysis led to a similar categorization pattern: two main signal types with distinct magnitudes of repetition rate were clustered into five groups. The pulsed signals, here described, present significant differences in their time-frequency features, especially repetition rate (P < 0.001), inter-click-interval (P < 0.001) and duration (P < 0.001). We document the occurrence of a distinct signal type-short burst-pulses, and highlight the existence of a diverse repertoire of pulsed vocalizations emitted in graded sequences. The use of quantitative analysis of pulsed signals is essential to improve classifications and to better assess the contexts of emission, geographic variation and the functional significance of pulsed signals.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, a musical learning application for mobile devices is presented. The main objective is to design and develop an application capable of offering exercises to practice and improve a selection of music skills, to users interested in music learning and training. The selected music skills are rhythm, melodic dictation and singing. The application includes an audio signal analysis system implemented making use of the Goertzel algorithm which is employed in singing exercises to check if the user sings the right musical note. This application also includes a graphical interface to represent musical symbols. A set of tests were conducted to check the usefulness of the application as musical learning tool. A group of users with different music knowledge have tested the system and reported to have found it effective, easy and accessible.