987 resultados para sound processing
Resumo:
premiered by Mel Puga Iglesias
Resumo:
Composers of digital music today have a bewildering variety of sound-processing tools and techniques at their disposal. At their best, these tools allow composers to hone a sound to perfection. However, they can also lead us into a routine which bypasses avenues of experimentation, simply because the known tools work so well and their sonic output is so attractive. An alternative strategy is oracular sound processing. An oracular sound processor creates a derived version of its input whose characteristics could not have been fully predicted, while affording the user little or no parametric control over the process.
Resumo:
OBJECTIVES: In natural hearing, cochlear mechanical compression is dynamically adjusted via the efferent medial olivocochlear reflex (MOCR). These adjustments probably help understanding speech in noisy environments and are not available to the users of current cochlear implants (CIs). The aims of the present study are to: (1) present a binaural CI sound processing strategy inspired by the control of cochlear compression provided by the contralateral MOCR in natural hearing; and (2) assess the benefits of the new strategy for understanding speech presented in competition with steady noise with a speech-like spectrum in various spatial configurations of the speech and noise sources. DESIGN: Pairs of CI sound processors (one per ear) were constructed to mimic or not mimic the effects of the contralateral MOCR on compression. For the nonmimicking condition (standard strategy or STD), the two processors in a pair functioned similarly to standard clinical processors (i.e., with fixed back-end compression and independently of each other). When configured to mimic the effects of the MOCR (MOC strategy), the two processors communicated with each other and the amount of back-end compression in a given frequency channel of each processor in the pair decreased/increased dynamically (so that output levels dropped/increased) with increases/decreases in the output energy from the corresponding frequency channel in the contralateral processor. Speech reception thresholds in speech-shaped noise were measured for 3 bilateral CI users and 2 single-sided deaf unilateral CI users. Thresholds were compared for the STD and MOC strategies in unilateral and bilateral listening conditions and for three spatial configurations of the speech and noise sources in simulated free-field conditions: speech and noise sources colocated in front of the listener, speech on the left ear with noise in front of the listener, and speech on the left ear with noise on the right ear. In both bilateral and unilateral listening, the electrical stimulus delivered to the test ear(s) was always calculated as if the listeners were wearing bilateral processors. RESULTS: In both unilateral and bilateral listening conditions, mean speech reception thresholds were comparable with the two strategies for colocated speech and noise sources, but were at least 2 dB lower (better) with the MOC than with the STD strategy for spatially separated speech and noise sources. In unilateral listening conditions, mean thresholds improved with increasing the spatial separation between the speech and noise sources regardless of the strategy but the improvement was significantly greater with the MOC strategy. In bilateral listening conditions, thresholds improved significantly with increasing the speech-noise spatial separation only with the MOC strategy. CONCLUSIONS: The MOC strategy (1) significantly improved the intelligibility of speech presented in competition with a spatially separated noise source, both in unilateral and bilateral listening conditions; (2) produced significant spatial release from masking in bilateral listening conditions, something that did not occur with fixed compression; and (3) enhanced spatial release from masking in unilateral listening conditions. The MOC strategy as implemented here, or a modified version of it, may be usefully applied in CIs and in hearing aids.
Resumo:
Current hearing-assistive technology performs poorly in noisy multi-talker conditions. The goal of this thesis was to establish the feasibility of using EEG to guide acoustic processing in such conditions. To attain this goal, this research developed a model via the constructive research method, relying on literature review. Several approaches have revealed improvements in the performance of hearing-assistive devices under multi-talker conditions, namely beamforming spatial filtering, model-based sparse coding shrinkage, and onset enhancement of the speech signal. Prior research has shown that electroencephalography (EEG) signals contain information that concerns whether the person is actively listening, what the listener is listening to, and where the attended sound source is. This thesis constructed a model for using EEG information to control beamforming, model-based sparse coding shrinkage, and onset enhancement of the speech signal. The purpose of this model is to propose a framework for using EEG signals to control sound processing to select a single talker in a noisy environment containing multiple talkers speaking simultaneously. On a theoretical level, the model showed that EEG can control acoustical processing. An analysis of the model identified a requirement for real-time processing and that the model inherits the computationally intensive properties of acoustical processing, although the model itself is low complexity placing a relatively small load on computational resources. A research priority is to develop a prototype that controls hearing-assistive devices with EEG. This thesis concludes highlighting challenges for future research.
Resumo:
The Pomegranate Cycle is a practice-led enquiry consisting of a creative work and an exegesis. This project investigates the potential of self-directed, technologically mediated composition as a means of reconfiguring gender stereotypes within the operatic tradition. This practice confronts two primary stereotypes: the positioning of female performing bodies within narratives of violence and the absence of women from authorial roles that construct and regulate the operatic tradition. The Pomegranate Cycle redresses these stereotypes by presenting a new narrative trajectory of healing for its central character, and by placing the singer inside the role of composer and producer. During the twentieth and early twenty-first century, operatic and classical music institutions have resisted incorporating works of living composers into their repertory. Consequently, the canon’s historic representations of gender remain unchallenged. Historically and contemporarily, men have almost exclusively occupied the roles of composer, conductor, director and critic, and therefore men have regulated the pedagogy, performance practices, repertoire and organisations that sustain classical music. In this landscape, women are singers, and few have the means to challenge the constructions of gender they are asked to reproduce. The Pomegranate Cycle uses recording technologies as the means of driving change because these technologies have already challenged the regulation of the classical tradition by changing people’s modes of accessing, creating and interacting with music. Building on the work of artists including Phillips and van Veen, Robert Ashley and Diamanda Galas, The Pomegranate Cycle seeks to broaden the definition of what opera can be. This work examines the ways in which the operatic tradition can be hybridised with contemporary musical forms such as ambient electronica, glitch, spoken word and concrete sounds as a way of bringing the form into dialogue with contemporary music cultures. The ultilisation of other sound cultures within the context of opera enables women’s voices and stories to be presented in new ways, while also providing a point of friction with opera’s traditional storytelling devices. The Pomegranate Cycle simulates aesthetics associated with Western art music genres by drawing on contemporary recording techniques, virtual instruments and sound-processing plug-ins. Through such simulations, the work disrupts the way virtuosic human craft has been used to generate authenticity and regulate access to the institutions that protect and produce Western art music. The DIY approach to production, recording, composition and performance of The Pomegranate Cycle demonstrates that an opera can be realised by a single person. Access to the broader institutions which regulate the tradition are not necessary. In short, The Pomegranate Cycle establishes that a singer can be more than a voice and a performing body. She can be her own multimedia storyteller. Her audience can be anywhere.
Resumo:
Donna Hewitt (composer/vocalist/emic designer), Avril Huddy (choreographer/dancer) and Vanessa Mafé (dancer). Donna is performing with her invention called the eMic (extended mic-stand interface controller). The work explores the use of gesture to control live sound processing during the performance. The work was choreographed by Avril Huddy.
Resumo:
Pitch discrimination is a fundamental property of the human auditory system. Our understanding of pitch-discrimination mechanisms is important from both theoretical and clinical perspectives. The discrimination of spectrally complex sounds is crucial in the processing of music and speech. Current methods of cognitive neuroscience can track the brain processes underlying sound processing either with precise temporal (EEG and MEG) or spatial resolution (PET and fMRI). A combination of different techniques is therefore required in contemporary auditory research. One of the problems in comparing the EEG/MEG and fMRI methods, however, is the fMRI acoustic noise. In the present thesis, EEG and MEG in combination with behavioral techniques were used, first, to define the ERP correlates of automatic pitch discrimination across a wide frequency range in adults and neonates and, second, they were used to determine the effect of recorded acoustic fMRI noise on those adult ERP and ERF correlates during passive and active pitch discrimination. Pure tones and complex 3-harmonic sounds served as stimuli in the oddball and matching-to-sample paradigms. The results suggest that pitch discrimination in adults, as reflected by MMN latency, is most accurate in the 1000-2000 Hz frequency range, and that pitch discrimination is facilitated further by adding harmonics to the fundamental frequency. Newborn infants are able to discriminate a 20% frequency change in the 250-4000 Hz frequency range, whereas the discrimination of a 5% frequency change was unconfirmed. Furthermore, the effect of the fMRI gradient noise on the automatic processing of pitch change was more prominent for tones with frequencies exceeding 500 Hz, overlapping with the spectral maximum of the noise. When the fundamental frequency of the tones was lower than the spectral maximum of the noise, fMRI noise had no effect on MMN and P3a, whereas the noise delayed and suppressed N1 and exogenous N2. Noise also suppressed the N1 amplitude in a matching-to-sample working memory task. However, the task-related difference observed in the N1 component, suggesting a functional dissociation between the processing of spatial and non-spatial auditory information, was partially preserved in the noise condition. Noise hampered feature coding mechanisms more than it hampered the mechanisms of change detection, involuntary attention, and the segregation of the spatial and non-spatial domains of working-memory. The data presented in the thesis can be used to develop clinical ERP-based frequency-discrimination protocols and combined EEG and fMRI experimental paradigms.
Resumo:
Speech has both auditory and visual components (heard speech sounds and seen articulatory gestures). During all perception, selective attention facilitates efficient information processing and enables concentration on high-priority stimuli. Auditory and visual sensory systems interact at multiple processing levels during speech perception and, further, the classical motor speech regions seem also to participate in speech perception. Auditory, visual, and motor-articulatory processes may thus work in parallel during speech perception, their use possibly depending on the information available and the individual characteristics of the observer. Because of their subtle speech perception difficulties possibly stemming from disturbances at elemental levels of sensory processing, dyslexic readers may rely more on motor-articulatory speech perception strategies than do fluent readers. This thesis aimed to investigate the neural mechanisms of speech perception and selective attention in fluent and dyslexic readers. We conducted four functional magnetic resonance imaging experiments, during which subjects perceived articulatory gestures, speech sounds, and other auditory and visual stimuli. Gradient echo-planar images depicting blood oxygenation level-dependent contrast were acquired during stimulus presentation to indirectly measure brain hemodynamic activation. Lip-reading activated the primary auditory cortex, and selective attention to visual speech gestures enhanced activity within the left secondary auditory cortex. Attention to non-speech sounds enhanced auditory cortex activity bilaterally; this effect showed modulation by sound presentation rate. A comparison between fluent and dyslexic readers' brain hemodynamic activity during audiovisual speech perception revealed stronger activation of predominantly motor speech areas in dyslexic readers during a contrast test that allowed exploration of the processing of phonetic features extracted from auditory and visual speech. The results show that visual speech perception modulates hemodynamic activity within auditory cortex areas once considered unimodal, and suggest that the left secondary auditory cortex specifically participates in extracting the linguistic content of seen articulatory gestures. They are strong evidence for the importance of attention as a modulator of auditory cortex function during both sound processing and visual speech perception, and point out the nature of attention as an interactive process (influenced by stimulus-driven effects). Further, they suggest heightened reliance on motor-articulatory and visual speech perception strategies among dyslexic readers, possibly compensating for their auditory speech perception difficulties.
Resumo:
We address the problem of estimating the fundamental frequency of voiced speech. We present a novel solution motivated by the importance of amplitude modulation in sound processing and speech perception. The new algorithm is based on a cumulative spectrum computed from the temporal envelope of various subbands. We provide theoretical analysis to derive the new pitch estimator based on the temporal envelope of the bandpass speech signal. We report extensive experimental performance for synthetic as well as natural vowels for both realworld noisy and noise-free data. Experimental results show that the new technique performs accurate pitch estimation and is robust to noise. We also show that the technique is superior to the autocorrelation technique for pitch estimation.
Resumo:
Desde os primórdios da humanidade, a descoberta do método de processamento cerebral do som, e consequentemente da música, fazem parte do imaginário humano. Portanto, as pesquisas relacionadas a este processo constituem um dos mais vastos campos de estudos das áreas de ciências. Dentre as inúmeras tentativas para compreensão do processamento biológico do som, o ser humano inventou o processo automático de composição musical, com o intuito de aferir a possibilidade da realização de composições musicais de qualidade sem a imposição sentimental, ou seja, apenas com a utilização das definições e estruturas de música existentes. Este procedimento automático de composição musical, também denominado música aleatória ou música do acaso, tem sido vastamente explorado ao longo dos séculos, já tendo sido utilizado por alguns dos grandes nomes do cenário musical, como por exemplo, Mozart. Os avanços nas áreas de engenharia e computação permitiram a evolução dos métodos utilizados para composição de música aleatória, tornando a aplicação de autômatos celulares uma alternativa viável para determinação da sequência de execução de notas musicais e outros itens utilizados durante a composição deste tipo de música. Esta dissertação propõe uma arquitetura para geração de música harmonizada a partir de intervalos melódicos determinados por autômatos celulares, implementada em hardware reconfigurável do tipo FPGA. A arquitetura proposta possui quatro tipos de autômatos celulares, desenvolvidos através dos modelos de vizinhança unidimensional de Wolfram, vizinhança bidimensional de Neumann, vizinhança bidimensional Moore e vizinhança tridimensional de Neumann, que podem ser combinados de 16 formas diferentes para geração de melodias. Os resultados do processamento realizado pela arquitetura proposta são melodias no formato .mid, compostas através da utilização de dois autômatos celulares, um para escolha das notas e outro para escolha dos instrumentos a serem emulados, de acordo com o protocolo MIDI. Para tal esta arquitetura é formada por três unidades principais, a unidade divisor de frequência, que é responsável pelo sincronismo das tarefas executadas pela arquitetura, a unidade de conjunto de autômatos celulares, que é responsável pelo controle e habilitação dos autômatos celulares, e a unidade máquina MIDI, que é responsável por organizar os resultados de cada iteração corrente dos autômatos celulares e convertê-los conforme a estrutura do protocolo MIDI, gerando-se assim o produto musical. A arquitetura proposta é parametrizável, de modo que a configuração dos dados que influenciam no produto musical gerado, como por exemplo, a definição dos conjuntos de regras para os autômatos celulares habilitados, fica a cargo do usuário, não havendo então limites para as combinações possíveis a serem realizadas na arquitetura. Para validação da funcionalidade e aplicabilidade da arquitetura proposta, alguns dos resultados obtidos foram apresentados e detalhados através do uso de técnicas de obtenção de informação musical.
Resumo:
La voix humaine constitue la partie dominante de notre environnement auditif. Non seulement les humains utilisent-ils la voix pour la parole, mais ils sont tout aussi habiles pour en extraire une multitude d’informations pertinentes sur le locuteur. Cette expertise universelle pour la voix humaine se reflète dans la présence d’aires préférentielles à celle-ci le long des sillons temporaux supérieurs. À ce jour, peu de données nous informent sur la nature et le développement de cette réponse sélective à la voix. Dans le domaine visuel, une vaste littérature aborde une problématique semblable en ce qui a trait à la perception des visages. L’étude d’experts visuels a permis de dégager les processus et régions impliqués dans leur expertise et a démontré une forte ressemblance avec ceux utilisés pour les visages. Dans le domaine auditif, très peu d’études se sont penchées sur la comparaison entre l’expertise pour la voix et d’autres catégories auditives, alors que ces comparaisons pourraient contribuer à une meilleure compréhension de la perception vocale et auditive. La présente thèse a pour dessein de préciser la spécificité des processus et régions impliqués dans le traitement de la voix. Pour ce faire, le recrutement de différents types d’experts ainsi que l’utilisation de différentes méthodes expérimentales ont été préconisés. La première étude a évalué l’influence d’une expertise musicale sur le traitement de la voix humaine, à l’aide de tâches comportementales de discrimination de voix et d’instruments de musique. Les résultats ont démontré que les musiciens amateurs étaient meilleurs que les non-musiciens pour discriminer des timbres d’instruments de musique mais aussi les voix humaines, suggérant une généralisation des apprentissages perceptifs causés par la pratique musicale. La seconde étude avait pour but de comparer les potentiels évoqués auditifs liés aux chants d’oiseaux entre des ornithologues amateurs et des participants novices. L’observation d’une distribution topographique différente chez les ornithologues à la présentation des trois catégories sonores (voix, chants d’oiseaux, sons de l’environnement) a rendu les résultats difficiles à interpréter. Dans la troisième étude, il était question de préciser le rôle des aires temporales de la voix dans le traitement de catégories d’expertise chez deux groupes d’experts auditifs, soit des ornithologues amateurs et des luthiers. Les données comportementales ont démontré une interaction entre les deux groupes d’experts et leur catégorie d’expertise respective pour des tâches de discrimination et de mémorisation. Les résultats obtenus en imagerie par résonance magnétique fonctionnelle ont démontré une interaction du même type dans le sillon temporal supérieur gauche et le gyrus cingulaire postérieur gauche. Ainsi, les aires de la voix sont impliquées dans le traitement de stimuli d’expertise dans deux groupes d’experts auditifs différents. Ce résultat suggère que la sélectivité à la voix humaine, telle que retrouvée dans les sillons temporaux supérieurs, pourrait être expliquée par une exposition prolongée à ces stimuli. Les données présentées démontrent plusieurs similitudes comportementales et anatomo-fonctionnelles entre le traitement de la voix et d’autres catégories d’expertise. Ces aspects communs sont explicables par une organisation à la fois fonctionnelle et économique du cerveau. Par conséquent, le traitement de la voix et d’autres catégories sonores se baserait sur les mêmes réseaux neuronaux, sauf en cas de traitement plus poussé. Cette interprétation s’avère particulièrement importante pour proposer une approche intégrative quant à la spécificité du traitement de la voix.
Resumo:
La version intégrale de cette thèse est disponible uniquement pour consultation individuelle à la Bibliothèque de musique de l’Université de Montréal (www.bib.umontreal.ca/MU).
Resumo:
Background: Voice processing in real-time is challenging. A drawback of previous work for Hypokinetic Dysarthria (HKD) recognition is the requirement of controlled settings in a laboratory environment. A personal digital assistant (PDA) has been developed for home assessment of PD patients. The PDA offers sound processing capabilities, which allow for developing a module for recognition and quantification HKD. Objective: To compose an algorithm for assessment of PD speech severity in the home environment based on a review synthesis. Methods: A two-tier review methodology is utilized. The first tier focuses on real-time problems in speech detection. In the second tier, acoustics features that are robust to medication changes in Levodopa-responsive patients are investigated for HKD recognition. Keywords such as Hypokinetic Dysarthria , and Speech recognition in real time were used in the search engines. IEEE explorer produced the most useful search hits as compared to Google Scholar, ELIN, EBRARY, PubMed and LIBRIS. Results: Vowel and consonant formants are the most relevant acoustic parameters to reflect PD medication changes. Since relevant speech segments (consonants and vowels) contains minority of speech energy, intelligibility can be improved by amplifying the voice signal using amplitude compression. Pause detection and peak to average power rate calculations for voice segmentation produce rich voice features in real time. Enhancements in voice segmentation can be done by inducing Zero-Crossing rate (ZCR). Consonants have high ZCR whereas vowels have low ZCR. Wavelet transform is found promising for voice analysis since it quantizes non-stationary voice signals over time-series using scale and translation parameters. In this way voice intelligibility in the waveforms can be analyzed in each time frame. Conclusions: This review evaluated HKD recognition algorithms to develop a tool for PD speech home-assessment using modern mobile technology. An algorithm that tackles realtime constraints in HKD recognition based on the review synthesis is proposed. We suggest that speech features may be further processed using wavelet transforms and used with a neural network for detection and quantification of speech anomalies related to PD. Based on this model, patients' speech can be automatically categorized according to UPDRS speech ratings.