997 resultados para audio processing


Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper we derive the a posteriori probability for the location of bursts of noise additively superimposed on a Gaussian AR process. The theory is developed to give a sequentially based restoration algorithm suitable for real-time applications. The algorithm is particularly appropriate for digital audio restoration, where clicks and scratches may be modelled as additive bursts of noise. Experiments are carried out on both real audio data and synthetic AR processes and Significant improvements are demonstrated over existing restoration techniques. © 1995 IEEE

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper describes our experiences in implementing an audio lecture streaming facility for Deakin University. For many years Deakin students have benefited from some of the most comprehensive printed study notes of any university in Australia. In 2002, portable digital audio recorders were utilised by academic staff to capture lecture presentations in order to supplement existing unit learning materials and teaching delivery methods. Audio recordings were processed to enable streamed access via the web browser interface using QuickTime. A trial of incorporating PowerPoint presentations was conducted on a limited basis. 68 undergraduate and postgraduate units implemented lecture streaming. This represented over1700 lecture recordings and 20000 audio streams. Evaluation findings indicate that students find this facility highly valuable to their studies and regularly access the audio recordings throughout semester. Benefits include; access to lecture presentations for off-campus enrolled students, the ability to revisit lecture presentations, and the ability to study at a place and time of convenience. Future enhancement to the audio lecture streaming may include implementing a hard-wired audio capture system into lecture theatres and providing for a more rapid turn around of audio processing.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

La tecnología moderna de computación ha permitido cambiar radicalmente la investigación tecnológica en todos los ámbitos. El proceso general utilizado previamente consistía en el desarrollo de prototipos analógicos, creando múltiples versiones del mismo hasta llegar al resultado adecuado. Este es un proceso costoso a nivel económico y de carga de trabajo. Es por ello por lo que el proceso de investigación actual aprovecha las nuevas tecnologías para lograr el objetivo final mediante la simulación. Gracias al desarrollo de software para la simulación de distintas áreas se ha incrementado el ritmo de crecimiento de los avances tecnológicos y reducido el coste de los proyectos en investigación y desarrollo. La simulación, por tanto, permite desarrollar previamente prototipos simulados con un coste mucho menor para así lograr un producto final, el cual será llevado a cabo en su ámbito correspondiente. Este proceso no sólo se aplica en el caso de productos con circuitería, si bien es utilizado también en productos programados. Muchos de los programas actuales trabajan con algoritmos concretos cuyo funcionamiento debe ser comprobado previamente, para después centrarse en la codificación del mismo. Es en este punto donde se encuentra el objetivo de este proyecto, simular algoritmos de procesado digital de la señal antes de la codificación del programa final. Los sistemas de audio están basados en su totalidad en algoritmos de procesado de la señal, tanto analógicos como digitales, siendo estos últimos los que están sustituyendo al mundo analógico mediante los procesadores y los ordenadores. Estos algoritmos son la parte más compleja del sistema, y es la creación de nuevos algoritmos la base para lograr sistemas de audio novedosos y funcionales. Se debe destacar que los grupos de desarrollo de sistemas de audio presentan un amplio número de miembros con cometidos diferentes, separando las funciones de programadores e ingenieros de la señal de audio. Es por ello por lo que la simulación de estos algoritmos es fundamental a la hora de desarrollar nuevos y más potentes sistemas de audio. Matlab es una de las herramientas fundamentales para la simulación por ordenador, la cual presenta utilidades para desarrollar proyectos en distintos ámbitos. Sin embargo, en creciente uso actualmente se encuentra el software Simulink, herramienta especializada en la simulación de alto nivel que simplifica la dificultad de la programación en Matlab y permite desarrollar modelos de forma más rápida. Simulink presenta una completa funcionalidad para el desarrollo de algoritmos de procesado digital de audio. Por ello, el objetivo de este proyecto es el estudio de las capacidades de Simulink para generar sistemas de audio funcionales. A su vez, este proyecto pretende profundizar en los métodos de procesado digital de la señal de audio, logrando al final un paquete de sistemas de audio compatible con los programas de edición de audio actuales. ABSTRACT. Modern computer technology has dramatically changed the technological research in multiple areas. The overall process previously used consisted of the development of analog prototypes, creating multiple versions to reach the proper result. This is an expensive process in terms of an economically level and workload. For this reason actual investigation process take advantage of the new technologies to achieve the final objective through simulation. Thanks to the software development for simulation in different areas the growth rate of technological progress has been increased and the cost of research and development projects has been decreased. Hence, simulation allows previously the development of simulated protoypes with a much lower cost to obtain a final product, which will be held in its respective field. This process is not only applied in the case of circuitry products, but is also used in programmed products. Many current programs work with specific algorithms whose performance should be tested beforehand, which allows focusing on the codification of the program. This is the main point of this project, to simulate digital signal processing algorithms before the codification of the final program. Audio systems are entirely based on signal processing, both analog and digital systems, being the digital systems which are replacing the analog world thanks to the processors and computers. This algorithms are the most complex part of every system, and the creation of new algorithms is the most important step to achieve innovative and functional new audio systems. It should be noted that development groups of audio systems have a large number of members with different roles, separating them into programmers and audio signal engineers. For this reason, the simulation of this algorithms is essential when developing new and more powerful audio systems. Matlab is one of the most important tools for computer simulation, which has utilities to develop projects in different areas. However, the use of the Simulink software is constantly growing. It is a simulation tool specialized in high-level simulations which simplifies the difficulty of programming in Matlab and allows the developing of models faster. Simulink presents a full functionality for the development of algorithms for digital audio processing. Therefore, the objective of this project is to study the posibilities of Simulink to generate funcional audio systems. In turn, this projects aims to get deeper into the methods of digital audio signal processing, making at the end a software package of audio systems compatible with the current audio editing software.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Amphibian is an 10’00’’ musical work which explores new musical interfaces and approaches to hybridising performance practices from the popular music, electronic dance music and computer music traditions. The work is designed to be presented in a range of contexts associated with the electro-acoustic, popular and classical music traditions. The work is for two performers using two synchronised laptops, an electric guitar and a custom designed gestural interface for vocal performers - the e-Mic (Extended Mic-stand Interface Controller). This interface was developed by one of the co-authors, Donna Hewitt. The e-Mic allows a vocal performer to manipulate the voice in real time through the capture of physical gestures via an array of sensors - pressure, distance, tilt - along with ribbon controllers and an X-Y joystick microphone mount. Performance data are then sent to a computer, running audio-processing software, which is used to transform the audio signal from the microphone. In this work, data is also exchanged between performers via a local wireless network, allowing performers to work with shared data streams. The duo employs the gestural conventions of guitarist and singer (i.e. 'a band' in a popular music context), but transform these sounds and gestures into new digital music. The gestural language of popular music is deliberately subverted and taken into a new context. The piece thus explores the nexus between the sonic and performative practices of electro acoustic music and intelligent electronic dance music (‘idm’). This work was situated in the research fields of new musical interfacing, interaction design, experimental music composition and performance. The contexts in which the research was conducted were live musical performance and studio music production. The work investigated new methods for musical interfacing, performance data mapping, hybrid performance and compositional practices in electronic music. The research methodology was practice-led. New insights were gained from the iterative experimental workshopping of gestural inputs, musical data mapping, inter-performer data exchange, software patch design, data and audio processing chains. In respect of interfacing, there were innovations in the design and implementation of a novel sensor-based gestural interface for singers, the e-Mic, one of the only existing gestural controllers for singers. This work explored the compositional potential of sharing real time performance data between performers and deployed novel methods for inter-performer data exchange and mapping. As regards stylistic and performance innovation, the work explored and demonstrated an approach to the hybridisation of the gestural and sonic language of popular music with recent ‘post-digital’ approaches to laptop based experimental music The development of the work was supported by an Australia Council Grant. Research findings have been disseminated via a range of international conference publications, recordings, radio interviews (ABC Classic FM), broadcasts, and performances at international events and festivals. The work was curated into the major Australian international festival, Liquid Architecture, and was selected by an international music jury (through blind peer review) for presentation at the International Computer Music Conference in Belfast, N. Ireland.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Sleeper is an 18'00" musical work for live performer and laptop computer which exists as both a live performance work and a recorded work for audio CD. The work has been presented at a range of international performance events and survey exhibitions. These include the 2003 International Computer Music Conference (Singapore) where it was selected for CD publication, Variable Resistance (San Francisco Museum of Modern Art, USA), and i.audio, a survey of experimental sound at the Performance Space, Sydney. The source sound materials are drawn from field recordings made in acoustically resonant spaces in the Australian urban environment, amplified and acoustic instruments, radio signals, and sound synthesis procedures. The processing techniques blur the boundaries between, and exploit, the perceptual ambiguities of de-contextualised and processed sound. The work thus challenges the arbitrary distinctions between sound, noise and music and attempts to reveal the inherent musicality in so-called non-musical materials via digitally re-processed location audio. Thematically the work investigates Paul Virilio’s theory that technology ‘collapses space’ via the relationship of technology to speed. Technically this is explored through the design of a music composition process that draws upon spatially and temporally dispersed sound materials treated using digital audio processing technologies. One of the contributions to knowledge in this work is a demonstration of how disparate materials may be employed within a compositional process to produce music through the establishment of musically meaningful morphological, spectral and pitch relationships. This is achieved through the design of novel digital audio processing networks and a software performance interface. The work explores, tests and extends the music perception theories of ‘reduced listening’ (Schaeffer, 1967) and ‘surrogacy’ (Smalley, 1997), by demonstrating how, through specific audio processing techniques, sounds may shifted away from ‘causal’ listening contexts towards abstract aesthetic listening contexts. In doing so, it demonstrates how various time and frequency domain processing techniques may be used to achieve this shift.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nodule is 19'54" musical work for two electronic music performers, two laptop computers and a custom built, sensor-based microphone controller - the e-Mic (Extended Mic-stand Interface Controller). This interface was developed by one of the co-authors, Donna Hewitt. The e-Mic allows a vocal performer to manipulate their voice in real time by capturing physical gestures via an array of sensors - pressure, distance, tilt – in addition to ribbon controllers and an X-Y joystick microphone mount. Performance data are then sent to a computer, running audio-processing software, which is used to transform the audio signal from the microphone in real time. The work seeks to explore the liminal space between the electro-acoustic music tradition and more recent developments in the electronic dance music tradition. It does so on both a performative (gestural) and compositional (sonic) level. Visually, the performance consists of a singer and a laptop performer, hybridising the gestural context of these traditions. On a sonic level, the work explores hybridity at deeper levels of the musical structure than simple bricolage or collage approaches. Hybridity is explored at the level of the sonic gesture (source material), in production (audio processing gestures), in performance gesture, and in approaches to the use of the frequency spectrum, pulse and meter. The work was designed to be performed in a range of contexts from concert halls, to clubs, to rock festivals, across a range of staging and production platforms. As a consequence, the work has been tested in a range of audience contexts, and has allowed the transportation of compositional and performance practices across traditional audience demographic boundaries.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis investigates aspects of encoding the speech spectrum at low bit rates, with extensions to the effect of such coding on automatic speaker identification. Vector quantization (VQ) is a technique for jointly quantizing a block of samples at once, in order to reduce the bit rate of a coding system. The major drawback in using VQ is the complexity of the encoder. Recent research has indicated the potential applicability of the VQ method to speech when product code vector quantization (PCVQ) techniques are utilized. The focus of this research is the efficient representation, calculation and utilization of the speech model as stored in the PCVQ codebook. In this thesis, several VQ approaches are evaluated, and the efficacy of two training algorithms is compared experimentally. It is then shown that these productcode vector quantization algorithms may be augmented with lossless compression algorithms, thus yielding an improved overall compression rate. An approach using a statistical model for the vector codebook indices for subsequent lossless compression is introduced. This coupling of lossy compression and lossless compression enables further compression gain. It is demonstrated that this approach is able to reduce the bit rate requirement from the current 24 bits per 20 millisecond frame to below 20, using a standard spectral distortion metric for comparison. Several fast-search VQ methods for use in speech spectrum coding have been evaluated. The usefulness of fast-search algorithms is highly dependent upon the source characteristics and, although previous research has been undertaken for coding of images using VQ codebooks trained with the source samples directly, the product-code structured codebooks for speech spectrum quantization place new constraints on the search methodology. The second major focus of the research is an investigation of the effect of lowrate spectral compression methods on the task of automatic speaker identification. The motivation for this aspect of the research arose from a need to simultaneously preserve the speech quality and intelligibility and to provide for machine-based automatic speaker recognition using the compressed speech. This is important because there are several emerging applications of speaker identification where compressed speech is involved. Examples include mobile communications where the speech has been highly compressed, or where a database of speech material has been assembled and stored in compressed form. Although these two application areas have the same objective - that of maximizing the identification rate - the starting points are quite different. On the one hand, the speech material used for training the identification algorithm may or may not be available in compressed form. On the other hand, the new test material on which identification is to be based may only be available in compressed form. Using the spectral parameters which have been stored in compressed form, two main classes of speaker identification algorithm are examined. Some studies have been conducted in the past on bandwidth-limited speaker identification, but the use of short-term spectral compression deserves separate investigation. Combining the major aspects of the research, some important design guidelines for the construction of an identification model when based on the use of compressed speech are put forward.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This dissertation seeks to define and classify potential forms of Nonlinear structure and explore the possibilities they afford for the creation of new musical works. It provides the first comprehensive framework for the discussion of Nonlinear structure in musical works and provides a detailed overview of the rise of nonlinearity in music during the 20th century. Nonlinear events are shown to emerge through significant parametrical discontinuity at the boundaries between regions of relatively strong internal cohesion. The dissertation situates Nonlinear structures in relation to linear structures and unstructured sonic phenomena and provides a means of evaluating Nonlinearity in a musical structure through the consideration of the degree to which the structure is integrated, contingent, compressible and determinate as a whole. It is proposed that Nonlinearity can be classified as a three dimensional space described by three continuums: the temporal continuum, encompassing sequential and multilinear forms of organization, the narrative continuum encompassing processual, game structure and developmental narrative forms and the referential continuum encompassing stylistic allusion, adaptation and quotation. The use of spectrograms of recorded musical works is proposed as a means of evaluating Nonlinearity in a musical work through the visual representation of parametrical divergence in pitch, duration, timbre and dynamic over time. Spectral and structural analysis of repertoire works is undertaken as part of an exploration of musical nonlinearity and the compositional and performative features that characterize it. The contribution of cultural, ideological, scientific and technological shifts to the emergence of Nonlinearity in music is discussed and a range of compositional factors that contributed to the emergence of musical Nonlinearity is examined. The evolution of notational innovations from the mobile score to the screen score is plotted and a novel framework for the discussion of these forms of musical transmission is proposed. A computer coordinated performative model is discussed, in which a computer synchronises screening of notational information, provides temporal coordination of the performers through click-tracks or similar methods and synchronises the audio processing and synthesized elements of the work. It is proposed that such a model constitutes a highly effective means of realizing complex Nonlinear structures. A creative folio comprising 29 original works that explore nonlinearity is presented, discussed and categorised utilising the proposed classifications. Spectrograms of these works are employed where appropriate to illustrate the instantiation of parametrically divergent substructures and examples of structural openness through multiple versioning.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Raven and Song Scope are two automated sound anal-ysis tools based on machine learning technique for en-vironmental monitoring. Many research works have been conducted upon them, however, no or rare explo-ration mentions about the performance and comparison between them. This paper investigates the comparisons from six aspects: theory, software interface, ease of use, detection targets, detection accuracy, and potential application. Through deep exploration one critical gap is identified that there is a lack of approach to detect both syllables and call structures, since Raven only aims to detect syllables while Song Scope targets call structures. Therefore, a Timed Probabilistic Automata (TPA) system is proposed which separates syllables first and clusters them into complex structures after.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper discusses the Cambridge University HTK (CU-HTK) system for the automatic transcription of conversational telephone speech. A detailed discussion of the most important techniques in front-end processing, acoustic modeling and model training, language and pronunciation modeling are presented. These include the use of conversation side based cepstral normalization, vocal tract length normalization, heteroscedastic linear discriminant analysis for feature projection, minimum phone error training and speaker adaptive training, lattice-based model adaptation, confusion network based decoding and confidence score estimation, pronunciation selection, language model interpolation, and class based language models. The transcription system developed for participation in the 2002 NIST Rich Transcription evaluations of English conversational telephone speech data is presented in detail. In this evaluation the CU-HTK system gave an overall word error rate of 23.9%, which was the best performance by a statistically significant margin. Further details on the derivation of faster systems with moderate performance degradation are discussed in the context of the 2002 CU-HTK 10 × RT conversational speech transcription system. © 2005 IEEE.