997 resultados para audio processing


Relevância:

60.00% 60.00%

Publicador:

Resumo:

La version intégrale de cette thèse est disponible uniquement pour consultation individuelle à la Bibliothèque de musique de l’Université de Montréal (www.bib.umontreal.ca/MU).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper investigates the robustness of a hybrid analog/digital feedback active noise cancellation (ANC) headset system. The digital ANC systems with the filtered-x least-mean-square (FXLMS) algorithm require accurate estimation of the secondary path for the stability and convergence of the algorithm. This demands a great challenge for the ANC headset design because the secondary path may fluctuate dramatically such as when the user adjusts the position of the ear-cup. In this paper, we analytically show that adding an analog feedback loop into the digital ANC systems can effectively reduce the plant fluctuation, thus achieving a more robust system. The method for designing the analog controller is highlighted. A practical hybrid analog/digital feedback ANC headset has been built and used to conduct experiments, and the experimental results show that the hybrid headset system is more robust under large plant fluctuation, and has achieved satisfactory noise cancellation for both narrowband and broadband noises.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The International Multimedia Modelling conference series is an annual forum to discuss the efficient representation, processing, interaction, integration, communication, and retrieval of multimedia information.
In particular, the 10th International Multimedia Modelling Conference (MMM2004) concentrates on common modelling frameworks for integrating the diverse fields of visual, audio, video, and
virtual world information.
MMM2004 deals with emerging Multimedia Modelling topics including:
• Multimedia Databases
Audio Processing, Coding and Encryption
• Network Games and Animation
• Video Applications
• Multimedia Frameworks and QoS
• Topological and 3D Geometric Modelling
• Image Applications
• Image Retrieval
• Modelling / Editing / Virtual Environment
• Video Retrieval and Browsing

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Process algebraic architectural description languages provide a formal means for modeling software systems and assessing their properties. In order to bridge the gap between system modeling and system im- plementation, in this thesis an approach is proposed for automatically generating multithreaded object-oriented code from process algebraic architectural descriptions, in a way that preserves – under certain assumptions – the properties proved at the architectural level. The approach is divided into three phases, which are illustrated by means of a running example based on an audio processing system. First, we develop an architecture-driven technique for thread coordination management, which is completely automated through a suitable package. Second, we address the translation of the algebraically-specified behavior of the individual software units into thread templates, which will have to be filled in by the software developer according to certain guidelines. Third, we discuss performance issues related to the suitability of synthesizing monitors rather than threads from software unit descriptions that satisfy specific constraints. In addition to the running example, we present two case studies about a video animation repainting system and the implementation of a leader election algorithm, in order to summarize the whole approach. The outcome of this thesis is the implementation of the proposed approach in a translator called PADL2Java and its integration in the architecture-centric verification tool TwoTowers.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the last two decades, there has been an important increase in research on speech technology in Spain, mainly due to a higher level of funding from European, Spanish and local institutions and also due to a growing interest in these technologies for developing new services and applications. This paper provides a review of the main areas of speech technology addressed by research groups in Spain, their main contributions in the recent years and the main focus of interest these days. This description is classified in five main areas: audio processing including speech, speaker characterization, speech and language processing, text to speech conversion and spoken language applications. This paper also introduces the Spanish Network of Speech Technologies (RTTH. Red Temática en Tecnologías del Habla) as the research network that includes almost all the researchers working in this area, presenting some figures, its objectives and its main activities developed in the last years.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Este proyecto consiste en crear una serie de tres pequeños videojuegos incluidos en una sola aplicación, para plataformas móviles Android, que permitan en cualquier lugar entrenar la estética de la voz del paciente con problemas de fonación. Dependiendo de los aspectos de la voz (sonidos sonoros y sordos, el pitch y la intensidad) a trabajar se le asignará un ejercicio u otro. En primer lugar se introduce el concepto de rehabilitación de la voz y en qué casos es necesario. Seguidamente se realiza un trabajo de búsqueda en el que se identifican las distintas plataformas de desarrollo de videojuegos que son compatibles con los sistemas Android, así como para la captura de audio y las librerías de procesado de señal. A continuación se eligen las herramientas que presentan las mejores capacidades y con las que se va a trabajar. Estas son el motor de juego Andengine, para la parte gráfica, el entorno Java específico de Android, para la captura de muestras de audio y la librería JTransforms que realiza transformadas de Fourier permitiendo procesar el audio para la detección de pitch. Al desarrollar y ensamblar los distintos bloques se prioriza el funcionamiento en tiempo real de la aplicación. Las líneas de mejora y conclusiones se comentan en el último capítulo del trabajo así como el manual de usuario para mayor comprensión. ABSTRACT. The main aim of this project is to create an application for mobile devices which includes three small speech therapy videogames for the Android OS. These videogames allow patients to train certain voice parameters (such as voice and unvoiced sounds, pitch and intensity) wherever they want and need to. First, an overview of the concept of voice rehabilitation and its uses for patients with speech disorders is given. Secondly a study has been made to identify the most suitable video game engine for the Android OS, the best possible way to capture audio from the device and the audio processing library which will combine with the latter. Therefore, the chosen tools are exposed. Andengine has been selected regarding the game engine, Android’s Java framework for audio capture and the fast Fourier transform library, JTransforms, for pitch detection. Real time processing is vital for the proper functioning of the application. Lines of improvement and other conclusions are discussed in the last part of this dissertation paper.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nowadays, a lot of interesting and useful and imaginative applications are springing to Android software market. And for guitar fans, some related apps bring great connivence to them, like a guitar tuner can save people from carrying a entity tuner all the time, some apps can simulate a real guitar, and some apps provide some simple lessons allowing people to learn some basic things. But these apps which can teach people, they can't really “monitor ” people, that is, they just give some instructions and hope people would follow them. So my project is to design an app which can detect if users are playing wrong and right real-timely. Guitar chords are always the first for new guitar beginners to learn, and a chord is a set of notes combined together in a regulated way ( get from the music theory having millions of developing ), and 'pitch' is the term for determining if the note different from other notes or noise, so the problem here is to manage the multi-pitch analysis in real time. And it's necessary to know some basics of digital signal processing ( DSP ) because digital signals are always more convenient for computers to analyze compared to analog signals. Then I found an audio processing Java library – TarsosDSP, and try to apply it to my Android project.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-08

Relevância:

60.00% 60.00%

Publicador:

Resumo:

With the rapid development of Internet technologies, video and audio processing are among the most important parts due to the constant requirements of high quality media contents. Along with the improvement of network environment and the hardware equipment, this demand is becoming more and more imperious, people prefer high quality videos and audios as well as the net streaming media resources. FFmpeg is a set of open source program about the A/V decoding. Many commercial players use FFmpeg as their displaying cores. This paper designed a simple and easy-to-use video player based on FFmpeg. The first part is about the basic theories and related knowledge of video displaying, including some concepts like data formats, streaming media data, video coding and decoding. In a word, the realization of the video player depend on the a set of video decoding process. The general idea about the process is to get the video packets from the Internet, to read the related protocols and de-encapsulate the protocols, to de-encapsulate the packaging data and to get encoded formats data, to decode them to pixel data that can be displayed directly through graphics cards. During the coding and decoding process, there could be different degrees of data losing, which is called lossy compression, but it usually does not influence the quality of user experiences. The second part is about the principle of the FFmpeg decoding process, that is one of the key point of the paper. In this project, FFmpeg is used for the main decoding task, by call some main functions and structures from FFmpeg class libraries, packaging video formats could be transfer to pixel data, after getting the pixel data, SDL is used for the displaying process. The third part is about the SDL displaying flow. Similarly, it would invoke some important displaying functions from SDL class libraries to realize the function, though SDL is able to do not only displaying task, but also many other game playing process. After that, a independent video displayer is completed, it is provided with all the key function of a player. The fourth part make a simple users interface for the player based on the MFC program, it enable the player could be used by most people. At last, in consideration of the mobile Internet’s blossom, people nowadays can hardly ever drop their mobile phones, there is a brief introduction about how to transplant the video player to Android platform which is one of the most used mobile systems.

Relevância:

40.00% 40.00%

Publicador:

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Human listeners seem to be remarkably able to recognise acoustic sound sources based on timbre cues. Here we describe a psychophysical paradigm to estimate the time it takes to recognise a set of complex sounds differing only in timbre cues: both in terms of the minimum duration of the sounds and the inferred neural processing time. Listeners had to respond to the human voice while ignoring a set of distractors. All sounds were recorded from natural sources over the same pitch range and equalised to the same duration and power. In a first experiment, stimuli were gated in time with a raised-cosine window of variable duration and random onset time. A voice/non-voice (yes/no) task was used. Performance, as measured by d', remained above chance for the shortest sounds tested (2 ms); d's above 1 were observed for durations longer than or equal to 8 ms. Then, we constructed sequences of short sounds presented in rapid succession. Listeners were asked to report the presence of a single voice token that could occur at a random position within the sequence. This method is analogous to the "rapid sequential visual presentation" paradigm (RSVP), which has been used to evaluate neural processing time for images. For 500-ms sequences made of 32-ms and 16-ms sounds, d' remained above chance for presentation rates of up to 30 sounds per second. There was no effect of the pitch relation between successive sounds: identical for all sounds in the sequence or random for each sound. This implies that the task was not determined by streaming or forward masking, as both phenomena would predict better performance for the random pitch condition. Overall, the recognition of familiar sound categories such as the voice seems to be surprisingly fast, both in terms of the acoustic duration required and of the underlying neural time constants.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

People possess different sensory modalities to detect, interpret, and efficiently act upon various events in a complex and dynamic environment (Fetsch, DeAngelis, & Angelaki, 2013). Much empirical work has been done to understand the interplay of modalities (e.g. audio-visual interactions, see Calvert, Spence, & Stein, 2004). On the one hand, integration of multimodal input as a functional principle of the brain enables the versatile and coherent perception of the environment (Lewkowicz & Ghazanfar, 2009). On the other hand, sensory integration does not necessarily mean that input from modalities is always weighted equally (Ernst, 2008). Rather, when two or more modalities are stimulated concurrently, one often finds one modality dominating over another. Study 1 and 2 of the dissertation addressed the developmental trajectory of sensory dominance. In both studies, 6-year-olds, 9-year-olds, and adults were tested in order to examine sensory (audio-visual) dominance across different age groups. In Study 3, sensory dominance was put into an applied context by examining verbal and visual overshadowing effects among 4- to 6-year olds performing a face recognition task. The results of Study 1 and Study 2 support default auditory dominance in young children as proposed by Napolitano and Sloutsky (2004) that persists up to 6 years of age. For 9-year-olds, results on privileged modality processing were inconsistent. Whereas visual dominance was revealed in Study 1, privileged auditory processing was revealed in Study 2. Among adults, a visual dominance was observed in Study 1, which has also been demonstrated in preceding studies (see Spence, Parise, & Chen, 2012). No sensory dominance was revealed in Study 2 for adults. Potential explanations are discussed. Study 3 referred to verbal and visual overshadowing effects in 4- to 6-year-olds. The aim was to examine whether verbalization (i.e., verbally describing a previously seen face), or visualization (i.e., drawing the seen face) might affect later face recognition. No effect of visualization on recognition accuracy was revealed. As opposed to a verbal overshadowing effect, a verbal facilitation effect occurred. Moreover, verbal intelligence was a significant predictor for recognition accuracy in the verbalization group but not in the control group. This suggests that strengthening verbal intelligence in children can pay off in non-verbal domains as well, which might have educational implications.