998 resultados para Spatial Audio
Resumo:
REVERIE (REal and Virtual Engagement in Realistic Immersive Environments) [1] is a multimedia and multimodal framework, which supports the creation of immersive games. The framework supports the creation of games integrating technologies such as 3D spatial audio, detection of the player’s body movement using Kinect and WIMO sensors, NPCs (Non-Playable Characters) with advanced AI capabilities featuring various levels of representation and gameplay into an immersive 3D environment. A demonstration game was developed for REVERIE, which is an adapted version of the popular Simon Says game. In the REVERIE version, a player tries to follow physical instructions issued by two autonomous agents with different degrees of realism. If a player follows a physical instruction correctly, they are awarded one point. If not, they are deducted one point. This paper presents a technical overview of the game technologies integrated in the Simon Says demo and its evaluation by players with variable computer literacy skills. Finally the potential of REVERIE as an immersive framework for gaming is discussed, followed by recommendations for improvements in future versions of the framework.
Resumo:
El audio multicanal ha avanzado a pasos agigantados en los últimos años, y no solo en las técnicas de reproducción, sino que en las de capitación también. Por eso en este proyecto se encuentran ambas cosas: un array microfónico, EigenMike32 de MH Acoustics, y un sistema de reproducción con tecnología Wave Field Synthesis, instalado Iosono en la Jade Höchscule Oldenburg. Para enlazar estos dos puntos de la cadena de audio se proponen dos tipos distintos de codificación: la reproducción de la toma horizontal del EigenMike32; y el 3er orden de Ambisonics (High Order Ambisonics, HOA), una técnica de codificación basada en Armónicos Esféricos mediante la cual se simula el campo acústico en vez de simular las distintas fuentes. Ambas se desarrollaron en el entorno Matlab y apoyadas por la colección de scripts de Isophonics llamada Spatial Audio Matlab Toolbox. Para probar éstas se llevaron a cabo una serie de test en los que se las comparó con las grabaciones realizadas a la vez con un Dummy Head, a la que se supone el método más aproximado a nuestro modo de escucha. Estas pruebas incluían otras grabaciones hechas con un Doble MS de Schoeps que se explican en el proyecto “Sally”. La forma de realizar éstas fue, una batería de 4 audios repetida 4 veces para cada una de las situaciones garbadas (una conversación, una clase, una calle y un comedor universitario). Los resultados fueron inesperados, ya que la codificación del tercer orden de HOA quedo por debajo de la valoración Buena, posiblemente debido a la introducción de material hecho para un array tridimensional dentro de uno de 2 dimensiones. Por el otro lado, la codificación que consistía en extraer los micrófonos del plano horizontal se mantuvo en el nivel de Buena en todas las situaciones. Se concluye que HOA debe seguir siendo probado con mayores conocimientos sobre Armónicos Esféricos; mientras que el otro codificador, mucho más sencillo, puede ser usado para situaciones sin mucha complejidad en cuanto a espacialidad. In the last years the multichannel audio has increased in leaps and bounds and not only in the playback techniques, but also in the recording ones. That is the reason of both things being in this project: a microphone array, EigenMike32 from MH Acoustics; and a playback system with Wave Field Synthesis technology, installed by Iosono in Jade Höchscule Oldenburg. To link these two points of the audio chain, 2 different kinds of codification are proposed: the reproduction of the EigenMike32´s horizontal take, and the Ambisonics´ third order (High Order Ambisonics, HOA), a codification technique based in Spherical Harmonics through which the acoustic field is simulated instead of the different sound sources. Both have been developed inside Matlab´s environment and supported by the Isophonics´ scripts collection called Spatial Audio Matlab Toolbox. To test these, a serial of tests were made in which they were compared with recordings made at the time by a Dummy Head, which is supposed to be the closest method to our hearing way. These tests included other recording and codifications made by a Double MS (DMS) from Schoeps which are explained in the project named “3D audio rendering through Ambisonics techniques: from multi-microphone recordings (DMS Schoeps) to a WFS system, through Matlab”. The way to perform the tests was, a collection made of 4 audios repeated 4 times for each recorded situation (a chat, a class, a street and college canteen or Mensa). The results were unexpected, because the HOA´s third order stood under the Well valuation, possibly caused by introducing material made for a tridimensional array inside one made only by 2 dimensions. On the other hand, the codification that consisted of extracting the horizontal plane microphones kept the Well valuation in all the situations. It is concluded that HOA should keep being tested with larger knowledge about Spherical Harmonics; while the other coder, quite simpler, can be used for situations without a lot of complexity with regards to spatiality.
Resumo:
Several groups all over the world are researching in several ways to render 3D sounds. One way to achieve this is to use Head Related Transfer Functions (HRTFs). These measurements contain the Frequency Response of the human head and torso for each angle. Some years ago, was only possible to measure these Frequency Responses only in the horizontal plane. Nowadays, several improvements have made possible to measure and use 3D data for this purpose. The problem was that the groups didn't have a standard format file to store the data. That was a problem when a third part wanted to use some different HRTFs for 3D audio rendering. Every of them have different ways to store the data. The Spatially Oriented Format for Acoustics or SOFA was created to provide a solution to this problem. It is a format definition to unify all the previous different ways of storing any kind of acoustics data. At the moment of this project they have defined some basis for the format and some recommendations to store HRTFs. It is actually under development, so several changes could come. The SOFA[1] file format uses a numeric container called netCDF[2], specifically the Enhaced data model described in netCDF 4 that is based on HDF5[3]. The SoundScape Renderer (SSR) is a tool for real-time spatial audio reproduction providing a variety of rendering algorithms. The SSR was developed at the Quality and Usability Lab at TU Berlin and is now further developed at the Institut für Nachrichtentechnik at Universität Rostock [4]. This project is intended to be an introduction to the use of SOFA files, providing a C++ API to manipulate them and adapt the binaural renderer of the SSR for working with the SOFA format. RESUMEN. El SSR (SoundScape Renderer) es un programa que está siendo desarrollado actualmente por la Universität Rostock, y previamente por la Technische Universität Berlin. El SSR es una herramienta diseñada para la reproducción y renderización de audio 2D en tiempo real. Para ello utiliza diversos algoritmos, algunos orientados a sistemas formados por arrays de altavoces en diferentes configuraciones y otros algoritmos diseñados para cascos. El principal objetivo de este proyecto es dotar al SSR de la capacidad de renderizar sonidos binaurales en 3D. Este proyecto está centrado en el binaural renderer del SSR. Este algoritmo se basa en el uso de HRTFs (Head Related Transfer Function). Las HRTFs representan la función de transferencia del sistema formado por la cabeza y el torso del oyente. Esta función es medida desde diferentes ángulos. Con estos datos el binaural renderer puede generar audio en tiempo real simulando la posición de diferentes fuentes. Para poder incluir una base de datos con HRTFs en 3D se ha hecho uso del nuevo formato SOFA (Spatially Oriented Format for Acoustics). Este nuevo formato se encuentra en una fase bastante temprana de su desarrollo. Está pensado para servir como formato estándar para almacenar HRTFs y cualquier otro tipo de medidas acústicas, ya que actualmente cada laboratorio cuenta con su propio formato de almacenamiento y esto hace bastante difícil usar varias bases de datos diferentes en un mismo proyecto. El formato SOFA hace uso del contenedor numérico netCDF, que a su vez esta basado en un contenedor más básico llamado HRTF-5. Para poder incluir el formato SOFA en el binaural renderer del SSR se ha desarrollado una API en C++ para poder crear y leer archivos SOFA con el fin de utilizar los datos contenidos en ellos dentro del SSR.
Resumo:
Listeners experience electroacoustic music as full of significance and meaning, and they experience spatiality as one of the factors contributing to its meaningfulness. If we want to understand spatiality in electroacoustic music, we must understand how the listener’s mental processes give rise to the experience of meaning. In electroacoustic music as in everyday life, these mental processes unite the peripheral auditory system with human spatial cognition. In the discussion that follows we consider a range of the listener’s mental processes relating space and meaning from the perceptual attributes of spatial imagery to the spatial reference frames for places and navigation. When considering multichannel loudspeaker systems in particular, an important part of the discussion is focused on the distinctive and idiomatic ways in which this particular mode of sound production contributes to and situates meaning. These idiosyncrasies include the phenomenon of image dispersion, the important consequences of the precedence effect and the influence of source characteristics on spatial imagery. These are discussed in close relation to the practicalities of artistic practice and to the potential for artistic meaning experienced by the listener.
Resumo:
Automated digital recordings are useful for large-scale temporal and spatial environmental monitoring. An important research effort has been the automated classification of calling bird species. In this paper we examine a related task, retrieval of birdcalls from a database of audio recordings, similar to a user supplied query call. Such a retrieval task can sometimes be more useful than an automated classifier. We compare three approaches to similarity-based birdcall retrieval using spectral ridge features and two kinds of gradient features, structure tensor and the histogram of oriented gradients. The retrieval accuracy of our spectral ridge method is 94% compared to 82% for the structure tensor method and 90% for the histogram of gradients method. Additionally, this approach potentially offers a more compact representation and is more computationally efficient.
Resumo:
The interest in low bit rate video coding has increased considerably. Despite rapid progress in storage density and digital communication system performance, demand for data-transmission bandwidth and storage capacity continue to exceed the capabilities of available technologies. The growth of data-intensive digital audio, video applications and the increased use of bandwidth-limited media such as video conferencing and full motion video have not only sustained the need for efficient ways to encode analog signals, but made signal compression central to digital communication and data-storage technology. In this paper we explore techniques for compression of image sequences in a manner that optimizes the results for the human receiver. We propose a new motion estimator using two novel block match algorithms which are based on human perception. Simulations with image sequences have shown an improved bit rate while maintaining ''image quality'' when compared to conventional motion estimation techniques using the MAD block match criteria.
Resumo:
This thesis explores the possibilities of spatial hearing in relation to sound perception, and presents three acousmatic compositions based on a musical aesthetic that emphasizes this relation in musical discourse. The first important characteristic of these compositions is the exclusive use of sine waves and other time invariant sound signals. Even though these types of sound signals present no variations in time, it is possible to perceive pitch, loudness, and tone color variations as soon as they move in space due to acoustic processes involved in spatial hearing. To emphasize the perception of such variations, this thesis proposes to divide a tone in multiple sound units and spread them in space using several loudspeakers arranged around the listener. In addition to the perception of sound attribute variations, it is also possible to create rhythm and texture variations that depend on how sound units are arranged in space. This strategy permits to overcome the so called "sound surrogacy" implicit in acousmatic music, as it is possible to establish cause-effect relations between sound movement and the perception of sound attribute, rhythm, and texture variations. Another important consequence of using sound fragmentation together with sound spatialization is the possibility to produce diffuse sound fields independently from the levels of reverberation of the room, and to create sound spaces with a certain spatial depth without using any kind of artificial sound delay or reverberation.
Resumo:
Primate multisensory object perception involves distributed brain regions. To investigate the network character of these regions of the human brain, we applied data-driven group spatial independent component analysis (ICA) to a functional magnetic resonance imaging (fMRI) data set acquired during a passive audio-visual (AV) experiment with common object stimuli. We labeled three group-level independent component (IC) maps as auditory (A), visual (V), and AV, based on their spatial layouts and activation time courses. The overlap between these IC maps served as definition of a distributed network of multisensory candidate regions including superior temporal, ventral occipito-temporal, posterior parietal and prefrontal regions. During an independent second fMRI experiment, we explicitly tested their involvement in AV integration. Activations in nine out of these twelve regions met the max-criterion (A < AV > V) for multisensory integration. Comparison of this approach with a general linear model-based region-of-interest definition revealed its complementary value for multisensory neuroimaging. In conclusion, we estimated functional networks of uni- and multisensory functional connectivity from one dataset and validated their functional roles in an independent dataset. These findings demonstrate the particular value of ICA for multisensory neuroimaging research and using independent datasets to test hypotheses generated from a data-driven analysis.
Resumo:
INTRODUCTION The Rondo is a single-unit cochlear implant (CI) audio processor comprising the identical components as its behind-the-ear predecessor, the Opus 2. An interchange of the Opus 2 with the Rondo leads to a shift of the microphone position toward the back of the head. This study aimed to investigate the influence of the Rondo wearing position on speech intelligibility in noise. METHODS Speech intelligibility in noise was measured in 4 spatial configurations with 12 experienced CI users using the German adaptive Oldenburg sentence test. A physical model and a numerical model were used to enable a comparison of the observations. RESULTS No statistically significant differences of the speech intelligibility were found in the situations in which the signal came from the front and the noise came from the frontal, ipsilateral, or contralateral side. The signal-to-noise ratio (SNR) was significantly better with the Opus 2 in the case with the noise presented from the back (4.4 dB, p < 0.001). The differences in the SNR were significantly worse with the Rondo processors placed further behind the ear than closer to the ear. CONCLUSION The study indicates that CI users with the receiver/stimulator implanted in positions further behind the ear are expected to have higher difficulties in noisy situations when wearing the single-unit audio processor.
Resumo:
The aim of the present study was to develop a pictorial presence scale using selfassessment- manikins (SAM). The instrument assesses presence sub-dimensions (selflocation and possible actions) as well as presence determinants (attention allocation, spatial situation model, higher cognitive involvement, and suspension of disbelief). To qualitatively validate the scale, think-aloud protocols and interviews (n = 12) were conducted. The results reveal that the SAM items are quickly filled out as well as easily, intuitively, and unambiguously understood. Furthermore, the instrument’s validity and sensitivity was quantitatively examined in a two-factorial design (n = 317). Factors were medium (written story, audio book, video, and computer game) and distraction (non-distraction vs. distraction). Factor analyses reveal that the SAM presence dimensions and determinants closely correspond to those of the MEC Spatial Presence Questionnaire, which was used as a comparison measure. The findings of the qualitative and quantitative validation procedures show that the Pictorial Presence SAM successfully assesses spatial presence. In contrast to the verbal questionnaire data (MEC), the significant distraction effect suggests that the new scale is even more sensitive. This points out that the scale can be a useful alternative to existing verbal presence selfreport measures.
Resumo:
An implementation of a real-time 3D videoconferencing system using the currently available technology is presented. This appr oach is based on the side by side spatial compression of the stereoscopic images . The encoder and the decoder have b een implemented in a standard personal computer and a conventional 3D comp atible TV has been used to present the frames. Moreover, the users without 3D technology can use the system because 2D compatibility mode has been implemented in the decoder. The performance res ults show that a conventional computer can be used for encod ing/decoding audio and video streams and the delay in the transmission is lower than 200 ms.
Resumo:
A deficiência auditiva afecta milhões de pessoas em todo o mundo, originando vários problemas, nomeadamente a nível psicossocial, que comprometem a qualidade de vida do indivíduo. A deficiência auditiva influencia o comportamento, particularmente ao dificultar a comunicação. Com o avanço tecnológico, os produtos de apoio, em particular os aparelhos auditivos e o implante coclear, melhoram essa qualidade de vida, através da melhoria da comunicação. Com as escalas de avaliação determinamos o modo como a deficiência auditiva influencia a vida diária, com ou sem amplificação, e de que forma afecta o desempenho psicossocial, emocional ou profissional do indivíduo, sendo esta informação importante para determinar a necessidade e o sucesso de amplificação, independentemente do tipo e grau da deficiência auditiva. O objectivo do presente estudo foi a tradução e adaptação para a cultura portuguesa da escala The Speech, Spatial and Qualities of Hearing Scale (SSQ), desenvolvida por Stuart Gatehouse e William Noble em 2004. Este trabalho foi realizado nos centros auditivos da Widex Portugal. Após os procedimentos de tradução e retroversão, a versão portuguesa foi testada em 12 indivíduos, com idades compreendidas entre os 36 anos e os 80 anos, dos quais 6 utilizavam prótese auditiva há mais de um ano, um utilizava prótese há menos de um ano e 5 nunca tinham utilizado. Com a tradução e adaptação cultural para o Português Europeu do “Questionário sobre as Qualidades Espaciais do Discurso – SSQ”, contribuímos para uma melhor avaliação dos indivíduos que estejam, ou venham a estar, a cumprir programas de reabilitação auditiva.