870 resultados para swd: 3D-Audio


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Future generations of mobile communication devices will serve more and more as multimedia platforms capable of reproducing high quality audio. In order to achieve a 3-D sound perception the reproduction quality of audio via headphones can be significantly increased by applying binaural technology. To be independent of individual head-related transfer functions (HRTFs) and to guarantee a good performance for all listeners, an adaptation of the synthesized sound field to the listener's head movements is required. In this article several methods of head-tracking for mobile communication devices are presented and compared. A system for testing the identified methods is set up and experiments are performed to evaluate the prosand cons of each method. The implementation of such a device in a 3-D audio system is described and applications making use of such a system are identified and discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces a database of freely available stereo-3D content designed to facilitate research in stereo post-production. It describes the structure and content of the database and provides some details about how the material was gathered. The database includes examples of many of the scenarios characteristic to broadcast footage. Material was gathered at different locations including a studio with controlled lighting and both indoor and outdoor on-location sites with more restricted lighting control. The database also includes video sequences with accompanying 3D audio data recorded in an Ambisonics format. An intended consequence of gathering the material is that the database contains examples of degradations that would be commonly present in real-world scenarios. This paper describes one such artefact caused by uneven exposure in the stereo views, causing saturation in the over-exposed view. An algorithm for the restoration of this artefact is proposed in order to highlight the usefuiness of the database.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

wo methods for registering laser-scans of human heads and transforming them to a new semantically consistent topology defined by a user-provided template mesh are described. Both algorithms are stated within the Iterative Closest Point framework. The first method is based on finding landmark correspondences by iteratively registering the vicinity of a landmark with a re-weighted error function. Thin-plate spline interpolation is then used to deform the template mesh and finally the scan is resampled in the topology of the deformed template. The second algorithm employs a morphable shape model, which can be computed from a database of laser-scans using the first algorithm. It directly optimizes pose and shape of the morphable model. The use of the algorithm with PCA mixture models, where the shape is split up into regions each described by an individual subspace, is addressed. Mixture models require either blending or regularization strategies, both of which are described in detail. For both algorithms, strategies for filling in missing geometry for incomplete laser-scans are described. While an interpolation-based approach can be used to fill in small or smooth regions, the model-driven algorithm is capable of fitting a plausible complete head mesh to arbitrarily small geometry, which is known as "shape completion". The importance of regularization in the case of extreme shape completion is shown.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

El audio multicanal ha avanzado a pasos agigantados en los últimos años, y no solo en las técnicas de reproducción, sino que en las de capitación también. Por eso en este proyecto se encuentran ambas cosas: un array microfónico, EigenMike32 de MH Acoustics, y un sistema de reproducción con tecnología Wave Field Synthesis, instalado Iosono en la Jade Höchscule Oldenburg. Para enlazar estos dos puntos de la cadena de audio se proponen dos tipos distintos de codificación: la reproducción de la toma horizontal del EigenMike32; y el 3er orden de Ambisonics (High Order Ambisonics, HOA), una técnica de codificación basada en Armónicos Esféricos mediante la cual se simula el campo acústico en vez de simular las distintas fuentes. Ambas se desarrollaron en el entorno Matlab y apoyadas por la colección de scripts de Isophonics llamada Spatial Audio Matlab Toolbox. Para probar éstas se llevaron a cabo una serie de test en los que se las comparó con las grabaciones realizadas a la vez con un Dummy Head, a la que se supone el método más aproximado a nuestro modo de escucha. Estas pruebas incluían otras grabaciones hechas con un Doble MS de Schoeps que se explican en el proyecto “Sally”. La forma de realizar éstas fue, una batería de 4 audios repetida 4 veces para cada una de las situaciones garbadas (una conversación, una clase, una calle y un comedor universitario). Los resultados fueron inesperados, ya que la codificación del tercer orden de HOA quedo por debajo de la valoración Buena, posiblemente debido a la introducción de material hecho para un array tridimensional dentro de uno de 2 dimensiones. Por el otro lado, la codificación que consistía en extraer los micrófonos del plano horizontal se mantuvo en el nivel de Buena en todas las situaciones. Se concluye que HOA debe seguir siendo probado con mayores conocimientos sobre Armónicos Esféricos; mientras que el otro codificador, mucho más sencillo, puede ser usado para situaciones sin mucha complejidad en cuanto a espacialidad. In the last years the multichannel audio has increased in leaps and bounds and not only in the playback techniques, but also in the recording ones. That is the reason of both things being in this project: a microphone array, EigenMike32 from MH Acoustics; and a playback system with Wave Field Synthesis technology, installed by Iosono in Jade Höchscule Oldenburg. To link these two points of the audio chain, 2 different kinds of codification are proposed: the reproduction of the EigenMike32´s horizontal take, and the Ambisonics´ third order (High Order Ambisonics, HOA), a codification technique based in Spherical Harmonics through which the acoustic field is simulated instead of the different sound sources. Both have been developed inside Matlab´s environment and supported by the Isophonics´ scripts collection called Spatial Audio Matlab Toolbox. To test these, a serial of tests were made in which they were compared with recordings made at the time by a Dummy Head, which is supposed to be the closest method to our hearing way. These tests included other recording and codifications made by a Double MS (DMS) from Schoeps which are explained in the project named “3D audio rendering through Ambisonics techniques: from multi-microphone recordings (DMS Schoeps) to a WFS system, through Matlab”. The way to perform the tests was, a collection made of 4 audios repeated 4 times for each recorded situation (a chat, a class, a street and college canteen or Mensa). The results were unexpected, because the HOA´s third order stood under the Well valuation, possibly caused by introducing material made for a tridimensional array inside one made only by 2 dimensions. On the other hand, the codification that consisted of extracting the horizontal plane microphones kept the Well valuation in all the situations. It is concluded that HOA should keep being tested with larger knowledge about Spherical Harmonics; while the other coder, quite simpler, can be used for situations without a lot of complexity with regards to spatiality.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Este proyecto de fin de carrera tiene como objetivo obtener una visión detallada de los sistemas y tecnologías de grabación y reproducción utilizadas para aplicaciones de audio 3D y entornos de realidad virtual, analizando las diferentes alternativas existentes, su funcionamiento, características, detalles técnicos y sus ámbitos de aplicación. Como punto de partida se estudiará la teoría psicoacústica y la localización de fuentes sonoras en el espacio, base para el estudio de los sistemas de audio 3D. Se estudiará tanto la espacialización sonora en un espacio real y la espacialización virtual (simulación mediante procesado de información de la localización de fuentes sonoras), en los que intervienen algunos fenómenos acústicos y psicoacústicos como ITD, o diferencia de tiempo que existe entre una señal acústica que llega a los pabellones auditivos, la ILD, o diferencia de intensidad o amplitud que hay entre la señal que llega a los pabellones auditivos y la localización espacial mediante otra serie de mecanismos biaurales. Tras una visión general de la teoría psicoacústica y la espacialización sonora, se analizarán con detalle los elementos de grabación y reproducción existentes para audio 3D. Concretamente, a lo largo del proyecto se profundizará en el funcionamiento del sistema estéreo, caracterizado por el posicionamiento sonoro mediante la utilización de dos canales; del sistema biaural, caracterizado por reconstruir campos sonoros mediante el uso de las HRTF; de los sistemas multicanal, detallando gran parte de las alternativas y configuraciones existentes; del sistema Ambiophonics, caracterizado por implementar filtros de cruce; del sistema Ambisonics, y sus diferentes formatos y técnicas de codificación y decodificación; y del sistema Wavefield Synthesis, caracterizado por recrear ambientes sonoros en grandes espacios. ABSTRACT This project aims to get a detailed view of recording and reproducing systems and technologies used to 3D audio applications and virtual reality environments, analyzing the different alternatives available, their functioning, features, technical details and their different scopes of applications. As a starting point, will be studied the psychoacoustic theory and the localization of sound sources in space, basis for the 3D audio study. Will be studied both the spacialization of sound sources in real space as virtual spatialization of sound sources (simulation by information processing of localization of sound sources), in which involves some acoustic and psychoacoustic phenomena like ITD (or the Interaural time difference), the ILD, (or the Interaural Level Difference) and spatial localization by another set of binaural mechanisms. After a general overview of the psychoacoustics theory and the sound spatialization, will be analyzed in detail existing methods of recording and reproducing for 3D audio. Specifically, during the project will analyze the characteristics of the stereo systems, characterized by sound positioning using two channels; the binaural systems, characterized by reconstructing sound fields by using the HRTF; the multichannel systems, detailing many of the existing alternatives and configurations; the Ambiophonics system, which is characterized by implementing crosstalk elimination techniques; the Ambiosonics system, and its various formats and encoding and decoding techniques; and the Wavefield Synthesis system, characterized by recreate soundscapes in large spaces.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Several groups all over the world are researching in several ways to render 3D sounds. One way to achieve this is to use Head Related Transfer Functions (HRTFs). These measurements contain the Frequency Response of the human head and torso for each angle. Some years ago, was only possible to measure these Frequency Responses only in the horizontal plane. Nowadays, several improvements have made possible to measure and use 3D data for this purpose. The problem was that the groups didn't have a standard format file to store the data. That was a problem when a third part wanted to use some different HRTFs for 3D audio rendering. Every of them have different ways to store the data. The Spatially Oriented Format for Acoustics or SOFA was created to provide a solution to this problem. It is a format definition to unify all the previous different ways of storing any kind of acoustics data. At the moment of this project they have defined some basis for the format and some recommendations to store HRTFs. It is actually under development, so several changes could come. The SOFA[1] file format uses a numeric container called netCDF[2], specifically the Enhaced data model described in netCDF 4 that is based on HDF5[3]. The SoundScape Renderer (SSR) is a tool for real-time spatial audio reproduction providing a variety of rendering algorithms. The SSR was developed at the Quality and Usability Lab at TU Berlin and is now further developed at the Institut für Nachrichtentechnik at Universität Rostock [4]. This project is intended to be an introduction to the use of SOFA files, providing a C++ API to manipulate them and adapt the binaural renderer of the SSR for working with the SOFA format. RESUMEN. El SSR (SoundScape Renderer) es un programa que está siendo desarrollado actualmente por la Universität Rostock, y previamente por la Technische Universität Berlin. El SSR es una herramienta diseñada para la reproducción y renderización de audio 2D en tiempo real. Para ello utiliza diversos algoritmos, algunos orientados a sistemas formados por arrays de altavoces en diferentes configuraciones y otros algoritmos diseñados para cascos. El principal objetivo de este proyecto es dotar al SSR de la capacidad de renderizar sonidos binaurales en 3D. Este proyecto está centrado en el binaural renderer del SSR. Este algoritmo se basa en el uso de HRTFs (Head Related Transfer Function). Las HRTFs representan la función de transferencia del sistema formado por la cabeza y el torso del oyente. Esta función es medida desde diferentes ángulos. Con estos datos el binaural renderer puede generar audio en tiempo real simulando la posición de diferentes fuentes. Para poder incluir una base de datos con HRTFs en 3D se ha hecho uso del nuevo formato SOFA (Spatially Oriented Format for Acoustics). Este nuevo formato se encuentra en una fase bastante temprana de su desarrollo. Está pensado para servir como formato estándar para almacenar HRTFs y cualquier otro tipo de medidas acústicas, ya que actualmente cada laboratorio cuenta con su propio formato de almacenamiento y esto hace bastante difícil usar varias bases de datos diferentes en un mismo proyecto. El formato SOFA hace uso del contenedor numérico netCDF, que a su vez esta basado en un contenedor más básico llamado HRTF-5. Para poder incluir el formato SOFA en el binaural renderer del SSR se ha desarrollado una API en C++ para poder crear y leer archivos SOFA con el fin de utilizar los datos contenidos en ellos dentro del SSR.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Mobile and wearable computers present input/output prob-lems due to limited screen space and interaction techniques. When mobile, users typically focus their visual attention on navigating their environment - making visually demanding interface designs hard to operate. This paper presents two multimodal interaction techniques designed to overcome these problems and allow truly mobile, 'eyes-free' device use. The first is a 3D audio radial pie menu that uses head gestures for selecting items. An evaluation of a range of different audio designs showed that egocentric sounds re-duced task completion time, perceived annoyance, and al-lowed users to walk closer to their preferred walking speed. The second is a sonically enhanced 2D gesture recognition system for use on a belt-mounted PDA. An evaluation of the system with and without audio feedback showed users' ges-tures were more accurate when dynamically guided by au-dio-feedback. These novel interaction techniques demon-strate effective alternatives to visual-centric interface de-signs on mobile devices.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Universidade Estadual de Campinas . Faculdade de Educação Física

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Monográfico con el título: 'Adaptación y accesibilidad de las tecnologías para el aprendizaje'. Resumen basado en el de la publicación

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We consider the problem of approximating the 3D scan of a real object through an affine combination of examples. Common approaches depend either on the explicit estimation of point-to-point correspondences or on 2-dimensional projections of the target mesh; both present drawbacks. We follow an approach similar to [IF03] by representing the target via an implicit function, whose values at the vertices of the approximation are used to define a robust cost function. The problem is approached in two steps, by approximating first a coarse implicit representation of the whole target, and then finer, local ones; the local approximations are then merged together with a Poisson-based method. We report the results of applying our method on a subset of 3D scans from the Face Recognition Grand Challenge v.1.0.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes a new compression algorithm for dynamic 3d meshes. In such a sequence of meshes, neighboring vertices have a strong tendency to behave similarly and the degree of dependencies between their locations in two successive frames is very large which can be efficiently exploited using a combination of Predictive and DCT coders (PDCT). Our strategy gathers mesh vertices of similar motions into clusters, establish a local coordinate frame (LCF) for each cluster and encodes frame by frame and each cluster separately. The vertices of each cluster have small variation over a time relative to the LCF. Therefore, the location of each new vertex is well predicted from its location in the previous frame relative to the LCF of its cluster. The difference between the original and the predicted local coordinates are then transformed into frequency domain using DCT. The resulting DCT coefficients are quantized and compressed with entropy coding. The original sequence of meshes can be reconstructed from only a few non-zero DCT coefficients without significant loss in visual quality. Experimental results show that our strategy outperforms or comes close to other coders.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Tracking user’s visual attention is a fundamental aspect in novel human-computer interaction paradigms found in Virtual Reality. For example, multimodal interfaces or dialogue-based communications with virtual and real agents greatly benefit from the analysis of the user’s visual attention as a vital source for deictic references or turn-taking signals. Current approaches to determine visual attention rely primarily on monocular eye trackers. Hence they are restricted to the interpretation of two-dimensional fixations relative to a defined area of projection. The study presented in this article compares precision, accuracy and application performance of two binocular eye tracking devices. Two algorithms are compared which derive depth information as required for visual attention-based 3D interfaces. This information is further applied to an improved VR selection task in which a binocular eye tracker and an adaptive neural network algorithm is used during the disambiguation of partly occluded objects.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents an empirical study of affine invariant feature detectors to perform matching on video sequences of people with non-rigid surface deformation. Recent advances in feature detection and wide baseline matching have focused on static scenes. Video frames of human movement capture highly non-rigid deformation such as loose hair, cloth creases, skin stretching and free flowing clothing. This study evaluates the performance of six widely used feature detectors for sparse temporal correspondence on single view and multiple view video sequences. Quantitative evaluation is performed of both the number of features detected and their temporal matching against and without ground truth correspondence. Recall-accuracy analysis of feature matching is reported for temporal correspondence on single view and multiple view sequences of people with variation in clothing and movement. This analysis identifies that existing feature detection and matching algorithms are unreliable for fast movement with common clothing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article describes a series of experiments which were carried out to measure the sense of presence in auditory virtual environments. Within the study a comparison of self-created signals to signals created by the surrounding environment is drawn. Furthermore, it is investigated if the room characteristics of the simulated environment have consequences on the perception of presence during vocalization or when listening to speech. Finally the experiments give information about the influence of background signals on the sense of presence. In the experiments subjects rated the degree of perceived presence in an auditory virtual environment on a perceptual scale. It is described which parameters have the most influence on the perception of presence and which ones are of minor influence. The results show that on the one hand an external speaker has more influence on the sense of presence than an adequate presentation of one’s own voice. On the other hand both room reflections and adequately presented background signals significantly increase the perceived presence in the virtual environment.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Visual fixation is employed by humans and some animals to keep a specific 3D location at the center of the visual gaze. Inspired by this phenomenon in nature, this paper explores the idea to transfer this mechanism to the context of video stabilization for a handheld video camera. A novel approach is presented that stabilizes a video by fixating on automatically extracted 3D target points. This approach is different from existing automatic solutions that stabilize the video by smoothing. To determine the 3D target points, the recorded scene is analyzed with a stateof- the-art structure-from-motion algorithm, which estimates camera motion and reconstructs a 3D point cloud of the static scene objects. Special algorithms are presented that search either virtual or real 3D target points, which back-project close to the center of the image for as long a period of time as possible. The stabilization algorithm then transforms the original images of the sequence so that these 3D target points are kept exactly in the center of the image, which, in case of real 3D target points, produces a perfectly stable result at the image center. Furthermore, different methods of additional user interaction are investigated. It is shown that the stabilization process can easily be controlled and that it can be combined with state-of-theart tracking techniques in order to obtain a powerful image stabilization tool. The approach is evaluated on a variety of videos taken with a hand-held camera in natural scenes.