960 resultados para audiovisual speech perception


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Speech coding might have an impact on music perception of cochlear implant users. This questionnaire study compares the musical activities and perception of postlingually deafened cochlear implant users with three different coding strategies (CIS, ACE, SPEAK) using the Munich Music Questionnaire. Overall, the self-reported perception of music of CIS, SPEAK, and ACE users did not differ by very much.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Music plays an important role in the daily life of cochlear implant (CI) users, but electrical hearing and speech processing pose challenges for enjoying music. Studies of unilateral CI (UCI) users' music perception have found that these subjects have little difficulty recognizing tempo and rhythm but great difficulty with pitch, interval and melody. The present study is an initial step towards understanding music perception in bilateral CI (BCI) users. The Munich Music Questionnaire was used to investigate music listening habits and enjoyment in 23 BCI users compared to 2 control groups: 23 UCI users and 23 normal-hearing (NH) listeners. Bilateral users appeared to have a number of advantages over unilateral users, though their enjoyment of music did not reach the level of NH listeners.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The introduction of open-plan offices in the 1960s with the intent of making the workplace more flexible, efficient, and team-oriented resulted in a higher noise floor level, which not only made concentrated work more difficult, but also caused physiological problems, such as increased stress, in addition to a loss of speech privacy. Irrelevant background human speech, in particular, has proven to be a major factor in disrupting concentration and lowering performance. Therefore, reducing the intelligibility of speech and has been a goal of increasing importance in recent years. One method employed to do so is the use of masking noises, which consists in emitting a continuous noise signal over a loudspeaker system that conceals the perturbing speech. Studies have shown that while effective, the maskers employed to date – normally filtered pink noise – are generally poorly accepted by users. The collaborative "Private Workspace" project, within the scope of which this thesis was carried out, attempts to develop a coupled, adaptive noise masking system along with a physical structure to be used for open-plan offices so as to combat these issues. There is evidence to suggest that nature sounds might be more accepted as masker, in part because they can have a visual object that acts as the source for the sound. Direct audio recordings are not recommended for various reasons, and thus the nature sounds must be synthesized. This work done consists of the synthesis of a sound texture to be used as a masker as well as its evaluation. The sound texture is composed of two parts: a wind-like noise synthesized with subtractive synthesis, and a leaf-like noise synthesized through granular synthesis. Different combinations of these two noises produced five variations of the masker, which were evaluated at different levels along with white noise and pink noise using a modified version of an Oldenburger Satztest to test for an affect on speech intelligibility and a questionnaire to asses its subjective acceptance. The goal was to find which of the synthesized noises works best as a speech masker. This thesis first uses a theoretical introduction to establish the basics of sound perception, psychoacoustic masking, and sound texture synthesis. The design of each of the noises, as well as their respective implementations in MATLAB, is explained, followed by the procedures used to evaluate the maskers. The results obtained in the evaluation are analyzed. Lastly, conclusions are drawn and future work is and modifications to the masker are proposed. RESUMEN. La introducción de las oficinas abiertas en los años 60 tenía como objeto flexibilizar el ambiente laboral, hacerlo más eficiente y que estuviera más orientado al trabajo en equipo. Como consecuencia, subió el nivel de ruido de fondo, que no sólo dificulta la concentración, sino que causa problemas fisiológicos, como el aumento del estrés, además de reducir la privacidad. Hay estudios que prueban que las conversaciones de fondo en particular tienen un efecto negativo en el nivel de concentración y disminuyen el rendimiento de los trabajadores. Por lo tanto, reducir la inteligibilidad del habla es uno de los principales objetivos en la actualidad. Un método empleado para hacerlo ha sido el uso de ruido enmascarante, que consiste en reproducir señales continuas de ruido a través de un sistema de altavoces que enmascare el habla. Aunque diversos estudios demuestran que es un método eficaz, los ruidos utilizados hasta la fecha (normalmente ruido rosa filtrado), no son muy bien aceptados por los usuarios. El proyecto colaborativo "Private Workspace", dentro del cual se engloba el trabajo realizado en este Proyecto Fin de Grado, tiene por objeto desarrollar un sistema de ruido enmascarador acoplado y adaptativo, además de una estructura física, para su uso en oficinas abiertas con el fin de combatir los problemas descritos anteriormente. Existen indicios de que los sonidos naturales son mejor aceptados, en parte porque pueden tener una estructura física que simule ser la fuente de los mismos. La utilización de grabaciones directas de estos sonidos no está recomendada por varios motivos, y por lo tanto los sonidos naturales deben ser sintetizados. El presente trabajo consiste en la síntesis de una textura de sonido (en inglés sound texture) para ser usada como ruido enmascarador, además de su evaluación. La textura está compuesta de dos partes: un sonido de viento sintetizado mediante síntesis sustractiva y un sonido de hojas sintetizado mediante síntesis granular. Diferentes combinaciones de estos dos sonidos producen cinco variaciones de ruido enmascarador. Estos cinco ruidos han sido evaluados a diferentes niveles, junto con ruido blanco y ruido rosa, mediante una versión modificada de un Oldenburger Satztest para comprobar cómo afectan a la inteligibilidad del habla, y mediante un cuestionario para una evaluación subjetiva de su aceptación. El objetivo era encontrar qué ruido de los que se han sintetizado funciona mejor como enmascarador del habla. El proyecto consiste en una introducción teórica que establece las bases de la percepción del sonido, el enmascaramiento psicoacústico, y la síntesis de texturas de sonido. Se explica a continuación el diseño de cada uno de los ruidos, así como su implementación en MATLAB. Posteriormente se detallan los procedimientos empleados para evaluarlos. Los resultados obtenidos se analizan y se extraen conclusiones. Por último, se propone un posible trabajo futuro y mejoras al ruido sintetizado.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Esta investigación se centra en el estudio de la dimensión audiovisual de la arquitectura, como aproximación intersensorial a la aprehensión e ideación del espacio. Poniendo en evidencia la complejidad de la relación hombre-medio, se plantea la necesidad de desarrollar nuevas metodologías y herramientas que tengan en cuenta dicha complejidad y que favorezcan el desarrollo del proyecto. Nos mueve en esta investigación la convicción de que los cambios rápidos y profundos que caracterizan nuestros tiempos en todos los ámbitos, social, económico, político… entrañan inevita-blemente nuevos modos de conocimiento y experimentación del espacio, y por tanto nuevos ejes de investigación. La creciente valoración, en todos los campos del conocimiento, de los aspectos subjetivos y sensoriales, el desarrollo de las tecnologías que ha cambiado completamente nuestras relaciones interpersonales y con el entorno, las nuevas capacidades de análisis, grabación y conservación y manipulación de datos y por ultimo, aunque no menos importante, la puesta a disposición democrá¬tica y global de todo el saber a través de Internet, imponen otra aproximación al hacer, concebir y vivir la arquitectura. Esta investigación se centra en un análisis crítico del estado de la cuestión, construyendo nue¬vas redes de relación entre disciplinas, que permitan plantear la dimensión audiovisual como un nuevo eje de investigación dentro de la arquitectura, poniendo en evidencia la necesidad de desa¬rrollar análisis de forma trasversal e interdisciplinar. Hemos prestado particular atención a la evolución de lo sonoro y su aproximación cualitativa a la arquitectura, mostrando como el sonido, con su capacidad de introducir el tiempo y los aspectos dinámicos (el movimiento, la presencia del cuerpo…), no es simplemente otro canal sensorial en la aprehensión del espacio, ya que su interacción con lo visual genera un espacio-tiempo indisociable, propio, característico de cada momento y lugar. A partir de este planteamiento se ha hecho una revisión metodológica dirigida a utilizar el reco¬rrido como herramienta de análisis, que permita estudiar la relación entre el espacio, la acción y la percepción audio-visual, cruzando para ello los datos correspondientes a la morfología del espacio, con los datos de la experiencia perceptiva individual y con los de los usos colectivos del espacio, utilizándose finalmente el video como un herramienta, no sólo de representación de lo real, sino también como instrumento de análisis, que permite tomar datos (grabaciones audio, video, obser¬vaciones…), aislarlos, estudiarlos, clasificarlos, ordenarlos, y finalmente, restituirlos mediante el montaje. Se ha realizado una primera experimentación “in situ” que ha servido para explorar la aplicación del método, planteando nuevas preguntas y abriendo líneas de análisis para ulteriores investigacio¬nes. ABSTRACT This research is focused on the study of the audiovisual dimension of architecture, as an in¬tersensorial approach to space apprehension and design. It is posed the necessity to develop new methodologies and tools that keep this complexity, as a contribution to the development of a project, by means of putting into evidence the sophistication of the relationship between man and media The research moves us to the conviction that the quick and relevant changes that confer a distinc-tion to these contemporary times all over the social, economic and political environments, involve, unavoidably, new ways of knowledge and experimentation on space, and therefore, new trends of research. The growing valuation of subjective and sensorial aspects all over the fields of the knowledge and the development of the technologies that have changed completely our interpersonal and environmental relationships, the new tools for analysis, recording, conservation and manipulation of data and, last but not least, the setting to democratic and global availability of the whole knowledge through Inter¬net, impose another approach to the making, conception and experience of architecture. This research deals with a critical analysis of the state–of- the-art of the matter, modelling new webs of relationship among disciplines that allow to outline the audiovisual dimension as a new focus of research on architecture, putting evidence into practice as it is necessary to develop any analysis in a transversal and interdisciplinary way. It is paid a special attention to the evolution of sound objects and their qualitative approach to ar¬chitecture, showing how sound, with its capacity to transmit time and dynamic aspects of things (movement, the presence of the body), it is not simply another sensorial channel in the apprehension of space, since its interaction with the visual thing generates an undetachable association of space and time, an specific one of every moment and place. Starting from this position a methodological revision has been made leading to use a walk as a tool for analysis that allows to study the relationship among the space, the action and the audio-visual perception, by means of crossing data corresponding to the morphology of space, with the data of a perceptive experience from the perspective of an individual observer and with those of the collective uses of the space, as video has been finally used as a tool, not only as a representation of the real thing, but also as a tool for analysis that allows to take isolated data (audio recordings, video, obser¬vations), to be studied, classified, and put into their appropriate place, and finally, to restore them by means of a multimedia set up. A first experimentation in situ has been carried out, being useful to explore a method of appli¬cation, outlining new questions and beginning with new ways of analysis for further research.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

La medida de calidad de vídeo sigue siendo necesaria para definir los criterios que caracterizan una señal que cumpla los requisitos de visionado impuestos por el usuario. Las nuevas tecnologías, como el vídeo 3D estereoscópico o formatos más allá de la alta definición, imponen nuevos criterios que deben ser analizadas para obtener la mayor satisfacción posible del usuario. Entre los problemas detectados durante el desarrollo de esta tesis doctoral se han determinado fenómenos que afectan a distintas fases de la cadena de producción audiovisual y tipo de contenido variado. En primer lugar, el proceso de generación de contenidos debe encontrarse controlado mediante parámetros que eviten que se produzca el disconfort visual y, consecuentemente, fatiga visual, especialmente en lo relativo a contenidos de 3D estereoscópico, tanto de animación como de acción real. Por otro lado, la medida de calidad relativa a la fase de compresión de vídeo emplea métricas que en ocasiones no se encuentran adaptadas a la percepción del usuario. El empleo de modelos psicovisuales y diagramas de atención visual permitirían ponderar las áreas de la imagen de manera que se preste mayor importancia a los píxeles que el usuario enfocará con mayor probabilidad. Estos dos bloques se relacionan a través de la definición del término saliencia. Saliencia es la capacidad del sistema visual para caracterizar una imagen visualizada ponderando las áreas que más atractivas resultan al ojo humano. La saliencia en generación de contenidos estereoscópicos se refiere principalmente a la profundidad simulada mediante la ilusión óptica, medida en términos de distancia del objeto virtual al ojo humano. Sin embargo, en vídeo bidimensional, la saliencia no se basa en la profundidad, sino en otros elementos adicionales, como el movimiento, el nivel de detalle, la posición de los píxeles o la aparición de caras, que serán los factores básicos que compondrán el modelo de atención visual desarrollado. Con el objetivo de detectar las características de una secuencia de vídeo estereoscópico que, con mayor probabilidad, pueden generar disconfort visual, se consultó la extensa literatura relativa a este tema y se realizaron unas pruebas subjetivas preliminares con usuarios. De esta forma, se llegó a la conclusión de que se producía disconfort en los casos en que se producía un cambio abrupto en la distribución de profundidades simuladas de la imagen, aparte de otras degradaciones como la denominada “violación de ventana”. A través de nuevas pruebas subjetivas centradas en analizar estos efectos con diferentes distribuciones de profundidades, se trataron de concretar los parámetros que definían esta imagen. Los resultados de las pruebas demuestran que los cambios abruptos en imágenes se producen en entornos con movimientos y disparidades negativas elevadas que producen interferencias en los procesos de acomodación y vergencia del ojo humano, así como una necesidad en el aumento de los tiempos de enfoque del cristalino. En la mejora de las métricas de calidad a través de modelos que se adaptan al sistema visual humano, se realizaron también pruebas subjetivas que ayudaron a determinar la importancia de cada uno de los factores a la hora de enmascarar una determinada degradación. Los resultados demuestran una ligera mejora en los resultados obtenidos al aplicar máscaras de ponderación y atención visual, los cuales aproximan los parámetros de calidad objetiva a la respuesta del ojo humano. ABSTRACT Video quality assessment is still a necessary tool for defining the criteria to characterize a signal with the viewing requirements imposed by the final user. New technologies, such as 3D stereoscopic video and formats of HD and beyond HD oblige to develop new analysis of video features for obtaining the highest user’s satisfaction. Among the problems detected during the process of this doctoral thesis, it has been determined that some phenomena affect to different phases in the audiovisual production chain, apart from the type of content. On first instance, the generation of contents process should be enough controlled through parameters that avoid the occurrence of visual discomfort in observer’s eye, and consequently, visual fatigue. It is especially necessary controlling sequences of stereoscopic 3D, with both animation and live-action contents. On the other hand, video quality assessment, related to compression processes, should be improved because some objective metrics are adapted to user’s perception. The use of psychovisual models and visual attention diagrams allow the weighting of image regions of interest, giving more importance to the areas which the user will focus most probably. These two work fields are related together through the definition of the term saliency. Saliency is the capacity of human visual system for characterizing an image, highlighting the areas which result more attractive to the human eye. Saliency in generation of 3DTV contents refers mainly to the simulated depth of the optic illusion, i.e. the distance from the virtual object to the human eye. On the other hand, saliency is not based on virtual depth, but on other features, such as motion, level of detail, position of pixels in the frame or face detection, which are the basic features that are part of the developed visual attention model, as demonstrated with tests. Extensive literature involving visual comfort assessment was looked up, and the development of new preliminary subjective assessment with users was performed, in order to detect the features that increase the probability of discomfort to occur. With this methodology, the conclusions drawn confirmed that one common source of visual discomfort was when an abrupt change of disparity happened in video transitions, apart from other degradations, such as window violation. New quality assessment was performed to quantify the distribution of disparities over different sequences. The results confirmed that abrupt changes in negative parallax environment produce accommodation-vergence mismatches derived from the increasing time for human crystalline to focus the virtual objects. On the other side, for developing metrics that adapt to human visual system, additional subjective tests were developed to determine the importance of each factor, which masks a concrete distortion. Results demonstrated slight improvement after applying visual attention to objective metrics. This process of weighing pixels approximates the quality results to human eye’s response.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Research in speech recognition and synthesis over the past several decades has brought speech technology to a point where it is being used in "real-world" applications. However, despite the progress, the perception remains that the current technology is not flexible enough to allow easy voice communication with machines. The focus of speech research is now on producing systems that are accurate and robust but that do not impose unnecessary constraints on the user. This chapter takes a critical look at the shortcomings of the current speech recognition and synthesis algorithms, discusses the technical challenges facing research, and examines the new directions that research in speech recognition and synthesis must take in order to form the basis of new solutions suitable for supporting a wide range of applications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The speech characteristics, oromotor function and speech intelligibility of a group of children treated for cerebellar tumour (CT) was investigated perceptually. Assessment of these areas was performed on 11 children treated for CT with dysarthric speech as well as 21 non-neurologically impaired controls matched for age and sex to obtain a comprehensive perceptual profile of their speech and oromotor mechanism. Contributing to the perception of dysarthria were a number of deviant speech dimensions including imprecision of consonants, hoarseness and decreased pitch variation, as well as a reduction in overall speech intelligibility for both sentences and connected speech. Oromotor assessment revealed deficits in lip, tongue and laryngeal function, particularly relating to deficits in timing and coordination of movements. The most salient features of the dysarthria seen in children treated for CT were the mild nature of the speech disorder and clustering of speech deficits in the prosodic, phonatory and articulatory aspects of speech production.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The hallucinogenic serotonin(IA&2A) agonist psilocybin is known for its ability to induce illusions of motion in otherwise stationary objects or textured surfaces. This study investigated the effect of psilocybin on local and global motion processing in nine human volunteers. Using a forced choice direction of motion discrimination task we show that psilocybin selectively impairs coherence sensitivity for random dot patterns, likely mediated by high-level global motion detectors, but not contrast sensitivity for drifting gratings, believed to be mediated by low-level detectors. These results are in line with those observed within schizophrenic populations and are discussed in respect to the proposition that psilocybin may provide a model to investigate clinical psychosis and the pharmacological underpinnings of visual perception in normal populations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Drawing from ethnographic, empirical, and historical/cultural perspectives, we examine the extent to which visual aspects of music contribute to the communication that takes place between performers and their listeners. First, we introduce a framework for understanding how media and genres shape aural and visual experiences of music. Second, we present case studies of two performances, and describe the relation between visual and aural aspects of performance. Third, we report empirical evidence that visual aspects of performance reliably influence perceptions of musical structure (pitch related features) and affective interpretations of music. Finally, we trace new and old media trajectories of aural and visual dimensions of music, and highlight how our conceptions, perceptions and appreciation of music are intertwined with technological innovation and media deployment strategies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Previous investigations employing electropalatography (EPG) have identified articulatory timing deficits in individuals with acquired dysarthria. However, this technology is yet to be applied to the articulatory timing disturbance present in Parkinson's disease (PD). As a result, the current investigation aimed to use EPG to comprehensively examine the temporal aspects of articulation in a group of nine individuals with PD at sentence, word and segment level. This investigation followed on from a prior study (McAuliffe, Ward and Murdoch) and similarly, aimed to compare the results of the participants with PD to a group of aged (n=7) and young controls (n=8) to determine if ageing contributed to any articulatory timing deficits observed. Participants were required to read aloud the phrase I saw a ___ today'' with the EPG palate in-situ. Target words included the consonants /1/, /s/ and /t/ in initial position in both the /i/ and /a/ vowel environments. Perceptual investigation of speech rate was conducted in addition to objective measurement of sentence, word and segment duration. Segment durations included the total segment length and duration of the approach, closure/constriction and release phases of EPG consonant production. Results of the present study revealed impaired speech rate, perceptually, in the group with PD. However, this was not confirmed objectively. Electropalatographic investigation of segment durations indicated that, in general, the group with PD demonstrated segment durations consistent with the control groups. Only one significant difference was noted, with the group with PD exhibiting significantly increased duration of the release phase for /1a/ when compared to both the control groups. It is, therefore, possible that EPG failed to detect lingual movement impairment as it does not measure the complete tongue movement towards and away from the hard palate. Furthermore, the contribution of individual variation to the present findings should not be overlooked.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Acuity for elbow joint position sense (JPS) is reduced when head position is modified. Movement of the head is associated with biomechanical changes in the neck and shoulder musculoskeletal system, which may explain changes in elbow JPS. The present study aimed to determine whether elbow JPS is also influenced by illusory changes in head position. Simultaneous vibration of sternocleidomastoid (SCM) and the contralateral splenius was applied to 14 healthy adult human subjects. Muscle vibration or passive head rotation was introduced between presentation and reproduction of a target elbow position. Ten out of 14 subjects reported illusions consistent with lengthening of the vibrated muscles. In these 10 subjects, absolute error for elbow JPS increased with left SCM/right splenius vibration but not with right SCM/left splenius vibration. Absolute error also increased with right rotation, with a trend for increased error with left rotation. These results demonstrated that both actual and illusory changes in head position are associated with diminished acuity for elbow JPS, suggesting that the influence of head position on upper limb JPS depends, at least partially, on perceived head position.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Although reading ability has been related to the processing of simple pitch features such as isolated transitions or continuous modulation spoken language also contains complex patterns of pitch changes that are important for establishing stress location and for segmenting the speech stream. These aspects of spoken language processing depend critically on pitch pattern (global structure) rather than on absolute pitch values (local structure). Here we show that the detection of global structure, and not local structure, is predictive of performance on measures of phonological skill and reading ability, which supports a critical importance of pitch contour processing in the acquisition of literacy.