948 resultados para Time code (Audio-visual technology)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Contiene informacion sobre el Proyecto III del programa de trabajo del CCCT: preparacion e intercambio de material audiovisual para la educacion sobre los beneficios, limitaciones y peligros del desarrollo en ciencia y tecnologia.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Acoustically, car cabins are extremely noisy and as a consequence audio-only, in-car voice recognition systems perform poorly. As the visual modality is immune to acoustic noise, using the visual lip information from the driver is seen as a viable strategy in circumventing this problem by using audio visual automatic speech recognition (AVASR). However, implementing AVASR requires a system being able to accurately locate and track the drivers face and lip area in real-time. In this paper we present such an approach using the Viola-Jones algorithm. Using the AVICAR [1] in-car database, we show that the Viola- Jones approach is a suitable method of locating and tracking the driver’s lips despite the visual variability of illumination and head pose for audio-visual speech recognition system.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Interacting with technology within a vehicle environment using a voice interface can greatly reduce the effects of driver distraction. Most current approaches to this problem only utilise the audio signal, making them susceptible to acoustic noise. An obvious approach to circumvent this is to use the visual modality in addition. However, capturing, storing and distributing audio-visual data in a vehicle environment is very costly and difficult. One current dataset available for such research is the AVICAR [1] database. Unfortunately this database is largely unusable due to timing mismatch between the two streams and in addition, no protocol is available. We have overcome this problem by re-synchronising the streams on the phone-number portion of the dataset and established a protocol for further research. This paper presents the first audio-visual results on this dataset for speaker-independent speech recognition. We hope this will serve as a catalyst for future research in this area.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, many of the world’s leading media producers, screenwriters, technicians and investors, particularly those in the Asia-Pacific region, have been drawn to work in the People's Republic of China (hereafter China or Mainland China). Media projects with a lighter commercial entertainment feel – compared with the heavy propaganda-oriented content of the past – have multiplied, thanks to the Chinese state’s newfound willingness to consider collaboration with foreign partners. This is no more evident than in film. Despite their long-standing reputation for rigorous censorship, state policymakers are now encouraging Chinese media entrepreneurs to generate fresh ideas and to develop products that will revitalise the stagnant domestic production sector. It is hoped that an increase in both the quality and quantity of domestic feature films, stimulated by an infusion of creativity and cutting-edge technology from outside the country, will help reverse China’s ‘cultural trade deficit’ (wenhua maoyi chizi) (Keane 2007).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Utilising advanced technologies, such as virtual environments (VEs), is of importance to training and education. The need to develop and effectively apply interactive, immersive 3D VEs continues to grow. As with any emerging technology, user acceptance of new software and hardware devices is often difficult to measure and guidelines to introduce and ensure adequate and correct usage of such technologies are lacking. It is therefore imperative to obtain a solid understanding of the important elements that play a role in effective learning through VEs. In particular, 3D VEs may present unusual and varied interaction and adoption considerations. The major contribution of this study is to investigate a complex set of interrelated factors in the relatively new sphere of VEs for training and education. Although many of the factors appears to be important from past research, researcher have not explicitly studied a comprehensive set of inter-dependant, empirically validated factors in order to understand how VEs aid complex procedural knowledge and motor skill learning. By integrating theory from research on training, human computer interaction (HCI), ergonomics and cognitive psychology, this research proposes and validates a model that contributes to application-specific VE efficacy formation. The findings of this study show visual feedback has a significant effect on performance. For tactile/force feedback and auditory feedback, no significant effect were found. For satisfaction, user control is salient for performance. Other factors such as interactivity and system comfort, as well as level of task difficulty, also showed effects on performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Primate multisensory object perception involves distributed brain regions. To investigate the network character of these regions of the human brain, we applied data-driven group spatial independent component analysis (ICA) to a functional magnetic resonance imaging (fMRI) data set acquired during a passive audio-visual (AV) experiment with common object stimuli. We labeled three group-level independent component (IC) maps as auditory (A), visual (V), and AV, based on their spatial layouts and activation time courses. The overlap between these IC maps served as definition of a distributed network of multisensory candidate regions including superior temporal, ventral occipito-temporal, posterior parietal and prefrontal regions. During an independent second fMRI experiment, we explicitly tested their involvement in AV integration. Activations in nine out of these twelve regions met the max-criterion (A < AV > V) for multisensory integration. Comparison of this approach with a general linear model-based region-of-interest definition revealed its complementary value for multisensory neuroimaging. In conclusion, we estimated functional networks of uni- and multisensory functional connectivity from one dataset and validated their functional roles in an independent dataset. These findings demonstrate the particular value of ICA for multisensory neuroimaging research and using independent datasets to test hypotheses generated from a data-driven analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis explores the role of multimodality in language learners’ comprehension, and more specifically, the effects on students’ audio-visual comprehension when different orchestrations of modes appear in the visualization of vodcasts. Firstly, I describe the state of the art of its three main areas of concern, namely the evolution of meaning-making, Information and Communication Technology (ICT), and audio-visual comprehension. One of the most important contributions in the theoretical overview is the suggested integrative model of audio-visual comprehension, which attempts to explain how students process information received from different inputs. Secondly, I present a study based on the following research questions: ‘Which modes are orchestrated throughout the vodcasts?’, ‘Are there any multimodal ensembles that are more beneficial for students’ audio-visual comprehension?’, and ‘What are the students’ attitudes towards audio-visual (e.g., vodcasts) compared to traditional audio (e.g., audio tracks) comprehension activities?’. Along with these research questions, I have formulated two hypotheses: Audio-visual comprehension improves when there is a greater number of orchestrated modes, and students have a more positive attitude towards vodcasts than traditional audios when carrying out comprehension activities. The study includes a multimodal discourse analysis, audio-visual comprehension tests, and students’ questionnaires. The multimodal discourse analysis of two British Council’s language learning vodcasts, entitled English is GREAT and Camden Fashion, using ELAN as the multimodal annotation tool, shows that there are a variety of multimodal ensembles of two, three and four modes. The audio-visual comprehension tests were given to 40 Spanish students, learning English as a foreign language, after the visualization of vodcasts. These comprehension tests contain questions related to specific orchestrations of modes appearing in the vodcasts. The statistical analysis of the test results, using repeated-measures ANOVA, reveal that students obtain better audio-visual comprehension results when the multimodal ensembles are constituted by a greater number of orchestrated modes. Finally, the data compiled from the questionnaires, conclude that students have a more positive attitude towards vodcasts in comparison to traditional audio listenings. Results from the audio-visual comprehension tests and questionnaires prove the two hypotheses of this study.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

No existe en Cuenca un proyecto de investigación periodística y de producción audiovisual que indague, recopile y presente información sobre aquellas profesiones tradicionales heredadas a través del tiempo y que poco a poco se van perdiendo con miras a extinguirse completamente. Este proyecto, de cierta manera, puede ser innovador, ya que involucra dos áreas: comunicación audiovisual y redacción dentro del periodismo. Se involucran por el hecho de presentar información relevante, a través de un producto final, visual y escrito, que enseñe de quéforma estas profesiones son desarrolladas por diferentes actores humanos, sus contextos y sus procesos, con la intención de servir de apoyo investigativo cultural en el ámbito local y nacional.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dirk de Bruyn has been creating film works for over 35 years; mostly in the hand-made, 'direct animation' mode. He also performs live with multiple projections of his films in a highly embodied mode of expanded cinema performance. His work is renowned for its intricate, suggestive layering of sound and image, and use of sumptuous, blooming fields of colour.An active participant in our PyR16, Dirk will be discussing his conceptual work, his meticulous creative process, and his particular relationship with the materials, light, space and time both on film and stage, illustrating with some examples.We'll conclude with a Q&A session with the attendants.