956 resultados para Audio-visual archives


100.00% 100.00%



Visual information in the form of lip movements of the speaker has been shown to improve the performance of speech recognition and search applications. In our previous work, we proposed cross database training of synchronous hidden Markov models (SHMMs) to make use of external large and publicly available audio databases in addition to the relatively small given audio visual database. In this work, the cross database training approach is improved by performing an additional audio adaptation step, which enables audio visual SHMMs to benefit from audio observations of the external audio models before adding visual modality to them. The proposed approach outperforms the baseline cross database training approach in clean and noisy environments in terms of phone recognition accuracy as well as spoken term detection (STD) accuracy.


100.00% 100.00%



Speech recognition can be improved by using visual information in the form of lip movements of the speaker in addition to audio information. To date, state-of-the-art techniques for audio-visual speech recognition continue to use audio and visual data of the same database for training their models. In this paper, we present a new approach to make use of one modality of an external dataset in addition to a given audio-visual dataset. By so doing, it is possible to create more powerful models from other extensive audio-only databases and adapt them on our comparatively smaller multi-stream databases. Results show that the presented approach outperforms the widely adopted synchronous hidden Markov models (HMM) trained jointly on audio and visual data of a given audio-visual database for phone recognition by 29% relative. It also outperforms the external audio models trained on extensive external audio datasets and also internal audio models by 5.5% and 46% relative respectively. We also show that the proposed approach is beneficial in noisy environments where the audio source is affected by the environmental noise.


100.00% 100.00%



In recent years, many of the world’s leading media producers, screenwriters, technicians and investors, particularly those in the Asia-Pacific region, have been drawn to work in the People's Republic of China (hereafter China or Mainland China). Media projects with a lighter commercial entertainment feel – compared with the heavy propaganda-oriented content of the past – have multiplied, thanks to the Chinese state’s newfound willingness to consider collaboration with foreign partners. This is no more evident than in film. Despite their long-standing reputation for rigorous censorship, state policymakers are now encouraging Chinese media entrepreneurs to generate fresh ideas and to develop products that will revitalise the stagnant domestic production sector. It is hoped that an increase in both the quality and quantity of domestic feature films, stimulated by an infusion of creativity and cutting-edge technology from outside the country, will help reverse China’s ‘cultural trade deficit’ (wenhua maoyi chizi) (Keane 2007).


100.00% 100.00%



The Audio/Visual Emotion Challenge and Workshop (AVEC 2011) is the first competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and audiovisual emotion analysis, with all participants competing under strictly the same conditions. This paper first describes the challenge participation conditions. Next follows the data used – the SEMAINE corpus – and its partitioning into train, development, and test partitions for the challenge with labelling in four dimensions, namely activity, expectation, power, and valence. Further, audio and video baseline features are introduced as well as baseline results that use these features for the three sub-challenges of audio, video, and audiovisual emotion recognition.


100.00% 100.00%



This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.


100.00% 100.00%



Direct experience of social work in another country is making an increasingly important contribution to internationalising the social work academic curriculum together with the cultural competency of students. However at present this opportunity is still restricted to a limited number of students. The aim of this paper is to describe and reflect on the production of an audio-visual presentation as representing the experience of three students who participated in an exchange with a social work programme in Pune, India. It describes and assesses the rationale, production and use of video to capture student learning from the Belfast/Pune exchange. We also describe the use of the video in a classroom setting with a year group of 53 students from a younger cohort. This exercise aimed to stimulate students’ curiosity about international dimensions of social work and add to their awareness of poverty, social justice, cultural competence and community social work as global issues. Written classroom feedback informs our discussion of the technical as well as the pedagogical benefits and challenges of this approach. We conclude that some benefit of audio-visual presentation in helping students connect with diverse cultural contexts, but that a complementary discussion challenging stereotyped viewpoints and unconscious professional imperialism is also crucial.


100.00% 100.00%



Aquest llibre és el producte d'anys de cooperació entre equips de recerca de cinc països diferents, tot ells Key Institutions de la xarxa Childwatch International, en el marc d'un projecte plurinacional sobre adolescents i mitjans


100.00% 100.00%



El v??deo est?? realizado por profesores de las ??reas de Did??ctica y Organizaci??n Escolar y Teor??a e Historia de la Educaci??n, durante los a??os 2000 y 2001. Recoge la opini??n de profesionales, padres y madres y personas con discapacidad f??sica (sordos, ciegos y P.C.I.)y personas con discapacidad mental en relaci??n con diferentes aspectos de la vida diaria: hogar, inserci??n laboral, etc. Este recurso did??ctico est?? dise??ado para el visionado, interpretaci??n te??rico-pr??ctica y contraste de opiniones en el aula, de ense??anza superior, para abordar la formaci??n de los profesionales que van a desarrollar su actividad con personas con discapacidad.


100.00% 100.00%



La función de la Lengua en el Bachillerato es triple: como factor de promoción socio-económica que permite en algunos casos obtener mejoras salariales y en otros alcanzar puestos vedados a los que no conocen idiomas, la UNESCO recomienda su estudio por su función educativa respecto al ser humano, integrante de los distintos grupos nacionales, enriquecimiento del sentido crítico y de tolerancia al apreciar las diferencias y semejanzas de los distintos pueblos, una cultura humanista que debe procurar el estudio de la lengua francesa, máxime para nosotros si tenemos en cuenta que es un país fronterizo nuestro y que permite el camino para llegar a Europa, es lógico que la lengua francesa sea tan importante para nosotros debido a las relaciones comerciales, económicas, etcétera que se desarrollan en esta lengua.; como tercera función, y primordial, el apredizaje de, por lo menos, un idioma, es primordial para la formación de la personalidad. A partir de 1975 son importantes los avances conseguidos en el estudio de un idioma, sobre todo los esfuerzos de renovación didáctica, destacando las aportaciones de la metodología estructuroglobal audiovisual, nacida a partir de los años cincuenta y que está siendo renovada constantemente. Si el alumno ha de aprender el francés a distancia debe tener un material adecuado a través de cassettes con diálogos para aprender a pronunciar correctamente. Después se aprenderá a leer y escribir porque se supone que se sabe pronunciar correctamente y el transcribir la lengua oral es un ejercicio para fijar los conocimientos. Pero el aprendizaje de un idioma debe realizarse dedicando todos los días un tiempo concreto, esta regularidad es la permite aprenderlo. Así, en cada caso el alumno deberá actuar de acuerdo con las orientaciones más precisas y personales de su profesor-tutor y con sus hábitos de trabajo siempre y cuando resulten eficaces.