The Audio/Visual Emotion Challenge and Workshop (AVEC 2011) is the first competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and audiovisual emotion analysis, with all participants competing under strictly the same conditions. This paper first describes the challenge participation conditions. Next follows the data used – the SEMAINE corpus – and its partitioning into train, development, and test partitions for the challenge with labelling in four dimensions, namely activity, expectation, power, and valence. Further, audio and video baseline features are introduced as well as baseline results that use these features for the three sub-challenges of audio, video, and audiovisual emotion recognition.


Recent debates about media literacy and the internet have begun to acknowledge the importance of active user-engagement and interaction. It is not enough simply to access material online, but also to comment upon it and re-use. Yet how do these new user expectations fit within digital initiatives which increase access to audio-visual-content but which prioritise access and preservation of archives and online research rather than active user-engagement? This article will address these issues of media literacy in relation to audio-visual content. It will consider how these issues are currently being addressed, focusing particularly on the high-profile European initiative EUscreen. EUscreen brings together 20 European television archives into a single searchable database of over 40,000 digital items. Yet creative re-use restrictions and copyright issues prevent users from re-working the material they find on the site. Instead of re-use, EUscreen instead offers access and detailed contextualisation of its collection of material. But if the emphasis for resources within an online environment rests no longer upon access but on user-engagement, what does EUscreen and similar sites offer to different users?


This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.


Aquest llibre és el producte d'anys de cooperació entre equips de recerca de cinc països diferents, tot ells Key Institutions de la xarxa Childwatch International, en el marc d'un projecte plurinacional sobre adolescents i mitjans


El v??deo est?? realizado por profesores de las ??reas de Did??ctica y Organizaci??n Escolar y Teor??a e Historia de la Educaci??n, durante los a??os 2000 y 2001. Recoge la opini??n de profesionales, padres y madres y personas con discapacidad f??sica (sordos, ciegos y P.C.I.)y personas con discapacidad mental en relaci??n con diferentes aspectos de la vida diaria: hogar, inserci??n laboral, etc. Este recurso did??ctico est?? dise??ado para el visionado, interpretaci??n te??rico-pr??ctica y contraste de opiniones en el aula, de ense??anza superior, para abordar la formaci??n de los profesionales que van a desarrollar su actividad con personas con discapacidad.


La función de la Lengua en el Bachillerato es triple: como factor de promoción socio-económica que permite en algunos casos obtener mejoras salariales y en otros alcanzar puestos vedados a los que no conocen idiomas, la UNESCO recomienda su estudio por su función educativa respecto al ser humano, integrante de los distintos grupos nacionales, enriquecimiento del sentido crítico y de tolerancia al apreciar las diferencias y semejanzas de los distintos pueblos, una cultura humanista que debe procurar el estudio de la lengua francesa, máxime para nosotros si tenemos en cuenta que es un país fronterizo nuestro y que permite el camino para llegar a Europa, es lógico que la lengua francesa sea tan importante para nosotros debido a las relaciones comerciales, económicas, etcétera que se desarrollan en esta lengua.; como tercera función, y primordial, el apredizaje de, por lo menos, un idioma, es primordial para la formación de la personalidad. A partir de 1975 son importantes los avances conseguidos en el estudio de un idioma, sobre todo los esfuerzos de renovación didáctica, destacando las aportaciones de la metodología estructuroglobal audiovisual, nacida a partir de los años cincuenta y que está siendo renovada constantemente. Si el alumno ha de aprender el francés a distancia debe tener un material adecuado a través de cassettes con diálogos para aprender a pronunciar correctamente. Después se aprenderá a leer y escribir porque se supone que se sabe pronunciar correctamente y el transcribir la lengua oral es un ejercicio para fijar los conocimientos. Pero el aprendizaje de un idioma debe realizarse dedicando todos los días un tiempo concreto, esta regularidad es la permite aprenderlo. Así, en cada caso el alumno deberá actuar de acuerdo con las orientaciones más precisas y personales de su profesor-tutor y con sus hábitos de trabajo siempre y cuando resulten eficaces.


La comunicación presentada forma parte del proyecto excelencia titulado: “Conocimiento y competencia profesional del profesor universitario sobre enseñanza y el aprendizaje en entornos de tecnología avanzada de la información y la comunicación”; que pretendía apoyar , desde la investigación de campo, la mejora de la actividad académica universitaria y difundir a nivel operativo acciones de innovación educativa y del uso de la tecnología avanzada en los procesos de enseñanza-aprendizaje, que permitiesen construir y recopilar el conocimiento profesional docente en la universidad


This dissertation examines auditory perception and audio-visual reception in noise for both hearing-impaired and normal hearing persons, with a goal of determining some of the noise conditions under which amplified acoustic cues for speech can be beneficial to hearing-impaired persons.


The encoding of goal-oriented motion events varies across different languages. Speakers of languages without grammatical aspect (e.g., Swedish) tend to mention motion endpoints when describing events, e.g., “two nuns walk to a house,”, and attach importance to event endpoints when matching scenes from memory. Speakers of aspect languages (e.g., English), on the other hand, are more prone to direct attention to the ongoingness of motion events, which is reflected both in their event descriptions, e.g., “two nuns are walking.”, and in their non-verbal similarity judgements. This study examines to what extent native speakers of Swedish (n = 82) with English as a foreign language (FL) restructure their categorisation of goal-oriented motion as a function of their English proficiency and experience with the English language (e.g., exposure, learning). Seventeen monolingual native English speakers from the United Kingdom (UK) were engaged for comparison purposes. Data on motion event cognition were collected through a memory-based triads matching task, in which a target scene with an intermediate degree of endpoint orientation was matched with two alternative scenes with low and high degrees of endpoint orientation, respectively. Results showed that the preference among the Swedish speakers of L2 English to base their similarity judgements on ongoingness rather than event endpoints was correlated with their use of English in their everyday lives, such that those who often watched television in English approximated the ongoingness preference of the English native speakers. These findings suggest that event cognition patterns may be restructured through the exposure to FL audio-visual media. The results thus add to the emerging picture that learning a new language entails learning new ways of observing and reasoning about reality.


Synesthesia entails a special kind of sensory perception, where stimulation in one sensory modality leads to an internally generated perceptual experience of another, not stimulated sensory modality. This phenomenon can be viewed as an abnormal multisensory integration process as here the synesthetic percept is aberrantly fused with the stimulated modality. Indeed, recent synesthesia research has focused on multimodal processing even outside of the specific synesthesia-inducing context and has revealed changed multimodal integration, thus suggesting perceptual alterations at a global level. Here, we focused on audio-visual processing in synesthesia using a semantic classification task in combination with visually or auditory-visually presented animated and in animated objects in an audio-visual congruent and incongruent manner. Fourteen subjects with auditory-visual and/or grapheme-color synesthesia and 14 control subjects participated in the experiment. During presentation of the stimuli, event-related potentials were recorded from 32 electrodes. The analysis of reaction times and error rates revealed no group differences with best performance for audio-visually congruent stimulation indicating the well-known multimodal facilitation effect. We found enhanced amplitude of the N1 component over occipital electrode sites for synesthetes compared to controls. The differences occurred irrespective of the experimental condition and therefore suggest a global influence on early sensory processing in synesthetes.