Biblioteca Digital

The Audio/Visual Emotion Challenge and Workshop (AVEC 2011) is the first competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and audiovisual emotion analysis, with all participants competing under strictly the same conditions. This paper first describes the challenge participation conditions. Next follows the data used – the SEMAINE corpus – and its partitioning into train, development, and test partitions for the challenge with labelling in four dimensions, namely activity, expectation, power, and valence. Further, audio and video baseline features are introduced as well as baseline results that use these features for the three sub-challenges of audio, video, and audiovisual emotion recognition.

Veja mais

Understanding Online Audio-Visual Content: A European Initiative, Media Literacy and the User

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent debates about media literacy and the internet have begun to acknowledge the importance of active user-engagement and interaction. It is not enough simply to access material online, but also to comment upon it and re-use. Yet how do these new user expectations fit within digital initiatives which increase access to audio-visual-content but which prioritise access and preservation of archives and online research rather than active user-engagement? This article will address these issues of media literacy in relation to audio-visual content. It will consider how these issues are currently being addressed, focusing particularly on the high-profile European initiative EUscreen. EUscreen brings together 20 European television archives into a single searchable database of over 40,000 digital items. Yet creative re-use restrictions and copyright issues prevent users from re-working the material they find on the site. Instead of re-use, EUscreen instead offers access and detailed contextualisation of its collection of material. But if the emphasis for resources within an online environment rests no longer upon access but on user-engagement, what does EUscreen and similar sites offer to different users?

Veja mais

Robust Audio-Visual Speech Recognition under Noisy Audio-Video Conditions

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.

Veja mais

Using Audio-visual Presentation to Teach Global Mindedness in Social Work Education

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Direct experience of social work in another country is making an increasingly important contribution to internationalising the social work academic curriculum together with the cultural competency of students. However at present this opportunity is still restricted to a limited number of students. The aim of this paper is to describe and reflect on the production of an audio-visual presentation as representing the experience of three students who participated in an exchange with a social work programme in Pune, India. It describes and assesses the rationale, production and use of video to capture student learning from the Belfast/Pune exchange. We also describe the use of the video in a classroom setting with a year group of 53 students from a younger cohort. This exercise aimed to stimulate students’ curiosity about international dimensions of social work and add to their awareness of poverty, social justice, cultural competence and community social work as global issues. Written classroom feedback informs our discussion of the technical as well as the pedagogical benefits and challenges of this approach. We conclude that some benefit of audio-visual presentation in helping students connect with diverse cultural contexts, but that a complementary discussion challenging stereotyped viewpoints and unconscious professional imperialism is also crucial.

Veja mais

Interactive force sensing feedback system for remote robotic laparoscopic surgery

Relevância:

80.00% 80.00%

Publicador:

Veja mais

Feature Selection for Pose Invariant Lip Biometrics

Relevância:

80.00% 80.00%

Publicador:

Resumo:

For the first time in this paper the authors present results showing the effect of out of plane speaker head pose variation on a lip biometric based speaker verification system. Using appearance DCT based features, they adopt a Mutual Information analysis technique to highlight the class discriminant DCT components most robust to changes in out of plane pose. Experiments are conducted using the initial phase of a new multi view Audio-Visual database designed for research and development of pose-invariant speech and speaker recognition. They show that verification performance can be improved by substituting higher order horizontal DCT components for vertical, particularly in the case of a train/test pose angle mismatch.

Veja mais

Web-assisted first-year undergraduate teaching in engineering

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A software system, recently developed by the authors for the efficient capturing, editing, and delivery of audio-visual web lectures, was used to create a series of lectures for a first-year undergraduate course in Dynamics. These web lectures were developed to serve as an extra study resource for students attending lectures and not as a replacement. A questionnaire was produced to obtain feedback from students. The overall response was very favorable and numerous requests were made for other lecturers to adopt this technology. Despite the students' approval of this added resource, there was no significant improvement in overall examination performance

Veja mais

An Investigation Into Features For Multi-View Lipreading

Relevância:

80.00% 80.00%

Publicador:

Resumo:

For the first time in this paper we present results showing the effect of speaker head pose angle on automatic lip-reading performance over a wide range of closely spaced angles. We analyse the effect head pose has upon the features themselves and show that by selecting coefficients with minimum variance w.r.t. pose angle, recognition performance can be improved when train-test pose angles differ. Experiments are conducted using the initial phase of a unique multi view Audio-Visual database designed specifically for research and development of pose-invariant lip-reading systems. We firstly show that it is the higher order horizontal spatial frequency components that become most detrimental as the pose deviates. Secondly we assess the performance of different feature selection masks across a range of pose angles including a new mask based on Minimum Cross-Pose Variance coefficients. We report a relative improvement of 50% in Word Error Rate when using our selection mask over a common energy based selection during profile view lip-reading.

Veja mais

Robust Multimodal Person Identification With Limited Training Data

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a novel method of audio-visual feature-level fusion for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowledge about the corruption. Furthermore, we assume there are limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new multimodal feature representation and a modified cosine similarity are introduced to combine and compare bimodal features with limited training data, as well as vastly differing data rates and feature sizes. Optimal feature selection and multicondition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Experiments have been carried out on a bimodal dataset created from the SPIDRE speaker recognition database and AR face recognition database with variable noise corruption of speech and occlusion in the face images. The system's speaker identification performance on the SPIDRE database, and facial identification performance on the AR database, is comparable with the literature. Combining both modalities using the new method of multimodal fusion leads to significantly improved accuracy over the unimodal systems, even when both modalities have been corrupted. The new method also shows improved identification accuracy compared with the bimodal systems based on multicondition model training or missing-feature decoding alone.

Veja mais

The Routledge Guide to Interviewing: Oral History, Social Enquiry & Investigation

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Routledge Guide to Interviewing sets out a well-tested and practical approach and methodology: what works, difficulties and dangers to avoid and key questions which must be answered before you set out. Background methodological issues and arguments are considered and drawn upon but the focus is on what is ethical, legally acceptable and productive:
-Rationale (why, what for, where, how)
-Ethics and Legalities (informed consent, data protection, risks, embargoes)
-Resources (organisational, technical, intellectual)
-Preparation (selecting and approaching interviewees, background and biographical research, establishing credentials, identifying topics)
-Technique (developing expertise and confidence)
-Audio-visual interviews
-Analysis (modes, methods, difficulties)
-Storage (archiving and long-term preservation)
-Sharing Resources (dissemination and development)

From death row to the mansion of a head of state, small kitchens and front parlours, to legislatures and presbyteries, Anna Bryson and Seán McConville’s wide interviewing experience has been condensed into this book. The material set out here has been acquired by trial, error and reflection over a period of more than four decades. The interviewees have ranged from the delightfully straightforward to the painfully difficult to the near impossible – with a sprinkling of those that were impossible.
Successful interviewing draws on the survival skills of everyday life. This guide will help you to adapt, develop and apply these innate skills. Including a range of useful information such as sample waivers, internet resources, useful hints and checklists, it provides sound and plain-speaking support for the oral historian, social scientist and investigator.

Veja mais

19 resultados para Audio-visual materials

em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast

Filtro por publicador