Biblioteca Digital

886 resultados para Audio-visual library service.

Recognising audio-visual speech in vehicles using the AVICAR database

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Interacting with technology within a vehicle environment using a voice interface can greatly reduce the effects of driver distraction. Most current approaches to this problem only utilise the audio signal, making them susceptible to acoustic noise. An obvious approach to circumvent this is to use the visual modality in addition. However, capturing, storing and distributing audio-visual data in a vehicle environment is very costly and difficult. One current dataset available for such research is the AVICAR [1] database. Unfortunately this database is largely unusable due to timing mismatch between the two streams and in addition, no protocol is available. We have overcome this problem by re-synchronising the streams on the phone-number portion of the dataset and established a protocol for further research. This paper presents the first audio-visual results on this dataset for speaker-independent speech recognition. We hope this will serve as a catalyst for future research in this area.

Improving visual noise insensitivity in small vocabulary audio visual speech recognition applications

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Visual noise insensitivity is important to audio visual speech recognition (AVSR). Visual noise can take on a number of forms such as varying frame rate, occlusion, lighting or speaker variabilities. The use of a high dimensional secondary classifier on the word likelihood scores from both the audio and video modalities is investigated for the purposes of adaptive fusion. Preliminary results are presented demonstrating performance above the catastrophic fusion boundary for our confidence measure irrespective of the type of visual noise presented to it. Our experiments were restricted to small vocabulary applications.

Can audio-visual speech recognition outperform acoustically enhanced speech recognition in automotive environment?

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of visual features in the form of lip movements to improve the performance of acoustic speech recognition has been shown to work well, particularly in noisy acoustic conditions. However, whether this technique can outperform speech recognition incorporating well-known acoustic enhancement techniques, such as spectral subtraction, or multi-channel beamforming is not known. This is an important question to be answered especially in an automotive environment, for the design of an efficient human-vehicle computer interface. We perform a variety of speech recognition experiments on a challenging automotive speech dataset and results show that synchronous HMM-based audio-visual fusion can outperform traditional single as well as multi-channel acoustic speech enhancement techniques. We also show that further improvement in recognition performance can be obtained by fusing speech-enhanced audio with the visual modality, demonstrating the complementary nature of the two robust speech recognition approaches.

Multiple cameras for audio-visual speech recognition in an automotive environment

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Audio-visualspeechrecognition, or the combination of visual lip-reading with traditional acoustic speechrecognition, has been previously shown to provide a considerable improvement over acoustic-only approaches in noisy environments, such as that present in an automotive cabin. The research presented in this paper will extend upon the established audio-visualspeechrecognition literature to show that further improvements in speechrecognition accuracy can be obtained when multiple frontal or near-frontal views of a speaker's face are available. A series of visualspeechrecognition experiments using a four-stream visual synchronous hidden Markov model (SHMM) are conducted on the four-camera AVICAR automotiveaudio-visualspeech database. We study the relative contribution between the side and central orientated cameras in improving visualspeechrecognition accuracy. Finally combination of the four visual streams with a single audio stream in a five-stream SHMM demonstrates a relative improvement of over 56% in word recognition accuracy when compared to the acoustic-only approach in the noisiest conditions of the AVICAR database.

Enhancing learning in large undergraduate classes using custom made audio-visual cases

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This is an exploratory study into the effective use of embedding custom made audiovisual case studies (AVCS) in enhancing the student’s learning experience. This paper describes a project that used AVCS for a large divergent cohort of undergraduate students, enrolled in an International Business course. The study makes a number of key contributions to advancing learning and teaching within the discipline. AVCS provide first hand reporting of the case material, where the students have the ability to improve their understanding from both verbal and nonverbal cues. The paper demonstrates how AVCS can be embedded in a student-centred teaching approach to capture the students’ interest and to enhance a deep approach to learning by providing real-world authentic experience.

Using social media to create a participatory library service : an Australian study

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Public libraries are increasingly using social media in an attempt to meet users in their own spaces. Social media can be useful when used to create a participatory library service – to engage with users. However, there has been little empirical investigation into the success of social media use by public libraries. This article reports on the findings of a research project that explored the use of social media by Australian public libraries. Two organisations participated in case studies that involved interviews, document analysis, and social media observation. To contextualise the use of social media in the case study organisations, a sub study was undertaking involving observation of an additional 24 public libraries across Australia. This article focuses on the findings from the observation sub study. It presents and applies a methodology for classifying social media content to determine whether the sample libraries’ social media use is indicative of a participatory approach to service delivery. This article explores how a range of social media platforms are used by the sample libraries and considers what ‘best practice’ in participatory library service looks like. The two case study organisations’ use of social media is highlighted as exemplary practice.

Acoustic adaptation in cross database audio visual SHMM training for phonetic spoken term detection

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Visual information in the form of lip movements of the speaker has been shown to improve the performance of speech recognition and search applications. In our previous work, we proposed cross database training of synchronous hidden Markov models (SHMMs) to make use of external large and publicly available audio databases in addition to the relatively small given audio visual database. In this work, the cross database training approach is improved by performing an additional audio adaptation step, which enables audio visual SHMMs to benefit from audio observations of the external audio models before adding visual modality to them. The proposed approach outperforms the baseline cross database training approach in clean and noisy environments in terms of phone recognition accuracy as well as spoken term detection (STD) accuracy.

Cross database training of audio-visual hidden Markov models for phone recognition

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech recognition can be improved by using visual information in the form of lip movements of the speaker in addition to audio information. To date, state-of-the-art techniques for audio-visual speech recognition continue to use audio and visual data of the same database for training their models. In this paper, we present a new approach to make use of one modality of an external dataset in addition to a given audio-visual dataset. By so doing, it is possible to create more powerful models from other extensive audio-only databases and adapt them on our comparatively smaller multi-stream databases. Results show that the presented approach outperforms the widely adopted synchronous hidden Markov models (HMM) trained jointly on audio and visual data of a given audio-visual database for phone recognition by 29% relative. It also outperforms the external audio models trained on extensive external audio datasets and also internal audio models by 5.5% and 46% relative respectively. We also show that the proposed approach is beneficial in noisy environments where the audio source is affected by the environmental noise.

East Asian audio-visual collaboration and the global expansion of Chinese media

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, many of the world’s leading media producers, screenwriters, technicians and investors, particularly those in the Asia-Pacific region, have been drawn to work in the People's Republic of China (hereafter China or Mainland China). Media projects with a lighter commercial entertainment feel – compared with the heavy propaganda-oriented content of the past – have multiplied, thanks to the Chinese state’s newfound willingness to consider collaboration with foreign partners. This is no more evident than in film. Despite their long-standing reputation for rigorous censorship, state policymakers are now encouraging Chinese media entrepreneurs to generate fresh ideas and to develop products that will revitalise the stagnant domestic production sector. It is hoped that an increase in both the quality and quantity of domestic feature films, stimulated by an infusion of creativity and cutting-edge technology from outside the country, will help reverse China’s ‘cultural trade deficit’ (wenhua maoyi chizi) (Keane 2007).

Assessing the impact of a health library service. Best practice guidance

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Urquhart, C. & Weightman, A. (2008). Assessing the impact of a health library service. Best Practice Guidance. Based on research originally funded by LKDN, now sponsored by National Library for Health. Aberystwyth: Department of Information Studies, Aberystwyth University. The guidance relates to a project report, Developing a toolkit for assessing the impact of health library services on patient care (also available in CADAIR). A version of this item is available as an online appendix to a paper in Health Information and Libraries Journal entitled: The value and impact of information provided through library services for patient care: developing guidance for best practice (Weightman, A., Urquhart, C. et al) available electronically prepublication Sponsorship: LKDN/NLH

A New Posterior Based Audio-Visual Integration Method for Robust Speech Recognition

Relevância:

100.00% 100.00%

Publicador:

Audio-visual Integration for Robust Speech Recognition Using Maximum Weighted Stream Posteriors

Relevância:

100.00% 100.00%

Publicador:

Investigations into the robustness of audio-visual gender classification to background noise and illumination effects

Relevância:

100.00% 100.00%

Publicador:

Editorial Proc of The Second International Audio/Visual Emotion Challenge and Workshop - An Introduction

Relevância:

100.00% 100.00%

Publicador:

Editorial The First Audio/Visual Emotion Challenge and Workshop - An Introduction

Relevância:

100.00% 100.00%

Publicador:

«
1
2
3
4
5
6
7
8
...
59
60
»