476 resultados para Audio-visual materials


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech recognition can be improved by using visual information in the form of lip movements of the speaker in addition to audio information. To date, state-of-the-art techniques for audio-visual speech recognition continue to use audio and visual data of the same database for training their models. In this paper, we present a new approach to make use of one modality of an external dataset in addition to a given audio-visual dataset. By so doing, it is possible to create more powerful models from other extensive audio-only databases and adapt them on our comparatively smaller multi-stream databases. Results show that the presented approach outperforms the widely adopted synchronous hidden Markov models (HMM) trained jointly on audio and visual data of a given audio-visual database for phone recognition by 29% relative. It also outperforms the external audio models trained on extensive external audio datasets and also internal audio models by 5.5% and 46% relative respectively. We also show that the proposed approach is beneficial in noisy environments where the audio source is affected by the environmental noise.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, many of the world’s leading media producers, screenwriters, technicians and investors, particularly those in the Asia-Pacific region, have been drawn to work in the People's Republic of China (hereafter China or Mainland China). Media projects with a lighter commercial entertainment feel – compared with the heavy propaganda-oriented content of the past – have multiplied, thanks to the Chinese state’s newfound willingness to consider collaboration with foreign partners. This is no more evident than in film. Despite their long-standing reputation for rigorous censorship, state policymakers are now encouraging Chinese media entrepreneurs to generate fresh ideas and to develop products that will revitalise the stagnant domestic production sector. It is hoped that an increase in both the quality and quantity of domestic feature films, stimulated by an infusion of creativity and cutting-edge technology from outside the country, will help reverse China’s ‘cultural trade deficit’ (wenhua maoyi chizi) (Keane 2007).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a novel technique for conducting robust voice activity detection (VAD) in high-noise recordings. We use Gaussian mixture modeling (GMM) to train two generic models; speech and non-speech. We then score smaller segments of a given (unseen) recording against each of these GMMs to obtain two respective likelihood scores for each segment. These scores are used to compute a dissimilarity measure between pairs of segments and to carry out complete-linkage clustering of the segments into speech and non-speech clusters. We compare the accuracy of our method against state-of-the-art and standardised VAD techniques to demonstrate an absolute improvement of 15% in half-total error rate (HTER) over the best performing baseline system and across the QUT-NOISE-TIMIT database. We then apply our approach to the Audio-Visual Database of American English (AVDBAE) to demonstrate the performance of our algorithm in using visual, audio-visual or a proposed fusion of these features.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The performance of visual speech recognition (VSR) systems are significantly influenced by the accuracy of the visual front-end. The current state-of-the-art VSR systems use off-the-shelf face detectors such as Viola- Jones (VJ) which has limited reliability for changes in illumination and head poses. For a VSR system to perform well under these conditions, an accurate visual front end is required. This is an important problem to be solved in many practical implementations of audio visual speech recognition systems, for example in automotive environments for an efficient human-vehicle computer interface. In this paper, we re-examine the current state-of-the-art VSR by comparing off-the-shelf face detectors with the recently developed Fourier Lucas-Kanade (FLK) image alignment technique. A variety of image alignment and visual speech recognition experiments are performed on a clean dataset as well as with a challenging automotive audio-visual speech dataset. Our results indicate that the FLK image alignment technique can significantly outperform off-the shelf face detectors, but requires frequent fine-tuning.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Spoken term detection (STD) is the task of looking up a spoken term in a large volume of speech segments. In order to provide fast search, speech segments are first indexed into an intermediate representation using speech recognition engines which provide multiple hypotheses for each speech segment. Approximate matching techniques are usually applied at the search stage to compensate the poor performance of automatic speech recognition engines during indexing. Recently, using visual information in addition to audio information has been shown to improve phone recognition performance, particularly in noisy environments. In this paper, we will make use of visual information in the form of lip movements of the speaker in indexing stage and will investigate its effect on STD performance. Particularly, we will investigate if gains in phone recognition accuracy will carry through the approximate matching stage to provide similar gains in the final audio-visual STD system over a traditional audio only approach. We will also investigate the effect of using visual information on STD performance in different noise environments.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

There are two aspects to the problem of digital scholarship and pedagogy. One is to do with scholarship; the other with pedagogy. In scholarship, the association of knowledge with its printed form remains dominant. In pedagogy, the desire to abandon print for ‘new’ media is urgent, at least in some parts of the academy. Film and media studies are thus at the intersection of opposing forces – pulling the field ‘back’ to print and ‘forward’ to digital media. These tensions may be especially painful in a field whose own object of study is another form of communication, neither print nor digital but broadcast. Although print has been overtaken in the popular marketplace by audio-visual forms, this was never achieved in the domain of scholarship. Even when it is digitally distributed, the output of research is still a ‘paper.’ But meanwhile, in the realm of teaching, production- and practice-based pedagogy has become firmly established. Nevertheless a disjunction remains, between high-end scholarship in research universities and vocational training in teaching institutions; but neither is well equipped to deal with the digital challenge.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Videotelephony (real-time audio-visual communication) has been used successfully in adult palliative home care. This paper describes two attempts to complete an RCT (both of which were abandoned following difficulties with family recruitment), designed to investigate the use of videotelephony with families receiving palliative care from a tertiary paediatric oncology service in Brisbane, Australia. To investigate whether providing videotelephone-based support was acceptable to these families, a 12-month non-randomised acceptability trial was completed. Seventeen palliative care families were offered access to a videotelephone support service in addition to the 24 hours ‘on-call’ service already offered. A 92% participation rate in this study provided some reassurance that the use of videotelephones themselves was not a factor in poor RCT participation rates. The next phase of research is to investigate the integration of videotelephone-based support from the time of diagnosis, through outpatient care and support, and for palliative care rather than for palliative care in isolation

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Research is often characterised as the search for new ideas and understanding. The language of this view privileges the cognitive and intellectual aspects of discovery. However, in the research process theoretical claims are usually evaluated in practice and, indeed, the observations and experiences of practical circumstances often lead to new research questions. This feedback loop between speculation and experimentation is fundamental to research in many disciplines, and is also appropriate for research in the creative arts. In this chapter we will examine how our creative desire for artistic expressivity results in interplay between actions and ideas that direct the development of techniques and approaches for our audio/visual live-coding activities.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Practice-led or multi modal theses (describing examinable outcomes of postgraduate study which comprise the practice of dancing/choreography with an accompanying exegesis) are an emerging strength of dance scholarship; a form of enquiry that has been gaining momentum for over a decade, particularly in Australia and the United Kingdom. It has been strongly argued that, in this form of research, legitimate claims to new knowledge are embodied predominantly within the practice itself (Pakes, 2003) and that these findings are emergent, contingent and often interstitial, contained within both the material form of the practice and in the symbolic languages surrounding the form. In a recent study on ‘dancing’ theses Phillips, Stock, Vincs (2009) found that there was general agreement from academics and artists that ‘there could be more flexibility in matching written language with conceptual thought expressed in practice’. The authors discuss how the seemingly intangible nature of danced / embodied research, reliant on what Melrose (2003) terms ‘performance mastery’ by the ‘expert practitioner’ (2006, Point 4) involving ‘expert’ intuition (2006, Point 5), might be accessed, articulated and validated in terms of alternative ways of knowing through exploring an ongoing dialogue in which the danced practice develops emergent theory. They also propose ways in which the danced thesis can be ‘converted’ into the required ‘durable’ artefact which the ephemerality of live performance denies, drawing on the work of Rye’s ‘multi-view’ digital record (2003) and Stapleton’s ‘multi-voiced audio visual document’(2006, 82). Building on a two-year research project (2007-2008) Dancing Between Diversity and Consistency: Refining Assessment in Postgraduate Degrees in Dance, which examined such issues in relation to assessment in an Australian context, the three researchers have further explored issues around interdisciplinarity, cultural differences and documentation through engaging with the following questions:  How do we represent research in which understandings, meanings and findings are situated within the body of the dancer/choreographer?  Do these need a form of ‘translating’ into textual form in order to be accessed as research?  What kind of language structures can be developed to effect this translation: metaphor, allusion, symbol?  How important is contextualising the creative practice?  How do we incorporate differing cultural inflections and practices into our reading and evaluation?  What kind of layered documentation can assist in producing a ‘durable’ research artefact from a non-reproduce-able live event?

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The evolution of the laptop computer as a musical instrument in the 1990s provided a tool for empowering the solo musician and divergent approaches to the application of this technology in performance remain consistently debated.  The increasing ubiquity of digital media combined with the power of current generation notebook technology has provided the perfect platform to realise integrated audio-visual toolsets that respond to musical controllers and provide mixed-media results.  Despite emerging practitioners increasingly availing themselves to the musical affordances of this technology, theoretical discussion in the field ignores the various approaches a solo musician might take in developing integrated media works for performance.   In an increasingly crowded niche there is a clear compulsion to consider expanded modes of performance, yet lacking any formal framework these integrations can easily alienate an audience, distract from performance and lead to criticisms of novelty for novelty's sake. 

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Grid music systems provide discrete geometric methods for simplified music-making by providing spatialised input to construct patterned music on a 2D matrix layout. While they are conceptually simple, grid systems may be layered to enable complex and satisfying musical results. Grid music systems have been applied to a range of systems from small portable devices up to larger systems. In this paper we will discuss the use of grid music systems in general and present an overview of the HarmonyGrid system we have developed as a new interactive performance system. We discuss a range of issues related to the design and use of larger-scale grid- based interactive performance systems such as the HarmonyGrid.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Digital rights management allows information owners to control the use and dissemination of electronic documents via a machine-readable licence. Documents are distributed in a protected form such that they may only be used with trusted environments, and only in accordance with terms and conditions stated in the licence. Digital rights management has found uses in protecting copyrighted audio-visual productions, private personal information, and companies' trade secrets and intellectual property. This chapter describes a general model of digital rights management together with the technologies used to implement each component of a digital rights management system, and desribes how digital rights management can be applied to secure the distribution of electronic information in a variety of contexts.

Relevância:

80.00% 80.00%

Publicador: