57 resultados para Audio director


Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The cascading appearance-based (CAB) feature extraction technique has established itself as the state-of-the-art in extracting dynamic visual speech features for speech recognition. In this paper, we will focus on investigating the effectiveness of this technique for the related speaker verification application. By investigating the speaker verification ability of each stage of the cascade we will demonstrate that the same steps taken to reduce static speaker and environmental information for the visual speech recognition application also provide similar improvements for visual speaker recognition. A further study is conducted comparing synchronous HMM (SHMM) based fusion of CAB visual features and traditional perceptual linear predictive (PLP) acoustic features to show that higher complexity inherit in the SHMM approach does not appear to provide any improvement in the final audio-visual speaker verification system over simpler utterance level score fusion.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Interacting with technology within a vehicle environment using a voice interface can greatly reduce the effects of driver distraction. Most current approaches to this problem only utilise the audio signal, making them susceptible to acoustic noise. An obvious approach to circumvent this is to use the visual modality in addition. However, capturing, storing and distributing audio-visual data in a vehicle environment is very costly and difficult. One current dataset available for such research is the AVICAR [1] database. Unfortunately this database is largely unusable due to timing mismatch between the two streams and in addition, no protocol is available. We have overcome this problem by re-synchronising the streams on the phone-number portion of the dataset and established a protocol for further research. This paper presents the first audio-visual results on this dataset for speaker-independent speech recognition. We hope this will serve as a catalyst for future research in this area.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Some of my most powerful spiritual experiences have come from the splendorous and sublime sounding hymns performed by a choir and church organ at the traditional Anglican church I’ve attended since I was very young. In the later stage of my life, my pursuit of education in the field of engineering caused me to move to Australia where I regularly attended a contemporary evangelical church and subsequently became a music director in the faith community. This environmental and cultural shift altered my perception and musical experiences of Christian music and led me to enquire about the relationship between Christian liturgy and church music. Throughout history church musicians and composers have synthesised the theological, congregational, cultural and musical aspects of church liturgy. Many great composers have taken into account the conditions surrounding the process of sacred composition and arrangement of music to enhance the experience of religious ecstasy – they sought resonances with Christian values and beliefs to draw congregational participation into the light of praising and glorifying God. As a music director in an evangelical church this aspiration has become one I share. I hope to identify and define the qualities of these resonances that have been successful and apply them to my own practice. Introduction and Structure of the Thesis In this study I will examine four purposively selected excerpts of Christian church vocal music combining theomusicological and semiotic analysis to help identify guidelines that might be useful in my practice as a church music director. The four musical excerpts have been selected based upon their sustained musical and theological impact over time, and their ability to affect ecstatic responses from congregations. This thesis documents a personal journey through analysis of music and uses a context that draws upon ethno-musicological, theological and semiotic tools that lead to a preliminary framework and principles which can then be applied to the identified qualities of resonance in church music today. The thesis is comprised of four parts. Part 1 presents a literature study on the relationship between sacred music, the effects of religious ecstasy and the Christian church. Multiple lenses on this phenomenon are drawn from the viewpoints of prominent western church historians, Biblical theologians, and philosophers. The literature study continues in Part 2, where the role of embodiment is examined from the current perspective of cognitive learning environments. This study offers a platform for a critical reflection on two distinctive musical liturgical systems that have treated differently the notion of embodied understanding amidst a shifting church paradigm. This allows an in-depth theological and philosophical understanding of the liturgical conditions around sacred music-making that relates to the monistic and dualistic body/mind. Part 3 involves undertaking a theomusicological methodology that utilises creative case studies of four purposively selected spiritual pieces. A semiotic study focuses on specific sections of sacred vocal works that express the notions of ‘praise’ and ‘glorification’, particularly in relation to these effects,which combine an analysis of theological perspectives around religious ecstasy and particular spiritual themes. Part 4 presents the critiques and findings gathered from the study that incorporate theoretical and technological means to analyse the purposive selected musical artefact, particularly with the sonic narratives expressing notions of ‘Praise' and 'Glory’. The musical findings are further discussed in relation to the notion of resonance, and then a conceptual framework for the role of contemporary musicdirector is proposed. The musical and Christian terminologies used in the thesis are explained in the glossary, and the appendices includes tables illustrating the musical findings, conducted surveys, written musical analyses and audio examples of selected sacred pieces available on the enclosed compact disc.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Visual noise insensitivity is important to audio visual speech recognition (AVSR). Visual noise can take on a number of forms such as varying frame rate, occlusion, lighting or speaker variabilities. The use of a high dimensional secondary classifier on the word likelihood scores from both the audio and video modalities is investigated for the purposes of adaptive fusion. Preliminary results are presented demonstrating performance above the catastrophic fusion boundary for our confidence measure irrespective of the type of visual noise presented to it. Our experiments were restricted to small vocabulary applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The use of visual features in the form of lip movements to improve the performance of acoustic speech recognition has been shown to work well, particularly in noisy acoustic conditions. However, whether this technique can outperform speech recognition incorporating well-known acoustic enhancement techniques, such as spectral subtraction, or multi-channel beamforming is not known. This is an important question to be answered especially in an automotive environment, for the design of an efficient human-vehicle computer interface. We perform a variety of speech recognition experiments on a challenging automotive speech dataset and results show that synchronous HMM-based audio-visual fusion can outperform traditional single as well as multi-channel acoustic speech enhancement techniques. We also show that further improvement in recognition performance can be obtained by fusing speech-enhanced audio with the visual modality, demonstrating the complementary nature of the two robust speech recognition approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Wikis have proved to be very effective collaboration and knowledge management tools in large variety of fields thanks to their simplicity and flexible nature. Another important development for the internet is the emergence of powerful mobile devices supported by fast and reliable wireless networks. The combination of these developments begs the question of how to extend wikis on mobile devices and how to leverage mobile devices' rich modalities to supplement current wikis. Realizing that composing and consuming through auditory channel is the most natural and efficient way for mobile device user, this paper explores the use of audio as the medium of wiki. Our work, as the first step towards this direction, creates a framework called Mobile Audio Wiki which facilitates asynchronous audio-mediated collaboration on the move. In this paper, we present the design of Mobile Audio Wiki. As a part of such design, we propose an innovative approach for a light-weight audio content annotation system for enabling group editing, versioning and cross-linking among audio clips. To elucidate the novel collaboration model introduced by Mobile Audio Wiki, its four usage modes are identified and presented in storyboard format. Finally, we describe the initial design for presentation and navigation of Mobile Audio Wiki.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Audio-visualspeechrecognition, or the combination of visual lip-reading with traditional acoustic speechrecognition, has been previously shown to provide a considerable improvement over acoustic-only approaches in noisy environments, such as that present in an automotive cabin. The research presented in this paper will extend upon the established audio-visualspeechrecognition literature to show that further improvements in speechrecognition accuracy can be obtained when multiple frontal or near-frontal views of a speaker's face are available. A series of visualspeechrecognition experiments using a four-stream visual synchronous hidden Markov model (SHMM) are conducted on the four-camera AVICAR automotiveaudio-visualspeech database. We study the relative contribution between the side and central orientated cameras in improving visualspeechrecognition accuracy. Finally combination of the four visual streams with a single audio stream in a five-stream SHMM demonstrates a relative improvement of over 56% in word recognition accuracy when compared to the acoustic-only approach in the noisiest conditions of the AVICAR database.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the 10 years since the addition of uncommercial transactions to the table of deemed “debts incurred” in s 588G(1A) of the Corporations Act, the sub-section has arguably achieved little. This article explains why this has been so, and what needs to be done to enable this aspect of Australia’s insolvent trading laws to operate effectively and as originally intended.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a novel technique for segmenting an audio stream into homogeneous regions according to speaker identities, background noise, music, environmental and channel conditions. Audio segmentation is useful in audio diarization systems, which aim to annotate an input audio stream with information that attributes temporal regions of the audio into their specific sources. The segmentation method introduced in this paper is performed using the Generalized Likelihood Ratio (GLR), computed between two adjacent sliding windows over preprocessed speech. This approach is inspired by the popular segmentation method proposed by the pioneering work of Chen and Gopalakrishnan, using the Bayesian Information Criterion (BIC) with an expanding search window. This paper will aim to identify and address the shortcomings associated with such an approach. The result obtained by the proposed segmentation strategy is evaluated on the 2002 Rich Transcription (RT-02) Evaluation dataset, and a miss rate of 19.47% and a false alarm rate of 16.94% is achieved at the optimal threshold.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Optimal adherence to antiretroviral therapy (ART) is necessary for people living with HIV/AIDS (PLHIV). There have been relatively few systematic analyses of factors that promote or inhibit adherence to antiretroviral therapy among PLHIV in Asia. This study assessed ART adherence and examined factors associated with suboptimal adherence in northern Viet Nam. Methods: Data from 615 PLHIV on ART in two urban and three rural outpatient clinics were collected by medical record extraction and from patient interviews using audio computer-assisted self-interview (ACASI). Results: The prevalence of suboptimal adherence was estimated to be 24.9% via a visual analogue scale (VAS) of past-month dose-missing and 29.1% using a modified Adult AIDS Clinical Trial Group scale for on-time dose-taking in the past 4 days. Factors significantly associated with the more conservative VAS score were: depression (p < 0.001), side-effect experiences (p < 0.001), heavy alcohol use (p = 0.001), chance health locus of control (p = 0.003), low perceived quality of information from care providers (p = 0.04) and low social connectedness (p = 0.03). Illicit drug use alone was not significantly associated with suboptimal adherence, but interacted with heavy alcohol use to reduce adherence (p < 0.001). Conclusions: This is the largest survey of ART adherence yet reported from Asia and the first in a developing country to use the ACASI method in this context. The evidence strongly indicates that ART services in Viet Nam should include screening and treatment for depression, linkage with alcohol and/or drug dependence treatment, and counselling to address the belief that chance or luck determines health outcomes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In relation to enterprise technology governance (ETG), opinions differ between there being no need for board of director involvement to there being an urgent need for such involvement. This research highlights the need for boards to provide ETG oversight of technology-related strategy, investment and risk, and to be competent in doing so. We identify a large gap between board’s awareness of the importance of ETG, their taking action and the competency requirements for effective ETG. Further, while there is considerable research and literature about operational IT governance frameworks and operational IT competencies, there is no known research into the specific competencies boards of directors need to effectively govern enterprise technology. This research focuses on and develops a board-level ETG competency set using a mixed methods approach within a recognised competency development framework. Further development is tracked using a rigour scale to demonstrate a medium to high level of competency validity for the derived set. This research contributes to practice by providing the first known industry validated ETG competency set situated within new and emerging technology. It contributes to the body of knowledge in the modification and application of competency development and competency validation frameworks not previously applied to the role of board director.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Bioacoustic data can provide an important base for environmental monitoring. To explore a large amount of field recordings collected, an automated similarity search algorithm is presented in this paper. A region of an audio defined by frequency and time bounds is provided by a user; the content of the region is used to construct a query. In the retrieving process, our algorithm will automatically scan through recordings to search for similar regions. In detail, we present a feature extraction approach based on the visual content of vocalisations – in this case ridges, and develop a generic regional representation of vocalisations for indexing. Our feature extraction method works best for bird vocalisations showing ridge characteristics. The regional representation method allows the content of an arbitrary region of a continuous recording to be described in a compressed format.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Interpreting acoustic recordings of the natural environment is an increasingly important technique for ecologists wishing to monitor terrestrial ecosystems. Technological advances make it possible to accumulate many more recordings than can be listened to or interpreted, thereby necessitating automated assistance to identify elements in the soundscape. In this paper we examine the problem of estimating avian species richness by sampling from very long acoustic recordings. We work with data recorded under natural conditions and with all the attendant problems of undefined and unconstrained acoustic content (such as wind, rain, traffic, etc.) which can mask content of interest (in our case, bird calls). We describe 14 acoustic indices calculated at one minute resolution for the duration of a 24 hour recording. An acoustic index is a statistic that summarizes some aspect of the structure and distribution of acoustic energy and information in a recording. Some of the indices we calculate are standard (e.g. signal-to-noise ratio), some have been reported useful for the detection of bioacoustic activity (e.g. temporal and spectral entropies) and some are directed to avian sources (spectral persistence of whistles). We rank the one minute segments of a 24 hour recording in descending order according to an "acoustic richness" score which is derived from a single index or a weighted combination of two or more. We describe combinations of indices which lead to more efficient estimates of species richness than random sampling from the same recording, where efficiency is defined as total species identified for given listening effort. Using random sampling, we achieve a 53% increase in species recognized over traditional field surveys and an increase of 87% using combinations of indices to direct the sampling. We also demonstrate how combinations of the same indices can be used to detect long duration acoustic events (such as heavy rain and cicada chorus) and to construct long duration (24 h) spectrograms.