972 resultados para Audio acoustics


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Audio-visualspeechrecognition, or the combination of visual lip-reading with traditional acoustic speechrecognition, has been previously shown to provide a considerable improvement over acoustic-only approaches in noisy environments, such as that present in an automotive cabin. The research presented in this paper will extend upon the established audio-visualspeechrecognition literature to show that further improvements in speechrecognition accuracy can be obtained when multiple frontal or near-frontal views of a speaker's face are available. A series of visualspeechrecognition experiments using a four-stream visual synchronous hidden Markov model (SHMM) are conducted on the four-camera AVICAR automotiveaudio-visualspeech database. We study the relative contribution between the side and central orientated cameras in improving visualspeechrecognition accuracy. Finally combination of the four visual streams with a single audio stream in a five-stream SHMM demonstrates a relative improvement of over 56% in word recognition accuracy when compared to the acoustic-only approach in the noisiest conditions of the AVICAR database.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a novel technique for segmenting an audio stream into homogeneous regions according to speaker identities, background noise, music, environmental and channel conditions. Audio segmentation is useful in audio diarization systems, which aim to annotate an input audio stream with information that attributes temporal regions of the audio into their specific sources. The segmentation method introduced in this paper is performed using the Generalized Likelihood Ratio (GLR), computed between two adjacent sliding windows over preprocessed speech. This approach is inspired by the popular segmentation method proposed by the pioneering work of Chen and Gopalakrishnan, using the Bayesian Information Criterion (BIC) with an expanding search window. This paper will aim to identify and address the shortcomings associated with such an approach. The result obtained by the proposed segmentation strategy is evaluated on the 2002 Rich Transcription (RT-02) Evaluation dataset, and a miss rate of 19.47% and a false alarm rate of 16.94% is achieved at the optimal threshold.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we propose and evaluate a speaker attribution system using a complete-linkage clustering method. Speaker attribution refers to the annotation of a collection of spoken audio based on speaker identities. This can be achieved using diarization and speaker linking. The main challenge associated with attribution is achieving computational efficiency when dealing with large audio archives. Traditional agglomerative clustering methods with model merging and retraining are not feasible for this purpose. This has motivated the use of linkage clustering methods without retraining. We first propose a diarization system using complete-linkage clustering and show that it outperforms traditional agglomerative and single-linkage clustering based diarization systems with a relative improvement of 40% and 68%, respectively. We then propose a complete-linkage speaker linking system to achieve attribution and demonstrate a 26% relative improvement in attribution error rate (AER) over the single-linkage speaker linking approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Optimal adherence to antiretroviral therapy (ART) is necessary for people living with HIV/AIDS (PLHIV). There have been relatively few systematic analyses of factors that promote or inhibit adherence to antiretroviral therapy among PLHIV in Asia. This study assessed ART adherence and examined factors associated with suboptimal adherence in northern Viet Nam. Methods: Data from 615 PLHIV on ART in two urban and three rural outpatient clinics were collected by medical record extraction and from patient interviews using audio computer-assisted self-interview (ACASI). Results: The prevalence of suboptimal adherence was estimated to be 24.9% via a visual analogue scale (VAS) of past-month dose-missing and 29.1% using a modified Adult AIDS Clinical Trial Group scale for on-time dose-taking in the past 4 days. Factors significantly associated with the more conservative VAS score were: depression (p < 0.001), side-effect experiences (p < 0.001), heavy alcohol use (p = 0.001), chance health locus of control (p = 0.003), low perceived quality of information from care providers (p = 0.04) and low social connectedness (p = 0.03). Illicit drug use alone was not significantly associated with suboptimal adherence, but interacted with heavy alcohol use to reduce adherence (p < 0.001). Conclusions: This is the largest survey of ART adherence yet reported from Asia and the first in a developing country to use the ACASI method in this context. The evidence strongly indicates that ART services in Viet Nam should include screening and treatment for depression, linkage with alcohol and/or drug dependence treatment, and counselling to address the belief that chance or luck determines health outcomes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Bioacoustic data can provide an important base for environmental monitoring. To explore a large amount of field recordings collected, an automated similarity search algorithm is presented in this paper. A region of an audio defined by frequency and time bounds is provided by a user; the content of the region is used to construct a query. In the retrieving process, our algorithm will automatically scan through recordings to search for similar regions. In detail, we present a feature extraction approach based on the visual content of vocalisations – in this case ridges, and develop a generic regional representation of vocalisations for indexing. Our feature extraction method works best for bird vocalisations showing ridge characteristics. The regional representation method allows the content of an arbitrary region of a continuous recording to be described in a compressed format.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Interpreting acoustic recordings of the natural environment is an increasingly important technique for ecologists wishing to monitor terrestrial ecosystems. Technological advances make it possible to accumulate many more recordings than can be listened to or interpreted, thereby necessitating automated assistance to identify elements in the soundscape. In this paper we examine the problem of estimating avian species richness by sampling from very long acoustic recordings. We work with data recorded under natural conditions and with all the attendant problems of undefined and unconstrained acoustic content (such as wind, rain, traffic, etc.) which can mask content of interest (in our case, bird calls). We describe 14 acoustic indices calculated at one minute resolution for the duration of a 24 hour recording. An acoustic index is a statistic that summarizes some aspect of the structure and distribution of acoustic energy and information in a recording. Some of the indices we calculate are standard (e.g. signal-to-noise ratio), some have been reported useful for the detection of bioacoustic activity (e.g. temporal and spectral entropies) and some are directed to avian sources (spectral persistence of whistles). We rank the one minute segments of a 24 hour recording in descending order according to an "acoustic richness" score which is derived from a single index or a weighted combination of two or more. We describe combinations of indices which lead to more efficient estimates of species richness than random sampling from the same recording, where efficiency is defined as total species identified for given listening effort. Using random sampling, we achieve a 53% increase in species recognized over traditional field surveys and an increase of 87% using combinations of indices to direct the sampling. We also demonstrate how combinations of the same indices can be used to detect long duration acoustic events (such as heavy rain and cicada chorus) and to construct long duration (24 h) spectrograms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Acoustic recordings of the environment are an important aid to ecologists monitoring biodiversity and environmental health. However, rapid advances in recording technology, storage and computing make it possible to accumulate thousands of hours of recordings, of which, ecologists can only listen to a small fraction. The big-data challenge is to visualize the content of long-duration audio recordings on multiple scales, from hours, days, months to years. The visualization should facilitate navigation and yield ecologically meaningful information. Our approach is to extract (at one minute resolution) acoustic indices which reflect content of ecological interest. An acoustic index is a statistic that summarizes some aspect of the distribution of acoustic energy in a recording. We combine indices to produce false-colour images that reveal acoustic content and facilitate navigation through recordings that are months or even years in duration.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Environmental monitoring is becoming critical as human activity and climate change place greater pressures on biodiversity, leading to an increasing need for data to make informed decisions. Acoustic sensors can help collect data across large areas for extended periods making them attractive in environmental monitoring. However, managing and analysing large volumes of environmental acoustic data is a great challenge and is consequently hindering the effective utilization of the big dataset collected. This paper presents an overview of our current techniques for collecting, storing and analysing large volumes of acoustic data efficiently, accurately, and cost-effectively.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The present paper explores extreme car audio systems and the culture and practices that surround car audio competitions. I begin by examining whether, and how, car audio can be thought of as a 'music scene' and in what ways the culture and practice of car audio may fit within post-subcultural discourses. Following this, I offer a description of car audio competitions, revealing some of the practices that define this aspect of car audio scenes. In particular, I concentrate on sound pressure level (SPL) competitions and some of the interesting aspects of the SPL scene. Finally, I briefly examine how the powerful effects (and affects) of bass frequencies are an important part of the attraction of loud car audio systems and how car audio systems contribute to the territorializing of urban spaces.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This is an exploratory study into the effective use of embedding custom made audiovisual case studies (AVCS) in enhancing the student’s learning experience. This paper describes a project that used AVCS for a large divergent cohort of undergraduate students, enrolled in an International Business course. The study makes a number of key contributions to advancing learning and teaching within the discipline. AVCS provide first hand reporting of the case material, where the students have the ability to improve their understanding from both verbal and nonverbal cues. The paper demonstrates how AVCS can be embedded in a student-centred teaching approach to capture the students’ interest and to enhance a deep approach to learning by providing real-world authentic experience.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Environmental monitoring has become increasingly important due to the significant impact of human activities and climate change on biodiversity. Environmental sound sources such as rain and insect vocalizations are a rich and underexploited source of information in environmental audio recordings. This paper is concerned with the classification of rain within acoustic sensor re-cordings. We present the novel application of a set of features for classifying environmental acoustics: acoustic entropy, the acoustic complexity index, spectral cover, and background noise. In order to improve the performance of the rain classification system we automatically classify segments of environmental recordings into the classes of heavy rain or non-rain. A decision tree classifier is experientially compared with other classifiers. The experimental results show that our system is effective in classifying segments of environmental audio recordings with an accuracy of 93% for the binary classification of heavy rain/non-rain.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a low-bandwidth multi-robot communication system designed to serve as a backup communication channel in the event a robot suffers a network device fault. While much research has been performed in the area of distributing network communication across multiple robots within a system, individual robots are still susceptible to hardware failure. In the past, such robots would simply be removed from service, and their tasks re-allocated to other members. However, there are times when a faulty robot might be crucial to a mission, or be able to contribute in a less communication intensive area. By allowing robots to encode and decode messages into unique sequences of DTMF symbols, called words, our system is able to facilitate continued low-bandwidth communication between robots without access to network communication. Our results have shown that the system is capable of permitting robots to negotiate task initiation and termination, and is flexible enough to permit a pair of robots to perform a simple turn taking task.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

It is impracticable to upgrade the 18,900 Australian passive crossings as such crossings are often located in remote areas, where power is lacking and with low road and rail traffic. The rail industry is interested in developing innovative in-vehicle technology interventions to warn motorists of approaching trains directly in their vehicles. The objective of this study was therefore to evaluate the benefits of the introduction of such technology. We evaluated the changes in driver performance once the technology is enabled and functioning correctly, as well as the effects of an unsafe failure of the technology? We conducted a driving simulator study where participants (N=15) were familiarised with an in-vehicle audio warning for an extended period. After being familiarised with the system, the technology started failing, and we tested the reaction of drivers with a train approaching. This study has shown that with the traditional passive crossings with RX2 signage, the majority of drivers complied (70%) and looked for trains on both sides of the rail track. With the introduction of the in-vehicle audio message, drivers did not approach crossings faster, did not reduce their safety margins and did not reduce their gaze towards the rail tracks. However participants’ compliance at the stop sign decreased by 16.5% with the technology installed in the vehicle. The effect of the failure of the in-vehicle audio warning technology showed that most participants did not experience difficulties in detecting the approaching train even though they did not receive any warning message. This showed that participants were still actively looking for trains with the system in their vehicle. However, two participants did not stop and one decided to beat the train when they did not receive the audio message, suggesting potential human factors issues to be considered with such technology.