Biblioteca Digital

929 resultados para Audio acoustics

Cross database training of audio-visual hidden Markov models for phone recognition

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Speech recognition can be improved by using visual information in the form of lip movements of the speaker in addition to audio information. To date, state-of-the-art techniques for audio-visual speech recognition continue to use audio and visual data of the same database for training their models. In this paper, we present a new approach to make use of one modality of an external dataset in addition to a given audio-visual dataset. By so doing, it is possible to create more powerful models from other extensive audio-only databases and adapt them on our comparatively smaller multi-stream databases. Results show that the presented approach outperforms the widely adopted synchronous hidden Markov models (HMM) trained jointly on audio and visual data of a given audio-visual database for phone recognition by 29% relative. It also outperforms the external audio models trained on extensive external audio datasets and also internal audio models by 5.5% and 46% relative respectively. We also show that the proposed approach is beneficial in noisy environments where the audio source is affected by the environmental noise.

Similarity-based birdcall retrieval from environmental audio

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Automated digital recordings are useful for large-scale temporal and spatial environmental monitoring. An important research effort has been the automated classification of calling bird species. In this paper we examine a related task, retrieval of birdcalls from a database of audio recordings, similar to a user supplied query call. Such a retrieval task can sometimes be more useful than an automated classifier. We compare three approaches to similarity-based birdcall retrieval using spectral ridge features and two kinds of gradient features, structure tensor and the histogram of oriented gradients. The retrieval accuracy of our spectral ridge method is 94% compared to 82% for the structure tensor method and 90% for the histogram of gradients method. Additionally, this approach potentially offers a more compact representation and is more computationally efficient.

Impedance tube technology for flow acoustics

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Acoustic impedance of a termination, or of a passive subsystem, needs to be measured not only for acoustic lining materials but also in the exhaust systems of flow machinery, where mean flow introduces peculiar problems. Out of the various methods of measurement of acoustic impedance, the discrete frequency, steady state, impedance tube method [1] is most reliable, though time consuming, and requires no special instrumentation.

Perceptions of audio feedback in higher education assessment

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this paper is to present results of research investigating the effectiveness of audio feedback in a third year undergraduate unit. While there is a large and growing body of literature about providing assessment feedback, there is little focussing on the use of audio media. This study employs a mixed method approach, involving semi-structured interviews with academic staff and a survey of students. Analysis of the interview data suggests that there are a number of issues surrounding acceptance of using audio feedback by lecturers. The next stage of the study is to examine the extent to which lecturers change their perceptions as they use audio feedback and to analyse the perceptions of the students (n=120), including the perceived importance of feedback, the ways in which they used the audio feedback and the extent to which they believe they control events that affect them. Ultimately, this study seeks to provide recommendations appropriate to the implementation of audio feedback in higher education.

'Take away from the dry sixties style marking': Lecturer and student perceptions and experiences of audio feedback

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Providing audio feedback to assessment is relatively uncommon in higher education. However, published research suggests that it is preferred over written feedback by students but lecturers were less convinced. The aim of this paper is to examine further these findings in the context of a third year business ethics unit. Data was collected from two sources. The first is a series of in-depth, semi-structured interviews conducted with three lecturers providing audio feeback for the first time in Semester One 2011. The second source of data was drawn from the university student evaluation system. A total of 363 responses were used providing 'before' and 'after' perspectives about the effectiveness of audio feedback versus written feedback. Between 2005 and 2009 the survey data provided information about student attitudes to written assessment feedback (n=261). From 2010 onwards the data relates to audio (mp3) feedback (n=102). The analysis of he interview data indicated that introducing audio feedback should be done with care. The perception of the participating lecturers was mixed, ranging from sceptism to outright enthusiasm, but over time the overall approach became positive. It was found that particular attention needs to be paid to small (but important) technical details, and lecturers need to be convinced of its effectieness, especially that it is not necessarily more time consuming than providing written feedback. For students, the analysis revealed a clear preference for audio feedback. It is concluded that there is cause for concern and reason for optimism. It is a cause for concern because there is a possibility that scepticism on the part of academic staff seems to be based on assumptions about what students prefer and a concern about using the technology. There is reason for optimism because the evidence points towards students preferring audio feedback and as academic staff become more familiar with the technology the scepticism tends to evaporate. While this study is limited in scope, questions are raised about tackling negative staff perceptions of audio feedback that are worthy of further research.

Content-based birdcall retrieval from environmental audio

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This research investigates techniques to analyse long duration acoustic recordings to help ecologists monitor birdcall activities. It designs a generalized algorithm to identify a broad range of bird species. It allows ecologists to search for arbitrary birdcalls of interest, rather than restricting them to just a very limited number of species on which the recogniser is trained. The algorithm can help ecologists find sounds of interest more efficiently by filtering out large volumes of unwanted sounds and only focusing on birdcalls.

Effects of unilateral vocal fold paralysis treatment with fascia augmentation on airflow mechanics, tracheal sounds and voice acoustics

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data on the influence of unilateral vocal fold paralysis on breathing, especially other than information obtained by spirometry, are relatively scarce. Even less is known about the effect of its treatment by vocal fold medialization. Consequently, there was a need to study the issue by combining multiple instruments capable of assessing airflow dynamics and voice. This need was emphasized by a recently developed medialization technique, autologous fascia injection; its effects on breathing have not previously been investigated. A cohort of ten patients with unilateral vocal fold paralysis was studied before and after autologous fascia injection by using flow-volume spirometry, body plethysmography and acoustic analysis of breathing and voice. Preoperative results were compared with those of ten healthy controls. A second cohort of 11 subjects with unilateral vocal fold paralysis was studied pre- and postoperatively by using flow-volume spirometry, impulse oscillometry, acoustic analysis of voice, voice handicap index and subjective assessment of dyspnoea. Preoperative peak inspiratory flow and specific airway conductance were significantly lower and airway resistance was significantly higher in the patients than in the healthy controls (78% vs. 107%, 73% vs. 116% and 182% vs. 125% of predicted; p = 0.004, p = 0.004 and p = 0.026, respectively). Patients had a higher root mean square of spectral power of tracheal sounds than controls, and three of them had wheezes as opposed to no wheezing in healthy subjects. Autologous fascia injection significantly improved acoustic parameters of the voice in both cohorts and voice handicap index in the latter cohort, indicating that this procedure successfully improved voice in unilateral vocal fold paralysis. Peak inspiratory flow decreased significantly as a consequence of this procedure (from 4.54 ± 1.68 l to 4.21 ± 1.26 l, p = 0.03, in pooled data of both cohorts), but no change occurred in the other variables of flow-volume spirometry, body-plethysmography and impulse oscillometry. Eight of the ten patients studied by acoustic analysis of breathing had wheezes after vocal fold medialization compared with only three patients before the procedure, and the numbers of wheezes per recorded inspirium and expirium increased significantly (from 0.02 to 0.42 and from 0.03 to 0.36; p = 0.028 and p = 0.043, respectively). In conclusion, unilateral vocal fold paralysis was observed to disturb forced breathing and also to cause some signs of disturbed tidal breathing. Findings of flow volume spirometry were consistent with variable extra-thoracic obstruction. Vocal fold medialization by autologous fascia injection improved the quality of the voice in patients with unilateral vocal fold paralysis, but also decreased peak inspiratory flow and induced wheezing during tidal breathing. However, these airflow changes did not appear to cause significant symptoms in patients.

Using multi-label classification for acoustic pattern detection and assisting bird species surveys

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Acoustics is a rich source of environmental information that can reflect the ecological dynamics. To deal with the escalating acoustic data, a variety of automated classification techniques have been used for acoustic patterns or scene recognition, including urban soundscapes such as streets and restaurants; and natural soundscapes such as raining and thundering. It is common to classify acoustic patterns under the assumption that a single type of soundscapes present in an audio clip. This assumption is reasonable for some carefully selected audios. However, only few experiments have been focused on classifying simultaneous acoustic patterns in long-duration recordings. This paper proposes a binary relevance based multi-label classification approach to recognise simultaneous acoustic patterns in one-minute audio clips. By utilising acoustic indices as global features and multilayer perceptron as a base classifier, we achieve good classification performance on in-the-field data. Compared with single-label classification, multi-label classification approach provides more detailed information about the distributions of various acoustic patterns in long-duration recordings. These results will merit further biodiversity investigations, such as bird species surveys.

East Asian audio-visual collaboration and the global expansion of Chinese media

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In recent years, many of the world’s leading media producers, screenwriters, technicians and investors, particularly those in the Asia-Pacific region, have been drawn to work in the People's Republic of China (hereafter China or Mainland China). Media projects with a lighter commercial entertainment feel – compared with the heavy propaganda-oriented content of the past – have multiplied, thanks to the Chinese state’s newfound willingness to consider collaboration with foreign partners. This is no more evident than in film. Despite their long-standing reputation for rigorous censorship, state policymakers are now encouraging Chinese media entrepreneurs to generate fresh ideas and to develop products that will revitalise the stagnant domestic production sector. It is hoped that an increase in both the quality and quantity of domestic feature films, stimulated by an infusion of creativity and cutting-edge technology from outside the country, will help reverse China’s ‘cultural trade deficit’ (wenhua maoyi chizi) (Keane 2007).

Is there a difference in sound acoustics between aspirating versus non-aspirating swallows in children with dysphagia?

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cervical auscultation (CA) may be used to complement the clinical feeding examination when assessing for oropharyngeal aspiration (OPA). Data exists on the acoustic properties of normal and abnormal swallowing sounds in adults and children. However, there are no published paediatric studies comparing the acoustic properties of sounds comparing OPA with non-OPA swallows. We aimed to determine if there is an acoustic difference between modified barium swallow (MBS)-identified OPA and non-OPA swallow sounds in children.

C. V. Raman Centenary Symposium on Acoustics, 25-28 October 1988

Relevância:

20.00% 20.00%

Publicador:

Reduced Rate Ultra Low Delay Audio Coder using Multistage Vector Quantization

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Communication applications are usually delay restricted, especially for the instance of musicians playing over the Internet. This requires a one-way delay of maximum 25 msec and also a high audio quality is desired at feasible bit rates. The ultra low delay (ULD) audio coding structure is well suited to this application and we investigate further the application of multistage vector quantization (MSVQ) to reach a bit rate range below 64 Kb/s, in a scalable manner. Results at 32 Kb/s and 64 Kb/s show that the trained codebook MSVQ performs best, better than KLT normalization followed by a simulated Gaussian MSVQ or simulated Gaussian MSVQ alone. The results also show that there is only a weak dependence on the training data, and that we indeed converge to the perceptual quality of our previous ULD coder at 64 Kb/s.

Analysis and Synthesis of Reverberation for Parametric Stereo Audio

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a parametric stereo coding analysis and synthesis directly in the MDCT domain using an analysis by synthesis parameter estimation. The stereo signal is represented by an equalized sum signal and spatialization parameters. Equalized sum signal and the spatialization parameters are obtained by sub-band analysis in the MDCT domain. The de-correlated signal required for the stereo synthesis is also generated in the MDCT domain. Subjective evaluation test using MUSHRA shows that the synthesized stereo signal is perceptually satisfactory and comparable to the state of the art parametric coders.

A hybrid pre-whitening technique for detection of additive spread spectrum watermarks in audio signals

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Pre-whitening techniques are employed in blind correlation detection of additive spread spectrum watermarks in audio signals to reduce the host signal interference. A direct deterministic whitening (DDW) scheme is derived in this paper from the frequency domain analysis of the time domain correlation process. Our experimental studies reveal that, the Savitzky-Golay Whitening (SGW), which is otherwise inferior to DDW technique, performs better when the audio signal is predominantly lowpass. The novelty of this paper lies in exploiting the complementary nature to the two whitening techniques to obtain a hybrid whitening (HbW) scheme. In the hybrid scheme the DDW and SGW techniques are selectively applied, based on short time spectral characteristics of the audio signal. The hybrid scheme extends the reliability of watermark detection to a wider range of audio signals.

Carnatic music analysis: Shadja, swara identification and rAga verification in AlApana using stochastic models

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We analyze the AlApana of a Carnatic music piece without the prior knowledge of the singer or the rAga. AlApana is ameans to communicate to the audience, the flavor or the bhAva of the rAga through the permitted notes and its phrases. The input to our analysis is a recording of the vocal AlApana along with the accompanying instrument. The AdhAra shadja(base note) of the singer for that AlApana is estimated through a stochastic model of note frequencies. Based on the shadja, we identify the notes (swaras) used in the AlApana using a semi-continuous GMM. Using the probabilities of each note interval, we recognize swaras of the AlApana. For sampurNa rAgas, we can identify the possible rAga, based on the swaras. We have been able to achieve correct shadja identification, which is crucial to all further steps, in 88.8% of 55 AlApanas. Among them (48 AlApanas of 7 rAgas), we get 91.5% correct swara identification and 62.13% correct R (rAga) accuracy.

«
1
2
3
4
5
6
7
8
...
61
62
»