313 resultados para audio processing


Relevância:

20.00% 20.00%

Publicador:

Resumo:

An increasingly common scenario in building speech synthesis and recognition systems is training on inhomogeneous data. This paper proposes a new framework for estimating hidden Markov models on data containing both multiple speakers and multiple languages. The proposed framework, speaker and language factorization, attempts to factorize speaker-/language-specific characteristics in the data and then model them using separate transforms. Language-specific factors in the data are represented by transforms based on cluster mean interpolation with cluster-dependent decision trees. Acoustic variations caused by speaker characteristics are handled by transforms based on constrained maximum-likelihood linear regression. Experimental results on statistical parametric speech synthesis show that the proposed framework enables data from multiple speakers in different languages to be used to: train a synthesis system; synthesize speech in a language using speaker characteristics estimated in a different language; and adapt to a new language. © 2012 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we consider Bayesian interpolation and parameter estimation in a dynamic sinusoidal model. This model is more flexible than the static sinusoidal model since it enables the amplitudes and phases of the sinusoids to be time-varying. For the dynamic sinusoidal model, we derive a Bayesian inference scheme for the missing observations, hidden states and model parameters of the dynamic model. The inference scheme is based on a Markov chain Monte Carlo method known as Gibbs sampler. We illustrate the performance of the inference scheme to the application of packet-loss concealment of lost audio and speech packets. © EURASIP, 2010.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents an automatic speaker recognition system for intelligence applications. The system has to provide functionalities for a speaker skimming application in which databases of recorded conversations belonging to an ongoing investigation can be annotated and quickly browsed by an operator. The paper discusses the criticalities introduced by the characteristics of the audio signals under consideration - in particular background noise and channel/coding distortions - as well as the requirements and functionalities of the system under development. It is shown that the performance of state-of-the-art approaches degrades significantly in presence of moderately high background noise. Finally, a novel speaker recognizer based on phonetic features and an ensemble classifier is presented. Results show that the proposed approach improves performance on clean audio, and suggest that it can be employed towards improved real-world robustness. © EURASIP, 2009.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Psychophysical evidence suggests that sensations arising from our own movements are diminished when predicted by motor forward models and that these models may also encode the timing and intensity of movement. Here we report a functional magnetic resonance imaging study in which the effects on sensation of varying the occurrence, timing and force of movements were measured. We observed that tactile-related activity in a region of secondary somatosensory cortex is reduced when sensation is associated with movement and further that this reduction is maximal when movement and sensation occur synchronously. Motor force is not represented in the degree of attenuation but rather in the magnitude of this region's response. These findings provide neurophysiological correlates of previously-observed behavioural forward-model phenomena, and advocate the adopted approach for the study of clinical conditions in which forward-model deficits have been posited to play a crucial role.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Graphene is at the center of an ever growing research effort due to its unique properties, interesting for both fundamental science and applications. A key requirement for applications is the development of industrial-scale, reliable, inexpensive production processes. Here we review the state of the art of graphene preparation, production, placement and handling. Graphene is just the first of a new class of two dimensional materials, derived from layered bulk crystals. Most of the approaches used for graphene can be extended to these crystals, accelerating their journey towards applications. © 2012 Elsevier Ltd.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The change in acoustic characteristics in personal computers to console gaming and home entertainment systems with the change in the Graphics Processing Unit (GPU), is presented. The tests are carried out using identical configurations of the software and system hardware. The prime components of the hardware used in the project are central processing unit, motherboard, hard disc drive, memory, power supply, optical drive, and additional cooling system. The results from the measurements taken for each GPU tested are analyzed and compared. The test results are obtained using a photo tachometer and reflective tape adhered to one particular fan blade. The test shows that loudness is a psychoacoustic metric developed by Zwicker and Fastal that aims to quantify how loud a sound is perceived as compared to a standard sound. The acoustic experiment reveals that the inherent noise generation mechanism increases with the increase of the complexity of the cooling solution.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Human listeners can identify vowels regardless of speaker size, although the sound waves for an adult and a child speaking the ’same’ vowel would differ enormously. The differences are mainly due to the differences in vocal tract length (VTL) and glottal pulse rate (GPR) which are both related to body size. Automatic speech recognition machines are notoriously bad at understanding children if they have been trained on the speech of an adult. In this paper, we propose that the auditory system adapts its analysis of speech sounds, dynamically and automatically to the GPR and VTL of the speaker on a syllable-to-syllable basis. We illustrate how this rapid adaptation might be performed with the aid of a computational version of the auditory image model, and we propose that an auditory preprocessor of this form would improve the robustness of speech recognisers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Psychological factors play a major role in exacerbating chronic pain. Effective self-management of pain is often hindered by inaccurate beliefs about the nature of pain which lead to a high degree of emotional reactivity. Probabilistic models of perception state that greater confidence (certainty) in beliefs increases their influence on perception and behavior. In this study, we treat confidence as a metacognitive process dissociable from the content of belief. We hypothesized that confidence is associated with anticipatory activation of areas of the pain matrix involved with top-down modulation of pain. Healthy volunteers rated their beliefs about the emotional distress that experimental pain would cause, and separately rated their level of confidence in this belief. Confidence predicted the influence of anticipation cues on experienced pain. We measured brain activity during anticipation of pain using high-density EEG and used electromagnetic tomography to determine neural substrates of this effect. Confidence correlated with activity in right anterior insula, posterior midcingulate and inferior parietal cortices during the anticipation of pain. Activity in the right anterior insula predicted a greater influence of anticipation cues on pain perception, whereas activity in right inferior parietal cortex predicted a decreased influence of anticipatory cues. The results support probabilistic models of pain perception and suggest that confidence in beliefs is an important determinant of expectancy effects on pain perception.