8 resultados para TRACT
em Cambridge University Engineering Department Publications Database
Resumo:
We investigated whether stimulation of the pyramidal tract (PT) could reset the phase of 15-30 Hz beta oscillations observed in the macaque motor cortex. We recorded local field potentials (LFPs) and multiple single-unit activity from two conscious macaque monkeys performing a precision grip task. EMG activity was also recorded from the second animal. Single PT stimuli were delivered during the hold period of the task, when oscillations in the LFP were most prominent. Stimulus-triggered averaging of the LFP showed a phase-locked oscillatory response to PT stimulation. Frequency domain analysis revealed two components within the response: a 15-30 Hz component, which represented resetting of on-going beta rhythms, and a lower frequency 10 Hz response. Only the higher frequency could be observed in the EMG activity, at stronger stimulus intensities than were required for resetting the cortical rhythm. Stimulation of the PT during movement elicited a greatly reduced oscillatory response. Analysis of single-unit discharge confirmed that PT stimulation was capable of resetting periodic activity in motor cortex. The firing patterns of pyramidal tract neurones (PTNs) and unidentified neurones exhibited successive cycles of suppression and facilitation, time locked to the stimulus. We conclude that PTN activity directly influences the generation of the 15-30 Hz rhythm. These PTNs facilitate EMG activity in upper limb muscles, contributing to corticomuscular coherence at this same frequency. Since the earliest oscillatory effect observed following stimulation was a suppression of firing, we speculate that inhibitory feedback may be the key mechanism generating such oscillations in the motor cortex.
Resumo:
In current methods for voice transformation and speech synthesis, the vocal tract filter is usually assumed to be excited by a flat amplitude spectrum. In this article, we present a method using a mixed source model defined as a mixture of the Liljencrants-Fant (LF) model and Gaussian noise. Using the LF model, the base approach used in this presented work is therefore close to a vocoder using exogenous input like ARX-based methods or the Glottal Spectral Separation (GSS) method. Such approaches are therefore dedicated to voice processing promising an improved naturalness compared to generic signal models. To estimate the Vocal Tract Filter (VTF), using spectral division like in GSS, we show that a glottal source model can be used with any envelope estimation method conversely to ARX approach where a least square AR solution is used. We therefore derive a VTF estimate which takes into account the amplitude spectra of both deterministic and random components of the glottal source. The proposed mixed source model is controlled by a small set of intuitive and independent parameters. The relevance of this voice production model is evaluated, through listening tests, in the context of resynthesis, HMM-based speech synthesis, breathiness modification and pitch transposition. © 2012 Elsevier B.V. All rights reserved.
Resumo:
This paper discusses the Cambridge University HTK (CU-HTK) system for the automatic transcription of conversational telephone speech. A detailed discussion of the most important techniques in front-end processing, acoustic modeling and model training, language and pronunciation modeling are presented. These include the use of conversation side based cepstral normalization, vocal tract length normalization, heteroscedastic linear discriminant analysis for feature projection, minimum phone error training and speaker adaptive training, lattice-based model adaptation, confusion network based decoding and confidence score estimation, pronunciation selection, language model interpolation, and class based language models. The transcription system developed for participation in the 2002 NIST Rich Transcription evaluations of English conversational telephone speech data is presented in detail. In this evaluation the CU-HTK system gave an overall word error rate of 23.9%, which was the best performance by a statistically significant margin. Further details on the derivation of faster systems with moderate performance degradation are discussed in the context of the 2002 CU-HTK 10 × RT conversational speech transcription system. © 2005 IEEE.
Resumo:
Simultaneous recording from multiple single neurones presents many technical difficulties. However, obtaining such data has many advantages, which make it highly worthwhile to overcome the technical problems. This report describes methods which we have developed to permit recordings in awake behaving monkeys using the 'Eckhorn' 16 electrode microdrive. Structural magnetic resonance images are collected to guide electrode placement. Head fixation is achieved using a specially designed headpiece, modified for the multiple electrode approach, and access to the cortex is provided via a novel recording chamber. Growth of scar tissue over the exposed dura mater is reduced using an anti-mitotic compound. Control of the microdrive is achieved by a computerised system which permits several experimenters to move different electrodes simultaneously, considerably reducing the load on an individual operator. Neurones are identified as pyramidal tract neurones by antidromic stimulation through chronically implanted electrodes; stimulus control is integrated into the computerised system. Finally, analysis of multiple single unit recordings requires accurate methods to correct for non-stationarity in unit firing. A novel technique for such correction is discussed.
Resumo:
Human listeners can identify vowels regardless of speaker size, although the sound waves for an adult and a child speaking the ’same’ vowel would differ enormously. The differences are mainly due to the differences in vocal tract length (VTL) and glottal pulse rate (GPR) which are both related to body size. Automatic speech recognition machines are notoriously bad at understanding children if they have been trained on the speech of an adult. In this paper, we propose that the auditory system adapts its analysis of speech sounds, dynamically and automatically to the GPR and VTL of the speaker on a syllable-to-syllable basis. We illustrate how this rapid adaptation might be performed with the aid of a computational version of the auditory image model, and we propose that an auditory preprocessor of this form would improve the robustness of speech recognisers.