988 resultados para speech disorder
Resumo:
In this paper, using the intrinsically disordered oncoprotein Myc as an example, we present a mathematical model to help explain how protein oscillatory dynamics can influence state switching. Earlier studies have demonstrated that, while Myc overexpression can facilitate state switching and transform a normal cell into a cancer phenotype, its downregulation can reverse state-switching. A fundamental aspect of the model is that a Myc threshold determines cell fate in cells expressing p53. We demonstrate that a non-cooperative positive feedback loop coupled with Myc sequestration at multiple binding sites can generate bistable Myc levels. Normal quiescent cells with Myc levels below the threshold can respond to mitogenic signals to activate the cyclin/cdk oscillator for limited cell divisions but the p53/Mdm2 oscillator remains nonfunctional. In response to stress, the p53/Mdm2 oscillator is activated in pulses that are critical to DNA repair. But if stress causes Myc levels to cross the threshold, Myc inactivates the p53/Mdm2 oscillator, abrogates p53 pulses, and pushes the cyclin/cdk oscillator into overdrive sustaining unchecked proliferation seen in cancer. However, if Myc is downregulated, the cyclin/cdk oscillator is inactivated and the p53/Mdm2 oscillator is reset and the cancer phenotype is reversed. (C) 2015 Elsevier Ltd. All rights reserved.
Resumo:
Speech polarity detection is a crucial first step in many speech processing techniques. In this paper, an algorithm is proposed that improvises the existing technique using the skewness of the voice source (VS) signal. Here, the integrated linear prediction residual (ILPR) is used as the VS estimate, which is obtained using linear prediction on long-term frames of the low-pass filtered speech signal. This excludes the unvoiced regions from analysis and also reduces the computation. Further, a modified skewness measure is proposed for decision, which also considers the magnitude of the skewness of the ILPR along with its sign. With the detection error rate (DER) as the performance metric, the algorithm is tested on 8 large databases and its performance (DER=0.20%) is found to be comparable to that of the best technique (DER=0.06%) on both clean and noisy speech. Further, the proposed method is found to be ten times faster than the best technique.
Resumo:
We propose apractical, feature-level and score-level fusion approach by combining acoustic and estimated articulatory information for both text independent and text dependent speaker verification. From a practical point of view, we study how to improve speaker verification performance by combining dynamic articulatory information with the conventional acoustic features. On text independent speaker verification, we find that concatenating articulatory features obtained from measured speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the performance dramatically. However, since directly measuring articulatory data is not feasible in many real world applications, we also experiment with estimated articulatory features obtained through acoustic-to-articulatory inversion. We explore both feature level and score level fusion methods and find that the overall system performance is significantly enhanced even with estimated articulatory features. Such a performance boost could be due to the inter-speaker variation information embedded in the estimated articulatory features. Since the dynamics of articulation contain important information, we included inverted articulatory trajectories in text dependent speaker verification. We demonstrate that the articulatory constraints introduced by inverted articulatory features help to reject wrong password trials and improve the performance after score level fusion. We evaluate the proposed methods on the X-ray Microbeam database and the RSR 2015 database, respectively, for the aforementioned two tasks. Experimental results show that we achieve more than 15% relative equal error rate reduction for both speaker verification tasks. (C) 2015 Elsevier Ltd. All rights reserved.
Resumo:
The complex nature of the structural disorder in the lead-free ferroelectric Na1/2Bi1/2TiO3 has a profound impact on the perceived global structure and polar properties. In this paper, we have investigated the effect of electric field and temperature on the local structure around theBi and Ti atoms using extended x-ray absorption fine structure. Detailed analysis revealed that poling brings about a noticeable change in the bond distances associated with the Bi-coordination sphere, whereas the Ti coordination remains unaffected. We also observed discontinuity in the Bi-O bond lengths across the depolarization temperature of the poled specimen. These results establish that the disappearance of the monoclinic-like (Cc) global distortion, along with the drastic suppression of the short-ranged in-phase octahedral tilt after poling B. N. Rao et al., Phys. Rev. B 88, 224103 (2013)] is a result of the readjustment of theA-O bonds by the electric field, so as to be in conformity with the rhombohedral R3c structure.
Resumo:
Oversmoothing of speech parameter trajectories is one of the causes for quality degradation of HMM-based speech synthesis. Various methods have been proposed to overcome this effect, the most recent ones being global variance (GV) and modulation-spectrum-based post-filter (MSPF). However, there is still a significant quality gap between natural and synthesized speech. In this paper, we propose a two-fold post-filtering technique to alleviate to a certain extent the oversmoothing of spectral and excitation parameter trajectories of HMM-based speech synthesis. For the spectral parameters, we propose a sparse coding-based post-filter to match the trajectories of synthetic speech to that of natural speech, and for the excitation trajectory, we introduce a perceptually motivated post-filter. Experimental evaluations show quality improvement compared with existing methods.
Resumo:
Speech enhancement in stationary noise is addressed using the ideal channel selection framework. In order to estimate the binary mask, we propose to classify each time-frequency (T-F) bin of the noisy signal as speech or noise using Discriminative Random Fields (DRF). The DRF function contains two terms - an enhancement function and a smoothing term. On each T-F bin, we propose to use an enhancement function based on likelihood ratio test for speech presence, while Ising model is used as smoothing function for spectro-temporal continuity in the estimated binary mask. The effect of the smoothing function over successive iterations is found to reduce musical noise as opposed to using only enhancement function. The binary mask is inferred from the noisy signal using Iterated Conditional Modes (ICM) algorithm. Sentences from NOIZEUS corpus are evaluated from 0 dB to 15 dB Signal to Noise Ratio (SNR) in 4 kinds of additive noise settings: additive white Gaussian noise, car noise, street noise and pink noise. The reconstructed speech using the proposed technique is evaluated in terms of average segmental SNR, Perceptual Evaluation of Speech Quality (PESQ) and Mean opinion Score (MOS).
Resumo:
We have investigated the multiferroic and glassy behaviour of metal-organic framework (MOF) material (CH3)(2)NH2Co(CHOO)(3). The compound has perovskite-like architecture in which the metal-formate forms a framework. The organic cation (CH3)(2)NH2+ occupies the cavities in the formate framework in the framework via N-H center dot center dot center dot O hydrogen bonds. At room temperature, the organic cation is disordered and occupies three crystallographically equivalent positions. Upon cooling, the organic cation is ordered which leads to a structural phase transition at 155 K. The structural phase transition is associated with a para-ferroelectric phase transition and is revealed by dielectric and pyroelectric measurements. Further, a PE hysteresis loop below 155 K confirms the ferroelectric behaviour of the material. Analysis of dielectric data reveal large frequency dispersion in the values of dielectric constant and tan delta which signifies the presence of glassy dielectric behaviour. The material displays a antiferromagnetic ordering below 15 K which is attributed to the super-exchange interaction between Co2+ ions mediated via formate linkers. Interestingly, another magnetic transition is also found around 11 K. The peak of the transition shifts to lower temperature with increasing frequency, suggesting glassy magnetism in the sample. (C) 2016 AIP Publishing LLC.
Resumo:
Acoustic feature based speech (syllable) rate estimation and syllable nuclei detection are important problems in automatic speech recognition (ASR), computer assisted language learning (CALL) and fluency analysis. A typical solution for both the problems consists of two stages. The first stage involves computing a short-time feature contour such that most of the peaks of the contour correspond to the syllabic nuclei. In the second stage, the peaks corresponding to the syllable nuclei are detected. In this work, instead of the peak detection, we perform a mode-shape classification, which is formulated as a supervised binary classification problem - mode-shapes representing the syllabic nuclei as one class and remaining as the other. We use the temporal correlation and selected sub-band correlation (TCSSBC) feature contour and the mode-shapes in the TCSSBC feature contour are converted into a set of feature vectors using an interpolation technique. A support vector machine classifier is used for the classification. Experiments are performed separately using Switchboard, TIMIT and CTIMIT corpora in a five-fold cross validation setup. The average correlation coefficients for the syllable rate estimation turn out to be 0.6761, 0.6928 and 0.3604 for three corpora respectively, which outperform those obtained by the best of the existing peak detection techniques. Similarly, the average F-scores (syllable level) for the syllable nuclei detection are 0.8917, 0.8200 and 0.7637 for three corpora respectively. (C) 2016 Elsevier B.V. All rights reserved.
Resumo:
This paper describes the development of the 2003 CU-HTK large vocabulary speech recognition system for Conversational Telephone Speech (CTS). The system was designed based on a multi-pass, multi-branch structure where the output of all branches is combined using system combination. A number of advanced modelling techniques such as Speaker Adaptive Training, Heteroscedastic Linear Discriminant Analysis, Minimum Phone Error estimation and specially constructed Single Pronunciation dictionaries were employed. The effectiveness of each of these techniques and their potential contribution to the result of system combination was evaluated in the framework of a state-of-the-art LVCSR system with sophisticated adaptation. The final 2003 CU-HTK CTS system constructed from some of these models is described and its performance on the DARPA/NIST 2003 Rich Transcription (RT-03) evaluation test set is discussed.
Resumo:
The electronic structure of amorphous diamond-like carbon is studied. Analysis of the participation ratio shows that π states within the σ-σ* gap are localized. The localization arises from dihedral angle disorder. The localization of π states causes the mobility gap to exceed the optical gap, which accounts for the low carrier mobility and the flat photoluminesence excitation spectrum. © 1998 Elsevier Science B.V. All rights reserved.
Discriminative language model adaptation for Mandarin broadcast speech transcription and translation
Resumo:
This paper investigates unsupervised test-time adaptation of language models (LM) using discriminative methods for a Mandarin broadcast speech transcription and translation task. A standard approach to adapt interpolated language models to is to optimize the component weights by minimizing the perplexity on supervision data. This is a widely made approximation for language modeling in automatic speech recognition (ASR) systems. For speech translation tasks, it is unclear whether a strong correlation still exists between perplexity and various forms of error cost functions in recognition and translation stages. The proposed minimum Bayes risk (MBR) based approach provides a flexible framework for unsupervised LM adaptation. It generalizes to a variety of forms of recognition and translation error metrics. LM adaptation is performed at the audio document level using either the character error rate (CER), or translation edit rate (TER) as the cost function. An efficient parameter estimation scheme using the extended Baum-Welch (EBW) algorithm is proposed. Experimental results on a state-of-the-art speech recognition and translation system are presented. The MBR adapted language models gave the best recognition and translation performance and reduced the TER score by up to 0.54% absolute. © 2007 IEEE.