986 resultados para speech signals
Resumo:
A new interpolation technique has been developed for replacing missing samples in a sampled waveform drawn from a stationary stochastic process, given the power spectrum for the process. The method works with a finite block of data and is based on the assumption that components of the block DFT are Gaussian zero-mean independent random variables with variance proportional to the power spectrum at each frequency value. These assumptions make the interpolator particularly suitable for signals with a sharply-defined harmonic structure, such as audio waveforms recorded from music or voiced speech. Some results are presented and comparisons are made with existing techniques.
Resumo:
There are many methods for decomposing signals into a sum of amplitude and frequency modulated sinusoids. In this paper we take a new estimation based approach. Identifying the problem as ill-posed, we show how to regularize the solution by imposing soft constraints on the amplitude and phase variables of the sinusoids. Estimation proceeds using a version of Kalman smoothing. We evaluate the method on synthetic and natural, clean and noisy signals, showing that it outperforms previous decompositions, but at a higher computational cost. © 2012 IEEE.
Resumo:
The subjective performance of the G. 722 7-kHz wideband speech-coding recommendation using music signals is described. A number of audible distortions specific to music signals were found to be present in real-time evaluations of the coder. As a result, three modifications are proposed which are found to improve the performance for music signals. These modifications are compatible with the G. 722 system configuration. The results obtained clearly demonstrate the very high coding efficiency of subband ADPCM (adaptive differential pulse-code modulation) with comparison to digitally companding and ADM schemes when applied to music signals.
Resumo:
In this work an adaptive modeling and spectral estimation scheme based on a dual Discrete Kalman Filtering (DKF) is proposed for speech enhancement. Both speech and noise signals are modeled by an autoregressive structure which provides an underlying time frame dependency and improves time-frequency resolution. The model parameters are arranged to obtain a combined state-space model and are also used to calculate instantaneous power spectral density estimates. The speech enhancement is performed by a dual discrete Kalman filter that simultaneously gives estimates for the models and the signals. This approach is particularly useful as a pre-processing module for parametric based speech recognition systems that rely on spectral time dependent models. The system performance has been evaluated by a set of human listeners and by spectral distances. In both cases the use of this pre-processing module has led to improved results.
Resumo:
In this work an adaptive filtering scheme based on a dual Discrete Kalman Filtering (DKF) is proposed for Hidden Markov Model (HMM) based speech synthesis quality enhancement. The objective is to improve signal smoothness across HMMs and their related states and to reduce artifacts due to acoustic model's limitations. Both speech and artifacts are modelled by an autoregressive structure which provides an underlying time frame dependency and improves time-frequency resolution. Themodel parameters are arranged to obtain a combined state-space model and are also used to calculate instantaneous power spectral density estimates. The quality enhancement is performed by a dual discrete Kalman filter that simultaneously gives estimates for the models and the signals. The system's performance has been evaluated using mean opinion score tests and the proposed technique has led to improved results.
Resumo:
Medical fields requires fast, simple and noninvasive methods of diagnostic techniques. Several methods are available and possible because of the growth of technology that provides the necessary means of collecting and processing signals. The present thesis details the work done in the field of voice signals. New methods of analysis have been developed to understand the complexity of voice signals, such as nonlinear dynamics aiming at the exploration of voice signals dynamic nature. The purpose of this thesis is to characterize complexities of pathological voice from healthy signals and to differentiate stuttering signals from healthy signals. Efficiency of various acoustic as well as non linear time series methods are analysed. Three groups of samples are used, one from healthy individuals, subjects with vocal pathologies and stuttering subjects. Individual vowels/ and a continuous speech data for the utterance of the sentence "iruvarum changatimaranu" the meaning in English is "Both are good friends" from Malayalam language are recorded using a microphone . The recorded audio are converted to digital signals and are subjected to analysis.Acoustic perturbation methods like fundamental frequency (FO), jitter, shimmer, Zero Crossing Rate(ZCR) were carried out and non linear measures like maximum lyapunov exponent(Lamda max), correlation dimension (D2), Kolmogorov exponent(K2), and a new measure of entropy viz., Permutation entropy (PE) are evaluated for all three groups of the subjects. Permutation Entropy is a nonlinear complexity measure which can efficiently distinguish regular and complex nature of any signal and extract information about the change in dynamics of the process by indicating sudden change in its value. The results shows that nonlinear dynamical methods seem to be a suitable technique for voice signal analysis, due to the chaotic component of the human voice. Permutation entropy is well suited due to its sensitivity to uncertainties, since the pathologies are characterized by an increase in the signal complexity and unpredictability. Pathological groups have higher entropy values compared to the normal group. The stuttering signals have lower entropy values compared to the normal signals.PE is effective in charaterising the level of improvement after two weeks of speech therapy in the case of stuttering subjects. PE is also effective in characterizing the dynamical difference between healthy and pathological subjects. This suggests that PE can improve and complement the recent voice analysis methods available for clinicians. The work establishes the application of the simple, inexpensive and fast algorithm of PE for diagnosis in vocal disorders and stuttering subjects.
Resumo:
A speech message played several metres from the listener in a room is usually heard to have much the same phonetic content as it does when played nearby, even though the different amounts of reflected sound make the temporal envelopes of these signals very different. To study this ‘constancy’ effect, listeners heard speech messages and speech-like sounds comprising 8 auditory-filter shaped noise-bands that had temporal envelopes corresponding to those in these filters when the speech message is played. The ‘contexts’ were “next you’ll get _to click on”, into which a “sir” or “stir” test word was inserted. These test words were from an 11-step continuum, formed by amplitude modulation. Listeners identified the test words appropriately, even in the 8-band conditions where the speech had a ‘robotic’ quality. Constancy was assessed by comparing the influence of room reflections on the test word across conditions where the context had either the same level of room reflections (i.e. from the same, far distance), or where it had a much lower level (i.e. from nearby). Constancy effects were obtained with both the natural- and the 8-band speech. Results are considered in terms of the degree of ‘matching’ between the context’s and test-word’s bands.
Resumo:
Background: Voice processing in real-time is challenging. A drawback of previous work for Hypokinetic Dysarthria (HKD) recognition is the requirement of controlled settings in a laboratory environment. A personal digital assistant (PDA) has been developed for home assessment of PD patients. The PDA offers sound processing capabilities, which allow for developing a module for recognition and quantification HKD. Objective: To compose an algorithm for assessment of PD speech severity in the home environment based on a review synthesis. Methods: A two-tier review methodology is utilized. The first tier focuses on real-time problems in speech detection. In the second tier, acoustics features that are robust to medication changes in Levodopa-responsive patients are investigated for HKD recognition. Keywords such as Hypokinetic Dysarthria , and Speech recognition in real time were used in the search engines. IEEE explorer produced the most useful search hits as compared to Google Scholar, ELIN, EBRARY, PubMed and LIBRIS. Results: Vowel and consonant formants are the most relevant acoustic parameters to reflect PD medication changes. Since relevant speech segments (consonants and vowels) contains minority of speech energy, intelligibility can be improved by amplifying the voice signal using amplitude compression. Pause detection and peak to average power rate calculations for voice segmentation produce rich voice features in real time. Enhancements in voice segmentation can be done by inducing Zero-Crossing rate (ZCR). Consonants have high ZCR whereas vowels have low ZCR. Wavelet transform is found promising for voice analysis since it quantizes non-stationary voice signals over time-series using scale and translation parameters. In this way voice intelligibility in the waveforms can be analyzed in each time frame. Conclusions: This review evaluated HKD recognition algorithms to develop a tool for PD speech home-assessment using modern mobile technology. An algorithm that tackles realtime constraints in HKD recognition based on the review synthesis is proposed. We suggest that speech features may be further processed using wavelet transforms and used with a neural network for detection and quantification of speech anomalies related to PD. Based on this model, patients' speech can be automatically categorized according to UPDRS speech ratings.
Resumo:
Objective: To evaluate the maximum residual signal auto-correlation also known as pitch amplitude (PA) values in patients with Parkinson's disease (PD) patients. Method. The signals of 21 Parkinson's patients were compared with 15 healthy individuals, divided according age and gender. Results: Statistical difference was seen between groups for PA, 0.39 for controls and 0.25 for PD. Normal value threshold was set as 0.3; (p <= 0.001). In the Parkinson's group 80.77%, and in the control group only 12.28%, had a PA < 0.3 demonstrating an association between these variables. The dispersion diagram for age and PA for PD individuals showed p=0.01 and r=0.54. There was no significant difference in relation to gender and PA between groups: Conclusion: the significant differences in pitch's amplitude between PD patients and healthy individuals demonstrate the methods specificity.-The results showed the need of prospective controlled studies,to improve the use and indications of residual signal auto-correlation to evaluate speech in PD patients.