23 resultados para Speech signals

em Cochin University of Science


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modeling nonlinear systems using Volterra series is a century old method but practical realizations were hampered by inadequate hardware to handle the increased computational complexity stemming from its use. But interest is renewed recently, in designing and implementing filters which can model much of the polynomial nonlinearities inherent in practical systems. The key advantage in resorting to Volterra power series for this purpose is that nonlinear filters so designed can be made to work in parallel with the existing LTI systems, yielding improved performance. This paper describes the inclusion of a quadratic predictor (with nonlinearity order 2) with a linear predictor in an analog source coding system. Analog coding schemes generally ignore the source generation mechanisms but focuses on high fidelity reconstruction at the receiver. The widely used method of differential pnlse code modulation (DPCM) for speech transmission uses a linear predictor to estimate the next possible value of the input speech signal. But this linear system do not account for the inherent nonlinearities in speech signals arising out of multiple reflections in the vocal tract. So a quadratic predictor is designed and implemented in parallel with the linear predictor to yield improved mean square error performance. The augmented speech coder is tested on speech signals transmitted over an additive white gaussian noise (AWGN) channel.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech is the primary, most prominent and convenient means of communication in audible language. Through speech, people can express their thoughts, feelings or perceptions by the articulation of words. Human speech is a complex signal which is non stationary in nature. It consists of immensely rich information about the words spoken, accent, attitude of the speaker, expression, intention, sex, emotion as well as style. The main objective of Automatic Speech Recognition (ASR) is to identify whatever people speak by means of computer algorithms. This enables people to communicate with a computer in a natural spoken language. Automatic recognition of speech by machines has been one of the most exciting, significant and challenging areas of research in the field of signal processing over the past five to six decades. Despite the developments and intensive research done in this area, the performance of ASR is still lower than that of speech recognition by humans and is yet to achieve a completely reliable performance level. The main objective of this thesis is to develop an efficient speech recognition system for recognising speaker independent isolated words in Malayalam.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Timely detection of sudden change in dynamics that adversely affect the performance of systems and quality of products has great scientific relevance. This work focuses on effective detection of dynamical changes of real time signals from mechanical as well as biological systems using a fast and robust technique of permutation entropy (PE). The results are used in detecting chatter onset in machine turning and identifying vocal disorders from speech signal.Permutation Entropy is a nonlinear complexity measure which can efficiently distinguish regular and complex nature of any signal and extract information about the change in dynamics of the process by indicating sudden change in its value. Here we propose the use of permutation entropy (PE), to detect the dynamical changes in two non linear processes, turning under mechanical system and speech under biological system.Effectiveness of PE in detecting the change in dynamics in turning process from the time series generated with samples of audio and current signals is studied. Experiments are carried out on a lathe machine for sudden increase in depth of cut and continuous increase in depth of cut on mild steel work pieces keeping the speed and feed rate constant. The results are applied to detect chatter onset in machining. These results are verified using frequency spectra of the signals and the non linear measure, normalized coarse-grained information rate (NCIR).PE analysis is carried out to investigate the variation in surface texture caused by chatter on the machined work piece. Statistical parameter from the optical grey level intensity histogram of laser speckle pattern recorded using a charge coupled device (CCD) camera is used to generate the time series required for PE analysis. Standard optical roughness parameter is used to confirm the results.Application of PE in identifying the vocal disorders is studied from speech signal recorded using microphone. Here analysis is carried out using speech signals of subjects with different pathological conditions and normal subjects, and the results are used for identifying vocal disorders. Standard linear technique of FFT is used to substantiate thc results.The results of PE analysis in all three cases clearly indicate that this complexity measure is sensitive to change in regularity of a signal and hence can suitably be used for detection of dynamical changes in real world systems. This work establishes the application of the simple, inexpensive and fast algorithm of PE for the benefit of advanced manufacturing process as well as clinical diagnosis in vocal disorders.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Speech is the most natural means of communication among human beings and speech processing and recognition are intensive areas of research for the last five decades. Since speech recognition is a pattern recognition problem, classification is an important part of any speech recognition system. In this work, a speech recognition system is developed for recognizing speaker independent spoken digits in Malayalam. Voice signals are sampled directly from the microphone. The proposed method is implemented for 1000 speakers uttering 10 digits each. Since the speech signals are affected by background noise, the signals are tuned by removing the noise from it using wavelet denoising method based on Soft Thresholding. Here, the features from the signals are extracted using Discrete Wavelet Transforms (DWT) because they are well suitable for processing non-stationary signals like speech. This is due to their multi- resolutional, multi-scale analysis characteristics. Speech recognition is a multiclass classification problem. So, the feature vector set obtained are classified using three classifiers namely, Artificial Neural Networks (ANN), Support Vector Machines (SVM) and Naive Bayes classifiers which are capable of handling multiclasses. During classification stage, the input feature vector data is trained using information relating to known patterns and then they are tested using the test data set. The performances of all these classifiers are evaluated based on recognition accuracy. All the three methods produced good recognition accuracy. DWT and ANN produced a recognition accuracy of 89%, SVM and DWT combination produced an accuracy of 86.6% and Naive Bayes and DWT combination produced an accuracy of 83.5%. ANN is found to be better among the three methods.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Speech signals are one of the most important means of communication among the human beings. In this paper, a comparative study of two feature extraction techniques are carried out for recognizing speaker independent spoken isolated words. First one is a hybrid approach with Linear Predictive Coding (LPC) and Artificial Neural Networks (ANN) and the second method uses a combination of Wavelet Packet Decomposition (WPD) and Artificial Neural Networks. Voice signals are sampled directly from the microphone and then they are processed using these two techniques for extracting the features. Words from Malayalam, one of the four major Dravidian languages of southern India are chosen for recognition. Training, testing and pattern recognition are performed using Artificial Neural Networks. Back propagation method is used to train the ANN. The proposed method is implemented for 50 speakers uttering 20 isolated words each. Both the methods produce good recognition accuracy. But Wavelet Packet Decomposition is found to be more suitable for recognizing speech because of its multi-resolution characteristics and efficient time frequency localizations

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Medical fields requires fast, simple and noninvasive methods of diagnostic techniques. Several methods are available and possible because of the growth of technology that provides the necessary means of collecting and processing signals. The present thesis details the work done in the field of voice signals. New methods of analysis have been developed to understand the complexity of voice signals, such as nonlinear dynamics aiming at the exploration of voice signals dynamic nature. The purpose of this thesis is to characterize complexities of pathological voice from healthy signals and to differentiate stuttering signals from healthy signals. Efficiency of various acoustic as well as non linear time series methods are analysed. Three groups of samples are used, one from healthy individuals, subjects with vocal pathologies and stuttering subjects. Individual vowels/ and a continuous speech data for the utterance of the sentence "iruvarum changatimaranu" the meaning in English is "Both are good friends" from Malayalam language are recorded using a microphone . The recorded audio are converted to digital signals and are subjected to analysis.Acoustic perturbation methods like fundamental frequency (FO), jitter, shimmer, Zero Crossing Rate(ZCR) were carried out and non linear measures like maximum lyapunov exponent(Lamda max), correlation dimension (D2), Kolmogorov exponent(K2), and a new measure of entropy viz., Permutation entropy (PE) are evaluated for all three groups of the subjects. Permutation Entropy is a nonlinear complexity measure which can efficiently distinguish regular and complex nature of any signal and extract information about the change in dynamics of the process by indicating sudden change in its value. The results shows that nonlinear dynamical methods seem to be a suitable technique for voice signal analysis, due to the chaotic component of the human voice. Permutation entropy is well suited due to its sensitivity to uncertainties, since the pathologies are characterized by an increase in the signal complexity and unpredictability. Pathological groups have higher entropy values compared to the normal group. The stuttering signals have lower entropy values compared to the normal signals.PE is effective in charaterising the level of improvement after two weeks of speech therapy in the case of stuttering subjects. PE is also effective in characterizing the dynamical difference between healthy and pathological subjects. This suggests that PE can improve and complement the recent voice analysis methods available for clinicians. The work establishes the application of the simple, inexpensive and fast algorithm of PE for diagnosis in vocal disorders and stuttering subjects.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A forward - biased point contact germanium signal diode placed inside a waveguide section along the E -vector is found to introduce significant phase shift of microwave signals . The usefulness of the arrangement as a phase modulator for microwave carriers is demonstrated. While there is a less significant amplitude modulation accompanying phase modulation , the insertion losses are found to be negligible. The observations can be explained on the basis of the capacitance variation of the barrier layer with forward current in the diode

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Fine magnetic particles (size≅100 Å) belonging to the series ZnxFe1−xFe2O4 were synthesized by cold co-precipitation methods and their structural properties were evaluated using X-ray diffraction. Magnetization studies have been carried out using vibrating sample magnetometry (VSM) showing near-zero loss loop characteristics. Ferrofluids were then prepared employing these fine magnetic powders using oleic acid as surfactant and kerosene as carrier liquid by modifying the usually reported synthesis technique in order to induce anisotropy and enhance the magneto-optical signals. Liquid thin films of these fluids were prepared and field-induced laser transmission through these films was studied. The transmitted light intensity decreases at the centre with applied magnetic field in a linear fashion when subjected to low magnetic fields and saturate at higher fields. This is in accordance with the saturation in cluster formation. The pattern exhibited by these films in the presence of different magnetic fields was observed with the help of a CCD camera and was recorded photographically.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose to show in this paper, that the time series obtained from biological systems such as human brain are invariably nonstationary because of different time scales involved in the dynamical process. This makes the invariant parameters time dependent. We made a global analysis of the EEG data obtained from the eight locations on the skull space and studied simultaneously the dynamical characteristics from various parts of the brain. We have proved that the dynamical parameters are sensitive to the time scales and hence in the study of brain one must identify all relevant time scales involved in the process to get an insight in the working of brain.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Sonar signal processing comprises of a large number of signal processing algorithms for implementing functions such as Target Detection, Localisation, Classification, Tracking and Parameter estimation. Current implementations of these functions rely on conventional techniques largely based on Fourier Techniques, primarily meant for stationary signals. Interestingly enough, the signals received by the sonar sensors are often non-stationary and hence processing methods capable of handling the non-stationarity will definitely fare better than Fourier transform based methods.Time-frequency methods(TFMs) are known as one of the best DSP tools for nonstationary signal processing, with which one can analyze signals in time and frequency domains simultaneously. But, other than STFT, TFMs have been largely limited to academic research because of the complexity of the algorithms and the limitations of computing power. With the availability of fast processors, many applications of TFMs have been reported in the fields of speech and image processing and biomedical applications, but not many in sonar processing. A structured effort, to fill these lacunae by exploring the potential of TFMs in sonar applications, is the net outcome of this thesis. To this end, four TFMs have been explored in detail viz. Wavelet Transform, Fractional Fourier Transfonn, Wigner Ville Distribution and Ambiguity Function and their potential in implementing five major sonar functions has been demonstrated with very promising results. What has been conclusively brought out in this thesis, is that there is no "one best TFM" for all applications, but there is "one best TFM" for each application. Accordingly, the TFM has to be adapted and tailored in many ways in order to develop specific algorithms for each of the applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this thesis, the applications of the recurrence quantification analysis in metal cutting operation in a lathe, with specific objective to detect tool wear and chatter, are presented.This study is based on the discovery that process dynamics in a lathe is low dimensional chaotic. It implies that the machine dynamics is controllable using principles of chaos theory. This understanding is to revolutionize the feature extraction methodologies used in condition monitoring systems as conventional linear methods or models are incapable of capturing the critical and strange behaviors associated with the metal cutting process.As sensor based approaches provide an automated and cost effective way to monitor and control, an efficient feature extraction methodology based on nonlinear time series analysis is much more demanding. The task here is more complex when the information has to be deduced solely from sensor signals since traditional methods do not address the issue of how to treat noise present in real-world processes and its non-stationarity. In an effort to get over these two issues to the maximum possible, this thesis adopts the recurrence quantification analysis methodology in the study since this feature extraction technique is found to be robust against noise and stationarity in the signals.The work consists of two different sets of experiments in a lathe; set-I and set-2. The experiment, set-I, study the influence of tool wear on the RQA variables whereas the set-2 is carried out to identify the sensitive RQA variables to machine tool chatter followed by its validation in actual cutting. To obtain the bounds of the spectrum of the significant RQA variable values, in set-i, a fresh tool and a worn tool are used for cutting. The first part of the set-2 experiments uses a stepped shaft in order to create chatter at a known location. And the second part uses a conical section having a uniform taper along the axis for creating chatter to onset at some distance from the smaller end by gradually increasing the depth of cut while keeping the spindle speed and feed rate constant.The study concludes by revealing the dependence of certain RQA variables; percent determinism, percent recurrence and entropy, to tool wear and chatter unambiguously. The performances of the results establish this methodology to be viable for detection of tool wear and chatter in metal cutting operation in a lathe. The key reason is that the dynamics of the system under study have been nonlinear and the recurrence quantification analysis can characterize them adequately.This work establishes that principles and practice of machining can be considerably benefited and advanced from using nonlinear dynamics and chaos theory.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Natural systems are inherently non linear. Recurrent behaviours are typical of natural systems. Recurrence is a fundamental property of non linear dynamical systems which can be exploited to characterize the system behaviour effectively. Cross recurrence based analysis of sensor signals from non linear dynamical system is presented in this thesis. The mutual dependency among relatively independent components of a system is referred as coupling. The analysis is done for a mechanically coupled system specifically designed for conducting experiment. Further, cross recurrence method is extended to the actual machining process in a lathe to characterize the chatter during turning. The result is verified by permutation entropy method. Conventional linear methods or models are incapable of capturing the critical and strange behaviours associated with the dynamical process. Hence any effective feature extraction methodologies should invariably gather information thorough nonlinear time series analysis. The sensor signals from the dynamical system normally contain noise and non stationarity. In an effort to get over these two issues to the maximum possible extent, this work adopts the cross recurrence quantification analysis (CRQA) methodology since it is found to be robust against noise and stationarity in the signals. The study reveals that the CRQA is capable of characterizing even weak coupling among system signals. It also divulges the dependence of certain CRQA variables like percent determinism, percent recurrence and entropy to chatter unambiguously. The surrogate data test shows that the results obtained by CRQA are the true properties of the temporal evolution of the dynamics and contain a degree of deterministic structure. The results are verified using permutation entropy (PE) to detect the onset of chatter from the time series. The present study ascertains that this CRP based methodology is capable of recognizing the transition from regular cutting to the chatter cutting irrespective of the machining parameters or work piece material. The results establish this methodology to be feasible for detection of chatter in metal cutting operation in a lathe.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis investigates the potential use of zerocrossing information for speech sample estimation. It provides 21 new method tn) estimate speech samples using composite zerocrossings. A simple linear interpolation technique is developed for this purpose. By using this method the A/D converter can be avoided in a speech coder. The newly proposed zerocrossing sampling theory is supported with results of computer simulations using real speech data. The thesis also presents two methods for voiced/ unvoiced classification. One of these methods is based on a distance measure which is a function of short time zerocrossing rate and short time energy of the signal. The other one is based on the attractor dimension and entropy of the signal. Among these two methods the first one is simple and reguires only very few computations compared to the other. This method is used imtea later chapter to design an enhanced Adaptive Transform Coder. The later part of the thesis addresses a few problems in Adaptive Transform Coding and presents an improved ATC. Transform coefficient with maximum amplitude is considered as ‘side information’. This. enables more accurate tfiiz assignment enui step—size computation. A new bit reassignment scheme is also introduced in this work. Finally, sum ATC which applies switching between luiscrete Cosine Transform and Discrete Walsh-Hadamard Transform for voiced and unvoiced speech segments respectively is presented. Simulation results are provided to show the improved performance of the coder

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Biometrics deals with the physiological and behavioral characteristics of an individual to establish identity. Fingerprint based authentication is the most advanced biometric authentication technology. The minutiae based fingerprint identification method offer reasonable identification rate. The feature minutiae map consists of about 70-100 minutia points and matching accuracy is dropping down while the size of database is growing up. Hence it is inevitable to make the size of the fingerprint feature code to be as smaller as possible so that identification may be much easier. In this research, a novel global singularity based fingerprint representation is proposed. Fingerprint baseline, which is the line between distal and intermediate phalangeal joint line in the fingerprint, is taken as the reference line. A polygon is formed with the singularities and the fingerprint baseline. The feature vectors are the polygonal angle, sides, area, type and the ridge counts in between the singularities. 100% recognition rate is achieved in this method. The method is compared with the conventional minutiae based recognition method in terms of computation time, receiver operator characteristics (ROC) and the feature vector length. Speech is a behavioural biometric modality and can be used for identification of a speaker. In this work, MFCC of text dependant speeches are computed and clustered using k-means algorithm. A backpropagation based Artificial Neural Network is trained to identify the clustered speech code. The performance of the neural network classifier is compared with the VQ based Euclidean minimum classifier. Biometric systems that use a single modality are usually affected by problems like noisy sensor data, non-universality and/or lack of distinctiveness of the biometric trait, unacceptable error rates, and spoof attacks. Multifinger feature level fusion based fingerprint recognition is developed and the performances are measured in terms of the ROC curve. Score level fusion of fingerprint and speech based recognition system is done and 100% accuracy is achieved for a considerable range of matching threshold

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis investigated the potential use of Linear Predictive Coding in speech communication applications. A Modified Block Adaptive Predictive Coder is developed, which reduces the computational burden and complexity without sacrificing the speech quality, as compared to the conventional adaptive predictive coding (APC) system. For this, changes in the evaluation methods have been evolved. This method is as different from the usual APC system in that the difference between the true and the predicted value is not transmitted. This allows the replacement of the high order predictor in the transmitter section of a predictive coding system, by a simple delay unit, which makes the transmitter quite simple. Also, the block length used in the processing of the speech signal is adjusted relative to the pitch period of the signal being processed rather than choosing a constant length as hitherto done by other researchers. The efficiency of the newly proposed coder has been supported with results of computer simulation using real speech data. Three methods for voiced/unvoiced/silent/transition classification have been presented. The first one is based on energy, zerocrossing rate and the periodicity of the waveform. The second method uses normalised correlation coefficient as the main parameter, while the third method utilizes a pitch-dependent correlation factor. The third algorithm which gives the minimum error probability has been chosen in a later chapter to design the modified coder The thesis also presents a comparazive study beh-cm the autocorrelation and the covariance methods used in the evaluaiicn of the predictor parameters. It has been proved that the azztocorrelation method is superior to the covariance method with respect to the filter stabf-it)‘ and also in an SNR sense, though the increase in gain is only small. The Modified Block Adaptive Coder applies a switching from pitch precitzion to spectrum prediction when the speech segment changes from a voiced or transition region to an unvoiced region. The experiments cont;-:ted in coding, transmission and simulation, used speech samples from .\£=_‘ajr2_1a:r1 and English phrases. Proposal for a speaker reecgnifion syste: and a phoneme identification system has also been outlized towards the end of the thesis.