2 resultados para Speech in Noise
em Cochin University of Science
Resumo:
In a leading service economy like India, services lie at the very center of economic activity. Competitive organizations now look not only at the skills and knowledge, but also at the behavior required by an employee to be successful on the job. Emotionally competent employees can effectively deal with occupational stress and maintain psychological well-being. This study explores the scope of the first two formants and jitter to assess seven common emotional states present in the natural speech in English. The k-means method was used to classify emotional speech as neutral, happy, surprised, angry, disgusted and sad. The accuracy of classification obtained using raw jitter was more than 65 percent for happy and sad but less accurate for the others. The overall classification accuracy was 72% in the case of preprocessed jitter. The experimental study was done on 1664 English utterances of 6 females. This is a simple, interesting and more proactive method for employees from varied backgrounds to become aware of their own communication styles as well as that of their colleagues' and customers and is therefore socially beneficial. It is a cheap method also as it requires only a computer. Since knowledge of sophisticated software or signal processing is not necessary, it is easy to analyze
Resumo:
Speech is the most natural means of communication among human beings and speech processing and recognition are intensive areas of research for the last five decades. Since speech recognition is a pattern recognition problem, classification is an important part of any speech recognition system. In this work, a speech recognition system is developed for recognizing speaker independent spoken digits in Malayalam. Voice signals are sampled directly from the microphone. The proposed method is implemented for 1000 speakers uttering 10 digits each. Since the speech signals are affected by background noise, the signals are tuned by removing the noise from it using wavelet denoising method based on Soft Thresholding. Here, the features from the signals are extracted using Discrete Wavelet Transforms (DWT) because they are well suitable for processing non-stationary signals like speech. This is due to their multi- resolutional, multi-scale analysis characteristics. Speech recognition is a multiclass classification problem. So, the feature vector set obtained are classified using three classifiers namely, Artificial Neural Networks (ANN), Support Vector Machines (SVM) and Naive Bayes classifiers which are capable of handling multiclasses. During classification stage, the input feature vector data is trained using information relating to known patterns and then they are tested using the test data set. The performances of all these classifiers are evaluated based on recognition accuracy. All the three methods produced good recognition accuracy. DWT and ANN produced a recognition accuracy of 89%, SVM and DWT combination produced an accuracy of 86.6% and Naive Bayes and DWT combination produced an accuracy of 83.5%. ANN is found to be better among the three methods.