16 resultados para Speech perception
em Cochin University of Science
Resumo:
Medical fields requires fast, simple and noninvasive methods of diagnostic techniques. Several methods are available and possible because of the growth of technology that provides the necessary means of collecting and processing signals. The present thesis details the work done in the field of voice signals. New methods of analysis have been developed to understand the complexity of voice signals, such as nonlinear dynamics aiming at the exploration of voice signals dynamic nature. The purpose of this thesis is to characterize complexities of pathological voice from healthy signals and to differentiate stuttering signals from healthy signals. Efficiency of various acoustic as well as non linear time series methods are analysed. Three groups of samples are used, one from healthy individuals, subjects with vocal pathologies and stuttering subjects. Individual vowels/ and a continuous speech data for the utterance of the sentence "iruvarum changatimaranu" the meaning in English is "Both are good friends" from Malayalam language are recorded using a microphone . The recorded audio are converted to digital signals and are subjected to analysis.Acoustic perturbation methods like fundamental frequency (FO), jitter, shimmer, Zero Crossing Rate(ZCR) were carried out and non linear measures like maximum lyapunov exponent(Lamda max), correlation dimension (D2), Kolmogorov exponent(K2), and a new measure of entropy viz., Permutation entropy (PE) are evaluated for all three groups of the subjects. Permutation Entropy is a nonlinear complexity measure which can efficiently distinguish regular and complex nature of any signal and extract information about the change in dynamics of the process by indicating sudden change in its value. The results shows that nonlinear dynamical methods seem to be a suitable technique for voice signal analysis, due to the chaotic component of the human voice. Permutation entropy is well suited due to its sensitivity to uncertainties, since the pathologies are characterized by an increase in the signal complexity and unpredictability. Pathological groups have higher entropy values compared to the normal group. The stuttering signals have lower entropy values compared to the normal signals.PE is effective in charaterising the level of improvement after two weeks of speech therapy in the case of stuttering subjects. PE is also effective in characterizing the dynamical difference between healthy and pathological subjects. This suggests that PE can improve and complement the recent voice analysis methods available for clinicians. The work establishes the application of the simple, inexpensive and fast algorithm of PE for diagnosis in vocal disorders and stuttering subjects.
Resumo:
Farm communication and extension programs are vital part of the farm development attempts. Electronic media plays a major role in farm extension activities. Kerala, the consumer state, which was a complete agricultural state in pre-independence period, is the sprouting land of agricultural extension and publication activities in print media. Later AIR (All India Radio) farm programs and farm broadcasting of Doordarshan enriched the role of electronic media in farm extension activities. The media saturated southern state of India received this new electronic media farm communication revolution whole heartedly. However, after 1990, Kerala witnessed a flood of private T V channels and currently there are 24 channels in this regional language, named Malayalam. All major news and entertainment channels are broadcasting farm programs. Farm programs of AIR and Doordarshan, broadcasted in Malayalam language, have been well accepted to the farmers‘ in Kerala. However, post-independence period, witnessed the formation of Kerala state in Indian Union and the first ballot-elected communist Government started its administration. After the land reform bills, the state witnessed a gradual decrease in agricultural production. Even if it is not reflected much in the attitude and practices of farm community and farm broadcast of traditional electronic broadcasting, a change is observable after the post-liberalization era of India. Private Television channels, which were focused on entertainment value of programs, started broadcasting farm programs and the parameters of program production went through certain changes. In this situation, there is ample relevance for a study about the farm programs of electronic media in terms of a comparative study of audience perception. The study is limited in the state of Kerala as it is the most media saturated state in India. The study analyzes the rate, nature and scope of adoption of farming methods transmitted through electronic media (T.V. and Radio) in Malayalam language.All kinds of Farm programs including comprehensive program serials, success stories, seasonal cropping methods, experts opinion, been analyzed on the basis of the following objectives. To find whether propagating new farm methods through farm programs in electronic media or the availability of adequate infrastructure and economic factors make a farmer to adopt a new farming method. To find which electronic media has more influence on farmers to adopt agricultural programs. To find which form of electronic media gets better feedback from farmers To find out whether the programs of T.V. or Radio is more acceptable to farmers than the print media. To find whether farmers gets the message through their preferred medium for the message. The researcher recorded opinions from a panel of agricultural officers, farm Information officers, agro extension researchers and experts. According to their opinions and guidelines, a pilot study is designed and conducted in Kanjikuzhy Panchayath, in Alappuzha district, Kerala. The Panchayath is selected by considering its ideal nature of being the sample for a social Science research. Besides, the nature of farming in the Panchayath, which devoid of the cultivation of cash crops also supported its sample value. As per the observations from the pilot study, researcher confirmed the Triangulation method as the methodology of research. The questionnaire survey, being the primary part contained 42 Questions with 6 independent and 32 dependent variables. The survey is conducted among 400 respondents in Idukki, Alappuzha and Pathanamthitta districts considering geographical differences and distribution of different types of crops. The response from a total of 360 respondents, 120 from each district, finally selected for tabulation and data analysis.The data analysis, based on percentage analysis, along with the results from focus group discussion among a selected group of 20 farmers, together produced the results as follows. Farmers, who are the audience of farm programs, have a very serious approach towards the medium. They are maintaining a critical point of view towards the content of the programs. Farmers are reasonably aware about the financial side of the programs and the monitory aspirations of both private and Government owned Television channels. Even though, the farmers are not aware on the technical terminology and jargons, they have ideas about success stories, program serials and they are even informed about channels are not maintaining an audience research section like AIR. Though the farmers accept Doordarshan as the credential source of farm information and methods, they are inclined to the entertainment value of programs too. They prefer to have more entertainment value for the programs of Doordarshan. Surprisingly, they have very solid suggestions on even about the shots which add entertainment value to the farm broadcasting methods of Doordarshan. Farmers are very much aware about the fact that media is just an instrument for inspiration and persuasion. They strongly believe that the source of information and new methods is agricultural research and an effective change happens only when there are adequate infrastructure and marketing facilities, along with the proper support from Government agricultural guideline and support systems like Krishi Bhavans. They strongly believe that media alone cannot create any magic in increasing agricultural production. Farmers are pointing out the lack of response to the feedback and queries of farmers on farming methods, as an evidence for the difference in levels of commitment of Government and private owned Television channels.Farmers are still perceiving AIR farm programs are far more committed to farmers and farming than any other electronic medium. However, they are seriously lacking Radio receivers with medium wave reception facility. Farmers perceive that the farming methods on new crops are more adoptable than the farming methods of traditional crops in both private and Government owned Television channels. There are multiple factors behind this observation from farmers. Farmers changed in terms of viewing habits and they prefer success stories, which are totally irrelevant and they even think that such stories encourage people to go for farming and they opined that such stories are good sources of inspiration. However, they are all very much sure about the importance and particular about the presence of entertainment factor even in farm programs. Farmers expect direct interaction of any expert of the new farming method to implement the method in their agriculture practices. Though introduction of a new idea in the T.V. is acceptable, farmers need the direct instruction of expert on field to start implementing the new farming practices Farmers still have an affinity towards print media reports and agricultural pages and they have complaints to print media on the removal of agricultural information pages from news papers. They prefer the reports in print media as it facilitates them to collect and refer articles when they need it. Farmers are having an eye of doubt about the credibility of farm programs by private T.V. channels. Even if they prefer private Television channels for listening and adopting new farming methods and other farm information, they scrutinize programs to know whether they are sponsored programs by agrochemical or agro-fertilizer manufacturer.
Resumo:
This thesis investigates the potential use of zerocrossing information for speech sample estimation. It provides 21 new method tn) estimate speech samples using composite zerocrossings. A simple linear interpolation technique is developed for this purpose. By using this method the A/D converter can be avoided in a speech coder. The newly proposed zerocrossing sampling theory is supported with results of computer simulations using real speech data. The thesis also presents two methods for voiced/ unvoiced classification. One of these methods is based on a distance measure which is a function of short time zerocrossing rate and short time energy of the signal. The other one is based on the attractor dimension and entropy of the signal. Among these two methods the first one is simple and reguires only very few computations compared to the other. This method is used imtea later chapter to design an enhanced Adaptive Transform Coder. The later part of the thesis addresses a few problems in Adaptive Transform Coding and presents an improved ATC. Transform coefficient with maximum amplitude is considered as ‘side information’. This. enables more accurate tfiiz assignment enui step—size computation. A new bit reassignment scheme is also introduced in this work. Finally, sum ATC which applies switching between luiscrete Cosine Transform and Discrete Walsh-Hadamard Transform for voiced and unvoiced speech segments respectively is presented. Simulation results are provided to show the improved performance of the coder
Resumo:
Biometrics deals with the physiological and behavioral characteristics of an individual to establish identity. Fingerprint based authentication is the most advanced biometric authentication technology. The minutiae based fingerprint identification method offer reasonable identification rate. The feature minutiae map consists of about 70-100 minutia points and matching accuracy is dropping down while the size of database is growing up. Hence it is inevitable to make the size of the fingerprint feature code to be as smaller as possible so that identification may be much easier. In this research, a novel global singularity based fingerprint representation is proposed. Fingerprint baseline, which is the line between distal and intermediate phalangeal joint line in the fingerprint, is taken as the reference line. A polygon is formed with the singularities and the fingerprint baseline. The feature vectors are the polygonal angle, sides, area, type and the ridge counts in between the singularities. 100% recognition rate is achieved in this method. The method is compared with the conventional minutiae based recognition method in terms of computation time, receiver operator characteristics (ROC) and the feature vector length. Speech is a behavioural biometric modality and can be used for identification of a speaker. In this work, MFCC of text dependant speeches are computed and clustered using k-means algorithm. A backpropagation based Artificial Neural Network is trained to identify the clustered speech code. The performance of the neural network classifier is compared with the VQ based Euclidean minimum classifier. Biometric systems that use a single modality are usually affected by problems like noisy sensor data, non-universality and/or lack of distinctiveness of the biometric trait, unacceptable error rates, and spoof attacks. Multifinger feature level fusion based fingerprint recognition is developed and the performances are measured in terms of the ROC curve. Score level fusion of fingerprint and speech based recognition system is done and 100% accuracy is achieved for a considerable range of matching threshold
Resumo:
This thesis investigated the potential use of Linear Predictive Coding in speech communication applications. A Modified Block Adaptive Predictive Coder is developed, which reduces the computational burden and complexity without sacrificing the speech quality, as compared to the conventional adaptive predictive coding (APC) system. For this, changes in the evaluation methods have been evolved. This method is as different from the usual APC system in that the difference between the true and the predicted value is not transmitted. This allows the replacement of the high order predictor in the transmitter section of a predictive coding system, by a simple delay unit, which makes the transmitter quite simple. Also, the block length used in the processing of the speech signal is adjusted relative to the pitch period of the signal being processed rather than choosing a constant length as hitherto done by other researchers. The efficiency of the newly proposed coder has been supported with results of computer simulation using real speech data. Three methods for voiced/unvoiced/silent/transition classification have been presented. The first one is based on energy, zerocrossing rate and the periodicity of the waveform. The second method uses normalised correlation coefficient as the main parameter, while the third method utilizes a pitch-dependent correlation factor. The third algorithm which gives the minimum error probability has been chosen in a later chapter to design the modified coder The thesis also presents a comparazive study beh-cm the autocorrelation and the covariance methods used in the evaluaiicn of the predictor parameters. It has been proved that the azztocorrelation method is superior to the covariance method with respect to the filter stabf-it)‘ and also in an SNR sense, though the increase in gain is only small. The Modified Block Adaptive Coder applies a switching from pitch precitzion to spectrum prediction when the speech segment changes from a voiced or transition region to an unvoiced region. The experiments cont;-:ted in coding, transmission and simulation, used speech samples from .\£=_‘ajr2_1a:r1 and English phrases. Proposal for a speaker reecgnifion syste: and a phoneme identification system has also been outlized towards the end of the thesis.
Resumo:
Speech processing and consequent recognition are important areas of Digital Signal Processing since speech allows people to communicate more natu-rally and efficiently. In this work, a speech recognition system is developed for re-cognizing digits in Malayalam. For recognizing speech, features are to be ex-tracted from speech and hence feature extraction method plays an important role in speech recognition. Here, front end processing for extracting the features is per-formed using two wavelet based methods namely Discrete Wavelet Transforms (DWT) and Wavelet Packet Decomposition (WPD). Naive Bayes classifier is used for classification purpose. After classification using Naive Bayes classifier, DWT produced a recognition accuracy of 83.5% and WPD produced an accuracy of 80.7%. This paper is intended to devise a new feature extraction method which produces improvements in the recognition accuracy. So, a new method called Dis-crete Wavelet Packet Decomposition (DWPD) is introduced which utilizes the hy-brid features of both DWT and WPD. The performance of this new approach is evaluated and it produced an improved recognition accuracy of 86.2% along with Naive Bayes classifier.
Resumo:
Speech is the most natural means of communication among human beings and speech processing and recognition are intensive areas of research for the last five decades. Since speech recognition is a pattern recognition problem, classification is an important part of any speech recognition system. In this work, a speech recognition system is developed for recognizing speaker independent spoken digits in Malayalam. Voice signals are sampled directly from the microphone. The proposed method is implemented for 1000 speakers uttering 10 digits each. Since the speech signals are affected by background noise, the signals are tuned by removing the noise from it using wavelet denoising method based on Soft Thresholding. Here, the features from the signals are extracted using Discrete Wavelet Transforms (DWT) because they are well suitable for processing non-stationary signals like speech. This is due to their multi- resolutional, multi-scale analysis characteristics. Speech recognition is a multiclass classification problem. So, the feature vector set obtained are classified using three classifiers namely, Artificial Neural Networks (ANN), Support Vector Machines (SVM) and Naive Bayes classifiers which are capable of handling multiclasses. During classification stage, the input feature vector data is trained using information relating to known patterns and then they are tested using the test data set. The performances of all these classifiers are evaluated based on recognition accuracy. All the three methods produced good recognition accuracy. DWT and ANN produced a recognition accuracy of 89%, SVM and DWT combination produced an accuracy of 86.6% and Naive Bayes and DWT combination produced an accuracy of 83.5%. ANN is found to be better among the three methods.
Resumo:
Development of continuous shrimp cell lines for effective investigation on shrimp viruses remains elusive with an arduous history of over 25 years. Despite presenting challenges to researchers in developing a cell line, the billion dollar aquaculture industry is under viral threat. Advances in molecular biology and various gene transfer technologies for immortalization of cells have resulted in the development of hundreds of cell lines from insects and mammals, but yet not a single cell line has been developed from shrimp and other marine invertebrates. Though improved growth and longevity of shrimp cells in vitro could be achieved by using modified growth media this did not make any leap to spontaneous transformation; probably due to the fact that shrimp cells inhibited neoplastic transformations. Oncogenic induction and immortalization are considered as the possible ways, and an exclusive medium for shrimp cell culture and an appropriate mode of transformation are crucial. In this review status of shrimp cell line development and its future orientation are discussed
Resumo:
Digit speech recognition is important in many applications such as automatic data entry, PIN entry, voice dialing telephone, automated banking system, etc. This paper presents speaker independent speech recognition system for Malayalam digits. The system employs Mel frequency cepstrum coefficient (MFCC) as feature for signal processing and Hidden Markov model (HMM) for recognition. The system is trained with 21 male and female voices in the age group of 20 to 40 years and there was 98.5% word recognition accuracy (94.8% sentence recognition accuracy) on a test set of continuous digit recognition task.
Resumo:
Performance of any continuous speech recognition system is dependent on the accuracy of its acoustic model. Hence, preparation of a robust and accurate acoustic model lead to satisfactory recognition performance for a speech recognizer. In acoustic modeling of phonetic unit, context information is of prime importance as the phonemes are found to vary according to the place of occurrence in a word. In this paper we compare and evaluate the effect of context dependent tied (CD tied) models, context dependent (CD) and context independent (CI) models in the perspective of continuous speech recognition of Malayalam language. The database for the speech recognition system has utterance from 21 speakers including 11 female and 10 males. Our evaluation results show that CD tied models outperforms CI models over 21%.
Resumo:
A connected digit speech recognition is important in many applications such as automated banking system, catalogue-dialing, automatic data entry, automated banking system, etc. This paper presents an optimum speaker-independent connected digit recognizer forMalayalam language. The system employs Perceptual Linear Predictive (PLP) cepstral coefficient for speech parameterization and continuous density Hidden Markov Model (HMM) in the recognition process. Viterbi algorithm is used for decoding. The training data base has the utterance of 21 speakers from the age group of 20 to 40 years and the sound is recorded in the normal office environment where each speaker is asked to read 20 set of continuous digits. The system obtained an accuracy of 99.5 % with the unseen data.
Resumo:
Modeling nonlinear systems using Volterra series is a century old method but practical realizations were hampered by inadequate hardware to handle the increased computational complexity stemming from its use. But interest is renewed recently, in designing and implementing filters which can model much of the polynomial nonlinearities inherent in practical systems. The key advantage in resorting to Volterra power series for this purpose is that nonlinear filters so designed can be made to work in parallel with the existing LTI systems, yielding improved performance. This paper describes the inclusion of a quadratic predictor (with nonlinearity order 2) with a linear predictor in an analog source coding system. Analog coding schemes generally ignore the source generation mechanisms but focuses on high fidelity reconstruction at the receiver. The widely used method of differential pnlse code modulation (DPCM) for speech transmission uses a linear predictor to estimate the next possible value of the input speech signal. But this linear system do not account for the inherent nonlinearities in speech signals arising out of multiple reflections in the vocal tract. So a quadratic predictor is designed and implemented in parallel with the linear predictor to yield improved mean square error performance. The augmented speech coder is tested on speech signals transmitted over an additive white gaussian noise (AWGN) channel.
Resumo:
This paper discusses the implementation details of a child friendly, good quality, English text-to-speech (TTS) system that is phoneme-based, concatenative, easy to set up and use with little memory. Direct waveform concatenation and linear prediction coding (LPC) are used. Most existing TTS systems are unit-selection based, which use standard speech databases available in neutral adult voices.Here reduced memory is achieved by the concatenation of phonemes and by replacing phonetic wave files with their LPC coefficients. Linguistic analysis was used to reduce the algorithmic complexity instead of signal processing techniques. Sufficient degree of customization and generalization catering to the needs of the child user had been included through the provision for vocabulary and voice selection to suit the requisites of the child. Prosody had also been incorporated. This inexpensive TTS systemwas implemented inMATLAB, with the synthesis presented by means of a graphical user interface (GUI), thus making it child friendly. This can be used not only as an interesting language learning aid for the normal child but it also serves as a speech aid to the vocally disabled child. The quality of the synthesized speech was evaluated using the mean opinion score (MOS).
Resumo:
This paper describes certain findings of intonation and intensity study of emotive speech with the minimal use of signal processing algorithms. This study was based on six basic emotions and the neutral, elicited from 1660 English utterances obtained from the speech recordings of six Indian women. The correctness of the emotional content was verified through perceptual listening tests. Marked similarity was noted among pitch contours of like-worded, positive valence emotions, though no such similarity was observed among the four negative valence emotional expressions. The intensity patterns were also studied. The results of the study were validated using arbitrary television recordings for four emotions. The findings are useful to technical researchers, social psychologists and to the common man interested in the dynamics of vocal expression of emotions
Resumo:
This study is an attempt to situate the quality of life and standard of living of local communities in ecotourism destinations inter alia their perception on forest conservation and the satisfaction level of the local community. 650 EDC/VSS members from Kerala demarcated into three zones constitute the data source. Four variables have been considered for evaluating the quality of life of the stakeholders of ecotourism sites, which is then funneled to the income-education spectrum for hypothesizing into the SLI framework. Zone-wise analysis of the community members working in tourism sector shows that the community members have benefited totally from tourism development in the region as they have got both employments as well as secured livelihood options. Most of the quality of life-indicators of the community in the eco-tourist centres show a promising position. The community perception does not show any negative impact on environment as well as on their local culture.