946 resultados para speech features


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes a region based image retrieval system using the local colour and texture features of image sub regions. The regions of interest (ROI) are roughly identified by segmenting the image into fixed partitions, finding the edge map and applying morphological dilation. The colour and texture features of the ROIs are computed from the histograms of the quantized HSV colour space and Gray Level co- occurrence matrix (GLCM) respectively. Each ROI of the query image is compared with same number of ROIs of the target image that are arranged in the descending order of white pixel density in the regions, using Euclidean distance measure for similarity computation. Preliminary experimental results show that the proposed method provides better retrieving result than retrieval using some of the existing methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Speech signals are one of the most important means of communication among the human beings. In this paper, a comparative study of two feature extraction techniques are carried out for recognizing speaker independent spoken isolated words. First one is a hybrid approach with Linear Predictive Coding (LPC) and Artificial Neural Networks (ANN) and the second method uses a combination of Wavelet Packet Decomposition (WPD) and Artificial Neural Networks. Voice signals are sampled directly from the microphone and then they are processed using these two techniques for extracting the features. Words from Malayalam, one of the four major Dravidian languages of southern India are chosen for recognition. Training, testing and pattern recognition are performed using Artificial Neural Networks. Back propagation method is used to train the ANN. The proposed method is implemented for 50 speakers uttering 20 isolated words each. Both the methods produce good recognition accuracy. But Wavelet Packet Decomposition is found to be more suitable for recognizing speech because of its multi-resolution characteristics and efficient time frequency localizations

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Speech is a natural mode of communication for people and speech recognition is an intensive area of research due to its versatile applications. This paper presents a comparative study of various feature extraction methods based on wavelets for recognizing isolated spoken words. Isolated words from Malayalam, one of the four major Dravidian languages of southern India are chosen for recognition. This work includes two speech recognition methods. First one is a hybrid approach with Discrete Wavelet Transforms and Artificial Neural Networks and the second method uses a combination of Wavelet Packet Decomposition and Artificial Neural Networks. Features are extracted by using Discrete Wavelet Transforms (DWT) and Wavelet Packet Decomposition (WPD). Training, testing and pattern recognition are performed using Artificial Neural Networks (ANN). The proposed method is implemented for 50 speakers uttering 20 isolated words each. The experimental results obtained show the efficiency of these techniques in recognizing speech

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Digit speech recognition is important in many applications such as automatic data entry, PIN entry, voice dialing telephone, automated banking system, etc. This paper presents speaker independent speech recognition system for Malayalam digits. The system employs Mel frequency cepstrum coefficient (MFCC) as feature for signal processing and Hidden Markov model (HMM) for recognition. The system is trained with 21 male and female voices in the age group of 20 to 40 years and there was 98.5% word recognition accuracy (94.8% sentence recognition accuracy) on a test set of continuous digit recognition task.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Grey Level Co-occurrence Matrices (GLCM) are one of the earliest techniques used for image texture analysis. In this paper we defined a new feature called trace extracted from the GLCM and its implications in texture analysis are discussed in the context of Content Based Image Retrieval (CBIR). The theoretical extension of GLCM to n-dimensional gray scale images are also discussed. The results indicate that trace features outperform Haralick features when applied to CBIR.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Performance of any continuous speech recognition system is dependent on the accuracy of its acoustic model. Hence, preparation of a robust and accurate acoustic model lead to satisfactory recognition performance for a speech recognizer. In acoustic modeling of phonetic unit, context information is of prime importance as the phonemes are found to vary according to the place of occurrence in a word. In this paper we compare and evaluate the effect of context dependent tied (CD tied) models, context dependent (CD) and context independent (CI) models in the perspective of continuous speech recognition of Malayalam language. The database for the speech recognition system has utterance from 21 speakers including 11 female and 10 males. Our evaluation results show that CD tied models outperforms CI models over 21%.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A connected digit speech recognition is important in many applications such as automated banking system, catalogue-dialing, automatic data entry, automated banking system, etc. This paper presents an optimum speaker-independent connected digit recognizer forMalayalam language. The system employs Perceptual Linear Predictive (PLP) cepstral coefficient for speech parameterization and continuous density Hidden Markov Model (HMM) in the recognition process. Viterbi algorithm is used for decoding. The training data base has the utterance of 21 speakers from the age group of 20 to 40 years and the sound is recorded in the normal office environment where each speaker is asked to read 20 set of continuous digits. The system obtained an accuracy of 99.5 % with the unseen data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a writer identification scheme for Malayalam documents. As the accomplishment rate of a scheme is highly dependent on the features extracted from the documents, the process of feature selection and extraction is highly relevant. The paper describes a set of novel features exclusively for Malayalam language. The features were studied in detail which resulted in a comparative study of all the features. The features are fused to form the feature vector or knowledge vector. This knowledge vector is then used in all the phases of the writer identification scheme. The scheme has been tested on a test bed of 280 writers of which 50 writers having only one page, 215 writers with at least 2 pages and 15 writers with at least 4 pages. To perform a comparative evaluation of the scheme the test is conducted using WD-LBP method also. A recognition rate of around 95% was obtained for the proposed approach

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The span of writer identification extends to broad domes like digital rights administration, forensic expert decisionmaking systems, and document analysis systems and so on. As the success rate of a writer identification scheme is highly dependent on the features extracted from the documents, the phase of feature extraction and therefore selection is highly significant for writer identification schemes. In this paper, the writer identification in Malayalam language is sought for by utilizing feature extraction technique such as Scale Invariant Features Transform (SIFT).The schemes are tested on a test bed of 280 writers and performance evaluated

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Modeling nonlinear systems using Volterra series is a century old method but practical realizations were hampered by inadequate hardware to handle the increased computational complexity stemming from its use. But interest is renewed recently, in designing and implementing filters which can model much of the polynomial nonlinearities inherent in practical systems. The key advantage in resorting to Volterra power series for this purpose is that nonlinear filters so designed can be made to work in parallel with the existing LTI systems, yielding improved performance. This paper describes the inclusion of a quadratic predictor (with nonlinearity order 2) with a linear predictor in an analog source coding system. Analog coding schemes generally ignore the source generation mechanisms but focuses on high fidelity reconstruction at the receiver. The widely used method of differential pnlse code modulation (DPCM) for speech transmission uses a linear predictor to estimate the next possible value of the input speech signal. But this linear system do not account for the inherent nonlinearities in speech signals arising out of multiple reflections in the vocal tract. So a quadratic predictor is designed and implemented in parallel with the linear predictor to yield improved mean square error performance. The augmented speech coder is tested on speech signals transmitted over an additive white gaussian noise (AWGN) channel.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper discusses the implementation details of a child friendly, good quality, English text-to-speech (TTS) system that is phoneme-based, concatenative, easy to set up and use with little memory. Direct waveform concatenation and linear prediction coding (LPC) are used. Most existing TTS systems are unit-selection based, which use standard speech databases available in neutral adult voices.Here reduced memory is achieved by the concatenation of phonemes and by replacing phonetic wave files with their LPC coefficients. Linguistic analysis was used to reduce the algorithmic complexity instead of signal processing techniques. Sufficient degree of customization and generalization catering to the needs of the child user had been included through the provision for vocabulary and voice selection to suit the requisites of the child. Prosody had also been incorporated. This inexpensive TTS systemwas implemented inMATLAB, with the synthesis presented by means of a graphical user interface (GUI), thus making it child friendly. This can be used not only as an interesting language learning aid for the normal child but it also serves as a speech aid to the vocally disabled child. The quality of the synthesized speech was evaluated using the mean opinion score (MOS).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes certain findings of intonation and intensity study of emotive speech with the minimal use of signal processing algorithms. This study was based on six basic emotions and the neutral, elicited from 1660 English utterances obtained from the speech recordings of six Indian women. The correctness of the emotional content was verified through perceptual listening tests. Marked similarity was noted among pitch contours of like-worded, positive valence emotions, though no such similarity was observed among the four negative valence emotional expressions. The intensity patterns were also studied. The results of the study were validated using arbitrary television recordings for four emotions. The findings are useful to technical researchers, social psychologists and to the common man interested in the dynamics of vocal expression of emotions

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Low grade and High grade Gliomas are tumors that originate in the glial cells. The main challenge in brain tumor diagnosis is whether a tumor is benign or malignant, primary or metastatic and low or high grade. Based on the patient's MRI, a radiologist could not differentiate whether it is a low grade Glioma or a high grade Glioma. Because both of these are almost visually similar, autopsy confirms the diagnosis of low grade with high-grade and infiltrative features. In this paper, textural description of Grade I and grade III Glioma are extracted using First order statistics and Gray Level Co-occurance Matrix Method (GLCM). Textural features are extracted from 16X16 sub image of the segmented Region of Interest(ROI) .In the proposed method, first order statistical features such as contrast, Intensity , Entropy, Kurtosis and spectral energy and GLCM features extracted were showed promising results. The ranges of these first order statistics and GLCM based features extracted are highly discriminant between grade I and Grade III. In this study which gives statistical textural information of grade I and grade III Glioma which is very useful for further classification and analysis and thus assisting Radiologist in greater extent.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The characterization and grading of glioma tumors, via image derived features, for diagnosis, prognosis, and treatment response has been an active research area in medical image computing. This paper presents a novel method for automatic detection and classification of glioma from conventional T2 weighted MR images. Automatic detection of the tumor was established using newly developed method called Adaptive Gray level Algebraic set Segmentation Algorithm (AGASA).Statistical Features were extracted from the detected tumor texture using first order statistics and gray level co-occurrence matrix (GLCM) based second order statistical methods. Statistical significance of the features was determined by t-test and its corresponding p-value. A decision system was developed for the grade detection of glioma using these selected features and its p-value. The detection performance of the decision system was validated using the receiver operating characteristic (ROC) curve. The diagnosis and grading of glioma using this non-invasive method can contribute promising results in medical image computing

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Speech is the primary, most prominent and convenient means of communication in audible language. Through speech, people can express their thoughts, feelings or perceptions by the articulation of words. Human speech is a complex signal which is non stationary in nature. It consists of immensely rich information about the words spoken, accent, attitude of the speaker, expression, intention, sex, emotion as well as style. The main objective of Automatic Speech Recognition (ASR) is to identify whatever people speak by means of computer algorithms. This enables people to communicate with a computer in a natural spoken language. Automatic recognition of speech by machines has been one of the most exciting, significant and challenging areas of research in the field of signal processing over the past five to six decades. Despite the developments and intensive research done in this area, the performance of ASR is still lower than that of speech recognition by humans and is yet to achieve a completely reliable performance level. The main objective of this thesis is to develop an efficient speech recognition system for recognising speaker independent isolated words in Malayalam.