19 resultados para Speaker verification
em Cochin University of Science
Resumo:
Presently different audio watermarking methods are available; most of them inclined towards copyright protection and copy protection. This is the key motive for the notion to develop a speaker verification scheme that guar- antees non-repudiation services and the thesis is its outcome. The research presented in this thesis scrutinizes the field of audio water- marking and the outcome is a speaker verification scheme that is proficient in addressing issues allied to non-repudiation to a great extent. This work aimed in developing novel audio watermarking schemes utilizing the fun- damental ideas of Fast-Fourier Transform (FFT) or Fast Walsh-Hadamard Transform (FWHT). The Mel-Frequency Cepstral Coefficients (MFCC) the best parametric representation of the acoustic signals along with few other key acoustic characteristics is employed in crafting of new schemes. The au- dio watermark created is entirely dependent to the acoustic features, hence named as FeatureMark and is crucial in this work. In any watermarking scheme, the quality of the extracted watermark de- pends exclusively on the pre-processing action and in this work framing and windowing techniques are involved. The theme non-repudiation provides immense significance in the audio watermarking schemes proposed in this work. Modification of the signal spectrum is achieved in a variety of ways by selecting appropriate FFT/FWHT coefficients and the watermarking schemes were evaluated for imperceptibility, robustness and capacity char- acteristics. The proposed schemes are unequivocally effective in terms of maintaining the sound quality, retrieving the embedded FeatureMark and in terms of the capacity to hold the mark bits. Robust nature of these marking schemes is achieved with the help of syn- chronization codes such as Barker Code with FFT based FeatureMarking scheme and Walsh Code with FWHT based FeatureMarking scheme. An- other important feature associated with this scheme is the employment of an encryption scheme towards the preparation of its FeatureMark that scrambles the signal features that helps to keep the signal features unreve- laed. A comparative study with the existing watermarking schemes and the ex- periments to evaluate imperceptibility, robustness and capacity tests guar- antee that the proposed schemes can be baselined as efficient audio water- marking schemes. The four new digital audio watermarking algorithms in terms of their performance are remarkable thereby opening more opportu- nities for further research.
Resumo:
Motivation for Speaker recognition work is presented in the first part of the thesis. An exhaustive survey of past work in this field is also presented. A low cost system not including complex computation has been chosen for implementation. Towards achieving this a PC based system is designed and developed. A front end analog to digital convertor (12 bit) is built and interfaced to a PC. Software to control the ADC and to perform various analytical functions including feature vector evaluation is developed. It is shown that a fixed set of phrases incorporating evenly balanced phonemes is aptly suited for the speaker recognition work at hand. A set of phrases are chosen for recognition. Two new methods are adopted for the feature evaluation. Some new measurements involving a symmetry check method for pitch period detection and ACE‘ are used as featured. Arguments are provided to show the need for a new model for speech production. Starting from heuristic, a knowledge based (KB) speech production model is presented. In this model, a KB provides impulses to a voice producing mechanism and constant correction is applied via a feedback path. It is this correction that differs from speaker to speaker. Methods of defining measurable parameters for use as features are described. Algorithms for speaker recognition are developed and implemented. Two methods are presented. The first is based on the model postulated. Here the entropy on the utterance of a phoneme is evaluated. The transitions of voiced regions are used as speaker dependent features. The second method presented uses features found in other works, but evaluated differently. A knock—out scheme is used to provide the weightage values for the selection of features. Results of implementation are presented which show on an average of 80% recognition. It is also shown that if there are long gaps between sessions, the performance deteriorates and is speaker dependent. Cross recognition percentages are also presented and this in the worst case rises to 30% while the best case is 0%. Suggestions for further work are given in the concluding chapter.
Resumo:
This thesis deals with the use of simulation as a problem-solving tool to solve a few logistic system related problems. More specifically it relates to studies on transport terminals. Transport terminals are key elements in the supply chains of industrial systems. One of the problems related to use of simulation is that of the multiplicity of models needed to study different problems. There is a need for development of methodologies related to conceptual modelling which will help reduce the number of models needed. Three different logistic terminal systems Viz. a railway yard, container terminal of apart and airport terminal were selected as cases for this study. The standard methodology for simulation development consisting of system study and data collection, conceptual model design, detailed model design and development, model verification and validation, experimentation, and analysis of results, reporting of finding were carried out. We found that models could be classified into tightly pre-scheduled, moderately pre-scheduled and unscheduled systems. Three types simulation models( called TYPE 1, TYPE 2 and TYPE 3) of various terminal operations were developed in the simulation package Extend. All models were of the type discrete-event simulation. Simulation models were successfully used to help solve strategic, tactical and operational problems related to three important logistic terminals as set in our objectives. From the point of contribution to conceptual modelling we have demonstrated that clubbing problems into operational, tactical and strategic and matching them with tightly pre-scheduled, moderately pre-scheduled and unscheduled systems is a good workable approach which reduces the number of models needed to study different terminal related problems.
Resumo:
Biometrics deals with the physiological and behavioral characteristics of an individual to establish identity. Fingerprint based authentication is the most advanced biometric authentication technology. The minutiae based fingerprint identification method offer reasonable identification rate. The feature minutiae map consists of about 70-100 minutia points and matching accuracy is dropping down while the size of database is growing up. Hence it is inevitable to make the size of the fingerprint feature code to be as smaller as possible so that identification may be much easier. In this research, a novel global singularity based fingerprint representation is proposed. Fingerprint baseline, which is the line between distal and intermediate phalangeal joint line in the fingerprint, is taken as the reference line. A polygon is formed with the singularities and the fingerprint baseline. The feature vectors are the polygonal angle, sides, area, type and the ridge counts in between the singularities. 100% recognition rate is achieved in this method. The method is compared with the conventional minutiae based recognition method in terms of computation time, receiver operator characteristics (ROC) and the feature vector length. Speech is a behavioural biometric modality and can be used for identification of a speaker. In this work, MFCC of text dependant speeches are computed and clustered using k-means algorithm. A backpropagation based Artificial Neural Network is trained to identify the clustered speech code. The performance of the neural network classifier is compared with the VQ based Euclidean minimum classifier. Biometric systems that use a single modality are usually affected by problems like noisy sensor data, non-universality and/or lack of distinctiveness of the biometric trait, unacceptable error rates, and spoof attacks. Multifinger feature level fusion based fingerprint recognition is developed and the performances are measured in terms of the ROC curve. Score level fusion of fingerprint and speech based recognition system is done and 100% accuracy is achieved for a considerable range of matching threshold
Resumo:
This thesis investigated the potential use of Linear Predictive Coding in speech communication applications. A Modified Block Adaptive Predictive Coder is developed, which reduces the computational burden and complexity without sacrificing the speech quality, as compared to the conventional adaptive predictive coding (APC) system. For this, changes in the evaluation methods have been evolved. This method is as different from the usual APC system in that the difference between the true and the predicted value is not transmitted. This allows the replacement of the high order predictor in the transmitter section of a predictive coding system, by a simple delay unit, which makes the transmitter quite simple. Also, the block length used in the processing of the speech signal is adjusted relative to the pitch period of the signal being processed rather than choosing a constant length as hitherto done by other researchers. The efficiency of the newly proposed coder has been supported with results of computer simulation using real speech data. Three methods for voiced/unvoiced/silent/transition classification have been presented. The first one is based on energy, zerocrossing rate and the periodicity of the waveform. The second method uses normalised correlation coefficient as the main parameter, while the third method utilizes a pitch-dependent correlation factor. The third algorithm which gives the minimum error probability has been chosen in a later chapter to design the modified coder The thesis also presents a comparazive study beh-cm the autocorrelation and the covariance methods used in the evaluaiicn of the predictor parameters. It has been proved that the azztocorrelation method is superior to the covariance method with respect to the filter stabf-it)‘ and also in an SNR sense, though the increase in gain is only small. The Modified Block Adaptive Coder applies a switching from pitch precitzion to spectrum prediction when the speech segment changes from a voiced or transition region to an unvoiced region. The experiments cont;-:ted in coding, transmission and simulation, used speech samples from .\£=_‘ajr2_1a:r1 and English phrases. Proposal for a speaker reecgnifion syste: and a phoneme identification system has also been outlized towards the end of the thesis.
Resumo:
Speech signals are one of the most important means of communication among the human beings. In this paper, a comparative study of two feature extraction techniques are carried out for recognizing speaker independent spoken isolated words. First one is a hybrid approach with Linear Predictive Coding (LPC) and Artificial Neural Networks (ANN) and the second method uses a combination of Wavelet Packet Decomposition (WPD) and Artificial Neural Networks. Voice signals are sampled directly from the microphone and then they are processed using these two techniques for extracting the features. Words from Malayalam, one of the four major Dravidian languages of southern India are chosen for recognition. Training, testing and pattern recognition are performed using Artificial Neural Networks. Back propagation method is used to train the ANN. The proposed method is implemented for 50 speakers uttering 20 isolated words each. Both the methods produce good recognition accuracy. But Wavelet Packet Decomposition is found to be more suitable for recognizing speech because of its multi-resolution characteristics and efficient time frequency localizations
Resumo:
Speech is the most natural means of communication among human beings and speech processing and recognition are intensive areas of research for the last five decades. Since speech recognition is a pattern recognition problem, classification is an important part of any speech recognition system. In this work, a speech recognition system is developed for recognizing speaker independent spoken digits in Malayalam. Voice signals are sampled directly from the microphone. The proposed method is implemented for 1000 speakers uttering 10 digits each. Since the speech signals are affected by background noise, the signals are tuned by removing the noise from it using wavelet denoising method based on Soft Thresholding. Here, the features from the signals are extracted using Discrete Wavelet Transforms (DWT) because they are well suitable for processing non-stationary signals like speech. This is due to their multi- resolutional, multi-scale analysis characteristics. Speech recognition is a multiclass classification problem. So, the feature vector set obtained are classified using three classifiers namely, Artificial Neural Networks (ANN), Support Vector Machines (SVM) and Naive Bayes classifiers which are capable of handling multiclasses. During classification stage, the input feature vector data is trained using information relating to known patterns and then they are tested using the test data set. The performances of all these classifiers are evaluated based on recognition accuracy. All the three methods produced good recognition accuracy. DWT and ANN produced a recognition accuracy of 89%, SVM and DWT combination produced an accuracy of 86.6% and Naive Bayes and DWT combination produced an accuracy of 83.5%. ANN is found to be better among the three methods.
Resumo:
In this paper we address the problem of face detection and recognition of grey scale frontal view images. We propose a face recognition system based on probabilistic neural networks (PNN) architecture. The system is implemented using voronoi/ delaunay tessellations and template matching. Images are segmented successfully into homogeneous regions by virtue of voronoi diagram properties. Face verification is achieved using matching scores computed by correlating edge gradients of reference images. The advantage of classification using PNN models is its short training time. The correlation based template matching guarantees good classification results
Resumo:
In Safety critical software failure can have a high price. Such software should be free of errors before it is put into operation. Application of formal methods in the Software Development Life Cycle helps to ensure that the software for safety critical missions are ultra reliable. PVS theorem prover, a formal method tool, can be used for the formal verification of software in ADA Language for Flight Software Application (ALFA.). This paper describes the modeling of ALFA programs for PVS theorem prover. An ALFA2PVS translator is developed which automatically converts the software in ALFA to PVS specification. By this approach the software can be verified formally with respect to underflow/overflow errors and divide by zero conditions without the actual execution of the code.
Resumo:
In Safety critical software failure can have a high price. Such software should be free of errors before it is put into operation. Application of formal methods in the Software Development Life Cycle helps to ensure that the software for safety critical missions are ultra reliable. PVS theorem prover, a formal method tool, can be used for the formal verification of software in ADA Language for Flight Software Application (ALFA.). This paper describes the modeling of ALFA programs for PVS theorem prover. An ALFA2PVS translator is developed which automatically converts the software in ALFA to PVS specification. By this approach the software can be verified formally with respect to underflow/overflow errors and divide by zero conditions without the actual execution of the code
Resumo:
n this paper we address the problem of face detection and recognition of grey scale frontal view images. We propose a face recognition system based on probabilistic neural networks (PNN) architecture. The system is implemented using voronoi/ delaunay tessellations and template matching. Images are segmented successfully into homogeneous regions by virtue of voronoi diagram properties. Face verification is achieved using matching scores computed by correlating edge gradients of reference images. The advantage of classification using PNN models is its short training time. The correlation based template matching guarantees good classification results.
Resumo:
Digit speech recognition is important in many applications such as automatic data entry, PIN entry, voice dialing telephone, automated banking system, etc. This paper presents speaker independent speech recognition system for Malayalam digits. The system employs Mel frequency cepstrum coefficient (MFCC) as feature for signal processing and Hidden Markov model (HMM) for recognition. The system is trained with 21 male and female voices in the age group of 20 to 40 years and there was 98.5% word recognition accuracy (94.8% sentence recognition accuracy) on a test set of continuous digit recognition task.
Resumo:
Malayalam is one of the 22 scheduled languages in India with more than 130 million speakers. This paper presents a report on the development of a speaker independent, continuous transcription system for Malayalam. The system employs Hidden Markov Model (HMM) for acoustic modeling and Mel Frequency Cepstral Coefficient (MFCC) for feature extraction. It is trained with 21 male and female speakers in the age group ranging from 20 to 40 years. The system obtained a word recognition accuracy of 87.4% and a sentence recognition accuracy of 84%, when tested with a set of continuous speech data.
Resumo:
Development of Malayalam speech recognition system is in its infancy stage; although many works have been done in other Indian languages. In this paper we present the first work on speaker independent Malayalam isolated speech recognizer based on PLP (Perceptual Linear Predictive) Cepstral Coefficient and Hidden Markov Model (HMM). The performance of the developed system has been evaluated with different number of states of HMM (Hidden Markov Model). The system is trained with 21 male and female speakers in the age group ranging from 19 to 41 years. The system obtained an accuracy of 99.5% with the unseen data
Resumo:
A connected digit speech recognition is important in many applications such as automated banking system, catalogue-dialing, automatic data entry, automated banking system, etc. This paper presents an optimum speaker-independent connected digit recognizer forMalayalam language. The system employs Perceptual Linear Predictive (PLP) cepstral coefficient for speech parameterization and continuous density Hidden Markov Model (HMM) in the recognition process. Viterbi algorithm is used for decoding. The training data base has the utterance of 21 speakers from the age group of 20 to 40 years and the sound is recorded in the normal office environment where each speaker is asked to read 20 set of continuous digits. The system obtained an accuracy of 99.5 % with the unseen data.