998 resultados para speech segmentation


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Les néphropaties (maladie des tissus rénaux) postradiques constituent l'un des facteurs limitants pour l'élaboration des plans de traitement lors des radiothérapies abdominales. Le processus actuel, qui consiste à évaluer la fonctionnalité relative des reins grâce à une scintigraphie gamma deux dimensions, ne permet pas d'identifier les portions fonctionnelles qui pourraient être évitées lors de l' élaboration des plans de traitement. Une méthode permettant de cartographier la fonctionnalité rénale en trois dimensions et d'extraire un contour fonctionnel utilisable lors de la planification a été développée à partir de CT double énergie injectés à l'iode. La concentration en produit de contraste est considérée reliée à la fonctionnalité rénale. La technique utilisée repose sur la décomposition à trois matériaux permettant de reconstruire des images en concentration d'iode. Un algorithme de segmentation semi-automatisé basé sur la déformation hiérarchique et anamorphique de surfaces permet ensuite d'extraire le contour fonctionnel des reins. Les premiers résultats obtenus avec des images patient démontrent qu'une utilisation en clinique est envisageable et pourra être bénéfique.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Le foie est un organe vital ayant une capacité de régénération exceptionnelle et un rôle crucial dans le fonctionnement de l’organisme. L’évaluation du volume du foie est un outil important pouvant être utilisé comme marqueur biologique de sévérité de maladies hépatiques. La volumétrie du foie est indiquée avant les hépatectomies majeures, l’embolisation de la veine porte et la transplantation. La méthode la plus répandue sur la base d'examens de tomodensitométrie (TDM) et d'imagerie par résonance magnétique (IRM) consiste à délimiter le contour du foie sur plusieurs coupes consécutives, un processus appelé la «segmentation». Nous présentons la conception et la stratégie de validation pour une méthode de segmentation semi-automatisée développée à notre institution. Notre méthode représente une approche basée sur un modèle utilisant l’interpolation variationnelle de forme ainsi que l’optimisation de maillages de Laplace. La méthode a été conçue afin d’être compatible avec la TDM ainsi que l' IRM. Nous avons évalué la répétabilité, la fiabilité ainsi que l’efficacité de notre méthode semi-automatisée de segmentation avec deux études transversales conçues rétrospectivement. Les résultats de nos études de validation suggèrent que la méthode de segmentation confère une fiabilité et répétabilité comparables à la segmentation manuelle. De plus, cette méthode diminue de façon significative le temps d’interaction, la rendant ainsi adaptée à la pratique clinique courante. D’autres études pourraient incorporer la volumétrie afin de déterminer des marqueurs biologiques de maladie hépatique basés sur le volume tels que la présence de stéatose, de fer, ou encore la mesure de fibrose par unité de volume.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Medical fields requires fast, simple and noninvasive methods of diagnostic techniques. Several methods are available and possible because of the growth of technology that provides the necessary means of collecting and processing signals. The present thesis details the work done in the field of voice signals. New methods of analysis have been developed to understand the complexity of voice signals, such as nonlinear dynamics aiming at the exploration of voice signals dynamic nature. The purpose of this thesis is to characterize complexities of pathological voice from healthy signals and to differentiate stuttering signals from healthy signals. Efficiency of various acoustic as well as non linear time series methods are analysed. Three groups of samples are used, one from healthy individuals, subjects with vocal pathologies and stuttering subjects. Individual vowels/ and a continuous speech data for the utterance of the sentence "iruvarum changatimaranu" the meaning in English is "Both are good friends" from Malayalam language are recorded using a microphone . The recorded audio are converted to digital signals and are subjected to analysis.Acoustic perturbation methods like fundamental frequency (FO), jitter, shimmer, Zero Crossing Rate(ZCR) were carried out and non linear measures like maximum lyapunov exponent(Lamda max), correlation dimension (D2), Kolmogorov exponent(K2), and a new measure of entropy viz., Permutation entropy (PE) are evaluated for all three groups of the subjects. Permutation Entropy is a nonlinear complexity measure which can efficiently distinguish regular and complex nature of any signal and extract information about the change in dynamics of the process by indicating sudden change in its value. The results shows that nonlinear dynamical methods seem to be a suitable technique for voice signal analysis, due to the chaotic component of the human voice. Permutation entropy is well suited due to its sensitivity to uncertainties, since the pathologies are characterized by an increase in the signal complexity and unpredictability. Pathological groups have higher entropy values compared to the normal group. The stuttering signals have lower entropy values compared to the normal signals.PE is effective in charaterising the level of improvement after two weeks of speech therapy in the case of stuttering subjects. PE is also effective in characterizing the dynamical difference between healthy and pathological subjects. This suggests that PE can improve and complement the recent voice analysis methods available for clinicians. The work establishes the application of the simple, inexpensive and fast algorithm of PE for diagnosis in vocal disorders and stuttering subjects.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis investigates the potential use of zerocrossing information for speech sample estimation. It provides 21 new method tn) estimate speech samples using composite zerocrossings. A simple linear interpolation technique is developed for this purpose. By using this method the A/D converter can be avoided in a speech coder. The newly proposed zerocrossing sampling theory is supported with results of computer simulations using real speech data. The thesis also presents two methods for voiced/ unvoiced classification. One of these methods is based on a distance measure which is a function of short time zerocrossing rate and short time energy of the signal. The other one is based on the attractor dimension and entropy of the signal. Among these two methods the first one is simple and reguires only very few computations compared to the other. This method is used imtea later chapter to design an enhanced Adaptive Transform Coder. The later part of the thesis addresses a few problems in Adaptive Transform Coding and presents an improved ATC. Transform coefficient with maximum amplitude is considered as ‘side information’. This. enables more accurate tfiiz assignment enui step—size computation. A new bit reassignment scheme is also introduced in this work. Finally, sum ATC which applies switching between luiscrete Cosine Transform and Discrete Walsh-Hadamard Transform for voiced and unvoiced speech segments respectively is presented. Simulation results are provided to show the improved performance of the coder

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The research problem selected for this study is one of the important issues in the field of financial market and its marketing dimensions on which researchers and academicians encourage more research studies. This research study may be relevant considering its significance in terms of some possible findings which may be useful to Fls in framing successful market segmentation approach to turn their dissatisfied and ‘merely' satisfied customers into ‘delighted’ customers, which in turn can result in better savings mobilisation. The household segments may also be benefited from the research findings if they bring about an attitudinal change in their savings behaviour. The importance of the study may be briefly highlighted in the following points. The research study examines existing theories on market segmentation by Fls and the findings might supplement the existing theories on this topic. The study brings to light certain clues to strengthen market segmentation approach of Fls.The study throws light on the existing beliefs and perceptions on customer behaviour which may be useful in effecting some positive changes in market segmentation approach by Fls. The study suggests certain relationship between market segmentation variables and customer behaviour in the context of marketing of financial products by Fls. The study supplements the existing knowledge on different dimension of market segmentation in the financial market which might encourage future research in the field.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Biometrics deals with the physiological and behavioral characteristics of an individual to establish identity. Fingerprint based authentication is the most advanced biometric authentication technology. The minutiae based fingerprint identification method offer reasonable identification rate. The feature minutiae map consists of about 70-100 minutia points and matching accuracy is dropping down while the size of database is growing up. Hence it is inevitable to make the size of the fingerprint feature code to be as smaller as possible so that identification may be much easier. In this research, a novel global singularity based fingerprint representation is proposed. Fingerprint baseline, which is the line between distal and intermediate phalangeal joint line in the fingerprint, is taken as the reference line. A polygon is formed with the singularities and the fingerprint baseline. The feature vectors are the polygonal angle, sides, area, type and the ridge counts in between the singularities. 100% recognition rate is achieved in this method. The method is compared with the conventional minutiae based recognition method in terms of computation time, receiver operator characteristics (ROC) and the feature vector length. Speech is a behavioural biometric modality and can be used for identification of a speaker. In this work, MFCC of text dependant speeches are computed and clustered using k-means algorithm. A backpropagation based Artificial Neural Network is trained to identify the clustered speech code. The performance of the neural network classifier is compared with the VQ based Euclidean minimum classifier. Biometric systems that use a single modality are usually affected by problems like noisy sensor data, non-universality and/or lack of distinctiveness of the biometric trait, unacceptable error rates, and spoof attacks. Multifinger feature level fusion based fingerprint recognition is developed and the performances are measured in terms of the ROC curve. Score level fusion of fingerprint and speech based recognition system is done and 100% accuracy is achieved for a considerable range of matching threshold

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The work is intended to study the following important aspects of document image processing and develop new methods. (1) Segmentation ofdocument images using adaptive interval valued neuro-fuzzy method. (2) Improving the segmentation procedure using Simulated Annealing technique. (3) Development of optimized compression algorithms using Genetic Algorithm and parallel Genetic Algorithm (4) Feature extraction of document images (5) Development of IV fuzzy rules. This work also helps for feature extraction and foreground and background identification. The proposed work incorporates Evolutionary and hybrid methods for segmentation and compression of document images. A study of different neural networks used in image processing, the study of developments in the area of fuzzy logic etc is carried out in this work

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis investigated the potential use of Linear Predictive Coding in speech communication applications. A Modified Block Adaptive Predictive Coder is developed, which reduces the computational burden and complexity without sacrificing the speech quality, as compared to the conventional adaptive predictive coding (APC) system. For this, changes in the evaluation methods have been evolved. This method is as different from the usual APC system in that the difference between the true and the predicted value is not transmitted. This allows the replacement of the high order predictor in the transmitter section of a predictive coding system, by a simple delay unit, which makes the transmitter quite simple. Also, the block length used in the processing of the speech signal is adjusted relative to the pitch period of the signal being processed rather than choosing a constant length as hitherto done by other researchers. The efficiency of the newly proposed coder has been supported with results of computer simulation using real speech data. Three methods for voiced/unvoiced/silent/transition classification have been presented. The first one is based on energy, zerocrossing rate and the periodicity of the waveform. The second method uses normalised correlation coefficient as the main parameter, while the third method utilizes a pitch-dependent correlation factor. The third algorithm which gives the minimum error probability has been chosen in a later chapter to design the modified coder The thesis also presents a comparazive study beh-cm the autocorrelation and the covariance methods used in the evaluaiicn of the predictor parameters. It has been proved that the azztocorrelation method is superior to the covariance method with respect to the filter stabf-it)‘ and also in an SNR sense, though the increase in gain is only small. The Modified Block Adaptive Coder applies a switching from pitch precitzion to spectrum prediction when the speech segment changes from a voiced or transition region to an unvoiced region. The experiments cont;-:ted in coding, transmission and simulation, used speech samples from .\£=_‘ajr2_1a:r1 and English phrases. Proposal for a speaker reecgnifion syste: and a phoneme identification system has also been outlized towards the end of the thesis.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Speech processing and consequent recognition are important areas of Digital Signal Processing since speech allows people to communicate more natu-rally and efficiently. In this work, a speech recognition system is developed for re-cognizing digits in Malayalam. For recognizing speech, features are to be ex-tracted from speech and hence feature extraction method plays an important role in speech recognition. Here, front end processing for extracting the features is per-formed using two wavelet based methods namely Discrete Wavelet Transforms (DWT) and Wavelet Packet Decomposition (WPD). Naive Bayes classifier is used for classification purpose. After classification using Naive Bayes classifier, DWT produced a recognition accuracy of 83.5% and WPD produced an accuracy of 80.7%. This paper is intended to devise a new feature extraction method which produces improvements in the recognition accuracy. So, a new method called Dis-crete Wavelet Packet Decomposition (DWPD) is introduced which utilizes the hy-brid features of both DWT and WPD. The performance of this new approach is evaluated and it produced an improved recognition accuracy of 86.2% along with Naive Bayes classifier.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Speech is the most natural means of communication among human beings and speech processing and recognition are intensive areas of research for the last five decades. Since speech recognition is a pattern recognition problem, classification is an important part of any speech recognition system. In this work, a speech recognition system is developed for recognizing speaker independent spoken digits in Malayalam. Voice signals are sampled directly from the microphone. The proposed method is implemented for 1000 speakers uttering 10 digits each. Since the speech signals are affected by background noise, the signals are tuned by removing the noise from it using wavelet denoising method based on Soft Thresholding. Here, the features from the signals are extracted using Discrete Wavelet Transforms (DWT) because they are well suitable for processing non-stationary signals like speech. This is due to their multi- resolutional, multi-scale analysis characteristics. Speech recognition is a multiclass classification problem. So, the feature vector set obtained are classified using three classifiers namely, Artificial Neural Networks (ANN), Support Vector Machines (SVM) and Naive Bayes classifiers which are capable of handling multiclasses. During classification stage, the input feature vector data is trained using information relating to known patterns and then they are tested using the test data set. The performances of all these classifiers are evaluated based on recognition accuracy. All the three methods produced good recognition accuracy. DWT and ANN produced a recognition accuracy of 89%, SVM and DWT combination produced an accuracy of 86.6% and Naive Bayes and DWT combination produced an accuracy of 83.5%. ANN is found to be better among the three methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents methods for moving object detection in airborne video surveillance. The motion segmentation in the above scenario is usually difficult because of small size of the object, motion of camera, and inconsistency in detected object shape etc. Here we present a motion segmentation system for moving camera video, based on background subtraction. An adaptive background building is used to take advantage of creation of background based on most recent frame. Our proposed system suggests CPU efficient alternative for conventional batch processing based background subtraction systems. We further refine the segmented motion by meanshift based mode association.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Digit speech recognition is important in many applications such as automatic data entry, PIN entry, voice dialing telephone, automated banking system, etc. This paper presents speaker independent speech recognition system for Malayalam digits. The system employs Mel frequency cepstrum coefficient (MFCC) as feature for signal processing and Hidden Markov model (HMM) for recognition. The system is trained with 21 male and female voices in the age group of 20 to 40 years and there was 98.5% word recognition accuracy (94.8% sentence recognition accuracy) on a test set of continuous digit recognition task.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Performance of any continuous speech recognition system is dependent on the accuracy of its acoustic model. Hence, preparation of a robust and accurate acoustic model lead to satisfactory recognition performance for a speech recognizer. In acoustic modeling of phonetic unit, context information is of prime importance as the phonemes are found to vary according to the place of occurrence in a word. In this paper we compare and evaluate the effect of context dependent tied (CD tied) models, context dependent (CD) and context independent (CI) models in the perspective of continuous speech recognition of Malayalam language. The database for the speech recognition system has utterance from 21 speakers including 11 female and 10 males. Our evaluation results show that CD tied models outperforms CI models over 21%.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A connected digit speech recognition is important in many applications such as automated banking system, catalogue-dialing, automatic data entry, automated banking system, etc. This paper presents an optimum speaker-independent connected digit recognizer forMalayalam language. The system employs Perceptual Linear Predictive (PLP) cepstral coefficient for speech parameterization and continuous density Hidden Markov Model (HMM) in the recognition process. Viterbi algorithm is used for decoding. The training data base has the utterance of 21 speakers from the age group of 20 to 40 years and the sound is recorded in the normal office environment where each speaker is asked to read 20 set of continuous digits. The system obtained an accuracy of 99.5 % with the unseen data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Modeling nonlinear systems using Volterra series is a century old method but practical realizations were hampered by inadequate hardware to handle the increased computational complexity stemming from its use. But interest is renewed recently, in designing and implementing filters which can model much of the polynomial nonlinearities inherent in practical systems. The key advantage in resorting to Volterra power series for this purpose is that nonlinear filters so designed can be made to work in parallel with the existing LTI systems, yielding improved performance. This paper describes the inclusion of a quadratic predictor (with nonlinearity order 2) with a linear predictor in an analog source coding system. Analog coding schemes generally ignore the source generation mechanisms but focuses on high fidelity reconstruction at the receiver. The widely used method of differential pnlse code modulation (DPCM) for speech transmission uses a linear predictor to estimate the next possible value of the input speech signal. But this linear system do not account for the inherent nonlinearities in speech signals arising out of multiple reflections in the vocal tract. So a quadratic predictor is designed and implemented in parallel with the linear predictor to yield improved mean square error performance. The augmented speech coder is tested on speech signals transmitted over an additive white gaussian noise (AWGN) channel.