34 resultados para MFCC


Relevância:

10.00% 10.00%

Publicador:

Resumo:

A primary medium for the human beings to communicate through language is Speech. Automatic Speech Recognition is wide spread today. Recognizing single digits is vital to a number of applications such as voice dialling of telephone numbers, automatic data entry, credit card entry, PIN (personal identification number) entry, entry of access codes for transactions, etc. In this paper we present a comparative study of SVM (Support Vector Machine) and HMM (Hidden Markov Model) to recognize and identify the digits used in Malayalam speech.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Speech is the primary, most prominent and convenient means of communication in audible language. Through speech, people can express their thoughts, feelings or perceptions by the articulation of words. Human speech is a complex signal which is non stationary in nature. It consists of immensely rich information about the words spoken, accent, attitude of the speaker, expression, intention, sex, emotion as well as style. The main objective of Automatic Speech Recognition (ASR) is to identify whatever people speak by means of computer algorithms. This enables people to communicate with a computer in a natural spoken language. Automatic recognition of speech by machines has been one of the most exciting, significant and challenging areas of research in the field of signal processing over the past five to six decades. Despite the developments and intensive research done in this area, the performance of ASR is still lower than that of speech recognition by humans and is yet to achieve a completely reliable performance level. The main objective of this thesis is to develop an efficient speech recognition system for recognising speaker independent isolated words in Malayalam.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Presently different audio watermarking methods are available; most of them inclined towards copyright protection and copy protection. This is the key motive for the notion to develop a speaker verification scheme that guar- antees non-repudiation services and the thesis is its outcome. The research presented in this thesis scrutinizes the field of audio water- marking and the outcome is a speaker verification scheme that is proficient in addressing issues allied to non-repudiation to a great extent. This work aimed in developing novel audio watermarking schemes utilizing the fun- damental ideas of Fast-Fourier Transform (FFT) or Fast Walsh-Hadamard Transform (FWHT). The Mel-Frequency Cepstral Coefficients (MFCC) the best parametric representation of the acoustic signals along with few other key acoustic characteristics is employed in crafting of new schemes. The au- dio watermark created is entirely dependent to the acoustic features, hence named as FeatureMark and is crucial in this work. In any watermarking scheme, the quality of the extracted watermark de- pends exclusively on the pre-processing action and in this work framing and windowing techniques are involved. The theme non-repudiation provides immense significance in the audio watermarking schemes proposed in this work. Modification of the signal spectrum is achieved in a variety of ways by selecting appropriate FFT/FWHT coefficients and the watermarking schemes were evaluated for imperceptibility, robustness and capacity char- acteristics. The proposed schemes are unequivocally effective in terms of maintaining the sound quality, retrieving the embedded FeatureMark and in terms of the capacity to hold the mark bits. Robust nature of these marking schemes is achieved with the help of syn- chronization codes such as Barker Code with FFT based FeatureMarking scheme and Walsh Code with FWHT based FeatureMarking scheme. An- other important feature associated with this scheme is the employment of an encryption scheme towards the preparation of its FeatureMark that scrambles the signal features that helps to keep the signal features unreve- laed. A comparative study with the existing watermarking schemes and the ex- periments to evaluate imperceptibility, robustness and capacity tests guar- antee that the proposed schemes can be baselined as efficient audio water- marking schemes. The four new digital audio watermarking algorithms in terms of their performance are remarkable thereby opening more opportu- nities for further research.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In neuroscience, the extracellular actions potentials of neurons are the most important signals, which are called spikes. However, a single extracellular electrode can capture spikes from more than one neuron. Spike sorting is an important task to diagnose various neural activities. The more we can understand neurons the more we can cure more neural diseases. The process of sorting these spikes is typically made in some steps which are detection, feature extraction and clustering. In this paper we propose to use the Mel-frequency cepstral coefficients (MFCC) to extract spike features associated with Hidden Markov model (HMM) in the clustering step. Our results show that using MFCC features can differentiate between spikes more clearly than the other feature extraction methods, and also using HMM as a clustering algorithm also yields a better sorting accuracy.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Brain Computer Interface (BCI) is playing a very important role in human machine communications. Recent communication systems depend on the brain signals for communication. In these systems, users clearly manipulate their brain activity rather than using motor movements in order to generate signals that could be used to give commands and control any communication devices, robots or computers. In this paper, the aim was to estimate the performance of a brain computer interface (BCI) system by detecting the prosthetic motor imaginary tasks by using only a single channel of electroencephalography (EEG). The participant is asked to imagine moving his arm up or down and our system detects the movement based on the participant brain signal. Some features are extracted from the brain signal using Mel-Frequency Cepstrum Coefficient and based on these feature a Hidden Markov model is used to help in knowing if the participant imagined moving up or down. The major advantage in our method is that only one channel is needed to take the decision. Moreover, the method is online which means that it can give the decision as soon as the signal is given to the system. Hundred signals were used for testing, on average 89 % of the up down prosthetic motor imaginary tasks were detected correctly. This method can be used in many different applications such as: moving artificial prosthetic limbs and wheelchairs due to it's high speed and accuracy.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this dissertation, the theoretical principles governing the molecular modeling were applied for electronic characterization of oligopeptide α3 and its variants (5Q, 7Q)-α3, as well as in the quantum description of the interaction of the aminoglycoside hygromycin B and the 30S subunit of bacterial ribosome. In the first study, the linear and neutral dipeptides which make up the mentioned oligopeptides were modeled and then optimized for a structure of lower potential energy and appropriate dihedral angles. In this case, three subsequent geometric optimization processes, based on classical Newtonian theory, the semi-empirical and density functional theory (DFT), explore the energy landscape of each dipeptide during the search of ideal minimum energy structures. Finally, great conformers were described about its electrostatic potential, ionization energy (amino acids), and frontier molecular orbitals and hopping term. From the hopping terms described in this study, it was possible in subsequent studies to characterize the charge transport propertie of these peptides models. It envisioned a new biosensor technology capable of diagnosing amyloid diseases, related to an accumulation of misshapen proteins, based on the conductivity displayed by proteins of the patient. In a second step of this dissertation, a study carried out by quantum molecular modeling of the interaction energy of an antibiotic ribosomal aminoglicosídico on your receiver. It is known that the hygromycin B (hygB) is an aminoglycoside antibiotic that affects ribosomal translocation by direct interaction with the small subunit of the bacterial ribosome (30S), specifically with nucleotides in helix 44 of the 16S ribosomal RNA (16S rRNA). Due to strong electrostatic character of this connection, it was proposed an energetic investigation of the binding mechanism of this complex using different values of dielectric constants (ε = 0, 4, 10, 20 and 40), which have been widely used to study the electrostatic properties of biomolecules. For this, increasing radii centered on the hygB centroid were measured from the 30S-hygB crystal structure (1HNZ.pdb), and only the individual interaction energy of each enclosed nucleotide was determined for quantum calculations using molecular fractionation with conjugate caps (MFCC) strategy. It was noticed that the dielectric constants underestimated the energies of individual interactions, allowing the convergence state is achieved quickly. But only for ε = 40, the total binding energy of drug-receptor interaction is stabilized at r = 18A, which provided an appropriate binding pocket because it encompassed the main residues that interact more strongly with the hygB - C1403, C1404, G1405, A1493, G1494, U1495, U1498 and C1496. Thus, the dielectric constant ≈ 40 is ideal for the treatment of systems with many electrical charges. By comparing the individual binding energies of 16S rRNA nucleotides with the experimental tests that determine the minimum inhibitory concentration (MIC) of hygB, it is believed that those residues with high binding values generated bacterial resistance to the drug when mutated. With the same reasoning, since those with low interaction energy do not influence effectively the affinity of the hygB in its binding site, there is no loss of effectiveness if they were replaced.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Currently, computational methods have been increasingly used to aid in the characterization of molecular biological systems, especially when they relevant to human health. Ibuprofen is a nonsteroidal antiinflammatory or broadband use in the clinic. Once in the bloodstream, most of ibuprofen is linked to human serum albumin, the major protein of blood plasma, decreasing its bioavailability and requiring larger doses to produce its antiinflamatory action. This study aimes to characterize, through the interaction energy, how is the binding of ibuprofen to albumin and to establish what are the main amino acids and molecular interactions involved in the process. For this purpouse, it was conducted an in silico study, by using quantum mechanical calculations based on Density Functional Theory (DFT), with Generalized Gradient approximation (GGA) to describe the effects of exchange and correlation. The interaction energy of each amino acid belonging to the binding site to the ligand was calculated the using the method of molecular fragmentation with conjugated caps (MFCC). Besides energy, we calculated the distances, types of molecular interactions and atomic groups involved. The theoretical models used were satisfactory and show a more accurate description when the dielectric constant ε = 40 was used. The findings corroborate the literature in which the Sudlow site I (I-FA3) is the primary binding site and the site I-FA6 as secondary site. However, it differs in identifying the most important amino acids, which by interaction energy, in order of decreasing energy, are: Arg410, Lys414, Ser 489, Leu453 and Tyr411 to the I-Site FA3 and Leu481, Ser480, Lys351, Val482 and Arg209 to the site I-FA6. The quantification of interaction energy and description of the most important amino acids opens new avenues for studies aiming at manipulating the structure of ibuprofen, in order to decrease its interaction with albumin, and consequently increase its distribution

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The automatic speech recognition by machine has been the target of researchers in the past five decades. In this period have been numerous advances, such as in the field of recognition of isolated words (commands), which has very high rates of recognition, currently. However, we are still far from developing a system that could have a performance similar to the human being (automatic continuous speech recognition). One of the great challenges of searches for continuous speech recognition is the large amount of pattern. The modern languages such as English, French, Spanish and Portuguese have approximately 500,000 words or patterns to be identified. The purpose of this study is to use smaller units than the word such as phonemes, syllables and difones units as the basis for the speech recognition, aiming to recognize any words without necessarily using them. The main goal is to reduce the restriction imposed by the excessive amount of patterns. In order to validate this proposal, the system was tested in the isolated word recognition in dependent-case. The phonemes characteristics of the Brazil s Portuguese language were used to developed the hierarchy decision system. These decisions are made through the use of neural networks SVM (Support Vector Machines). The main speech features used were obtained from the Wavelet Packet Transform. The descriptors MFCC (Mel-Frequency Cepstral Coefficient) are also used in this work. It was concluded that the method proposed in this work, showed good results in the steps of recognition of vowels, consonants (syllables) and words when compared with other existing methods in literature

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The human voice is an important communication tool and any disorder of the voice can have profound implications for social and professional life of an individual. Techniques of digital signal processing have been used by acoustic analysis of vocal disorders caused by pathologies in the larynx, due to its simplicity and noninvasive nature. This work deals with the acoustic analysis of voice signals affected by pathologies in the larynx, specifically, edema, and nodules on the vocal folds. The purpose of this work is to develop a classification system of voices to help pre-diagnosis of pathologies in the larynx, as well as monitoring pharmacological treatments and after surgery. Linear Prediction Coefficients (LPC), Mel Frequency cepstral coefficients (MFCC) and the coefficients obtained through the Wavelet Packet Transform (WPT) are applied to extract relevant characteristics of the voice signal. For the classification task is used the Support Vector Machine (SVM), which aims to build optimal hyperplanes that maximize the margin of separation between the classes involved. The hyperplane generated is determined by the support vectors, which are subsets of points in these classes. According to the database used in this work, the results showed a good performance, with a hit rate of 98.46% for classification of normal and pathological voices in general, and 98.75% in the classification of diseases together: edema and nodules

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present a novel approach for detecting severe obstructive sleep apnea (OSA) cases by introducing non-linear analysis into sustained speech characterization. The proposed scheme was designed for providing additional information into our baseline system, built on top of state-of-the-art cepstral domain modeling techniques, aiming to improve accuracy rates. This new information is lightly correlated with our previous MFCC modeling of sustained speech and uncorrelated with the information in our continuous speech modeling scheme. Tests have been performed to evaluate the improvement for our detection task, based on sustained speech as well as combined with a continuous speech classifier, resulting in a 10% relative reduction in classification for the first and a 33% relative reduction for the fused scheme. Results encourage us to consider the existence of non-linear effects on OSA patients' voices, and to think about tools which could be used to improve short-time analysis.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present a novel approach for the detection of severe obstructive sleep apnea (OSA) based on patients' voices introducing nonlinear measures to describe sustained speech dynamics. Nonlinear features were combined with state-of-the-art speech recognition systems using statistical modeling techniques (Gaussian mixture models, GMMs) over cepstral parameterization (MFCC) for both continuous and sustained speech. Tests were performed on a database including speech records from both severe OSA and control speakers. A 10 % relative reduction in classification error was obtained for sustained speech when combining MFCC-GMM and nonlinear features, and 33 % when fusing nonlinear features with both sustained and continuous MFCC-GMM. Accuracy reached 88.5 % allowing the system to be used in OSA early detection. Tests showed that nonlinear features and MFCCs are lightly correlated on sustained speech, but uncorrelated on continuous speech. Results also suggest the existence of nonlinear effects in OSA patients' voices, which should be found in continuous speech.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Los procedimientos de evaluación de la calidad de la voz basados en la valoración subjetiva a través de la percepción acústica por parte de un experto están bastante extendidos. Entre ellos,el protocolo GRBAS es el más comúnmente utilizado en la rutina clínica. Sin embargo existen varios problemas derivados de este tipo de estimaciones, el primero de los cuales es que se precisa de profesionales debidamente entrenados para su realización. Otro inconveniente reside en el hecho de que,al tratarse de una valoración subjetiva, múltiples circunstancias significativas influyen en la decisión final del evaluador, existiendo en muchos casos una variabilidad inter-evaluador e intra-evaluador en los juicios. Por estas razones se hace necesario el uso de parámetros objetivos que permitan realizar una valoración de la calidad de la voz y la detección de diversas patologías. Este trabajo tiene como objetivo comparar la efectividad de diversas técnicas de cálculo de parámetros representativos de la voz para su uso en la clasificación automática de escalas perceptuales. Algunos parámetros analizados serán los coeficientes Mel-Frequency Cepstral Coefficients(MFCC),las medidas de complejidad y las de ruido.Así mismo se introducirá un nuevo conjunto de características extraídas del Espectro de Modulación (EM) denominadas Centroides del Espectro de Modulación (CEM).En concreto se analizará el proceso de detección automática de dos de los cinco rasgos que componen la escala GRBAS: G y R. A lo largo de este documento se muestra cómo las características CEM proporcionan resultados similares a los de otras técnicas anteriormente utilizadas y propician en algún caso un incremento en la efectividad de la clasificación cuando son combinados con otros parámetros.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

MFCC coefficients extracted from the power spectral density of speech as a whole, seems to have become the de facto standard in the area of speaker recognition, as demonstrated by its use in almost all systems submitted to the 2013 Speaker Recognition Evaluation (SRE) in Mobile Environment [1], thus relegating to background this component of the recognition systems. However, in this article we will show that selecting the adequate speaker characterization system is as important as the selection of the classifier. To accomplish this we will compare the recognition rates achieved by different recognition systems that relies on the same classifier (GMM-UBM) but connected with different feature extraction systems (based on both classical and biometric parameters). As a result we will show that a gender dependent biometric parameterization with a simple recognition system based on GMM- UBM paradigm provides very competitive or even better recognition rates when compared to more complex classification systems based on classical features

Relevância:

10.00% 10.00%

Publicador:

Resumo:

La cuestión principal abordada en esta tesis doctoral es la mejora de los sistemas biométricos de reconocimiento de personas a partir de la voz, proponiendo el uso de una nueva parametrización, que hemos denominado parametrización biométrica extendida dependiente de género (GDEBP en sus siglas en inglés). No se propone una ruptura completa respecto a los parámetros clásicos sino una nueva forma de utilizarlos y complementarlos. En concreto, proponemos el uso de parámetros diferentes dependiendo del género del locutor, ya que como es bien sabido, la voz masculina y femenina presentan características diferentes que deberán modelarse, por tanto, de diferente manera. Además complementamos los parámetros clásicos utilizados (MFFC extraídos de la señal de voz), con un nuevo conjunto de parámetros extraídos a partir de la deconstrucción de la señal de voz en sus componentes de fuente glótica (más relacionada con el proceso y órganos de fonación y por tanto con características físicas del locutor) y de tracto vocal (más relacionada con la articulación acústica y por tanto con el mensaje emitido). Para verificar la validez de esta propuesta se plantean diversos escenarios, utilizando diferentes bases de datos, para validar que la GDEBP permite generar una descripción más precisa de los locutores que los parámetros MFCC clásicos independientes del género. En concreto se plantean diferentes escenarios de identificación sobre texto restringido y texto independiente utilizando las bases de datos de HESPERIA y ALBAYZIN. El trabajo también se completa con la participación en dos competiciones internacionales de reconocimiento de locutor, NIST SRE (2010 y 2012) y MOBIO 2013. En el primer caso debido a la naturaleza de las bases de datos utilizadas se obtuvieron resultados cercanos al estado del arte, mientras que en el segundo de los casos el sistema presentado obtuvo la mejor tasa de reconocimiento para locutores femeninos. A pesar de que el objetivo principal de esta tesis no es el estudio de sistemas de clasificación, sí ha sido necesario analizar el rendimiento de diferentes sistemas de clasificación, para ver el rendimiento de la parametrización propuesta. En concreto, se ha abordado el uso de sistemas de reconocimiento basados en el paradigma GMM-UBM, supervectores e i-vectors. Los resultados que se presentan confirman que la utilización de características que permitan describir los locutores de manera más precisa es en cierto modo más importante que la elección del sistema de clasificación utilizado por el sistema. En este sentido la parametrización propuesta supone un paso adelante en la mejora de los sistemas de reconocimiento biométrico de personas por la voz, ya que incluso con sistemas de clasificación relativamente simples se consiguen tasas de reconocimiento realmente competitivas. ABSTRACT The main question addressed in this thesis is the improvement of automatic speaker recognition systems, by the introduction of a new front-end module that we have called Gender Dependent Extended Biometric Parameterisation (GDEBP). This front-end do not constitute a complete break with respect to classical parameterisation techniques used in speaker recognition but a new way to obtain these parameters while introducing some complementary ones. Specifically, we propose a gender-dependent parameterisation, since as it is well known male and female voices have different characteristic, and therefore the use of different parameters to model these distinguishing characteristics should provide a better characterisation of speakers. Additionally, we propose the introduction of a new set of biometric parameters extracted from the components which result from the deconstruction of the voice into its glottal source estimate (close related to the phonation process and the involved organs, and therefore the physical characteristics of the speaker) and vocal tract estimate (close related to acoustic articulation and therefore to the spoken message). These biometric parameters constitute a complement to the classical MFCC extracted from the power spectral density of speech as a whole. In order to check the validity of this proposal we establish different practical scenarios, using different databases, so we can conclude that a GDEBP generates a more accurate description of speakers than classical approaches based on gender-independent MFCC. Specifically, we propose scenarios based on text-constrain and text-independent test using HESPERIA and ALBAYZIN databases. This work is also completed with the participation in two international speaker recognition evaluations: NIST SRE (2010 and 2012) and MOBIO 2013, with diverse results. In the first case, due to the nature of the NIST databases, we obtain results closed to state-of-the-art although confirming our hypothesis, whereas in the MOBIO SRE we obtain the best simple system performance for female speakers. Although the study of classification systems is beyond the scope of this thesis, we found it necessary to analise the performance of different classification systems, in order to verify the effect of them on the propose parameterisation. In particular, we have addressed the use of speaker recognition systems based on the GMM-UBM paradigm, supervectors and i-vectors. The presented results confirm that the selection of a set of parameters that allows for a more accurate description of the speakers is as important as the selection of the classification method used by the biometric system. In this sense, the proposed parameterisation constitutes a step forward in improving speaker recognition systems, since even when using relatively simple classification systems, really competitive recognition rates are achieved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents new techniques with relevant improvements added to the primary system presented by our group to the Albayzin 2012 LRE competition, where the use of any additional corpora for training or optimizing the models was forbidden. In this work, we present the incorporation of an additional phonotactic subsystem based on the use of phone log-likelihood ratio features (PLLR) extracted from different phonotactic recognizers that contributes to improve the accuracy of the system in a 21.4% in terms of Cavg (we also present results for the official metric during the evaluation, Fact). We will present how using these features at the phone state level provides significant improvements, when used together with dimensionality reduction techniques, especially PCA. We have also experimented with applying alternative SDC-like configurations on these PLLR features with additional improvements. Also, we will describe some modifications to the MFCC-based acoustic i-vector system which have also contributed to additional improvements. The final fused system outperformed the baseline in 27.4% in Cavg.