Biblioteca Digital

39 resultados para Automatic speech recognition (ASR)

Phonotactic language recognition using i-vectors and phoneme posteriogram counts

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes a novel approach to phonotactic LID, where instead of using soft-counts based on phoneme lattices, we use posteriogram to obtain n-gram counts. The high-dimensional vectors of counts are reduced to low-dimensional units for which we adapted the commonly used term i-vectors. The reduction is based on multinomial subspace modeling and is designed to work in the total-variability space. The proposed technique was tested on the NIST 2009 LRE set with better results to a system based on using soft-counts (Cavg on 30s: 3.15% vs 3.43%), and with very good results when fused with an acoustic i-vector LID system (Cavg on 30s acoustic 2.4% vs 1.25%). The proposed technique is also compared with another low dimensional projection system based on PCA. In comparison with the original soft-counts, the proposed technique provides better results, reduces the problems due to sparse counts, and avoids the process of using pruning techniques when creating the lattices.

Veja mais

Towards glottal source controllability in expressive speech synthesis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In order to obtain more human like sounding humanmachine interfaces we must first be able to give them expressive capabilities in the way of emotional and stylistic features so as to closely adequate them to the intended task. If we want to replicate those features it is not enough to merely replicate the prosodic information of fundamental frequency and speaking rhythm. The proposed additional layer is the modification of the glottal model, for which we make use of the GlottHMM parameters. This paper analyzes the viability of such an approach by verifying that the expressive nuances are captured by the aforementioned features, obtaining 95% recognition rates on styled speaking and 82% on emotional speech. Then we evaluate the effect of speaker bias and recording environment on the source modeling in order to quantify possible problems when analyzing multi-speaker databases. Finally we propose a speaking styles separation for Spanish based on prosodic features and check its perceptual significance.

Veja mais

Architecture for Text Sign Localization and Recognition

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes a low complexity strategy for detecting and recognizing text signs automatically. Traditional approaches use large image algorithms for detecting the text sign, followed by the application of an Optical Character Recognition (OCR) algorithm in the previously identified areas. This paper proposes a new architecture that applies the OCR to a whole lightly treated image and then carries out the text detection process of the OCR output. The strategy presented in this paper significantly reduces the processing time required for text localization in an image, while guaranteeing a high recognition rate. This strategy will facilitate the incorporation of video processing-based applications into the automatic detection of text sign similar to that of a smartphone. These applications will increase the autonomy of visually impaired people in their daily life.

Veja mais

Low-resource language recognition using a fusion of phoneme posteriorgram counts, acoustic and glottal-based i-vectors

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a description of our system for the Albayzin 2012 LRE competition. One of the main characteristics of this evaluation was the reduced number of available files for training the system, especially for the empty condition where no training data set was provided but only a development set. In addition, the whole database was created from online videos and around one third of the training data was labeled as noisy files. Our primary system was the fusion of three different i-vector based systems: one acoustic system based on MFCCs, a phonotactic system using trigrams of phone-posteriorgram counts, and another acoustic system based on RPLPs that improved robustness against noise. A contrastive system that included new features based on the glottal source was also presented. Official and postevaluation results for all the conditions using the proposed metrics for the evaluation and the Cavg metric are presented in the paper.

Veja mais

Evaluation of a transplantation algorithm for expressive speech synthesis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

When designing human-machine interfaces it is important to consider not only the bare bones functionality but also the ease of use and accessibility it provides. When talking about voice-based inter- faces, it has been proven that imbuing expressiveness into the synthetic voices increases signi?cantly its perceived naturalness, which in the end is very helpful when building user friendly interfaces. This paper proposes an adaptation based expressiveness transplantation system capable of copying the emotions of a source speaker into any desired target speaker with just a few minutes of read speech and without requiring the record- ing of additional expressive data. This system was evaluated through a perceptual test for 3 speakers showing up to an average of 52% emotion recognition rates relative to the natural voice recognition rates, while at the same time keeping good scores in similarity and naturality.

Veja mais

GMM-based classifiers for the automatic detection of obstructive sleep apnea

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of automatic pathological voice detection systems is to serve as tools, to medical specialists, for a more objective, less invasive and improved diagnosis of diseases. In this respect, the gold standard for those system include the usage of a optimized representation of the spectral envelope, either based on cepstral coef��cients from the mel-scaled Fourier spectral envelope (Mel-Frequency Cepstral Coef��cients) or from an all-pole estimation (Linear Prediction Coding Cepstral Coef��cients) forcharacterization, and Gaussian Mixture Models for posterior classi��cation. However, the study of recently proposed GMM-based classi��ers as well as Nuisance mitigation techniques, such as those employed in speaker recognition, has not been widely considered inpathology detection labours. The present work aims at testing whether or not the employment of such speaker recognition tools might contribute to improve system performance in pathology detection systems, speci��cally in the automatic detection of Obstructive Sleep Apnea. The testing procedure employs an Obstructive Sleep Apnea database, in conjunction with GMM-based classi��ers looking for a better performance. The results show that an improved performance might be obtained by using such approach.

Veja mais

Classical vs. biometric features in the 2013 speaker recognition evaluation in mobile environments

Relevância:

30.00% 30.00%

Publicador:

Resumo:

MFCC coefficients extracted from the power spectral density of speech as a whole, seems to have become the de facto standard in the area of speaker recognition, as demonstrated by its use in almost all systems submitted to the 2013 Speaker Recognition Evaluation (SRE) in Mobile Environment [1], thus relegating to background this component of the recognition systems. However, in this article we will show that selecting the adequate speaker characterization system is as important as the selection of the classifier. To accomplish this we will compare the recognition rates achieved by different recognition systems that relies on the same classifier (GMM-UBM) but connected with different feature extraction systems (based on both classical and biometric parameters). As a result we will show that a gender dependent biometric parameterization with a simple recognition system based on GMM- UBM paradigm provides very competitive or even better recognition rates when compared to more complex classification systems based on classical features

Veja mais

Detection and pattern recognition applied to leaves and chromosomes

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Project you are about to see it is based on the technologies used on object detection and recognition, especially on leaves and chromosomes. To do so, this document contains the typical parts of a scientific paper, as it is what it is. It is composed by an Abstract, an Introduction, points that have to do with the investigation area, future work, conclusions and references used for the elaboration of the document. The Abstract talks about what are we going to find in this paper, which is technologies employed on pattern detection and recognition for leaves and chromosomes and the jobs that are already made for cataloguing these objects. In the introduction detection and recognition meanings are explained. This is necessary as many papers get confused with these terms, specially the ones talking about chromosomes. Detecting an object is gathering the parts of the image that are useful and eliminating the useless parts. Summarizing, detection would be recognizing the objects borders. When talking about recognition, we are talking about the computers or the machines process, which says what kind of object we are handling. Afterwards we face a compilation of the most used technologies in object detection in general. There are two main groups on this category: Based on derivatives of images and based on ASIFT points. The ones that are based on derivatives of images have in common that convolving them with a previously created matrix does the treatment of them. This is done for detecting borders on the images, which are changes on the intensity of the pixels. Within these technologies we face two groups: Gradian based, which search for maximums and minimums on the pixels intensity as they only use the first derivative. The Laplacian based methods search for zeros on the pixels intensity as they use the second derivative. Depending on the level of details that we want to use on the final result, we will choose one option or the other, because, as its logic, if we used Gradian based methods, the computer will consume less resources and less time as there are less operations, but the quality will be worse. On the other hand, if we use the Laplacian based methods we will need more time and resources as they require more operations, but we will have a much better quality result. After explaining all the derivative based methods, we take a look on the different algorithms that are available for both groups. The other big group of technologies for object recognition is the one based on ASIFT points, which are based on 6 image parameters and compare them with another image taking under consideration these parameters. These methods disadvantage, for our future purposes, is that it is only valid for one single object. So if we are going to recognize two different leaves, even though if they refer to the same specie, we are not going to be able to recognize them with this method. It is important to mention these types of technologies as we are talking about recognition methods in general. At the end of the chapter we can see a comparison with pros and cons of all technologies that are employed. Firstly comparing them separately and then comparing them all together, based on our purposes. Recognition techniques, which are the next chapter, are not really vast as, even though there are general steps for doing object recognition, every single object that has to be recognized has its own method as the are different. This is why there is not a general method that we can specify on this chapter. We now move on into leaf detection techniques on computers. Now we will use the technique explained above based on the image derivatives. Next step will be to turn the leaf into several parameters. Depending on the document that you are referring to, there will be more or less parameters. Some papers recommend to divide the leaf into 3 main features (shape, dent and vein] and doing mathematical operations with them we can get up to 16 secondary features. Next proposition is dividing the leaf into 5 main features (Diameter, physiological length, physiological width, area and perimeter] and from those, extract 12 secondary features. This second alternative is the most used so it is the one that is going to be the reference. Following in to leaf recognition, we are based on a paper that provides a source code that, clicking on both leaf ends, it automatically tells to which specie belongs the leaf that we are trying to recognize. To do so, it only requires having a database. On the tests that have been made by the document, they assure us a 90.312% of accuracy over 320 total tests (32 plants on the database and 10 tests per specie]. Next chapter talks about chromosome detection, where we shall pass the metaphasis plate, where the chromosomes are disorganized, into the karyotype plate, which is the usual view of the 23 chromosomes ordered by number. There are two types of techniques to do this step: the skeletonization process and swiping angles. Skeletonization progress consists on suppressing the inside pixels of the chromosome to just stay with the silhouette. This method is really similar to the ones based on the derivatives of the image but the difference is that it doesnt detect the borders but the interior of the chromosome. Second technique consists of swiping angles from the beginning of the chromosome and, taking under consideration, that on a single chromosome we cannot have more than an X angle, it detects the various regions of the chromosomes. Once the karyotype plate is defined, we continue with chromosome recognition. To do so, there is a technique based on the banding that chromosomes have (grey scale bands] that make them unique. The program then detects the longitudinal axis of the chromosome and reconstructs the band profiles. Then the computer is able to recognize this chromosome. Concerning the future work, we generally have to independent techniques that dont reunite detection and recognition, so our main focus would be to prepare a program that gathers both techniques. On the leaf matter we have seen that, detection and recognition, have a link as both share the option of dividing the leaf into 5 main features. The work that would have to be done is to create an algorithm that linked both methods, as in the program, which recognizes leaves, it has to be clicked both leaf ends so it is not an automatic algorithm. On the chromosome side, we should create an algorithm that searches for the beginning of the chromosome and then start to swipe angles, to later give the parameters to the program that searches for the band profiles. Finally, on the summary, we explain why this type of investigation is needed, and that is because with global warming, lots of species (animals and plants] are beginning to extinguish. That is the reason why a big database, which gathers all the possible species, is needed. For recognizing animal species, we just only have to have the 23 chromosomes. While recognizing a plant, there are several ways of doing it, but the easiest way to input a computer is to scan the leaf of the plant. RESUMEN. El proyecto que se puede ver a continuaci��n trata sobre las tecnolog��as empleadas en la detecci��n y reconocimiento de objetos, especialmente de hojas y cromosomas. Para ello, este documento contiene las partes t��picas de un paper de investigaci��n, puesto que es de lo que se trata. As��, estar�� compuesto de Abstract, Introducci��n, diversos puntos que tengan que ver con el ��rea a investigar, trabajo futuro, conclusiones y biograf��a utilizada para la realizaci��n del documento. As��, el Abstract nos cuenta qu�� vamos a poder encontrar en este paper, que no es ni m��s ni menos que las tecnolog��as empleadas en el reconocimiento y detecci��n de patrones en hojas y cromosomas y qu�� trabajos hay existentes para catalogar a estos objetos. En la introducci��n se explican los conceptos de qu�� es la detecci��n y qu�� es el reconocimiento. Esto es necesario ya que muchos papers cient��ficos, especialmente los que hablan de cromosomas, confunden estos dos t��rminos que no pod��an ser m��s sencillos. Por un lado tendr��amos la detecci��n del objeto, que ser��a simplemente coger las partes que nos interesasen de la imagen y eliminar aquellas partes que no nos fueran ��tiles para un futuro. Resumiendo, ser��a reconocer los bordes del objeto de estudio. Cuando hablamos de reconocimiento, estamos refiri��ndonos al proceso que tiene el ordenador, o la m��quina, para decir qu�� clase de objeto estamos tratando. Seguidamente nos encontramos con un recopilatorio de las tecnolog��as m��s utilizadas para la detecci��n de objetos, en general. Aqu�� nos encontrar��amos con dos grandes grupos de tecnolog��as: Las basadas en las derivadas de im��genes y las basadas en los puntos ASIFT. El grupo de tecnolog��as basadas en derivadas de im��genes tienen en com��n que hay que tratar a las im��genes mediante una convoluci��n con una matriz creada previamente. Esto se hace para detectar bordes en las im��genes que son b��sicamente cambios en la intensidad de los p��xeles. Dentro de estas tecnolog��as nos encontramos con dos grupos: Los basados en gradientes, los cuales buscan m��ximos y m��nimos de intensidad en la imagen puesto que s��lo utilizan la primera derivada; y los Laplacianos, los cuales buscan ceros en la intensidad de los p��xeles puesto que estos utilizan la segunda derivada de la imagen. Dependiendo del nivel de detalles que queramos utilizar en el resultado final nos decantaremos por un m��todo u otro puesto que, como es l��gico, si utilizamos los basados en el gradiente habr�� menos operaciones por lo que consumir�� m��s tiempo y recursos pero por la contra tendremos menos calidad de imagen. Y al rev��s pasa con los Laplacianos, puesto que necesitan m��s operaciones y recursos pero tendr��n un resultado final con mejor calidad. Despu��s de explicar los tipos de operadores que hay, se hace un recorrido explicando los distintos tipos de algoritmos que hay en cada uno de los grupos. El otro gran grupo de tecnolog��as para el reconocimiento de objetos son los basados en puntos ASIFT, los cuales se basan en 6 par��metros de la imagen y la comparan con otra imagen teniendo en cuenta dichos par��metros. La desventaja de este m��todo, para nuestros prop��sitos futuros, es que s��lo es valido para un objeto en concreto. Por lo que si vamos a reconocer dos hojas diferentes, aunque sean de la misma especie, no vamos a poder reconocerlas mediante este m��todo. A��n as�� es importante explicar este tipo de tecnolog��as puesto que estamos hablando de t��cnicas de reconocimiento en general. Al final del cap��tulo podremos ver una comparaci��n con los pros y las contras de todas las tecnolog��as empleadas. Primeramente compar��ndolas de forma separada y, finalmente, compararemos todos los m��todos existentes en base a nuestros prop��sitos. Las t��cnicas de reconocimiento, el siguiente apartado, no es muy extenso puesto que, aunque haya pasos generales para el reconocimiento de objetos, cada objeto a reconocer es distinto por lo que no hay un m��todo espec��fico que se pueda generalizar. Pasamos ahora a las t��cnicas de detecci��n de hojas mediante ordenador. Aqu�� usaremos la t��cnica explicada previamente explicada basada en las derivadas de las im��genes. La continuaci��n de este paso ser��a diseccionar la hoja en diversos par��metros. Dependiendo de la fuente a la que se consulte pueden haber m��s o menos par��metros. Unos documentos aconsejan dividir la morfolog��a de la hoja en 3 par��metros principales (Forma, Dentina y ramificaci��n] y derivando de dichos par��metros convertirlos a 16 par��metros secundarios. La otra propuesta es dividir la morfolog��a de la hoja en 5 par��metros principales (Di��metro, longitud fisiol��gica, anchura fisiol��gica, ��rea y per��metro] y de ah�� extraer 12 par��metros secundarios. Esta segunda propuesta es la m��s utilizada de todas por lo que es la que se utilizar��. Pasamos al reconocimiento de hojas, en la cual nos hemos basado en un documento que provee un c��digo fuente que cucando en los dos extremos de la hoja autom��ticamente nos dice a qu�� especie pertenece la hoja que estamos intentando reconocer. Para ello s��lo hay que formar una base de datos. En los test realizados por el citado documento, nos aseguran que tiene un ��ndice de acierto del 90.312% en 320 test en total (32 plantas insertadas en la base de datos por 10 test que se han realizado por cada una de las especies]. El siguiente apartado trata de la detecci��n de cromosomas, en el cual se debe de pasar de la c��lula metaf��sica, donde los cromosomas est��n desorganizados, al cariotipo, que es como solemos ver los 23 cromosomas de forma ordenada. Hay dos tipos de t��cnicas para realizar este paso: Por el proceso de esquelotonizaci��n y barriendo ��ngulos. El proceso de esqueletonizaci��n consiste en eliminar los p��xeles del interior del cromosoma para quedarse con su silueta; Este proceso es similar a los m��todos de derivaci��n de los p��xeles pero se diferencia en que no detecta bordes si no que detecta el interior de los cromosomas. La segunda t��cnica consiste en ir barriendo ��ngulos desde el principio del cromosoma y teniendo en cuenta que un cromosoma no puede doblarse m��s de X grados detecta las diversas regiones de los cromosomas. Una vez tengamos el cariotipo, se continua con el reconocimiento de cromosomas. Para ello existe una t��cnica basada en las bandas de blancos y negros que tienen los cromosomas y que son las que los hacen ��nicos. Para ello el programa detecta los ejes longitudinales del cromosoma y reconstruye los perfiles de las bandas que posee el cromosoma y que lo identifican como ��nico. En cuanto al trabajo que se podr��a desempe��ar en el futuro, tenemos por lo general dos t��cnicas independientes que no unen la detecci��n con el reconocimiento por lo que se habr��a de preparar un programa que uniese estas dos t��cnicas. Respecto a las hojas hemos visto que ambos m��todos, detecci��n y reconocimiento, est��n vinculados debido a que ambos comparten la opini��n de dividir las hojas en 5 par��metros principales. El trabajo que habr��a que realizar ser��a el de crear un algoritmo que conectase a ambos ya que en el programa de reconocimiento se debe clicar a los dos extremos de la hoja por lo que no es una tarea autom��tica. En cuanto a los cromosomas, se deber��a de crear un algoritmo que busque el inicio del cromosoma y entonces empiece a barrer ��ngulos para despu��s poder d��rselo al programa que busca los perfiles de bandas de los cromosomas. Finalmente, en el resumen se explica el por qu�� hace falta este tipo de investigaci��n, esto es que con el calentamiento global, muchas de las especies (tanto animales como plantas] se est��n empezando a extinguir. Es por ello que se necesitar�� una base de datos que contemple todas las posibles especies tanto del reino animal como del reino vegetal. Para reconocer a una especie animal, simplemente bastar�� con tener sus 23 cromosomas; mientras que para reconocer a una especie vegetal, existen diversas formas. Aunque la m��s sencilla de todas es contar con la hoja de la especie puesto que es el elemento m��s f��cil de escanear e introducir en el ordenador.

Veja mais

Extended phone log-likelihood ratio features and acoustic-based I-vectors for language recognition

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents new techniques with relevant improvements added to the primary system presented by our group to the Albayzin 2012 LRE competition, where the use of any additional corpora for training or optimizing the models was forbidden. In this work, we present the incorporation of an additional phonotactic subsystem based on the use of phone log-likelihood ratio features (PLLR) extracted from different phonotactic recognizers that contributes to improve the accuracy of the system in a 21.4% in terms of Cavg (we also present results for the official metric during the evaluation, Fact). We will present how using these features at the phone state level provides significant improvements, when used together with dimensionality reduction techniques, especially PCA. We have also experimented with applying alternative SDC-like configurations on these PLLR features with additional improvements. Also, we will describe some modifications to the MFCC-based acoustic i-vector system which have also contributed to additional improvements. The final fused system outperformed the baseline in 27.4% in Cavg.

Veja mais

39 resultados para Automatic speech recognition (ASR)

Filtro por publicador