45 resultados para speech recognition
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
En aquest projecte es fa una introducció als reconeixedors de la parla, el seu funcionament i la seva base matemàtica. Un cop tots els conceptes han quedat clars, es mostra el mètode de creació que hem seguit per obtenir el nostre propi reconeixedor de la parla, utilitzant les eines HTK, en català. S’avaluen les seves virtuts i els seus defectes a través de diferents proves realitzades als seus components. A més a més, el projecte arrodoneix la feina implementant un sistema de dictat automàtic que explota el reconeixedor de la parla utilitzant Julius.
Resumo:
La interacció home-màquina per mitjà de la veu cobreix moltes àrees d’investigació. Es destaquen entre altres, el reconeixement de la parla, la síntesis i identificació de discurs, la verificació i identificació de locutor i l’activació per veu (ordres) de sistemes robòtics. Reconèixer la parla és natural i simple per a les persones, però és un treball complex per a les màquines, pel qual existeixen diverses metodologies i tècniques, entre elles les Xarxes Neuronals. L’objectiu d’aquest treball és desenvolupar una eina en Matlab per al reconeixement i identificació de paraules pronunciades per un locutor, entre un conjunt de paraules possibles, i amb una bona fiabilitat dins d’uns marges preestablerts. El sistema és independent del locutor que pronuncia la paraula, és a dir, aquest locutor no haurà intervingut en el procés d’entrenament del sistema. S’ha dissenyat una interfície que permet l’adquisició del senyal de veu i el seu processament mitjançant xarxes neuronals i altres tècniques. Adaptant una part de control al sistema, es podria utilitzar per donar ordres a un robot com l’Alfa6Uvic o qualsevol altre dispositiu.
Resumo:
La interacció home-màquina per mitjà de la veu cobreix moltes àrees d’investigació. Es destaquen entre altres, el reconeixement de la parla, la síntesis i identificació de discurs, la verificació i identificació de locutor i l’activació per veu (ordres) de sistemes robòtics. Reconèixer la parla és natural i simple per a les persones, però és un treball complex per a les màquines, pel qual existeixen diverses metodologies i tècniques, entre elles les Xarxes Neuronals. L’objectiu d’aquest treball és desenvolupar una eina en Matlab per al reconeixement i identificació de paraules pronunciades per un locutor, entre un conjunt de paraules possibles, i amb una bona fiabilitat dins d’uns marges preestablerts. El sistema és independent del locutor que pronuncia la paraula, és a dir, aquest locutor no haurà intervingut en el procés d’entrenament del sistema. S’ha dissenyat una interfície que permet l’adquisició del senyal de veu i el seu processament mitjançant xarxes neuronals i altres tècniques. Adaptant una part de control al sistema, es podria utilitzar per donar ordres a un robot com l’Alfa6Uvic o qualsevol altre dispositiu.
Resumo:
In this paper we propose the inversion of nonlinear distortions in order to improve the recognition rates of a speaker recognizer system. We study the effect of saturations on the test signals, trying to take into account real situations where the training material has been recorded in a controlled situation but the testing signals present some mismatch with the input signal level (saturations). The experimental results for speaker recognition shows that a combination of several strategies can improve the recognition rates with saturated test sentences from 80% to 89.39%, while the results with clean speech (without saturation) is 87.76% for one microphone, and for speaker identification can reduce the minimum detection cost function with saturated test sentences from 6.42% to 4.15%, while the results with clean speech (without saturation) is 5.74% for one microphone and 7.02% for the other one.
Resumo:
We describe a series of experiments in which we start with English to French and English to Japanese versions of an Open Source rule-based speech translation system for a medical domain, and bootstrap correspondign statistical systems. Comparative evaluation reveals that the rule-based systems are still significantly better than the statistical ones, despite the fact that considerable effort has been invested in tuning both the recognition and translation components; also, a hybrid system only marginally improved recall at the cost of a los in precision. The result suggests that rule-based architectures may still be preferable to statistical ones for safety-critical speech translation tasks.
Resumo:
In this paper we propose the inversion of nonlinear distortions in order to improve the recognition rates of a speaker recognizer system. We study the effect of saturations on the test signals, trying to take into account real situations where the training material has been recorded in a controlled situation but the testing signals present some mismatch with the input signal level (saturations). The experimental results shows that a combination of several strategies can improve the recognition rates with saturated test sentences from 80% to 89.39%, while the results with clean speech (without saturation) is 87.76% for one microphone.
Resumo:
The purpose of our project is to contribute to earlier diagnosis of AD and better estimates of its severity by using automatic analysis performed through new biomarkers extracted from non-invasive intelligent methods. The methods selected in this case are speech biomarkers oriented to Sponta-neous Speech and Emotional Response Analysis. Thus the main goal of the present work is feature search in Spontaneous Speech oriented to pre-clinical evaluation for the definition of test for AD diagnosis by One-class classifier. One-class classifi-cation problem differs from multi-class classifier in one essen-tial aspect. In one-class classification it is assumed that only information of one of the classes, the target class, is available. In this work we explore the problem of imbalanced datasets that is particularly crucial in applications where the goal is to maximize recognition of the minority class as in medical diag-nosis. The use of information about outlier and Fractal Dimen-sion features improves the system performance.
Resumo:
"Es tracta d'un projecte dividit en dues parts independents però complementàries, realitzades per autors diferents. Aquest document conté originàriament altre material i/o programari només consultable a la Biblioteca de Ciència i Tecnologia"
Resumo:
Report for the scientific sojourn at the Swiss Federal Institute of Technology Zurich, Switzerland, between September and December 2007. In order to make robots useful assistants for our everyday life, the ability to learn and recognize objects is of essential importance. However, object recognition in real scenes is one of the most challenging problems in computer vision, as it is necessary to deal with difficulties. Furthermore, in mobile robotics a new challenge is added to the list: computational complexity. In a dynamic world, information about the objects in the scene can become obsolete before it is ready to be used if the detection algorithm is not fast enough. Two recent object recognition techniques have achieved notable results: the constellation approach proposed by Lowe and the bag of words approach proposed by Nistér and Stewénius. The Lowe constellation approach is the one currently being used in the robot localization project of the COGNIRON project. This report is divided in two main sections. The first section is devoted to briefly review the currently used object recognition system, the Lowe approach, and bring to light the drawbacks found for object recognition in the context of indoor mobile robot navigation. Additionally the proposed improvements for the algorithm are described. In the second section the alternative bag of words method is reviewed, as well as several experiments conducted to evaluate its performance with our own object databases. Furthermore, some modifications to the original algorithm to make it suitable for object detection in unsegmented images are proposed.
Resumo:
The automatic interpretation of conventional traffic signs is very complex and time consuming. The paper concerns an automatic warning system for driving assistance. It does not interpret the standard traffic signs on the roadside; the proposal is to incorporate into the existing signs another type of traffic sign whose information will be more easily interpreted by a processor. The type of information to be added is profuse and therefore the most important object is the robustness of the system. The basic proposal of this new philosophy is that the co-pilot system for automatic warning and driving assistance can interpret with greater ease the information contained in the new sign, whilst the human driver only has to interpret the "classic" sign. One of the codings that has been tested with good results and which seems to us easy to implement is that which has a rectangular shape and 4 vertical bars of different colours. The size of these signs is equivalent to the size of the conventional signs (approximately 0.4 m2). The colour information from the sign can be easily interpreted by the proposed processor and the interpretation is much easier and quicker than the information shown by the pictographs of the classic signs
Real-Time implementation of a blind authentication method using self-synchronous speech watermarking
Resumo:
A blind speech watermarking scheme that meets hard real-time deadlines is presented and implemented. In addition, one of the key issues in these block-oriented watermarking techniques is to preserve the synchronization. Namely, to recover the exact position of each block in the mark extract process. In fact, the presented scheme can be split up into two distinguished parts, the synchronization and the information mark methods. The former is embedded into the time domain and it is fast enough to be run meeting real-time requirements. The latter contains the authentication information and it is embedded into the wavelet domain. The synchronization and information mark techniques are both tunable in order to allow a con gurable method. Thus, capacity, transparency and robustness can be con gured depending on the needs. It makes the scheme useful for professional applications, such telephony authentication or even sending information throw radio applications.
Resumo:
Los hablantes bilingües tienen un acceso al léxico más lento y menos robusto que los monolingües, incluso cuando hablan en su lengua materna y dominante. Este fenómeno, comúnmente llamado “la desventaja bilingüe” también se observa en hablantes de una segunda lengua en comparación con hablantes de una primera lengua. Una causa que posiblemente contribuya a estas desventajas es el uso de control inhibitorio durante la producción del lenguaje: la inhibición de palabras coactivadas de la lengua actualmente no en uso puede prevenir intrusiones de dicha lengua, pero al mismo tiempo ralentizar la producción del lenguaje. El primer objetivo de los estudios descritos en este informe era testear esta hipótesis mediante diferentes predicciones generadas por teorías de control inhibitorio del lenguaje. Un segundo objetivo era investigar la extensión de la desventaja bilingüe dentro y fuera de la producción de palabras aisladas, así como avanzar en el conocimiento de las variables que la modulan. En lo atingente al primer objetivo, la evidencia obtenida es incompatible con un control inhibitorio global, desafiando la idea de mecanismos específicos en el hablante bilingüe utilizados para la selección léxica. Esto implica que una explicación común para el control de lenguaje y la desventaja bilingüe en el acceso al léxico es poco plausible. En cuanto al segundo objetivo, los resultados muestran que (a) la desventaja bilingüe no tiene un impacto al acceso a la memoria; (b) la desventaja bilingüe extiende a la producción del habla conectada; y (c) similitudes entre lenguas a diferentes niveles de representación así como la frecuencia de uso son factores que modulan la desventaja bilingüe.
Resumo:
This paper describes a systematic research about free software solutions and techniques for art imagery computer recognition problem.
Resumo:
Background: Single Nucleotide Polymorphisms, among other type of sequence variants, constitute key elements in genetic epidemiology and pharmacogenomics. While sequence data about genetic variation is found at databases such as dbSNP, clues about the functional and phenotypic consequences of the variations are generally found in biomedical literature. The identification of the relevant documents and the extraction of the information from them are hampered by the large size of literature databases and the lack of widely accepted standard notation for biomedical entities. Thus, automatic systems for the identification of citations of allelic variants of genes in biomedical texts are required. Results: Our group has previously reported the development of OSIRIS, a system aimed at the retrieval of literature about allelic variants of genes http://ibi.imim.es/osirisform.html. Here we describe the development of a new version of OSIRIS (OSIRISv1.2, http://ibi.imim.es/OSIRISv1.2.html webcite) which incorporates a new entity recognition module and is built on top of a local mirror of the MEDLINE collection and HgenetInfoDB: a database that collects data on human gene sequence variations. The new entity recognition module is based on a pattern-based search algorithm for the identification of variation terms in the texts and their mapping to dbSNP identifiers. The performance of OSIRISv1.2 was evaluated on a manually annotated corpus, resulting in 99% precision, 82% recall, and an F-score of 0.89. As an example, the application of the system for collecting literature citations for the allelic variants of genes related to the diseases intracranial aneurysm and breast cancer is presented. Conclusion: OSIRISv1.2 can be used to link literature references to dbSNP database entries with high accuracy, and therefore is suitable for collecting current knowledge on gene sequence variations and supporting the functional annotation of variation databases. The application of OSIRISv1.2 in combination with controlled vocabularies like MeSH provides a way to identify associations of biomedical interest, such as those that relate SNPs with diseases.