868 resultados para Discriminative model training


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Vector Taylor Series (VTS) model based compensation is a powerful approach for noise robust speech recognition. An important extension to this approach is VTS adaptive training (VAT), which allows canonical models to be estimated on diverse noise-degraded training data. These canonical model can be estimated using EM-based approaches, allowing simple extensions to discriminative VAT (DVAT). However to ensure a diagonal corrupted speech covariance matrix the Jacobian (loading matrix) relating the noise and clean speech is diagonalised. In this work an approach for yielding optimal diagonal loading matrices based on minimising the expected KL-divergence between the diagonal loading matrix and "correct" distributions is proposed. The performance of DVAT using the standard and optimal diagonalisation was evaluated on both in-car collected data and the Aurora4 task. © 2012 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Discriminative training of Gaussian Mixture Models (GMMs) for speech or speaker recognition purposes is usually based on the gradient descent method, in which the iteration step-size, ε, uses to be defined experimentally. In this letter, we derive an equation to adaptively determine ε, by showing that the second-order Newton-Raphson iterative method to find roots of equations is equivalent to the gradient descent algorithm. © 2010 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces a new tool for pattern recognition. Called the Discriminative Paraconsistent Machine (DPM), it is based on a supervised discriminative model training that incorporates paraconsistency criteria and allows an intelligent treatment of contradictions and uncertainties. DPMs can be applied to solve problems in many fields of science, using the tests and discussions presented here, which demonstrate their efficacy and usefulness. Major difficulties and challenges that were overcome consisted basically in establishing the proper model with which to represent the concept of paraconsistency.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently there has been interest in structured discriminative models for speech recognition. In these models sentence posteriors are directly modelled, given a set of features extracted from the observation sequence, and hypothesised word sequence. In previous work these discriminative models have been combined with features derived from generative models for noise-robust speech recognition for continuous digits. This paper extends this work to medium to large vocabulary tasks. The form of the score-space extracted using the generative models, and parameter tying of the discriminative model, are both discussed. Update formulae for both conditional maximum likelihood and minimum Bayes' risk training are described. Experimental results are presented on small and medium to large vocabulary noise-corrupted speech recognition tasks: AURORA 2 and 4. © 2011 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background and Purpose: Becoming proficient in laparoscopic surgery is dependent on the acquisition of specialized skills that can only be obtained from specific training. This training could be achieved in various ways using inanimate models, animal models, or live patient surgery-each with its own pros and cons. Currently, there are substantial data that support the benefits of animal model training in the initial learning of laparoscopy. Nevertheless, whether these benefits extent themselves to moderately experienced surgeons is uncertain. The purpose of this study was to determine if training using a porcine model results in a quantifiable gain in laparoscopic skills for moderately experienced laparoscopic surgeons. Materials and Methods: Six urologists with some laparoscopic experience were asked to perform a radical nephrectomy weekly for 10 weeks in a porcine model. The procedures were recorded, and surgical performance was assessed by two experienced laparoscopic surgeons using a previously published surgical performance assessment tool. The obtained data were then submitted to statistical analysis. Results: With training, blood loss was reduced approximately 45% when comparing the averages of the first and last surgical procedures (P = 0.006). Depth perception showed an improvement close to 35% (P = 0.041), and dexterity showed an improvement close to 25% (P = 0.011). Total operative time showed trends of improvement, although it was not significant (P = 0.158). Autonomy, efficiency, and tissue handling were the only aspects that did not show any noteworthy change (P = 0.202, P = 0.677, and P = 0.456, respectively). Conclusions: These findings suggest that there are quantifiable gains in laparoscopic skills obtained from training in an animal model. Our results suggest that these benefits also extend to more advanced stages of the learning curve, but it is unclear how far along the learning curve training with animal models provides a clear benefit for the performance of laparoscopic procedures. Future studies are necessary to confirm these findings and better understand the impact of this learning tool on surgical practice.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Model based compensation schemes are a powerful approach for noise robust speech recognition. Recently there have been a number of investigations into adaptive training, and estimating the noise models used for model adaptation. This paper examines the use of EM-based schemes for both canonical models and noise estimation, including discriminative adaptive training. One issue that arises when estimating the noise model is a mismatch between the noise estimation approximation and final model compensation scheme. This paper proposes FA-style compensation where this mismatch is eliminated, though at the expense of a sensitivity to the initial noise estimates. EM-based discriminative adaptive training is evaluated on in-car and Aurora4 tasks. FA-style compensation is then evaluated in an incremental mode on the in-car task. © 2011 IEEE.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper presents a novel method of audio-visual feature-level fusion for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowledge about the corruption. Furthermore, we assume there are limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new multimodal feature representation and a modified cosine similarity are introduced to combine and compare bimodal features with limited training data, as well as vastly differing data rates and feature sizes. Optimal feature selection and multicondition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Experiments have been carried out on a bimodal dataset created from the SPIDRE speaker recognition database and AR face recognition database with variable noise corruption of speech and occlusion in the face images. The system's speaker identification performance on the SPIDRE database, and facial identification performance on the AR database, is comparable with the literature. Combining both modalities using the new method of multimodal fusion leads to significantly improved accuracy over the unimodal systems, even when both modalities have been corrupted. The new method also shows improved identification accuracy compared with the bimodal systems based on multicondition model training or missing-feature decoding alone.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

L'increment de bases de dades que cada vegada contenen imatges més difícils i amb un nombre més elevat de categories, està forçant el desenvolupament de tècniques de representació d'imatges que siguin discriminatives quan es vol treballar amb múltiples classes i d'algorismes que siguin eficients en l'aprenentatge i classificació. Aquesta tesi explora el problema de classificar les imatges segons l'objecte que contenen quan es disposa d'un gran nombre de categories. Primerament s'investiga com un sistema híbrid format per un model generatiu i un model discriminatiu pot beneficiar la tasca de classificació d'imatges on el nivell d'anotació humà sigui mínim. Per aquesta tasca introduïm un nou vocabulari utilitzant una representació densa de descriptors color-SIFT, i desprès s'investiga com els diferents paràmetres afecten la classificació final. Tot seguit es proposa un mètode par tal d'incorporar informació espacial amb el sistema híbrid, mostrant que la informació de context es de gran ajuda per la classificació d'imatges. Desprès introduïm un nou descriptor de forma que representa la imatge segons la seva forma local i la seva forma espacial, tot junt amb un kernel que incorpora aquesta informació espacial en forma piramidal. La forma es representada per un vector compacte obtenint un descriptor molt adequat per ésser utilitzat amb algorismes d'aprenentatge amb kernels. Els experiments realitzats postren que aquesta informació de forma te uns resultats semblants (i a vegades millors) als descriptors basats en aparença. També s'investiga com diferents característiques es poden combinar per ésser utilitzades en la classificació d'imatges i es mostra com el descriptor de forma proposat juntament amb un descriptor d'aparença millora substancialment la classificació. Finalment es descriu un algoritme que detecta les regions d'interès automàticament durant l'entrenament i la classificació. Això proporciona un mètode per inhibir el fons de la imatge i afegeix invariança a la posició dels objectes dins les imatges. S'ensenya que la forma i l'aparença sobre aquesta regió d'interès i utilitzant els classificadors random forests millora la classificació i el temps computacional. Es comparen els postres resultats amb resultats de la literatura utilitzant les mateixes bases de dades que els autors Aixa com els mateixos protocols d'aprenentatge i classificació. Es veu com totes les innovacions introduïdes incrementen la classificació final de les imatges.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

OBJETIVO: Embora vários modelos de bancada inanimados tenham sido descritos para o treinamento de habilidades de sutura, até o momento, não existe um método ideal para esse ensino e aprendizagem durante a formação médica. O objetivo foi avaliar se a fidelidade dos modelos de bancada interfere na aquisição de habilidades de sutura em estudantes de medicina iniciantes na prática cirúrgica. MÉTODOS: 36 estudantes de medicina sem exposição prévia a habilidades cirúrgicas foram randomizados em três grupos (n = 12): treinamento de suturas baseado em materiais didáticos (controle); treinamento de suturas em modelo de baixa-fidelidade (modelo de bancada de etileno vinil acetato); ou treinamento de suturas em modelo de alta-fidelidade (modelo de bancada de pele de pata de porco). Foram aplicados pré e pós-testes (realização de pontos simples e pontos subdérmicos invertidos em língua de boi). Três ferramentas (Global Rating Scale com avaliação cega, tamanho do efeito e autopercepção da confiança baseada em uma escala de Likert) foram utilizadas para mensurar todas as performances de sutura. RESULTADOS: A análise após o treinamento demonstrou que os estudantes que treinaram nos modelos tiveram um melhor (p < 0.0000) desempenho na avaliação pela Global Rating Scale, quando comparados com o controle, independente da fidelidade do modelo. A magnitude do efeito (treinamento) foi considerada grande (> 0.80) em todas as mensurações. Após o treinamento os alunos sentiram-se mais confiantes (p < 0.0000) para executarem os dois tipos de suturas. CONCLUSÃO: A aquisição de habilidades de suturas no modelo de baixa fidelidade foi semelhante à prática no modelo de alta fidelidade, sendo que a melhora no desempenho dos participantes que treinaram nesses dois modelos foi superior à aprendizagem baseada em materiais didáticos.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This is the first part of a study investigating a model-based transient calibration process for diesel engines. The motivation is to populate hundreds of parameters (which can be calibrated) in a methodical and optimum manner by using model-based optimization in conjunction with the manual process so that, relative to the manual process used by itself, a significant improvement in transient emissions and fuel consumption and a sizable reduction in calibration time and test cell requirements is achieved. Empirical transient modelling and optimization has been addressed in the second part of this work, while the required data for model training and generalization are the focus of the current work. Transient and steady-state data from a turbocharged multicylinder diesel engine have been examined from a model training perspective. A single-cylinder engine with external air-handling has been used to expand the steady-state data to encompass transient parameter space. Based on comparative model performance and differences in the non-parametric space, primarily driven by a high engine difference between exhaust and intake manifold pressures (ΔP) during transients, it has been recommended that transient emission models should be trained with transient training data. It has been shown that electronic control module (ECM) estimates of transient charge flow and the exhaust gas recirculation (EGR) fraction cannot be accurate at the high engine ΔP frequently encountered during transient operation, and that such estimates do not account for cylinder-to-cylinder variation. The effects of high engine ΔP must therefore be incorporated empirically by using transient data generated from a spectrum of transient calibrations. Specific recommendations on how to choose such calibrations, how many data to acquire, and how to specify transient segments for data acquisition have been made. Methods to process transient data to account for transport delays and sensor lags have been developed. The processed data have then been visualized using statistical means to understand transient emission formation. Two modes of transient opacity formation have been observed and described. The first mode is driven by high engine ΔP and low fresh air flowrates, while the second mode is driven by high engine ΔP and high EGR flowrates. The EGR fraction is inaccurately estimated at both modes, while EGR distribution has been shown to be present but unaccounted for by the ECM. The two modes and associated phenomena are essential to understanding why transient emission models are calibration dependent and furthermore how to choose training data that will result in good model generalization.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This dissertation, whose research has been conducted at the Group of Electronic and Microelectronic Design (GDEM) within the framework of the project Power Consumption Control in Multimedia Terminals (PCCMUTE), focuses on the development of an energy estimation model for the battery-powered embedded processor board. The main objectives and contributions of the work are summarized as follows: A model is proposed to obtain the accurate energy estimation results based on the linear correlation between the performance monitoring counters (PMCs) and energy consumption. the uniqueness of the appropriate PMCs for each different system, the modeling methodology is improved to obtain stable accuracies with slight variations among multiple scenarios and to be repeatable in other systems. It includes two steps: the former, the PMC-filter, to identify the most proper set among the available PMCs of a system and the latter, the k-fold cross validation method, to avoid the bias during the model training stage. The methodology is implemented on a commercial embedded board running the 2.6.34 Linux kernel and the PAPI, a cross-platform interface to configure and access PMCs. The results show that the methodology is able to keep a good stability in different scenarios and provide robust estimation results with the average relative error being less than 5%. Este trabajo fin de máster, cuya investigación se ha desarrollado en el Grupo de Diseño Electrónico y Microelectrónico (GDEM) en el marco del proyecto PccMuTe, se centra en el desarrollo de un modelo de estimación de energía para un sistema empotrado alimentado por batería. Los objetivos principales y las contribuciones de esta tesis se resumen como sigue: Se propone un modelo para obtener estimaciones precisas del consumo de energía de un sistema empotrado. El modelo se basa en la correlación lineal entre los valores de los contadores de prestaciones y el consumo de energía. Considerando la particularidad de los contadores de prestaciones en cada sistema, la metodología de modelado se ha mejorado para obtener precisiones estables, con ligeras variaciones entre escenarios múltiples y para replicar los resultados en diferentes sistemas. La metodología incluye dos etapas: la primera, filtrado-PMC, que consiste en identificar el conjunto más apropiado de contadores de prestaciones de entre los disponibles en un sistema y la segunda, el método de validación cruzada de K iteraciones, cuyo fin es evitar los sesgos durante la fase de entrenamiento. La metodología se implementa en un sistema empotrado que ejecuta el kernel 2.6.34 de Linux y PAPI, un interfaz multiplataforma para configurar y acceder a los contadores. Los resultados muestran que esta metodología consigue una buena estabilidad en diferentes escenarios y proporciona unos resultados robustos de estimación con un error medio relativo inferior al 5%.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper, we propose a speech recognition engine using hybrid model of Hidden Markov Model (HMM) and Gaussian Mixture Model (GMM). Both the models have been trained independently and the respective likelihood values have been considered jointly and input to a decision logic which provides net likelihood as the output. This hybrid model has been compared with the HMM model. Training and testing has been done by using a database of 20 Hindi words spoken by 80 different speakers. Recognition rates achieved by normal HMM are 83.5% and it gets increased to 85% by using the hybrid approach of HMM and GMM.