38 resultados para recognition system
em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast
Resumo:
A novel methodology is proposed for the development of neural network models for complex engineering systems exhibiting nonlinearity. This method performs neural network modeling by first establishing some fundamental nonlinear functions from a priori engineering knowledge, which are then constructed and coded into appropriate chromosome representations. Given a suitable fitness function, using evolutionary approaches such as genetic algorithms, a population of chromosomes evolves for a certain number of generations to finally produce a neural network model best fitting the system data. The objective is to improve the transparency of the neural networks, i.e. to produce physically meaningful
Resumo:
A scalable large vocabulary, speaker independent speech recognition system is being developed using Hidden Markov Models (HMMs) for acoustic modeling and a Weighted Finite State Transducer (WFST) to compile sentence, word, and phoneme models. The system comprises a software backend search and an FPGA-based Gaussian calculation which are covered here. In this paper, we present an efficient pipelined design implemented both as an embedded peripheral and as a scalable, parallel hardware accelerator. Both architectures have been implemented on an Alpha Data XRC-5T1, reconfigurable computer housing a Virtex 5 SX95T FPGA. The core has been tested and is capable of calculating a full set of Gaussian results from 3825 acoustic models in 9.03 ms which coupled with a backend search of 5000 words has provided an accuracy of over 80%. Parallel implementations have been designed with up to 32 cores and have been successfully implemented with a clock frequency of 133?MHz.
Resumo:
Recent work suggests that the human ear varies significantly between different subjects and can be used for identification. In principle, therefore, using ears in addition to the face within a recognition system could improve accuracy and robustness, particularly for non-frontal views. The paper describes work that investigates this hypothesis using an approach based on the construction of a 3D morphable model of the head and ear. One issue with creating a model that includes the ear is that existing training datasets contain noise and partial occlusion. Rather than exclude these regions manually, a classifier has been developed which automates this process. When combined with a robust registration algorithm the resulting system enables full head morphable models to be constructed efficiently using less constrained datasets. The algorithm has been evaluated using registration consistency, model coverage and minimalism metrics, which together demonstrate the accuracy of the approach. To make it easier to build on this work, the source code has been made available online.
Resumo:
A practically viable multi-biometric recognition system should not only be stable, robust and accurate but should also adhere to real-time processing speed and memory constraints. This study proposes a cascaded classifier-based framework for use in biometric recognition systems. The proposed framework utilises a set of weak classifiers to reduce the enrolled users' dataset to a small list of candidate users. This list is then used by a strong classifier set as the final stage of the cascade to formulate the decision. At each stage, the candidate list is generated by a Mahalanobis distance-based match score quality measure. One of the key features of the authors framework is that each classifier in the ensemble can be designed to use a different modality thus providing the advantages of a truly multimodal biometric recognition system. In addition, it is one of the first truly multimodal cascaded classifier-based approaches for biometric recognition. The performance of the proposed system is evaluated both for single and multimodalities to demonstrate the effectiveness of the approach.
Resumo:
This paper presents a new approach to speech enhancement from single-channel measurements involving both noise and channel distortion (i.e., convolutional noise), and demonstrates its applications for robust speech recognition and for improving noisy speech quality. The approach is based on finding longest matching segments (LMS) from a corpus of clean, wideband speech. The approach adds three novel developments to our previous LMS research. First, we address the problem of channel distortion as well as additive noise. Second, we present an improved method for modeling noise for speech estimation. Third, we present an iterative algorithm which updates the noise and channel estimates of the corpus data model. In experiments using speech recognition as a test with the Aurora 4 database, the use of our enhancement approach as a preprocessor for feature extraction significantly improved the performance of a baseline recognition system. In another comparison against conventional enhancement algorithms, both the PESQ and the segmental SNR ratings of the LMS algorithm were superior to the other methods for noisy speech enhancement.
Resumo:
This paper presents a new approach to single-channel speech enhancement involving both noise and channel distortion (i.e., convolutional noise). The approach is based on finding longest matching segments (LMS) from a corpus of clean, wideband speech. The approach adds three novel developments to our previous LMS research. First, we address the problem of channel distortion as well as additive noise. Second, we present an improved method for modeling noise. Third, we present an iterative algorithm for improved speech estimates. In experiments using speech recognition as a test with the Aurora 4 database, the use of our enhancement approach as a preprocessor for feature extraction significantly improved the performance of a baseline recognition system. In another comparison against conventional enhancement algorithms, both the PESQ and the segmental SNR ratings of the LMS algorithm were superior to the other methods for noisy speech enhancement. Index Terms: corpus-based speech model, longest matching segment, speech enhancement, speech recognition
Resumo:
This work presents a novel approach for human action recognition based on the combination of computer vision techniques and common-sense knowledge and reasoning capabilities. The emphasis of this work is on how common sense has to be leveraged to a vision-based human action recognition so that nonsensical errors can be amended at the understanding stage. The proposed framework is to be deployed in a realistic environment in which humans behave rationally, that is, motivated by an aim or a reason. © 2012 Springer-Verlag.
Resumo:
This paper presents the results of an experimental investigation, carried out in order to verify the feasibility of a ‘drive-by’ approach which uses a vehicle instrumented with accelerometers to detect and locate damage in a bridge. In theoretical simulations, a simplified vehicle-bridge interaction model is used to investigate the effectiveness of the approach in detecting damage in a bridge from vehicle accelerations. For this purpose, the accelerations are processed using a continuous wavelet transform and damage indicators are evaluated and compared. Alternative statistical pattern recognition techniques are incorporated to allow for repeated vehicle passes. Parameters such as vehicle speed, damage level, location and road roughness are varied in simulations to investigate the effect. A scaled laboratory experiment is carried out to assess the effectiveness of the approach in a more realistic environment, considering a number of bridge damage scenarios.
Resumo:
T cell immune responses to central nervous system-derived and other self-antigens are commonly described in both healthy and autoimmune individuals. However, in the case of the human prion protein (PrP), it has been argued that immunologic tolerance is uncommonly robust. Although development of an effective vaccine for prion disease requires breaking of tolerance to PrP, the extent of immune tolerance to PrP and the identity of immunodominant regions of the protein have not previously been determined in humans. We analyzed PrP T cell epitopes both by using a predictive algorithm and by measuring functional immune responses from healthy donors. Interestingly, clusters of epitopes were focused around the area of the polymorphic residue 129, previously identified as an indicator of susceptibility to prion disease, and in the C-terminal region. Moreover, responses were seen to PrP peptide 121-134 containing methionine at position 129, whereas PrP 121-134 [129V] was not immunogenic. The residue 129 polymorphism was also associated with distinct patterns of cytokine response: PrP 128-141 [129M] inducing IL-4 and IL-6 production, which was not seen in response to PrP 128-141 [129V]. Our data suggest that the immunogenic regions of human PrP lie between residue 107 and the C-terminus and that, like with many other central nervous system antigens, healthy individuals carry responses to PrP within the T cell repertoire and yet do not experience deleterious autoimmune reactions.
Resumo:
Face recognition with unknown, partial distortion and occlusion is a practical problem, and has a wide range of applications, including security and multimedia information retrieval. The authors present a new approach to face recognition subject to unknown, partial distortion and occlusion. The new approach is based on a probabilistic decision-based neural network, enhanced by a statistical method called the posterior union model (PUM). PUM is an approach for ignoring severely mismatched local features and focusing the recognition mainly on the reliable local features. It thereby improves the robustness while assuming no prior information about the corruption. We call the new approach the posterior union decision-based neural network (PUDBNN). The new PUDBNN model has been evaluated on three face image databases (XM2VTS, AT&T and AR) using testing images subjected to various types of simulated and realistic partial distortion and occlusion. The new system has been compared to other approaches and has demonstrated improved performance.
Resumo:
In this paper we present the application of Hidden Conditional Random Fields (HCRFs) to modelling speech for visual speech recognition. HCRFs may be easily adapted to model long range dependencies across an observation sequence. As a result visual word recognition performance can be improved as the model is able to take more of a contextual approach to generating state sequences. Results are presented from a speaker-dependent, isolated digit, visual speech recognition task using comparisons with a baseline HMM system. We firstly illustrate that word recognition rates on clean video using HCRFs can be improved by increasing the number of past and future observations being taken into account by each state. Secondly we compare model performances using various levels of video compression on the test set. As far as we are aware this is the first attempted use of HCRFs for visual speech recognition.