954 resultados para recognition system


Relevância:

70.00% 70.00%

Publicador:

Resumo:

The QUT-NOISE-SRE protocol is designed to mix the large QUT-NOISE database, consisting of over 10 hours of back- ground noise, collected across 10 unique locations covering 5 common noise scenarios, with commonly used speaker recognition datasets such as Switchboard, Mixer and the speaker recognition evaluation (SRE) datasets provided by NIST. By allowing common, clean, speech corpora to be mixed with a wide variety of noise conditions, environmental reverberant responses, and signal-to-noise ratios, this protocol provides a solid basis for the development, evaluation and benchmarking of robust speaker recognition algorithms, and is freely available to download alongside the QUT-NOISE database. In this work, we use the QUT-NOISE-SRE protocol to evaluate a state-of-the-art PLDA i-vector speaker recognition system, demonstrating the importance of designing voice-activity-detection front-ends specifically for speaker recognition, rather than aiming for perfect coherence with the true speech/non-speech boundaries.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This work describes an online handwritten character recognition system working in combination with an offline recognition system. The online input data is also converted into an offline image, and parallely recognized by both online and offline strategies. Features are proposed for offline recognition and a disambiguation step is employed in the offline system for the samples for which the confidence level of the classifier is low. The outputs are then combined probabilistically resulting in a classifier out-performing both individual systems. Experiments are performed for Kannada, a South Indian Language, over a database of 295 classes. The accuracy of the online recognizer improves by 11% when the combination with offline system is used.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

3D Face Recognition is an active area of research for past several years. For a 3D face recognition system one would like to have an accurate as well as low cost setup for constructing 3D face model. In this paper, we use Profilometry approach to obtain a 3D face model.This method gives a low cost solution to the problem of acquiring 3D data and the 3D face models generated by this method are sufficiently accurate. We also develop an algorithm that can use the 3D face model generated by the above method for the recognition purpose.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this article, we aim at reducing the error rate of the online Tamil symbol recognition system by employing multiple experts to reevaluate certain decisions of the primary support vector machine classifier. Motivated by the relatively high percentage of occurrence of base consonants in the script, a reevaluation technique has been proposed to correct any ambiguities arising in the base consonants. Secondly, a dynamic time-warping method is proposed to automatically extract the discriminative regions for each set of confused characters. Class-specific features derived from these regions aid in reducing the degree of confusion. Thirdly, statistics of specific features are proposed for resolving any confusions in vowel modifiers. The reevaluation approaches are tested on two databases (a) the isolated Tamil symbols in the IWFHR test set, and (b) the symbols segmented from a set of 10,000 Tamil words. The recognition rate of the isolated test symbols of the IWFHR database improves by 1.9 %. For the word database, the incorporation of the reevaluation step improves the symbol recognition rate by 3.5 % (from 88.4 to 91.9 %). This, in turn, boosts the word recognition rate by 11.9 % (from 65.0 to 76.9 %). The reduction in the word error rate has been achieved using a generic approach, without the incorporation of language models.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We propose to develop a 3-D optical flow features based human action recognition system. Optical flow based features are employed here since they can capture the apparent movement in object, by design. Moreover, they can represent information hierarchically from local pixel level to global object level. In this work, 3-D optical flow based features a re extracted by combining the 2-1) optical flow based features with the depth flow features obtained from depth camera. In order to develop an action recognition system, we employ a Meta-Cognitive Neuro-Fuzzy Inference System (McFIS). The m of McFIS is to find the decision boundary separating different classes based on their respective optical flow based features. McFIS consists of a neuro-fuzzy inference system (cognitive component) and a self-regulatory learning mechanism (meta-cognitive component). During the supervised learning, self-regulatory learning mechanism monitors the knowledge of the current sample with respect to the existing knowledge in the network and controls the learning by deciding on sample deletion, sample learning or sample reserve strategies. The performance of the proposed action recognition system was evaluated on a proprietary data set consisting of eight subjects. The performance evaluation with standard support vector machine classifier and extreme learning machine indicates improved performance of McFIS is recognizing actions based of 3-D optical flow based features.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper, we propose a H.264/AVC compressed domain human action recognition system with projection based metacognitive learning classifier (PBL-McRBFN). The features are extracted from the quantization parameters and the motion vectors of the compressed video stream for a time window and used as input to the classifier. Since compressed domain analysis is done with noisy, sparse compression parameters, it is a huge challenge to achieve performance comparable to pixel domain analysis. On the positive side, compressed domain allows rapid analysis of videos compared to pixel level analysis. The classification results are analyzed for different values of Group of Pictures (GOP) parameter, time window including full videos. The functional relationship between the features and action labels are established using PBL-McRBFN with a cognitive and meta-cognitive component. The cognitive component is a radial basis function, while the meta-cognitive component employs self-regulation to achieve better performance in subject independent action recognition task. The proposed approach is faster and shows comparable performance with respect to the state-of-the-art pixel domain counterparts. It employs partial decoding, which rules out the complexity of full decoding, and minimizes computational load and memory usage. This results in reduced hardware utilization and increased speed of classification. The results are compared with two benchmark datasets and show more than 90% accuracy using the PBL-McRBFN. The performance for various GOP parameters and group of frames are obtained with twenty random trials and compared with other well-known classifiers in machine learning literature. (C) 2015 Elsevier B.V. All rights reserved.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this work, we describe a system, which recognises open vocabulary, isolated, online handwritten Tamil words and extend it to recognize a paragraph of writing. We explain in detail each step involved in the process: segmentation, preprocessing, feature extraction, classification and bigram-based post-processing. On our database of 45,000 handwritten words obtained through tablet PC, we have obtained symbol level accuracy of 78.5% and 85.3% without and with the usage of post-processing using symbol level language models, respectively. Word level accuracies for the same are 40.1% and 59.6%. A line and word level segmentation strategy is proposed, which gives promising results of 100% line segmentation and 98.1% word segmentation accuracies on our initial trials of 40 handwritten paragraphs. The two modules have been combined to obtain a full-fledged page recognition system for online handwritten Tamil data. To the knowledge of the authors, this is the first ever attempt on recognition of open vocabulary, online handwritten paragraphs in any Indian language.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper describes the development of the 2003 CU-HTK large vocabulary speech recognition system for Conversational Telephone Speech (CTS). The system was designed based on a multi-pass, multi-branch structure where the output of all branches is combined using system combination. A number of advanced modelling techniques such as Speaker Adaptive Training, Heteroscedastic Linear Discriminant Analysis, Minimum Phone Error estimation and specially constructed Single Pronunciation dictionaries were employed. The effectiveness of each of these techniques and their potential contribution to the result of system combination was evaluated in the framework of a state-of-the-art LVCSR system with sophisticated adaptation. The final 2003 CU-HTK CTS system constructed from some of these models is described and its performance on the DARPA/NIST 2003 Rich Transcription (RT-03) evaluation test set is discussed.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper reports our experiences with a phoneme recognition system for the TIMIT database which uses multiple mixture continuous density monophone HMMs trained using MMI. A comprehensive set of results are presented comparing the ML and MMI training criteria for both diagonal and full covariance models. These results using simple monophone HMMs show clear performance gains achieved by MMI training, and are comparable to the best reported by others including those which use context-dependent models. In addition, the paper discusses a number of performance and implementation issues which are crucial to successful MMI training.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In spite of over two decades of intense research, illumination and pose invariance remain prohibitively challenging aspects of face recognition for most practical applications. The objective of this work is to recognize faces using video sequences both for training and recognition input, in a realistic, unconstrained setup in which lighting, pose and user motion pattern have a wide variability and face images are of low resolution. In particular there are three areas of novelty: (i) we show how a photometric model of image formation can be combined with a statistical model of generic face appearance variation, learnt offline, to generalize in the presence of extreme illumination changes; (ii) we use the smoothness of geodesically local appearance manifold structure and a robust same-identity likelihood to achieve invariance to unseen head poses; and (iii) we introduce an accurate video sequence "reillumination" algorithm to achieve robustness to face motion patterns in video. We describe a fully automatic recognition system based on the proposed method and an extensive evaluation on 171 individuals and over 1300 video sequences with extreme illumination, pose and head motion variation. On this challenging data set our system consistently demonstrated a nearly perfect recognition rate (over 99.7%), significantly outperforming state-of-the-art commercial software and methods from the literature. © Springer-Verlag Berlin Heidelberg 2006.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In spite of over two decades of intense research, illumination and pose invariance remain prohibitively challenging aspects of face recognition for most practical applications. The objective of this work is to recognize faces using video sequences both for training and recognition input, in a realistic, unconstrained setup in which lighting, pose and user motion pattern have a wide variability and face images are of low resolution. The central contribution is an illumination invariant, which we show to be suitable for recognition from video of loosely constrained head motion. In particular there are three contributions: (i) we show how a photometric model of image formation can be combined with a statistical model of generic face appearance variation to exploit the proposed invariant and generalize in the presence of extreme illumination changes; (ii) we introduce a video sequence re-illumination algorithm to achieve fine alignment of two video sequences; and (iii) we use the smoothness of geodesically local appearance manifold structure and a robust same-identity likelihood to achieve robustness to unseen head poses. We describe a fully automatic recognition system based on the proposed method and an extensive evaluation on 323 individuals and 1474 video sequences with extreme illumination, pose and head motion variation. Our system consistently achieved a nearly perfect recognition rate (over 99.7% on all four databases). © 2012 Elsevier Ltd All rights reserved.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper, we propose a new scheme for omnidirectional object-recognition in free space. The proposed scheme divides above problem into several onmidirectional object-recognition with different depression angles. An onmidirectional object-recognition system with oblique observation directions based on a new recognition theory-Biomimetic Pattern Recognition (BPR) is discussed in detail. Based on it, we can get the size of training samples in the onmidirectional object-recognition system in free space. Omnidirection ally cognitive tests were done on various kinds of animal models of rather similar shapes. For the total 8400 tests, the correct recognition rate is 99.89%. The rejection rate is 0.11% and on the condition of zero error rates. Experimental results are presented to show that the proposed approach outperforms three types of SVMs with either a three degree polynomial kernel or a radial basis function kernel.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The Double Synapse Weighted Neuron (DSWN) is a kind of general-purpose neuron model, which with the ability of configuring Hyper-sausage neuron (HSN). After introducing the design method of hardware DSWN synapse, this paper proposed a DSWN-based specific purpose neural computing device-CASSANN-IIspr. As its application, a rigid body recognition system was developed on CASSANN-IIspr, which achieved better performance than RIBF-SVMs system.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Video-based facial expression recognition is a challenging problem in computer vision and human-computer interaction. To target this problem, texture features have been extracted and widely used, because they can capture image intensity changes raised by skin deformation. However, existing texture features encounter problems with albedo and lighting variations. To solve both problems, we propose a new texture feature called image ratio features. Compared with previously proposed texture features, e. g., high gradient component features, image ratio features are more robust to albedo and lighting variations. In addition, to further improve facial expression recognition accuracy based on image ratio features, we combine image ratio features with facial animation parameters (FAPs), which describe the geometric motions of facial feature points. The performance evaluation is based on the Carnegie Mellon University Cohn-Kanade database, our own database, and the Japanese Female Facial Expression database. Experimental results show that the proposed image ratio feature is more robust to albedo and lighting variations, and the combination of image ratio features and FAPs outperforms each feature alone. In addition, we study asymmetric facial expressions based on our own facial expression database and demonstrate the superior performance of our combined expression recognition system.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper presents a new method of grouping edges in order to recognize objects. This grouping method succeeds on images of both two- and three- dimensional objects. So that the recognition system can consider first the collections of edges most likely to lead to the correct recognition of objects, we order groups of edges based on the likelihood that a single object produced them. The grouping module estimates this likelihood using the distance that separates edges and their relative orientation. This ordering greatly reduces the amount of computation required to locate objects and improves the system's robustness to error.