133 resultados para task recognition
em Indian Institute of Science - Bangalore - Índia
Resumo:
Fallibility is inherent in human cognition and so a system that will monitor performance is indispensable. While behavioral evidence for such a system derives from the finding that subjects slow down after trials that are likely to produce errors, the neural and behavioral characterization that enables such control is incomplete. Here, we report a specific role for dopamine/basal ganglia in response conflict by accessing deficits in performance monitoring in patients with Parkinson's disease. To characterize such a deficit, we used a modification of the oculomotor countermanding task to show that slowing down of responses that generate robust response conflict, and not post-error per se, is deficient in Parkinson's disease patients. Poor performance adjustment could be either due to impaired ability to slow RT subsequent to conflicts or due to impaired response conflict recognition. If the latter hypothesis was true, then PD subjects should show evidence of impaired error detection/correction, which was found to be the case. These results make a strong case for impaired performance monitoring in Parkinson's patients.
Resumo:
In this paper, we consider the problem of time series classification. Using piecewise linear interpolation various novel kernels are obtained which can be used with Support vector machines for designing classifiers capable of deciding the class of a given time series. The approach is general and is applicable in many scenarios. We apply the method to the task of Online Tamil handwritten character recognition with promising results.
Resumo:
In this paper, we use optical flow based complex-valued features extracted from video sequences to recognize human actions. The optical flow features between two image planes can be appropriately represented in the Complex plane. Therefore, we argue that motion information that is used to model the human actions should be represented as complex-valued features and propose a fast learning fully complex-valued neural classifier to solve the action recognition task. The classifier, termed as, ``fast learning fully complex-valued neural (FLFCN) classifier'' is a single hidden layer fully complex-valued neural network. The neurons in the hidden layer employ the fully complex-valued activation function of the type of a hyperbolic secant function. The parameters of the hidden layer are chosen randomly and the output weights are estimated as the minimum norm least square solution to a set of linear equations. The results indicate the superior performance of FLFCN classifier in recognizing the actions compared to real-valued support vector machines and other existing results in the literature. Complex valued representation of 2D motion and orthogonal decision boundaries boost the classification performance of FLFCN classifier. (c) 2012 Elsevier B.V. All rights reserved.
Resumo:
Acoustic modeling using mixtures of multivariate Gaussians is the prevalent approach for many speech processing problems. Computing likelihoods against a large set of Gaussians is required as a part of many speech processing systems and it is the computationally dominant phase for Large Vocabulary Continuous Speech Recognition (LVCSR) systems. We express the likelihood computation as a multiplication of matrices representing augmented feature vectors and Gaussian parameters. The computational gain of this approach over traditional methods is by exploiting the structure of these matrices and efficient implementation of their multiplication. In particular, we explore direct low-rank approximation of the Gaussian parameter matrix and indirect derivation of low-rank factors of the Gaussian parameter matrix by optimum approximation of the likelihood matrix. We show that both the methods lead to similar speedups but the latter leads to far lesser impact on the recognition accuracy. Experiments on 1,138 work vocabulary RM1 task and 6,224 word vocabulary TIMIT task using Sphinx 3.7 system show that, for a typical case the matrix multiplication based approach leads to overall speedup of 46 % on RM1 task and 115 % for TIMIT task. Our low-rank approximation methods provide a way for trading off recognition accuracy for a further increase in computational performance extending overall speedups up to 61 % for RM1 and 119 % for TIMIT for an increase of word error rate (WER) from 3.2 to 3.5 % for RM1 and for no increase in WER for TIMIT. We also express pairwise Euclidean distance computation phase in Dynamic Time Warping (DTW) in terms of matrix multiplication leading to saving of approximately of computational operations. In our experiments using efficient implementation of matrix multiplication, this leads to a speedup of 5.6 in computing the pairwise Euclidean distances and overall speedup up to 3.25 for DTW.
Resumo:
In this paper, we discuss the issues related to word recognition in born-digital word images. We introduce a novel method of power-law transformation on the word image for binarization. We show the improvement in image binarization and the consequent increase in the recognition performance of OCR engine on the word image. The optimal value of gamma for a word image is automatically chosen by our algorithm with fixed stroke width threshold. We have exhaustively experimented our algorithm by varying the gamma and stroke width threshold value. By varying the gamma value, we found that our algorithm performed better than the results reported in the literature. On the ICDAR Robust Reading Systems Challenge-1: Word Recognition Task on born digital dataset, as compared to the recognition rate of 61.5% achieved by TH-OCR after suitable pre-processing by Yang et. al. and 63.4% by ABBYY Fine Reader (used as baseline by the competition organizers without any preprocessing), we achieved 82.9% using Omnipage OCR applied on the images after being processed by our algorithm.
Resumo:
In this paper, we report a breakthrough result on the difficult task of segmentation and recognition of coloured text from the word image dataset of ICDAR robust reading competition challenge 2: reading text in scene images. We split the word image into individual colour, gray and lightness planes and enhance the contrast of each of these planes independently by a power-law transform. The discrimination factor of each plane is computed as the maximum between-class variance used in Otsu thresholding. The plane that has maximum discrimination factor is selected for segmentation. The trial version of Omnipage OCR is then used on the binarized words for recognition. Our recognition results on ICDAR 2011 and ICDAR 2003 word datasets are compared with those reported in the literature. As baseline, the images binarized by simple global and local thresholding techniques were also recognized. The word recognition rate obtained by our non-linear enhancement and selection of plance method is 72.8% and 66.2% for ICDAR 2011 and 2003 word datasets, respectively. We have created ground-truth for each image at the pixel level to benchmark these datasets using a toolkit developed by us. The recognition rate of benchmarked images is 86.7% and 83.9% for ICDAR 2011 and 2003 datasets, respectively.
Resumo:
Facial emotions are the most expressive way to display emotions. Many algorithms have been proposed which employ a particular set of people (usually a database) to both train and test their model. This paper focuses on the challenging task of database independent emotion recognition, which is a generalized case of subject-independent emotion recognition. The emotion recognition system employed in this work is a Meta-Cognitive Neuro-Fuzzy Inference System (McFIS). McFIS has two components, a neuro-fuzzy inference system, which is the cognitive component and a self-regulatory learning mechanism, which is the meta-cognitive component. The meta-cognitive component, monitors the knowledge in the neuro-fuzzy inference system and decides on what-to-learn, when-to-learn and how-to-learn the training samples, efficiently. For each sample, the McFIS decides whether to delete the sample without being learnt, use it to add/prune or update the network parameter or reserve it for future use. This helps the network avoid over-training and as a result improve its generalization performance over untrained databases. In this study, we extract pixel based emotion features from well-known (Japanese Female Facial Expression) JAFFE and (Taiwanese Female Expression Image) TFEID database. Two sets of experiment are conducted. First, we study the individual performance of both databases on McFIS based on 5-fold cross validation study. Next, in order to study the generalization performance, McFIS trained on JAFFE database is tested on TFEID and vice-versa. The performance The performance comparison in both experiments against SVNI classifier gives promising results.
Resumo:
In this paper, we propose a H.264/AVC compressed domain human action recognition system with projection based metacognitive learning classifier (PBL-McRBFN). The features are extracted from the quantization parameters and the motion vectors of the compressed video stream for a time window and used as input to the classifier. Since compressed domain analysis is done with noisy, sparse compression parameters, it is a huge challenge to achieve performance comparable to pixel domain analysis. On the positive side, compressed domain allows rapid analysis of videos compared to pixel level analysis. The classification results are analyzed for different values of Group of Pictures (GOP) parameter, time window including full videos. The functional relationship between the features and action labels are established using PBL-McRBFN with a cognitive and meta-cognitive component. The cognitive component is a radial basis function, while the meta-cognitive component employs self-regulation to achieve better performance in subject independent action recognition task. The proposed approach is faster and shows comparable performance with respect to the state-of-the-art pixel domain counterparts. It employs partial decoding, which rules out the complexity of full decoding, and minimizes computational load and memory usage. This results in reduced hardware utilization and increased speed of classification. The results are compared with two benchmark datasets and show more than 90% accuracy using the PBL-McRBFN. The performance for various GOP parameters and group of frames are obtained with twenty random trials and compared with other well-known classifiers in machine learning literature. (C) 2015 Elsevier B.V. All rights reserved.
Resumo:
Selective and discriminative detection of -NO2 containing high energy organic compounds such as picric acid (PA), 2,4,6-trinitrotoluene (TNT) and dinitrotoluene (DNT) has become a challenging task due to concerns over national security, criminal investigations and environment protections. Among various known detection methods, fluorescence techniques have gained special attention in recent time. A wide variety of fluorescent chemosensors have been developed for nitroaromatic explosive detection. In this review article, we provide an overview of the recent developments made in small molecule-based turn-off fluorescent sensors for nitroaromatic explosives with special focus on organic and H-bonded supramolecular sensors. The fluorescent sensors discussed in this review are classified and organized according to their functionality and their recognition of nitroaromatics by fluorescence quenching.
Resumo:
Semi-rigid molecular tweezers 1, 3 and 4 bind picric acid with more than tenfold increment in tetrachloromethane as compared to chloroform.
Resumo:
The baculovirus expression system using the Autographa californica nuclear polyhedrosis virus (AcNPV) has been extensively utilized for high-level expression of cloned foreign genes, driven by the strong viral promoters of polyhedrin (polh) and p10 encoding genes. A parallel system using Bombyx mori nuclear polyhedrosis virus (BmNPV) is much less exploited because the choice and variety of BmNPV-based transfer vectors are limited. Using a transient expression assay, we have demonstrated here that the heterologous promoters of the very late genes polh and p10 from AcNPV function as efficiently in BmN cells as the BmNPV promoters. The location of the cloned foreign gene with respect to the promoter sequences was critical for achieving the highest levels of expression, following the order +35 > +1 > -3 > -8 nucleotides (nt) with respect to the polh or p10 start codons. We have successfully generated recombinant BmNPV harboring AcNPV promoters by homeologous recombination between AcNPV-based transfer vectors and BmNPV genomic DNA. Infection of BmN cell lines with recombinant BmNPV showed a temporal expression pattern, reaching very high levels in 60-72 h post infection. The recombinant BmNPV harboring the firefly luciferase-encoding gene under the control of AcNPV polh or p10 promoters, on infection of the silkworm larvae led to the synthesis of large quantities of luciferase. Such larvae emanated significant luminiscence instantaneously on administration of the substrate luciferin resulting in 'glowing silkworms'. The virus-infected larvae continued to glow for several hours and revealed the most abundant distribution of virus in the fat bodies. In larval expression also, the highest levels were achieved when the reporter gene was located at +35 nt of the polh.
Resumo:
In recent years there has been considerable interest in developing new types of gelators of organic solvents.1 Despite the recent advances, a priori design of a gelator for gelling a given solvent has remained a challenging task. Various noncovalent interactions like hydrogen-bonding,2 metal coordination3 etc. have been used as the driving force for the gelation process. A special class of cholesterol-based gelators were reported by Weiss,4 and by Shinkai.5 Gels derived from these molecules have been used for chiral recognition/sensing,6 for studying photo- and metal-responsive functions,7 and as templates to make hollow fiber silica.8 Other types of organogels have been used for designing polymerized 9 and reverse aerogels,10 and in molecular imprinting.11 Hanabusa’s group has recently reported organogels with a bile acid derivative.12 This has prompted us to disclose our results on a novel electron donor–acceptor (EDA) interaction mediated two-component13 gelator system based on the bile acid14 backbone.
Resumo:
Abstract-The success of automatic speaker recognition in laboratory environments suggests applications in forensic science for establishing the Identity of individuals on the basis of features extracted from speech. A theoretical model for such a verification scheme for continuous normaliy distributed featureIss developed. The three cases of using a) single feature, b)multipliendependent measurements of a single feature, and c)multpleindependent features are explored.The number iofndependent features needed for areliable personal identification is computed based on the theoretcal model and an expklatory study of some speech featues.
Resumo:
An adaptive learning scheme, based on a fuzzy approximation to the gradient descent method for training a pattern classifier using unlabeled samples, is described. The objective function defined for the fuzzy ISODATA clustering procedure is used as the loss function for computing the gradient. Learning is based on simultaneous fuzzy decisionmaking and estimation. It uses conditional fuzzy measures on unlabeled samples. An exponential membership function is assumed for each class, and the parameters constituting these membership functions are estimated, using the gradient, in a recursive fashion. The induced possibility of occurrence of each class is useful for estimation and is computed using 1) the membership of the new sample in that class and 2) the previously computed average possibility of occurrence of the same class. An inductive entropy measure is defined in terms of induced possibility distribution to measure the extent of learning. The method is illustrated with relevant examples.
Resumo:
The minimum cost classifier when general cost functionsare associated with the tasks of feature measurement and classification is formulated as a decision graph which does not reject class labels at intermediate stages. Noting its complexities, a heuristic procedure to simplify this scheme to a binary decision tree is presented. The optimizationof the binary tree in this context is carried out using ynamicprogramming. This technique is applied to the voiced-unvoiced-silence classification in speech processing.