62 resultados para Feature vectors

em Indian Institute of Science - Bangalore - Índia


Relevância:

70.00% 70.00%

Publicador:

Resumo:

With the availability of a huge amount of video data on various sources, efficient video retrieval tools are increasingly in demand. Video being a multi-modal data, the perceptions of ``relevance'' between the user provided query video (in case of Query-By-Example type of video search) and retrieved video clips are subjective in nature. We present an efficient video retrieval method that takes user's feedback on the relevance of retrieved videos and iteratively reformulates the input query feature vectors (QFV) for improved video retrieval. The QFV reformulation is done by a simple, but powerful feature weight optimization method based on Simultaneous Perturbation Stochastic Approximation (SPSA) technique. A video retrieval system with video indexing, searching and relevance feedback (RF) phases is built for demonstrating the performance of the proposed method. The query and database videos are indexed using the conventional video features like color, texture, etc. However, we use the comprehensive and novel methods of feature representations, and a spatio-temporal distance measure to retrieve the top M videos that are similar to the query. In feedback phase, the user activated iterative on the previously retrieved videos is used to reformulate the QFV weights (measure of importance) that reflect the user's preference, automatically. It is our observation that a few iterations of such feedback are generally sufficient for retrieving the desired video clips. The novel application of SPSA based RF for user-oriented feature weights optimization makes the proposed method to be distinct from the existing ones. The experimental results show that the proposed RF based video retrieval exhibit good performance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Downscaling to station-scale hydrologic variables from large-scale atmospheric variables simulated by general circulation models (GCMs) is usually necessary to assess the hydrologic impact of climate change. This work presents CRF-downscaling, a new probabilistic downscaling method that represents the daily precipitation sequence as a conditional random field (CRF). The conditional distribution of the precipitation sequence at a site, given the daily atmospheric (large-scale) variable sequence, is modeled as a linear chain CRF. CRFs do not make assumptions on independence of observations, which gives them flexibility in using high-dimensional feature vectors. Maximum likelihood parameter estimation for the model is performed using limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimization. Maximum a posteriori estimation is used to determine the most likely precipitation sequence for a given set of atmospheric input variables using the Viterbi algorithm. Direct classification of dry/wet days as well as precipitation amount is achieved within a single modeling framework. The model is used to project the future cumulative distribution function of precipitation. Uncertainty in precipitation prediction is addressed through a modified Viterbi algorithm that predicts the n most likely sequences. The model is applied for downscaling monsoon (June-September) daily precipitation at eight sites in the Mahanadi basin in Orissa, India, using the MIROC3.2 medium-resolution GCM. The predicted distributions at all sites show an increase in the number of wet days, and also an increase in wet day precipitation amounts. A comparison of current and future predicted probability density functions for daily precipitation shows a change in shape of the density function with decreasing probability of lower precipitation and increasing probability of higher precipitation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Acoustic modeling using mixtures of multivariate Gaussians is the prevalent approach for many speech processing problems. Computing likelihoods against a large set of Gaussians is required as a part of many speech processing systems and it is the computationally dominant phase for Large Vocabulary Continuous Speech Recognition (LVCSR) systems. We express the likelihood computation as a multiplication of matrices representing augmented feature vectors and Gaussian parameters. The computational gain of this approach over traditional methods is by exploiting the structure of these matrices and efficient implementation of their multiplication. In particular, we explore direct low-rank approximation of the Gaussian parameter matrix and indirect derivation of low-rank factors of the Gaussian parameter matrix by optimum approximation of the likelihood matrix. We show that both the methods lead to similar speedups but the latter leads to far lesser impact on the recognition accuracy. Experiments on 1,138 work vocabulary RM1 task and 6,224 word vocabulary TIMIT task using Sphinx 3.7 system show that, for a typical case the matrix multiplication based approach leads to overall speedup of 46 % on RM1 task and 115 % for TIMIT task. Our low-rank approximation methods provide a way for trading off recognition accuracy for a further increase in computational performance extending overall speedups up to 61 % for RM1 and 119 % for TIMIT for an increase of word error rate (WER) from 3.2 to 3.5 % for RM1 and for no increase in WER for TIMIT. We also express pairwise Euclidean distance computation phase in Dynamic Time Warping (DTW) in terms of matrix multiplication leading to saving of approximately of computational operations. In our experiments using efficient implementation of matrix multiplication, this leads to a speedup of 5.6 in computing the pairwise Euclidean distances and overall speedup up to 3.25 for DTW.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Acoustic modeling using mixtures of multivariate Gaussians is the prevalent approach for many speech processing problems. Computing likelihoods against a large set of Gaussians is required as a part of many speech processing systems and it is the computationally dominant phase for LVCSR systems. We express the likelihood computation as a multiplication of matrices representing augmented feature vectors and Gaussian parameters. The computational gain of this approach over traditional methods is by exploiting the structure of these matrices and efficient implementation of their multiplication.In particular, we explore direct low-rank approximation of the Gaussian parameter matrix and indirect derivation of low-rank factors of the Gaussian parameter matrix by optimum approximation of the likelihood matrix. We show that both the methods lead to similar speedups but the latter leads to far lesser impact on the recognition accuracy. Experiments on a 1138 word vocabulary RM1 task using Sphinx 3.7 system show that, for a typical case the matrix multiplication approach leads to overall speedup of 46%. Both the low-rank approximation methods increase the speedup to around 60%, with the former method increasing the word error rate (WER) from 3.2% to 6.6%, while the latter increases the WER from 3.2% to 3.5%.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we present a fast learning neural network classifier for human action recognition. The proposed classifier is a fully complex-valued neural network with a single hidden layer. The neurons in the hidden layer employ the fully complex-valued hyperbolic secant as an activation function. The parameters of the hidden layer are chosen randomly and the output weights are estimated analytically as a minimum norm least square solution to a set of linear equations. The fast leaning fully complex-valued neural classifier is used for recognizing human actions accurately. Optical flow-based features extracted from the video sequences are utilized to recognize 10 different human actions. The feature vectors are computationally simple first order statistics of the optical flow vectors, obtained from coarse to fine rectangular patches centered around the object. The results indicate the superior performance of the complex-valued neural classifier for action recognition. The superior performance of the complex neural network for action recognition stems from the fact that motion, by nature, consists of two components, one along each of the axes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents an efficient approach to the modeling and classification of vehicles using the magnetic signature of the vehicle. A database was created using the magnetic signature collected over a wide range of vehicles(cars). A vehicle is modeled as an array of magnetic dipoles. The strength of the magnetic dipole and the separation between the magnetic dipoles varies for different vehicles and is dependent on the metallic composition and configuration of the vehicle. Based on the magnetic dipole data model, we present a novel method to extract a feature vector from the magnetic signature. In the classification of vehicles, a linear support vector machine configuration is used to classify the vehicles based on the obtained feature vectors.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents an efficient approach to the modeling and classification of vehicles using the magnetic signature of the vehicle. A database was created using the magnetic signature collected over a wide range of vehicles(cars). A sensor dependent approach called as Magnetic Field Angle Model is proposed for modeling the obtained magnetic signature. Based on the data model, we present a novel method to extract the feature vector from the magnetic signature. In the classification of vehicles, a linear support vector machine configuration is used to classify the vehicles based on the obtained feature vectors.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we present a machine learning approach for subject independent human action recognition using depth camera, emphasizing the importance of depth in recognition of actions. The proposed approach uses the flow information of all 3 dimensions to classify an action. In our approach, we have obtained the 2-D optical flow and used it along with the depth image to obtain the depth flow (Z motion vectors). The obtained flow captures the dynamics of the actions in space time. Feature vectors are obtained by averaging the 3-D motion over a grid laid over the silhouette in a hierarchical fashion. These hierarchical fine to coarse windows capture the motion dynamics of the object at various scales. The extracted features are used to train a Meta-cognitive Radial Basis Function Network (McRBFN) that uses a Projection Based Learning (PBL) algorithm, referred to as PBL-McRBFN, henceforth. PBL-McRBFN begins with zero hidden neurons and builds the network based on the best human learning strategy, namely, self-regulated learning in a meta-cognitive environment. When a sample is used for learning, PBLMcRBFN uses the sample overlapping conditions, and a projection based learning algorithm to estimate the parameters of the network. The performance of PBL-McRBFN is compared to that of a Support Vector Machine (SVM) and Extreme Learning Machine (ELM) classifiers with representation of every person and action in the training and testing datasets. Performance study shows that PBL-McRBFN outperforms these classifiers in recognizing actions in 3-D. Further, a subject-independent study is conducted by leave-one-subject-out strategy and its generalization performance is tested. It is observed from the subject-independent study that McRBFN is capable of generalizing actions accurately. The performance of the proposed approach is benchmarked with Video Analytics Lab (VAL) dataset and Berkeley Multimodal Human Action Database (MHAD). (C) 2013 Elsevier Ltd. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Acoustic feature based speech (syllable) rate estimation and syllable nuclei detection are important problems in automatic speech recognition (ASR), computer assisted language learning (CALL) and fluency analysis. A typical solution for both the problems consists of two stages. The first stage involves computing a short-time feature contour such that most of the peaks of the contour correspond to the syllabic nuclei. In the second stage, the peaks corresponding to the syllable nuclei are detected. In this work, instead of the peak detection, we perform a mode-shape classification, which is formulated as a supervised binary classification problem - mode-shapes representing the syllabic nuclei as one class and remaining as the other. We use the temporal correlation and selected sub-band correlation (TCSSBC) feature contour and the mode-shapes in the TCSSBC feature contour are converted into a set of feature vectors using an interpolation technique. A support vector machine classifier is used for the classification. Experiments are performed separately using Switchboard, TIMIT and CTIMIT corpora in a five-fold cross validation setup. The average correlation coefficients for the syllable rate estimation turn out to be 0.6761, 0.6928 and 0.3604 for three corpora respectively, which outperform those obtained by the best of the existing peak detection techniques. Similarly, the average F-scores (syllable level) for the syllable nuclei detection are 0.8917, 0.8200 and 0.7637 for three corpora respectively. (C) 2016 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a multilevel inverter topology suitable for the generation of dodecagonal space vectors instead of hexagonal space vectors as in the case of conventional schemes. This feature eliminates all the 6n +/- 1 (n = odd) harmonics from the phase voltages and currents in the entire modulation range with an increase in the linear modulation range. The topology is realized by flying capacitor-based three-level inverters feeding from two ends of an open-end winding induction motor with asymmetric dc links. The flying capacitor voltages are tightly controlled throughout the modulation range using redundant switching states for any load power factor. A simple and fast carrier-based space-vector pulsewidth modulation (PWM) scheme is also proposed for the topology which utilizes only the sampled amplitudes of the reference wave for the PWM timing computation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we have proposed a simple and effective approach to classify H.264 compressed videos, by capturing orientation information from the motion vectors. Our major contribution involves computing Histogram of Oriented Motion Vectors (HOMV) for overlapping hierarchical Space-Time cubes. The Space-Time cubes selected are partially overlapped. HOMV is found to be very effective to define the motion characteristics of these cubes. We then use Bag of Features (B OF) approach to define the video as histogram of HOMV keywords, obtained using k-means clustering. The video feature, thus computed, is found to be very effective in classifying videos. We demonstrate our results with experiments on two large publicly available video database.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The minimum cost classifier when general cost functionsare associated with the tasks of feature measurement and classification is formulated as a decision graph which does not reject class labels at intermediate stages. Noting its complexities, a heuristic procedure to simplify this scheme to a binary decision tree is presented. The optimizationof the binary tree in this context is carried out using ynamicprogramming. This technique is applied to the voiced-unvoiced-silence classification in speech processing.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, a novel 12-sided polygonal space vector structure is proposed for an induction motor drive. The space vector pattern presented in this paper consists of two 12-sided concentric polygons with the outer polygon having a radius double the inner one. As compared to previously reported 12-sided polygonal space vector structures, this paper subdivides the space vector plane into smaller sized triangles. This helps in reducing the switching frequency of the inverters without deteriorating the output voltage quality. It also reduces the device ratings and dv/dt stress on the devices to half. At the same time, other benefits obtained from the existing 12-sided space vector structure, such as increased linear modulation range and complete elimination of 5th and 7th order harmonics in the phase voltage, are also retained in this paper. The space vector structure is realized by feeding an open-end induction motor with two conventional three-level neutral point clamped (NPC) inverters with asymmetric isolated dc link voltage sources. The neutral point voltage fluctuations in the three-level NPC inverters are eliminated by utilizing the switching state multiplicities for a space vector point. The pulsewidth modulation timings are calculated using sampled reference waveform amplitudes and are explained in detail in this paper. Experimental verification on a laboratory prototype shows that this configuration may be considered suitable for high power drives.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The concept of feature selection in a nonparametric unsupervised learning environment is practically undeveloped because no true measure for the effectiveness of a feature exists in such an environment. The lack of a feature selection phase preceding the clustering process seriously affects the reliability of such learning. New concepts such as significant features, level of significance of features, and immediate neighborhood are introduced which result in meeting implicitly the need for feature slection in the context of clustering techniques.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The concept of feature selection in a nonparametric unsupervised learning environment is practically undeveloped because no true measure for the effectiveness of a feature exists in such an environment. The lack of a feature selection phase preceding the clustering process seriously affects the reliability of such learning. New concepts such as significant features, level of significance of features, and immediate neighborhood are introduced which result in meeting implicitly the need for feature slection in the context of clustering techniques.