11 resultados para Bag-of-visual Words
em Indian Institute of Science - Bangalore - Índia
Resumo:
How the brain maintains perceptual continuity across eye movements that yield discontinuous snapshots of the world is still poorly understood. In this study, we adapted a framework from the dual-task paradigm, well suited to reveal bottlenecks in mental processing, to study how information is processed across sequential saccades. The pattern of RTs allowed us to distinguish among three forms of trans-saccadic processing (no trans-saccadic processing, trans-saccadic visual processing and trans-saccadic visual processing and saccade planning models). Using a cued double-step saccade task, we show that even though saccade execution is a processing bottleneck, limiting access to incoming visual information, partial visual and motor processing that occur prior to saccade execution is used to guide the next eye movement. These results provide insights into how the oculomotor system is designed to process information across multiple fixations that occur during natural scanning.
Resumo:
An action is typically composed of different parts of the object moving in particular sequences. The presence of different motions (represented as a 1D histogram) has been used in the traditional bag-of-words (BoW) approach for recognizing actions. However the interactions among the motions also form a crucial part of an action. Different object-parts have varying degrees of interactions with the other parts during an action cycle. It is these interactions we want to quantify in order to bring in additional information about the actions. In this paper we propose a causality based approach for quantifying the interactions to aid action classification. Granger causality is used to compute the cause and effect relationships for pairs of motion trajectories of a video. A 2D histogram descriptor for the video is constructed using these pairwise measures. Our proposed method of obtaining pairwise measures for videos is also applicable for large datasets. We have conducted experiments on challenging action recognition databases such as HMDB51 and UCF50 and shown that our causality descriptor helps in encoding additional information regarding the actions and performs on par with the state-of-the art approaches. Due to the complementary nature, a further increase in performance can be observed by combining our approach with state-of-the-art approaches.
Resumo:
Simple formalized rules are proposed for automatic phonetic transcription of Tamil words into Roman script. These rules are syntax-directed and require a one-symbol look-ahead facility and hence easily automated in a digital computer. Some suggestions are also put forth for the linearization of Tamil script for handling these by modern machinery.
Resumo:
The neural network finds its application in many image denoising applications because of its inherent characteristics such as nonlinear mapping and self-adaptiveness. The design of filters largely depends on the a-priori knowledge about the type of noise. Due to this, standard filters are application and image specific. Widely used filtering algorithms reduce noisy artifacts by smoothing. However, this operation normally results in smoothing of the edges as well. On the other hand, sharpening filters enhance the high frequency details making the image non-smooth. An integrated general approach to design a finite impulse response filter based on principal component neural network (PCNN) is proposed in this study for image filtering, optimized in the sense of visual inspection and error metric. This algorithm exploits the inter-pixel correlation by iteratively updating the filter coefficients using PCNN. This algorithm performs optimal smoothing of the noisy image by preserving high and low frequency features. Evaluation results show that the proposed filter is robust under various noise distributions. Further, the number of unknown parameters is very few and most of these parameters are adaptively obtained from the processed image.
Resumo:
Our everyday visual experience frequently involves searching for objects in clutter. Why are some searches easy and others hard? It is generally believed that the time taken to find a target increases as it becomes similar to its surrounding distractors. Here, I show that while this is qualitatively true, the exact relationship is in fact not linear. In a simple search experiment, when subjects searched for a bar differing in orientation from its distractors, search time was inversely proportional to the angular difference in orientation. Thus, rather than taking search reaction time (RT) to be a measure of target-distractor similarity, we can literally turn search time on its head (i.e. take its reciprocal 1/RT) to obtain a measure of search dissimilarity that varies linearly over a large range of target-distractor differences. I show that this dissimilarity measure has the properties of a distance metric, and report two interesting insights come from this measure: First, for a large number of searches, search asymmetries are relatively rare and when they do occur, differ by a fixed distance. Second, search distances can be used to elucidate object representations that underlie search - for example, these representations are roughly invariant to three-dimensional view. Finally, search distance has a straightforward interpretation in the context of accumulator models of search, where it is proportional to the discriminative signal that is integrated to produce a response. This is consistent with recent studies that have linked this distance to neuronal discriminability in visual cortex. Thus, while search time remains the more direct measure of visual search, its reciprocal also has the potential for interesting and novel insights. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
We propose a new paradigm for displaying comments: showing comments alongside parts of the article they correspond to. We evaluate the effectiveness of various approaches for this task and show that a combination of bag of words and topic models performs the best.
Resumo:
There are many popular models available for classification of documents like Naïve Bayes Classifier, k-Nearest Neighbors and Support Vector Machine. In all these cases, the representation is based on the “Bag of words” model. This model doesn't capture the actual semantic meaning of a word in a particular document. Semantics are better captured by proximity of words and their occurrence in the document. We propose a new “Bag of Phrases” model to capture this discriminative power of phrases for text classification. We present a novel algorithm to extract phrases from the corpus using the well known topic model, Latent Dirichlet Allocation(LDA), and to integrate them in vector space model for classification. Experiments show a better performance of classifiers with the new Bag of Phrases model against related representation models.
Resumo:
In this paper, we have proposed a simple and effective approach to classify H.264 compressed videos, by capturing orientation information from the motion vectors. Our major contribution involves computing Histogram of Oriented Motion Vectors (HOMV) for overlapping hierarchical Space-Time cubes. The Space-Time cubes selected are partially overlapped. HOMV is found to be very effective to define the motion characteristics of these cubes. We then use Bag of Features (B OF) approach to define the video as histogram of HOMV keywords, obtained using k-means clustering. The video feature, thus computed, is found to be very effective in classifying videos. We demonstrate our results with experiments on two large publicly available video database.
Resumo:
Objective: The aim of this study is to validate the applicability of the PolyVinyliDene Fluoride (PVDF) nasal sensor to assess the nasal airflow, in healthy subjects and patients with nasal obstruction and to correlate the results with the score of Visual Analogue Scale (VAS). Methods: PVDF nasal sensor and VAS measurements were carried out in 50 subjects (25-healthy subjects and 25 patients). The VAS score of nasal obstruction and peak-to-peak amplitude (Vp-p) of nasal cycle measured by PVDF nasal sensors were analyzed for right nostril (RN) and left nostril (LN) in both the groups. Spearman's rho correlation was calculated. The relationship between PVDF nasal sensor measurements and severity of nasal obstruction (VAS score) were assessed by ANOVA. Results: In healthy group, the measurement of nasal airflow by PVDF nasal sensor for RN and LN were found to be 51.14 +/- 5.87% and 48.85 +/- 5.87%, respectively. In patient group, PVDF nasal sensor indicated lesser nasal airflow in the blocked nostrils (RN: 23.33 +/- 10.54% and LN: 32.24 +/- 11.54%). Moderate correlation was observed in healthy group (r = 0.710, p < 0.001 for RN and r = 0.651, p < 0.001 for LN), and moderate to strong correlation in patient group (r = 0.751, p < 0.01 for RN and r = 0.885, p < 0.0001 for LN). Conclusion: PVDF nasal sensor method is a newly developed technique for measuring the nasal airflow. Moderate to strong correlation was observed between PVDF nasal sensor data and VAS scores for nasal obstruction. In our present study, PVDF nasal sensor technique successfully differentiated between healthy subjects and patients with nasal obstruction. Additionally, it can also assess severity of nasal obstruction in comparison with VAS. Thus, we propose that the PVDF nasal sensor technique could be used as a new diagnostic method to evaluate nasal obstruction in routine clinical practice. (C) 2015 Elsevier Inc. All rights reserved.
Resumo:
In optical character recognition of very old books, the recognition accuracy drops mainly due to the merging or breaking of characters. In this paper, we propose the first algorithm to segment merged Kannada characters by using a hypothesis to select the positions to be cut. This method searches for the best possible positions to segment, by taking into account the support vector machine classifier's recognition score and the validity of the aspect ratio (width to height ratio) of the segments between every pair of cut positions. The hypothesis to select the cut position is based on the fact that a concave surface exists above and below the touching portion. These concave surfaces are noted down by tracing the valleys in the top contour of the image and similarly doing it for the image rotated upside-down. The cut positions are then derived as closely matching valleys of the original and the rotated images. Our proposed segmentation algorithm works well for different font styles, shapes and sizes better than the existing vertical projection profile based segmentation. The proposed algorithm has been tested on 1125 different word images, each containing multiple merged characters, from an old Kannada book and 89.6% correct segmentation is achieved and the character recognition accuracy of merged words is 91.2%. A few points of merge are still missed due to the absence of a matched valley due to the specific shapes of the particular characters meeting at the merges.
Resumo:
Among the human factors that influence safe driving, visual skills of the driver can be considered fundamental. This study mainly focuses on investigating the effect of visual functions of drivers in India on their road crash involvement. Experiments were conducted to assess vision functions of Indian licensed drivers belonging to various organizations, age groups and driving experience. The test results were further related to the crash involvement histories of drivers through statistical tools. A generalized linear model was developed to ascertain the influence of these traits on propensity of crash involvement. Among the sampled drivers, colour vision, vertical field of vision, depth perception, contrast sensitivity, acuity and phoria were found to influence their crash involvement rates. In India, there are no efficient standards and testing methods to assess the visual capabilities of drivers during their licensing process and this study highlights the need for the same.