6 resultados para Visual Object Identification Task
em DRUM (Digital Repository at the University of Maryland)
Resumo:
Early human development offers a unique perspective in investigating the potential cognitive and social implications of action and perception. Specifically, during infancy, action production and action perception undergo foundational developments. One essential component to examine developments in action processing is the analysis of others’ actions as meaningful and goal-directed. Little research, however, has examined the underlying neural systems that may be associated with emerging action and perception abilities, and infants’ learning of goal-directed actions. The current study examines the mu rhythm—a brain oscillation found in the electroencephalogram (EEG)—that has been associated with action and perception. Specifically, the present work investigates whether the mu signal is related to 9-month-olds’ learning of a novel goal-directed means-end task. The findings of this study demonstrate a relation between variations in mu rhythm activity and infants’ ability to learn a novel goal-directed means-end action task (compared to a visual pattern learning task used as a comparison task). Additionally, we examined the relations between standardized assessments of early motor competence, infants’ ability to learn a novel goal-directed task, and mu rhythm activity. We found that: 1a) mu rhythm activity during observation of a grasp uniquely predicted infants’ learning on the cane training task, 1b) mu rhythm activity during observation and execution of a grasp did not uniquely predict infants’ learning on the visual pattern learning task (comparison learning task), 2) infants’ motor competence did not predict infants’ learning on the cane training task, 3) mu rhythm activity during observation and execution was not related to infants’ measure of motor competence, and 4) mu rhythm activity did not predict infants’ learning on the cane task above and beyond infants’ motor competence. The results from this study demonstrate that mu rhythm activity is a sensitive measure to detect individual differences in infants’ action and perception abilities, specifically their learning of a novel goal-directed action.
Resumo:
The goal of image retrieval and matching is to find and locate object instances in images from a large-scale image database. While visual features are abundant, how to combine them to improve performance by individual features remains a challenging task. In this work, we focus on leveraging multiple features for accurate and efficient image retrieval and matching. We first propose two graph-based approaches to rerank initially retrieved images for generic image retrieval. In the graph, vertices are images while edges are similarities between image pairs. Our first approach employs a mixture Markov model based on a random walk model on multiple graphs to fuse graphs. We introduce a probabilistic model to compute the importance of each feature for graph fusion under a naive Bayesian formulation, which requires statistics of similarities from a manually labeled dataset containing irrelevant images. To reduce human labeling, we further propose a fully unsupervised reranking algorithm based on a submodular objective function that can be efficiently optimized by greedy algorithm. By maximizing an information gain term over the graph, our submodular function favors a subset of database images that are similar to query images and resemble each other. The function also exploits the rank relationships of images from multiple ranked lists obtained by different features. We then study a more well-defined application, person re-identification, where the database contains labeled images of human bodies captured by multiple cameras. Re-identifications from multiple cameras are regarded as related tasks to exploit shared information. We apply a novel multi-task learning algorithm using both low level features and attributes. A low rank attribute embedding is joint learned within the multi-task learning formulation to embed original binary attributes to a continuous attribute space, where incorrect and incomplete attributes are rectified and recovered. To locate objects in images, we design an object detector based on object proposals and deep convolutional neural networks (CNN) in view of the emergence of deep networks. We improve a Fast RCNN framework and investigate two new strategies to detect objects accurately and efficiently: scale-dependent pooling (SDP) and cascaded rejection classifiers (CRC). The SDP improves detection accuracy by exploiting appropriate convolutional features depending on the scale of input object proposals. The CRC effectively utilizes convolutional features and greatly eliminates negative proposals in a cascaded manner, while maintaining a high recall for true objects. The two strategies together improve the detection accuracy and reduce the computational cost.
Resumo:
Increasing the size of training data in many computer vision tasks has shown to be very effective. Using large scale image datasets (e.g. ImageNet) with simple learning techniques (e.g. linear classifiers) one can achieve state-of-the-art performance in object recognition compared to sophisticated learning techniques on smaller image sets. Semantic search on visual data has become very popular. There are billions of images on the internet and the number is increasing every day. Dealing with large scale image sets is intense per se. They take a significant amount of memory that makes it impossible to process the images with complex algorithms on single CPU machines. Finding an efficient image representation can be a key to attack this problem. A representation being efficient is not enough for image understanding. It should be comprehensive and rich in carrying semantic information. In this proposal we develop an approach to computing binary codes that provide a rich and efficient image representation. We demonstrate several tasks in which binary features can be very effective. We show how binary features can speed up large scale image classification. We present learning techniques to learn the binary features from supervised image set (With different types of semantic supervision; class labels, textual descriptions). We propose several problems that are very important in finding and using efficient image representation.
Resumo:
Sequences of timestamped events are currently being generated across nearly every domain of data analytics, from e-commerce web logging to electronic health records used by doctors and medical researchers. Every day, this data type is reviewed by humans who apply statistical tests, hoping to learn everything they can about how these processes work, why they break, and how they can be improved upon. To further uncover how these processes work the way they do, researchers often compare two groups, or cohorts, of event sequences to find the differences and similarities between outcomes and processes. With temporal event sequence data, this task is complex because of the variety of ways single events and sequences of events can differ between the two cohorts of records: the structure of the event sequences (e.g., event order, co-occurring events, or frequencies of events), the attributes about the events and records (e.g., gender of a patient), or metrics about the timestamps themselves (e.g., duration of an event). Running statistical tests to cover all these cases and determining which results are significant becomes cumbersome. Current visual analytics tools for comparing groups of event sequences emphasize a purely statistical or purely visual approach for comparison. Visual analytics tools leverage humans' ability to easily see patterns and anomalies that they were not expecting, but is limited by uncertainty in findings. Statistical tools emphasize finding significant differences in the data, but often requires researchers have a concrete question and doesn't facilitate more general exploration of the data. Combining visual analytics tools with statistical methods leverages the benefits of both approaches for quicker and easier insight discovery. Integrating statistics into a visualization tool presents many challenges on the frontend (e.g., displaying the results of many different metrics concisely) and in the backend (e.g., scalability challenges with running various metrics on multi-dimensional data at once). I begin by exploring the problem of comparing cohorts of event sequences and understanding the questions that analysts commonly ask in this task. From there, I demonstrate that combining automated statistics with an interactive user interface amplifies the benefits of both types of tools, thereby enabling analysts to conduct quicker and easier data exploration, hypothesis generation, and insight discovery. The direct contributions of this dissertation are: (1) a taxonomy of metrics for comparing cohorts of temporal event sequences, (2) a statistical framework for exploratory data analysis with a method I refer to as high-volume hypothesis testing (HVHT), (3) a family of visualizations and guidelines for interaction techniques that are useful for understanding and parsing the results, and (4) a user study, five long-term case studies, and five short-term case studies which demonstrate the utility and impact of these methods in various domains: four in the medical domain, one in web log analysis, two in education, and one each in social networks, sports analytics, and security. My dissertation contributes an understanding of how cohorts of temporal event sequences are commonly compared and the difficulties associated with applying and parsing the results of these metrics. It also contributes a set of visualizations, algorithms, and design guidelines for balancing automated statistics with user-driven analysis to guide users to significant, distinguishing features between cohorts. This work opens avenues for future research in comparing two or more groups of temporal event sequences, opening traditional machine learning and data mining techniques to user interaction, and extending the principles found in this dissertation to data types beyond temporal event sequences.
Resumo:
Object recognition has long been a core problem in computer vision. To improve object spatial support and speed up object localization for object recognition, generating high-quality category-independent object proposals as the input for object recognition system has drawn attention recently. Given an image, we generate a limited number of high-quality and category-independent object proposals in advance and used as inputs for many computer vision tasks. We present an efficient dictionary-based model for image classification task. We further extend the work to a discriminative dictionary learning method for tensor sparse coding. In the first part, a multi-scale greedy-based object proposal generation approach is presented. Based on the multi-scale nature of objects in images, our approach is built on top of a hierarchical segmentation. We first identify the representative and diverse exemplar clusters within each scale. Object proposals are obtained by selecting a subset from the multi-scale segment pool via maximizing a submodular objective function, which consists of a weighted coverage term, a single-scale diversity term and a multi-scale reward term. The weighted coverage term forces the selected set of object proposals to be representative and compact; the single-scale diversity term encourages choosing segments from different exemplar clusters so that they will cover as many object patterns as possible; the multi-scale reward term encourages the selected proposals to be discriminative and selected from multiple layers generated by the hierarchical image segmentation. The experimental results on the Berkeley Segmentation Dataset and PASCAL VOC2012 segmentation dataset demonstrate the accuracy and efficiency of our object proposal model. Additionally, we validate our object proposals in simultaneous segmentation and detection and outperform the state-of-art performance. To classify the object in the image, we design a discriminative, structural low-rank framework for image classification. We use a supervised learning method to construct a discriminative and reconstructive dictionary. By introducing an ideal regularization term, we perform low-rank matrix recovery for contaminated training data from all categories simultaneously without losing structural information. A discriminative low-rank representation for images with respect to the constructed dictionary is obtained. With semantic structure information and strong identification capability, this representation is good for classification tasks even using a simple linear multi-classifier.
Resumo:
According to a traditional rationalist proposal, it is possible to attain knowledge of certain necessary truths by means of insight—an epistemic mental act that combines the 'presentational' character of perception with the a priori status usually reserved for discursive reasoning. In this dissertation, I defend the insight proposal in relation to a specific subject matter: elementary Euclidean plane geometry, as set out in Book I of Euclid's Elements. In particular, I argue that visualizations and visual experiences of diagrams allow human subjects to grasp truths of geometry by means of visual insight. In the first two chapters, I provide an initial defense of the geometrical insight proposal, drawing on a novel interpretation of Plato's Meno to motivate the view and to reply to some objections. In the remaining three chapters, I provide an account of the psychological underpinnings of geometrical insight, a task that requires considering the psychology of visual imagery alongside the details of Euclid's geometrical system. One important challenge is to explain how basic features of human visual representations can serve to ground our intuitive grasp of Euclid's postulates and other initial assumptions. A second challenge is to explain how we are able to grasp general theorems by considering diagrams that depict only special cases. I argue that both of these challenges can be met by an account that regards geometrical insight as based in visual experiences involving the combined deployment of two varieties of 'dynamic' visual imagery: one that allows the subject to visually rehearse spatial transformations of a figure's parts, and another that allows the subject to entertain alternative ways of structurally integrating the figure as a whole. It is the interplay between these two forms of dynamic imagery that enables a visual experience of a diagram, suitably animated in visual imagination, to justify belief in the propositions of Euclid’s geometry. The upshot is a novel dynamic imagery account that explains how intuitive knowledge of elementary Euclidean plane geometry can be understood as grounded in visual insight.