919 resultados para Image-Based Visual Hull
Resumo:
Visual recognition is a fundamental research topic in computer vision. This dissertation explores datasets, features, learning, and models used for visual recognition. In order to train visual models and evaluate different recognition algorithms, this dissertation develops an approach to collect object image datasets on web pages using an analysis of text around the image and of image appearance. This method exploits established online knowledge resources (Wikipedia pages for text; Flickr and Caltech data sets for images). The resources provide rich text and object appearance information. This dissertation describes results on two datasets. The first is Berg’s collection of 10 animal categories; on this dataset, we significantly outperform previous approaches. On an additional set of 5 categories, experimental results show the effectiveness of the method. Images are represented as features for visual recognition. This dissertation introduces a text-based image feature and demonstrates that it consistently improves performance on hard object classification problems. The feature is built using an auxiliary dataset of images annotated with tags, downloaded from the Internet. Image tags are noisy. The method obtains the text features of an unannotated image from the tags of its k-nearest neighbors in this auxiliary collection. A visual classifier presented with an object viewed under novel circumstances (say, a new viewing direction) must rely on its visual examples. This text feature may not change, because the auxiliary dataset likely contains a similar picture. While the tags associated with images are noisy, they are more stable when appearance changes. The performance of this feature is tested using PASCAL VOC 2006 and 2007 datasets. This feature performs well; it consistently improves the performance of visual object classifiers, and is particularly effective when the training dataset is small. With more and more collected training data, computational cost becomes a bottleneck, especially when training sophisticated classifiers such as kernelized SVM. This dissertation proposes a fast training algorithm called Stochastic Intersection Kernel Machine (SIKMA). This proposed training method will be useful for many vision problems, as it can produce a kernel classifier that is more accurate than a linear classifier, and can be trained on tens of thousands of examples in two minutes. It processes training examples one by one in a sequence, so memory cost is no longer the bottleneck to process large scale datasets. This dissertation applies this approach to train classifiers of Flickr groups with many group training examples. The resulting Flickr group prediction scores can be used to measure image similarity between two images. Experimental results on the Corel dataset and a PASCAL VOC dataset show the learned Flickr features perform better on image matching, retrieval, and classification than conventional visual features. Visual models are usually trained to best separate positive and negative training examples. However, when recognizing a large number of object categories, there may not be enough training examples for most objects, due to the intrinsic long-tailed distribution of objects in the real world. This dissertation proposes an approach to use comparative object similarity. The key insight is that, given a set of object categories which are similar and a set of categories which are dissimilar, a good object model should respond more strongly to examples from similar categories than to examples from dissimilar categories. This dissertation develops a regularized kernel machine algorithm to use this category dependent similarity regularization. Experiments on hundreds of categories show that our method can make significant improvement for categories with few or even no positive examples.
Resumo:
Hand detection on images has important applications on person activities recognition. This thesis focuses on PASCAL Visual Object Classes (VOC) system for hand detection. VOC has become a popular system for object detection, based on twenty common objects, and has been released with a successful deformable parts model in VOC2007. A hand detection on an image is made when the system gets a bounding box which overlaps with at least 50% of any ground truth bounding box for a hand on the image. The initial average precision of this detector is around 0.215 compared with a state-of-art of 0.104; however, color and frequency features for detected bounding boxes contain important information for re-scoring, and the average precision can be improved to 0.218 with these features. Results show that these features help on getting higher precision for low recall, even though the average precision is similar.
Resumo:
With the rise of smart phones, lifelogging devices (e.g. Google Glass) and popularity of image sharing websites (e.g. Flickr), users are capturing and sharing every aspect of their life online producing a wealth of visual content. Of these uploaded images, the majority are poorly annotated or exist in complete semantic isolation making the process of building retrieval systems difficult as one must firstly understand the meaning of an image in order to retrieve it. To alleviate this problem, many image sharing websites offer manual annotation tools which allow the user to “tag” their photos, however, these techniques are laborious and as a result have been poorly adopted; Sigurbjörnsson and van Zwol (2008) showed that 64% of images uploaded to Flickr are annotated with < 4 tags. Due to this, an entire body of research has focused on the automatic annotation of images (Hanbury, 2008; Smeulders et al., 2000; Zhang et al., 2012a) where one attempts to bridge the semantic gap between an image’s appearance and meaning e.g. the objects present. Despite two decades of research the semantic gap still largely exists and as a result automatic annotation models often offer unsatisfactory performance for industrial implementation. Further, these techniques can only annotate what they see, thus ignoring the “bigger picture” surrounding an image (e.g. its location, the event, the people present etc). Much work has therefore focused on building photo tag recommendation (PTR) methods which aid the user in the annotation process by suggesting tags related to those already present. These works have mainly focused on computing relationships between tags based on historical images e.g. that NY and timessquare co-exist in many images and are therefore highly correlated. However, tags are inherently noisy, sparse and ill-defined often resulting in poor PTR accuracy e.g. does NY refer to New York or New Year? This thesis proposes the exploitation of an image’s context which, unlike textual evidences, is always present, in order to alleviate this ambiguity in the tag recommendation process. Specifically we exploit the “what, who, where, when and how” of the image capture process in order to complement textual evidences in various photo tag recommendation and retrieval scenarios. In part II, we combine text, content-based (e.g. # of faces present) and contextual (e.g. day-of-the-week taken) signals for tag recommendation purposes, achieving up to a 75% improvement to precision@5 in comparison to a text-only TF-IDF baseline. We then consider external knowledge sources (i.e. Wikipedia & Twitter) as an alternative to (slower moving) Flickr in order to build recommendation models on, showing that similar accuracy could be achieved on these faster moving, yet entirely textual, datasets. In part II, we also highlight the merits of diversifying tag recommendation lists before discussing at length various problems with existing automatic image annotation and photo tag recommendation evaluation collections. In part III, we propose three new image retrieval scenarios, namely “visual event summarisation”, “image popularity prediction” and “lifelog summarisation”. In the first scenario, we attempt to produce a rank of relevant and diverse images for various news events by (i) removing irrelevant images such memes and visual duplicates (ii) before semantically clustering images based on the tweets in which they were originally posted. Using this approach, we were able to achieve over 50% precision for images in the top 5 ranks. In the second retrieval scenario, we show that by combining contextual and content-based features from images, we are able to predict if it will become “popular” (or not) with 74% accuracy, using an SVM classifier. Finally, in chapter 9 we employ blur detection and perceptual-hash clustering in order to remove noisy images from lifelogs, before combining visual and geo-temporal signals in order to capture a user’s “key moments” within their day. We believe that the results of this thesis show an important step towards building effective image retrieval models when there lacks sufficient textual content (i.e. a cold start).
Resumo:
Most approaches to stereo visual odometry reconstruct the motion based on the tracking of point features along a sequence of images. However, in low-textured scenes it is often difficult to encounter a large set of point features, or it may happen that they are not well distributed over the image, so that the behavior of these algorithms deteriorates. This paper proposes a probabilistic approach to stereo visual odometry based on the combination of both point and line segment that works robustly in a wide variety of scenarios. The camera motion is recovered through non-linear minimization of the projection errors of both point and line segment features. In order to effectively combine both types of features, their associated errors are weighted according to their covariance matrices, computed from the propagation of Gaussian distribution errors in the sensor measurements. The method, of course, is computationally more expensive that using only one type of feature, but still can run in real-time on a standard computer and provides interesting advantages, including a straightforward integration into any probabilistic framework commonly employed in mobile robotics.
Resumo:
The new generation of artificial satellites is providing a huge amount of Earth observation images whose exploitation can report invaluable benefits, both economical and environmental. However, only a small fraction of this data volume has been analyzed, mainly due to the large human resources needed for that task. In this sense, the development of unsupervised methodologies for the analysis of these images is a priority. In this work, a new unsupervised segmentation algorithm for satellite images is proposed. This algorithm is based on the rough-set theory, and it is inspired by a previous segmentation algorithm defined in the RGB color domain. The main contributions of the new algorithm are: (i) extending the original algorithm to four spectral bands; (ii) the concept of the superpixel is used in order to define the neighborhood similarity of a pixel adapted to the local characteristics of each image; (iii) and two new region merged strategies are proposed and evaluated in order to establish the final number of regions in the segmented image. The experimental results show that the proposed approach improves the results provided by the original method when both are applied to satellite images with different spectral and spatial resolutions.
Resumo:
In this thesis, an image enhancement application is developed for low-vision patients when they use iPhones to see images/watch videos. The thesis has two contributions. The first contribution is the new image enhancement algorithm which combines human vision features. The new image enhancement algorithm is modified from a wavelet transform based image enhancement algorithm developed by Dr. Jinshan Tang. Different from the original algorithm, the new image enhancement algorithm combines human visual feature into the algorithm and thus can make the new algorithm more effective. Experimental simulation results show that the proposed algorithm has better visual results than the algorithm without combining visual features. The second contribution of this thesis is the development of a mobile image enhancement application. In this application, users with low-vision can see clearer images on an iPhone which is installed with the application I have developed.
Resumo:
Radio Simultaneous Location and Mapping (SLAM) consists of the simultaneous tracking of the target and estimation of the surrounding environment, to build a map and estimate the target movements within it. It is an increasingly exploited technique for automotive applications, in order to improve the localization of obstacles and the target relative movement with respect to them, for emergency situations, for example when it is necessary to explore (with a drone or a robot) environments with a limited visibility, or for personal radar applications, thanks to its versatility and cheapness. Until today, these systems were based on light detection and ranging (lidar) or visual cameras, high-accuracy and expensive approaches that are limited to specific environments and weather conditions. Instead, in case of smoke, fog or simply darkness, radar-based systems can operate exactly in the same way. In this thesis activity, the Fourier-Mellin algorithm is analyzed and implemented, to verify the applicability to Radio SLAM, in which the radar frames can be treated as images and the radar motion between consecutive frames can be covered with registration. Furthermore, a simplified version of that algorithm is proposed, in order to solve the problems of the Fourier-Mellin algorithm when working with real radar images and improve the performance. The INRAS RBK2, a MIMO 2x16 mmWave radar, is used for experimental acquisitions, consisting of multiple tests performed in Lab-E of the Cesena Campus, University of Bologna. The different performances of Fourier-Mellin and its simplified version are compared also with the MatchScan algorithm, a classic algorithm for SLAM systems.
Resumo:
The effects of ionic strength on ions in aqueous solutions are quite relevant, especially for biochemical systems, in which proteins and amino acids are involved. The teaching of this topic and more specifically, the Debye-Hückel limiting law, is central in chemistry undergraduate courses. In this work, we present a description of an experimental procedure based on the color change of aqueous solutions of bromocresol green (BCG), driven by addition of electrolyte. The contribution of charge product (z+|z-|) to the Debye-Hückel limiting law is demonstrated when the effects of NaCl and Na2SO4 on the color of BCG solutions are compared.
Resumo:
PURPOSE: To verify perceptions and conduct of students with visual impairment regarding devices and equipment utilized in schooling process. METHODS: A transversal descriptive study on a population of 12-year-old or older students in schooling process, affected by congenital or acquired visual impairment, inserted in the government teaching system of Campinas during the year 2000. An interview quiz, created based on an exploratory study was applied. RESULTS: A group of 26 students, 46% of them with low vision and 53.8% affected by blindness was obtained. Most of the students were from fundamental teaching courses (65.4%), studying in schools with classrooms provided with devices (73.1%). Among the resources used in reading and writing activities, 94.1% of the students reported they used the Braille system and 81.8% reported that the reading subject was dictated by a colleague. Most of the students with low vision wore glasses (91.7%), and 33.3% utilized a magnifying glass as optical devices. Among the non-optical devices, the most common were the environmental ones, getting closer to the blackboard (75.0%) and to the window (66.7%) for better lighting. CONCLUSIONS: It became evident that students with low vision eye-sight made use of devices meant for bearers of blindness, such as applying the Braille system. A reduced number of low vision students making use of optical and non-optical devices applicable to their problems were observed, indicating a probable unawareness of their visual potential and the appropriate devices to improve efficiency.
Resumo:
Universidade Estadual de Campinas . Faculdade de Educação Física
Resumo:
Universidade Estadual de Campinas . Faculdade de Educação Física
Resumo:
Universidade Estadual de Campinas . Faculdade de Educação Física
Resumo:
Universidade Estadual de Campinas . Faculdade de Educação Física