920 resultados para decoupled image-based visual servoing
Resumo:
Abstract : Images acquired from unmanned aerial vehicles (UAVs) can provide data with unprecedented spatial and temporal resolution for three-dimensional (3D) modeling. Solutions developed for this purpose are mainly operating based on photogrammetry concepts, namely UAV-Photogrammetry Systems (UAV-PS). Such systems are used in applications where both geospatial and visual information of the environment is required. These applications include, but are not limited to, natural resource management such as precision agriculture, military and police-related services such as traffic-law enforcement, precision engineering such as infrastructure inspection, and health services such as epidemic emergency management. UAV-photogrammetry systems can be differentiated based on their spatial characteristics in terms of accuracy and resolution. That is some applications, such as precision engineering, require high-resolution and high-accuracy information of the environment (e.g. 3D modeling with less than one centimeter accuracy and resolution). In other applications, lower levels of accuracy might be sufficient, (e.g. wildlife management needing few decimeters of resolution). However, even in those applications, the specific characteristics of UAV-PSs should be well considered in the steps of both system development and application in order to yield satisfying results. In this regard, this thesis presents a comprehensive review of the applications of unmanned aerial imagery, where the objective was to determine the challenges that remote-sensing applications of UAV systems currently face. This review also allowed recognizing the specific characteristics and requirements of UAV-PSs, which are mostly ignored or not thoroughly assessed in recent studies. Accordingly, the focus of the first part of this thesis is on exploring the methodological and experimental aspects of implementing a UAV-PS. The developed system was extensively evaluated for precise modeling of an open-pit gravel mine and performing volumetric-change measurements. This application was selected for two main reasons. Firstly, this case study provided a challenging environment for 3D modeling, in terms of scale changes, terrain relief variations as well as structure and texture diversities. Secondly, open-pit-mine monitoring demands high levels of accuracy, which justifies our efforts to improve the developed UAV-PS to its maximum capacities. The hardware of the system consisted of an electric-powered helicopter, a high-resolution digital camera, and an inertial navigation system. The software of the system included the in-house programs specifically designed for camera calibration, platform calibration, system integration, onboard data acquisition, flight planning and ground control point (GCP) detection. The detailed features of the system are discussed in the thesis, and solutions are proposed in order to enhance the system and its photogrammetric outputs. The accuracy of the results was evaluated under various mapping conditions, including direct georeferencing and indirect georeferencing with different numbers, distributions and types of ground control points. Additionally, the effects of imaging configuration and network stability on modeling accuracy were assessed. The second part of this thesis concentrates on improving the techniques of sparse and dense reconstruction. The proposed solutions are alternatives to traditional aerial photogrammetry techniques, properly adapted to specific characteristics of unmanned, low-altitude imagery. Firstly, a method was developed for robust sparse matching and epipolar-geometry estimation. The main achievement of this method was its capacity to handle a very high percentage of outliers (errors among corresponding points) with remarkable computational efficiency (compared to the state-of-the-art techniques). Secondly, a block bundle adjustment (BBA) strategy was proposed based on the integration of intrinsic camera calibration parameters as pseudo-observations to Gauss-Helmert model. The principal advantage of this strategy was controlling the adverse effect of unstable imaging networks and noisy image observations on the accuracy of self-calibration. The sparse implementation of this strategy was also performed, which allowed its application to data sets containing a lot of tie points. Finally, the concepts of intrinsic curves were revisited for dense stereo matching. The proposed technique could achieve a high level of accuracy and efficiency by searching only through a small fraction of the whole disparity search space as well as internally handling occlusions and matching ambiguities. These photogrammetric solutions were extensively tested using synthetic data, close-range images and the images acquired from the gravel-pit mine. Achieving absolute 3D mapping accuracy of 11±7 mm illustrated the success of this system for high-precision modeling of the environment.
Resumo:
PURPOSE: To analyze the outcomes of intracorneal ring segment (ICRS) implantation for the treatment of keratoconus based on preoperative visual impairment. DESIGN: Multicenter, retrospective, nonrandomized study. METHODS: A total of 611 eyes of 361 keratoconic patients were evaluated. Subjects were classified according to their preoperative corrected distance visual acuity (CDVA) into 5 different groups: grade I, CDVA of 0.90 or better; grade II, CDVA equal to or better than 0.60 and worse than 0.90; grade III, CDVA equal to or better than 0.40 and worse than 0.60; grade IV, CDVA equal to or better than 0.20 and worse than 0.40; and grade plus, CDVA worse than 0.20. Success and failure indices were defined based on visual, refractive, corneal topographic, and aberrometric data and evaluated in each group 6 months after ICRS implantation. RESULTS: Significant improvement after the procedure was observed regarding uncorrected distance visual acuity in all grades (P < .05). CDVA significantly decreased in grade I (P < .01) but significantly increased in all other grades (P < .05). A total of 37.9% of patients with preoperative CDVA 0.6 or better gained 1 or more lines of CDVA, whereas 82.8% of patients with preoperative CDVA 0.4 or worse gained 1 or more lines of CDVA (P < .01). Spherical equivalent and keratometry readings showed a significant reduction in all grades (P ≤ .02). Corneal higher-order aberrations did not change after the procedure (P ≥ .05). CONCLUSIONS: Based on preoperative visual impairment, ICRS implantation provides significantly better results in patients with a severe form of the disease. A notable loss of CDVA lines can be expected in patients with a milder form of keratoconus.
Resumo:
The goal of image retrieval and matching is to find and locate object instances in images from a large-scale image database. While visual features are abundant, how to combine them to improve performance by individual features remains a challenging task. In this work, we focus on leveraging multiple features for accurate and efficient image retrieval and matching. We first propose two graph-based approaches to rerank initially retrieved images for generic image retrieval. In the graph, vertices are images while edges are similarities between image pairs. Our first approach employs a mixture Markov model based on a random walk model on multiple graphs to fuse graphs. We introduce a probabilistic model to compute the importance of each feature for graph fusion under a naive Bayesian formulation, which requires statistics of similarities from a manually labeled dataset containing irrelevant images. To reduce human labeling, we further propose a fully unsupervised reranking algorithm based on a submodular objective function that can be efficiently optimized by greedy algorithm. By maximizing an information gain term over the graph, our submodular function favors a subset of database images that are similar to query images and resemble each other. The function also exploits the rank relationships of images from multiple ranked lists obtained by different features. We then study a more well-defined application, person re-identification, where the database contains labeled images of human bodies captured by multiple cameras. Re-identifications from multiple cameras are regarded as related tasks to exploit shared information. We apply a novel multi-task learning algorithm using both low level features and attributes. A low rank attribute embedding is joint learned within the multi-task learning formulation to embed original binary attributes to a continuous attribute space, where incorrect and incomplete attributes are rectified and recovered. To locate objects in images, we design an object detector based on object proposals and deep convolutional neural networks (CNN) in view of the emergence of deep networks. We improve a Fast RCNN framework and investigate two new strategies to detect objects accurately and efficiently: scale-dependent pooling (SDP) and cascaded rejection classifiers (CRC). The SDP improves detection accuracy by exploiting appropriate convolutional features depending on the scale of input object proposals. The CRC effectively utilizes convolutional features and greatly eliminates negative proposals in a cascaded manner, while maintaining a high recall for true objects. The two strategies together improve the detection accuracy and reduce the computational cost.
Resumo:
Context: Mobile applications support a set of user-interaction features that are independent of the application logic. Rotating the device, scrolling, or zooming are examples of such features. Some bugs in mobile applications can be attributed to user-interaction features. Objective: This paper proposes and evaluates a bug analyzer based on user-interaction features that uses digital image processing to find bugs. Method: Our bug analyzer detects bugs by comparing the similarity between images taken before and after a user-interaction. SURF, an interest point detector and descriptor, is used to compare the images. To evaluate the bug analyzer, we conducted a case study with 15 randomly selected mobile applications. First, we identified user-interaction bugs by manually testing the applications. Images were captured before and after applying each user-interaction feature. Then, image pairs were processed with SURF to obtain interest points, from which a similarity percentage was computed, to finally decide whether there was a bug. Results: We performed a total of 49 user-interaction feature tests. When manually testing the applications, 17 bugs were found, whereas when using image processing, 15 bugs were detected. Conclusions: 8 out of 15 mobile applications tested had bugs associated to user-interaction features. Our bug analyzer based on image processing was able to detect 88% (15 out of 17) of the user-interaction bugs found with manual testing.
Resumo:
Visual recognition is a fundamental research topic in computer vision. This dissertation explores datasets, features, learning, and models used for visual recognition. In order to train visual models and evaluate different recognition algorithms, this dissertation develops an approach to collect object image datasets on web pages using an analysis of text around the image and of image appearance. This method exploits established online knowledge resources (Wikipedia pages for text; Flickr and Caltech data sets for images). The resources provide rich text and object appearance information. This dissertation describes results on two datasets. The first is Berg’s collection of 10 animal categories; on this dataset, we significantly outperform previous approaches. On an additional set of 5 categories, experimental results show the effectiveness of the method. Images are represented as features for visual recognition. This dissertation introduces a text-based image feature and demonstrates that it consistently improves performance on hard object classification problems. The feature is built using an auxiliary dataset of images annotated with tags, downloaded from the Internet. Image tags are noisy. The method obtains the text features of an unannotated image from the tags of its k-nearest neighbors in this auxiliary collection. A visual classifier presented with an object viewed under novel circumstances (say, a new viewing direction) must rely on its visual examples. This text feature may not change, because the auxiliary dataset likely contains a similar picture. While the tags associated with images are noisy, they are more stable when appearance changes. The performance of this feature is tested using PASCAL VOC 2006 and 2007 datasets. This feature performs well; it consistently improves the performance of visual object classifiers, and is particularly effective when the training dataset is small. With more and more collected training data, computational cost becomes a bottleneck, especially when training sophisticated classifiers such as kernelized SVM. This dissertation proposes a fast training algorithm called Stochastic Intersection Kernel Machine (SIKMA). This proposed training method will be useful for many vision problems, as it can produce a kernel classifier that is more accurate than a linear classifier, and can be trained on tens of thousands of examples in two minutes. It processes training examples one by one in a sequence, so memory cost is no longer the bottleneck to process large scale datasets. This dissertation applies this approach to train classifiers of Flickr groups with many group training examples. The resulting Flickr group prediction scores can be used to measure image similarity between two images. Experimental results on the Corel dataset and a PASCAL VOC dataset show the learned Flickr features perform better on image matching, retrieval, and classification than conventional visual features. Visual models are usually trained to best separate positive and negative training examples. However, when recognizing a large number of object categories, there may not be enough training examples for most objects, due to the intrinsic long-tailed distribution of objects in the real world. This dissertation proposes an approach to use comparative object similarity. The key insight is that, given a set of object categories which are similar and a set of categories which are dissimilar, a good object model should respond more strongly to examples from similar categories than to examples from dissimilar categories. This dissertation develops a regularized kernel machine algorithm to use this category dependent similarity regularization. Experiments on hundreds of categories show that our method can make significant improvement for categories with few or even no positive examples.
Resumo:
Hand detection on images has important applications on person activities recognition. This thesis focuses on PASCAL Visual Object Classes (VOC) system for hand detection. VOC has become a popular system for object detection, based on twenty common objects, and has been released with a successful deformable parts model in VOC2007. A hand detection on an image is made when the system gets a bounding box which overlaps with at least 50% of any ground truth bounding box for a hand on the image. The initial average precision of this detector is around 0.215 compared with a state-of-art of 0.104; however, color and frequency features for detected bounding boxes contain important information for re-scoring, and the average precision can be improved to 0.218 with these features. Results show that these features help on getting higher precision for low recall, even though the average precision is similar.
Resumo:
With the rise of smart phones, lifelogging devices (e.g. Google Glass) and popularity of image sharing websites (e.g. Flickr), users are capturing and sharing every aspect of their life online producing a wealth of visual content. Of these uploaded images, the majority are poorly annotated or exist in complete semantic isolation making the process of building retrieval systems difficult as one must firstly understand the meaning of an image in order to retrieve it. To alleviate this problem, many image sharing websites offer manual annotation tools which allow the user to “tag” their photos, however, these techniques are laborious and as a result have been poorly adopted; Sigurbjörnsson and van Zwol (2008) showed that 64% of images uploaded to Flickr are annotated with < 4 tags. Due to this, an entire body of research has focused on the automatic annotation of images (Hanbury, 2008; Smeulders et al., 2000; Zhang et al., 2012a) where one attempts to bridge the semantic gap between an image’s appearance and meaning e.g. the objects present. Despite two decades of research the semantic gap still largely exists and as a result automatic annotation models often offer unsatisfactory performance for industrial implementation. Further, these techniques can only annotate what they see, thus ignoring the “bigger picture” surrounding an image (e.g. its location, the event, the people present etc). Much work has therefore focused on building photo tag recommendation (PTR) methods which aid the user in the annotation process by suggesting tags related to those already present. These works have mainly focused on computing relationships between tags based on historical images e.g. that NY and timessquare co-exist in many images and are therefore highly correlated. However, tags are inherently noisy, sparse and ill-defined often resulting in poor PTR accuracy e.g. does NY refer to New York or New Year? This thesis proposes the exploitation of an image’s context which, unlike textual evidences, is always present, in order to alleviate this ambiguity in the tag recommendation process. Specifically we exploit the “what, who, where, when and how” of the image capture process in order to complement textual evidences in various photo tag recommendation and retrieval scenarios. In part II, we combine text, content-based (e.g. # of faces present) and contextual (e.g. day-of-the-week taken) signals for tag recommendation purposes, achieving up to a 75% improvement to precision@5 in comparison to a text-only TF-IDF baseline. We then consider external knowledge sources (i.e. Wikipedia & Twitter) as an alternative to (slower moving) Flickr in order to build recommendation models on, showing that similar accuracy could be achieved on these faster moving, yet entirely textual, datasets. In part II, we also highlight the merits of diversifying tag recommendation lists before discussing at length various problems with existing automatic image annotation and photo tag recommendation evaluation collections. In part III, we propose three new image retrieval scenarios, namely “visual event summarisation”, “image popularity prediction” and “lifelog summarisation”. In the first scenario, we attempt to produce a rank of relevant and diverse images for various news events by (i) removing irrelevant images such memes and visual duplicates (ii) before semantically clustering images based on the tweets in which they were originally posted. Using this approach, we were able to achieve over 50% precision for images in the top 5 ranks. In the second retrieval scenario, we show that by combining contextual and content-based features from images, we are able to predict if it will become “popular” (or not) with 74% accuracy, using an SVM classifier. Finally, in chapter 9 we employ blur detection and perceptual-hash clustering in order to remove noisy images from lifelogs, before combining visual and geo-temporal signals in order to capture a user’s “key moments” within their day. We believe that the results of this thesis show an important step towards building effective image retrieval models when there lacks sufficient textual content (i.e. a cold start).
Resumo:
Most approaches to stereo visual odometry reconstruct the motion based on the tracking of point features along a sequence of images. However, in low-textured scenes it is often difficult to encounter a large set of point features, or it may happen that they are not well distributed over the image, so that the behavior of these algorithms deteriorates. This paper proposes a probabilistic approach to stereo visual odometry based on the combination of both point and line segment that works robustly in a wide variety of scenarios. The camera motion is recovered through non-linear minimization of the projection errors of both point and line segment features. In order to effectively combine both types of features, their associated errors are weighted according to their covariance matrices, computed from the propagation of Gaussian distribution errors in the sensor measurements. The method, of course, is computationally more expensive that using only one type of feature, but still can run in real-time on a standard computer and provides interesting advantages, including a straightforward integration into any probabilistic framework commonly employed in mobile robotics.
Resumo:
The new generation of artificial satellites is providing a huge amount of Earth observation images whose exploitation can report invaluable benefits, both economical and environmental. However, only a small fraction of this data volume has been analyzed, mainly due to the large human resources needed for that task. In this sense, the development of unsupervised methodologies for the analysis of these images is a priority. In this work, a new unsupervised segmentation algorithm for satellite images is proposed. This algorithm is based on the rough-set theory, and it is inspired by a previous segmentation algorithm defined in the RGB color domain. The main contributions of the new algorithm are: (i) extending the original algorithm to four spectral bands; (ii) the concept of the superpixel is used in order to define the neighborhood similarity of a pixel adapted to the local characteristics of each image; (iii) and two new region merged strategies are proposed and evaluated in order to establish the final number of regions in the segmented image. The experimental results show that the proposed approach improves the results provided by the original method when both are applied to satellite images with different spectral and spatial resolutions.
Resumo:
In this thesis, an image enhancement application is developed for low-vision patients when they use iPhones to see images/watch videos. The thesis has two contributions. The first contribution is the new image enhancement algorithm which combines human vision features. The new image enhancement algorithm is modified from a wavelet transform based image enhancement algorithm developed by Dr. Jinshan Tang. Different from the original algorithm, the new image enhancement algorithm combines human visual feature into the algorithm and thus can make the new algorithm more effective. Experimental simulation results show that the proposed algorithm has better visual results than the algorithm without combining visual features. The second contribution of this thesis is the development of a mobile image enhancement application. In this application, users with low-vision can see clearer images on an iPhone which is installed with the application I have developed.
Resumo:
Radio Simultaneous Location and Mapping (SLAM) consists of the simultaneous tracking of the target and estimation of the surrounding environment, to build a map and estimate the target movements within it. It is an increasingly exploited technique for automotive applications, in order to improve the localization of obstacles and the target relative movement with respect to them, for emergency situations, for example when it is necessary to explore (with a drone or a robot) environments with a limited visibility, or for personal radar applications, thanks to its versatility and cheapness. Until today, these systems were based on light detection and ranging (lidar) or visual cameras, high-accuracy and expensive approaches that are limited to specific environments and weather conditions. Instead, in case of smoke, fog or simply darkness, radar-based systems can operate exactly in the same way. In this thesis activity, the Fourier-Mellin algorithm is analyzed and implemented, to verify the applicability to Radio SLAM, in which the radar frames can be treated as images and the radar motion between consecutive frames can be covered with registration. Furthermore, a simplified version of that algorithm is proposed, in order to solve the problems of the Fourier-Mellin algorithm when working with real radar images and improve the performance. The INRAS RBK2, a MIMO 2x16 mmWave radar, is used for experimental acquisitions, consisting of multiple tests performed in Lab-E of the Cesena Campus, University of Bologna. The different performances of Fourier-Mellin and its simplified version are compared also with the MatchScan algorithm, a classic algorithm for SLAM systems.
Resumo:
The effects of ionic strength on ions in aqueous solutions are quite relevant, especially for biochemical systems, in which proteins and amino acids are involved. The teaching of this topic and more specifically, the Debye-Hückel limiting law, is central in chemistry undergraduate courses. In this work, we present a description of an experimental procedure based on the color change of aqueous solutions of bromocresol green (BCG), driven by addition of electrolyte. The contribution of charge product (z+|z-|) to the Debye-Hückel limiting law is demonstrated when the effects of NaCl and Na2SO4 on the color of BCG solutions are compared.
Resumo:
PURPOSE: To verify perceptions and conduct of students with visual impairment regarding devices and equipment utilized in schooling process. METHODS: A transversal descriptive study on a population of 12-year-old or older students in schooling process, affected by congenital or acquired visual impairment, inserted in the government teaching system of Campinas during the year 2000. An interview quiz, created based on an exploratory study was applied. RESULTS: A group of 26 students, 46% of them with low vision and 53.8% affected by blindness was obtained. Most of the students were from fundamental teaching courses (65.4%), studying in schools with classrooms provided with devices (73.1%). Among the resources used in reading and writing activities, 94.1% of the students reported they used the Braille system and 81.8% reported that the reading subject was dictated by a colleague. Most of the students with low vision wore glasses (91.7%), and 33.3% utilized a magnifying glass as optical devices. Among the non-optical devices, the most common were the environmental ones, getting closer to the blackboard (75.0%) and to the window (66.7%) for better lighting. CONCLUSIONS: It became evident that students with low vision eye-sight made use of devices meant for bearers of blindness, such as applying the Braille system. A reduced number of low vision students making use of optical and non-optical devices applicable to their problems were observed, indicating a probable unawareness of their visual potential and the appropriate devices to improve efficiency.