899 resultados para computer vision, geometric variations, congealing, unsupervised image alignment
Resumo:
AIRES, Kelson R. T. ; ARAÚJO, Hélder J. ; MEDEIROS, Adelardo A. D. . Plane Detection from Monocular Image Sequences. In: VISUALIZATION, IMAGING AND IMAGE PROCESSING, 2008, Palma de Mallorca, Spain. Proceedings..., Palma de Mallorca: VIIP, 2008
Resumo:
AIRES, Kelson R. T. ; ARAÚJO, Hélder J. ; MEDEIROS, Adelardo A. D. . Plane Detection from Monocular Image Sequences. In: VISUALIZATION, IMAGING AND IMAGE PROCESSING, 2008, Palma de Mallorca, Spain. Proceedings..., Palma de Mallorca: VIIP, 2008
Resumo:
Image (Video) retrieval is an interesting problem of retrieving images (videos) similar to the query. Images (Videos) are represented in an input (feature) space and similar images (videos) are obtained by finding nearest neighbors in the input representation space. Numerous input representations both in real valued and binary space have been proposed for conducting faster retrieval. In this thesis, we present techniques that obtain improved input representations for retrieval in both supervised and unsupervised settings for images and videos. Supervised retrieval is a well known problem of retrieving same class images of the query. We address the practical aspects of achieving faster retrieval with binary codes as input representations for the supervised setting in the first part, where binary codes are used as addresses into hash tables. In practice, using binary codes as addresses does not guarantee fast retrieval, as similar images are not mapped to the same binary code (address). We address this problem by presenting an efficient supervised hashing (binary encoding) method that aims to explicitly map all the images of the same class ideally to a unique binary code. We refer to the binary codes of the images as `Semantic Binary Codes' and the unique code for all same class images as `Class Binary Code'. We also propose a new class based Hamming metric that dramatically reduces the retrieval times for larger databases, where only hamming distance is computed to the class binary codes. We also propose a Deep semantic binary code model, by replacing the output layer of a popular convolutional Neural Network (AlexNet) with the class binary codes and show that the hashing functions learned in this way outperforms the state of the art, and at the same time provide fast retrieval times. In the second part, we also address the problem of supervised retrieval by taking into account the relationship between classes. For a given query image, we want to retrieve images that preserve the relative order i.e. we want to retrieve all same class images first and then, the related classes images before different class images. We learn such relationship aware binary codes by minimizing the similarity between inner product of the binary codes and the similarity between the classes. We calculate the similarity between classes using output embedding vectors, which are vector representations of classes. Our method deviates from the other supervised binary encoding schemes as it is the first to use output embeddings for learning hashing functions. We also introduce new performance metrics that take into account the related class retrieval results and show significant gains over the state of the art. High Dimensional descriptors like Fisher Vectors or Vector of Locally Aggregated Descriptors have shown to improve the performance of many computer vision applications including retrieval. In the third part, we will discuss an unsupervised technique for compressing high dimensional vectors into high dimensional binary codes, to reduce storage complexity. In this approach, we deviate from adopting traditional hyperplane hashing functions and instead learn hyperspherical hashing functions. The proposed method overcomes the computational challenges of directly applying the spherical hashing algorithm that is intractable for compressing high dimensional vectors. A practical hierarchical model that utilizes divide and conquer techniques using the Random Select and Adjust (RSA) procedure to compress such high dimensional vectors is presented. We show that our proposed high dimensional binary codes outperform the binary codes obtained using traditional hyperplane methods for higher compression ratios. In the last part of the thesis, we propose a retrieval based solution to the Zero shot event classification problem - a setting where no training videos are available for the event. To do this, we learn a generic set of concept detectors and represent both videos and query events in the concept space. We then compute similarity between the query event and the video in the concept space and videos similar to the query event are classified as the videos belonging to the event. We show that we significantly boost the performance using concept features from other modalities.
Resumo:
Dissertação de Mestrado, Engenharia Informática, Faculdade de Ciências e Tecnologia, Universidade do Algarve, 2014
Resumo:
To date, automatic recognition of semantic information such as salient objects and mid-level concepts from images is a challenging task. Since real-world objects tend to exist in a context within their environment, the computer vision researchers have increasingly incorporated contextual information for improving object recognition. In this paper, we present a method to build a visual contextual ontology from salient objects descriptions for image annotation. The ontologies include not only partOf/kindOf relations, but also spatial and co-occurrence relations. A two-step image annotation algorithm is also proposed based on ontology relations and probabilistic inference. Different from most of the existing work, we specially exploit how to combine representation of ontology, contextual knowledge and probabilistic inference. The experiments show that image annotation results are improved in the LabelMe dataset.
Resumo:
Light Detection and Ranging (LIDAR) has great potential to assist vegetation management in power line corridors by providing more accurate geometric information of the power line assets and vegetation along the corridors. However, the development of algorithms for the automatic processing of LIDAR point cloud data, in particular for feature extraction and classification of raw point cloud data, is in still in its infancy. In this paper, we take advantage of LIDAR intensity and try to classify ground and non-ground points by statistically analyzing the skewness and kurtosis of the intensity data. Moreover, the Hough transform is employed to detected power lines from the filtered object points. The experimental results show the effectiveness of our methods and indicate that better results were obtained by using LIDAR intensity data than elevation data.
Resumo:
Semi-automatic segmentation of still images has vast and varied practical applications. Recently, an approach "GrabCut" has managed to successfully build upon earlier approaches based on colour and gradient information in order to address the problem of efficient extraction of a foreground object in a complex environment. In this paper, we extend the GrabCut algorithm further by applying an unsupervised algorithm for modelling the Gaussian Mixtures that are used to define the foreground and background in the segmentation algorithm. We show examples where the optimisation of the GrabCut framework leads to further improvements in performance.
Resumo:
Competent navigation in an environment is a major requirement for an autonomous mobile robot to accomplish its mission. Nowadays, many successful systems for navigating a mobile robot use an internal map which represents the environment in a detailed geometric manner. However, building, maintaining and using such environment maps for navigation is difficult because of perceptual aliasing and measurement noise. Moreover, geometric maps require the processing of huge amounts of data which is computationally expensive. This thesis addresses the problem of vision-based topological mapping and localisation for mobile robot navigation. Topological maps are concise and graphical representations of environments that are scalable and amenable to symbolic manipulation. Thus, they are well-suited for basic robot navigation applications, and also provide a representational basis for the procedural and semantic information needed for higher-level robotic tasks. In order to make vision-based topological navigation suitable for inexpensive mobile robots for the mass market we propose to characterise key places of the environment based on their visual appearance through colour histograms. The approach for representing places using visual appearance is based on the fact that colour histograms change slowly as the field of vision sweeps the scene when a robot moves through an environment. Hence, a place represents a region of the environment rather than a single position. We demonstrate in experiments using an indoor data set, that a topological map in which places are characterised using visual appearance augmented with metric clues provides sufficient information to perform continuous metric localisation which is robust to the kidnapped robot problem. Many topological mapping methods build a topological map by clustering visual observations to places. However, due to perceptual aliasing observations from different places may be mapped to the same place representative in the topological map. A main contribution of this thesis is a novel approach for dealing with the perceptual aliasing problem in topological mapping. We propose to incorporate neighbourhood relations for disambiguating places which otherwise are indistinguishable. We present a constraint based stochastic local search method which integrates the approach for place disambiguation in order to induce a topological map. Experiments show that the proposed method is capable of mapping environments with a high degree of perceptual aliasing, and that a small map is found quickly. Moreover, the method of using neighbourhood information for place disambiguation is integrated into a framework for topological off-line simultaneous localisation and mapping which does not require an initial categorisation of visual observations. Experiments on an indoor data set demonstrate the suitability of our method to reliably localise the robot while building a topological map.
Resumo:
Machine vision represents a particularly attractive solution for sensing and detecting potential collision-course targets due to the relatively low cost, size, weight, and power requirements of the sensors involved (as opposed to radar). This paper describes the development and evaluation of a vision-based collision detection algorithm suitable for fixed-wing aerial robotics. The system was evaluated using highly realistic vision data of the moments leading up to a collision. Based on the collected data, our detection approaches were able to detect targets at distances ranging from 400m to about 900m. These distances (with some assumptions about closing speeds and aircraft trajectories) translate to an advanced warning of between 8-10 seconds ahead of impact, which approaches the 12.5 second response time recommended for human pilots. We make use of the enormous potential of graphic processing units to achieve processing rates of 30Hz (for images of size 1024-by- 768). Currently, integration in the final platform is under way.
Resumo:
The article described an open-source toolbox for machine vision called Machine Vision Toolbox (MVT). MVT includes more than 60 functions including image file reading and writing, acquisition, display, filtering, blob, point and line feature extraction, mathematical morphology, homographies, visual Jacobians, camera calibration, and color space conversion. MVT can be used for research into machine vision but is also versatile enough to be usable for real-time work and even control. MVT, combined with MATLAB and a model workstation computer, is a useful and convenient environment for the investigation of machine vision algorithms. The article illustrated the use of a subset of toolbox functions for some typical problems and described MVT operations including the simulation of a complete image-based visual servo system.
Resumo:
Uninhabited aerial vehicles (UAVs) are a cutting-edge technology that is at the forefront of aviation/aerospace research and development worldwide. Many consider their current military and defence applications as just a token of their enormous potential. Unlocking and fully exploiting this potential will see UAVs in a multitude of civilian applications and routinely operating alongside piloted aircraft. The key to realising the full potential of UAVs lies in addressing a host of regulatory, public relation, and technological challenges never encountered be- fore. Aircraft collision avoidance is considered to be one of the most important issues to be addressed, given its safety critical nature. The collision avoidance problem can be roughly organised into three areas: 1) Sense; 2) Detect; and 3) Avoid. Sensing is concerned with obtaining accurate and reliable information about other aircraft in the air; detection involves identifying potential collision threats based on available information; avoidance deals with the formulation and execution of appropriate manoeuvres to maintain safe separation. This thesis tackles the detection aspect of collision avoidance, via the development of a target detection algorithm that is capable of real-time operation onboard a UAV platform. One of the key challenges of the detection problem is the need to provide early warning. This translates to detecting potential threats whilst they are still far away, when their presence is likely to be obscured and hidden by noise. Another important consideration is the choice of sensors to capture target information, which has implications for the design and practical implementation of the detection algorithm. The main contributions of the thesis are: 1) the proposal of a dim target detection algorithm combining image morphology and hidden Markov model (HMM) filtering approaches; 2) the novel use of relative entropy rate (RER) concepts for HMM filter design; 3) the characterisation of algorithm detection performance based on simulated data as well as real in-flight target image data; and 4) the demonstration of the proposed algorithm's capacity for real-time target detection. We also consider the extension of HMM filtering techniques and the application of RER concepts for target heading angle estimation. In this thesis we propose a computer-vision based detection solution, due to the commercial-off-the-shelf (COTS) availability of camera hardware and the hardware's relatively low cost, power, and size requirements. The proposed target detection algorithm adopts a two-stage processing paradigm that begins with an image enhancement pre-processing stage followed by a track-before-detect (TBD) temporal processing stage that has been shown to be effective in dim target detection. We compare the performance of two candidate morphological filters for the image pre-processing stage, and propose a multiple hidden Markov model (MHMM) filter for the TBD temporal processing stage. The role of the morphological pre-processing stage is to exploit the spatial features of potential collision threats, while the MHMM filter serves to exploit the temporal characteristics or dynamics. The problem of optimising our proposed MHMM filter has been examined in detail. Our investigation has produced a novel design process for the MHMM filter that exploits information theory and entropy related concepts. The filter design process is posed as a mini-max optimisation problem based on a joint RER cost criterion. We provide proof that this joint RER cost criterion provides a bound on the conditional mean estimate (CME) performance of our MHMM filter, and this in turn establishes a strong theoretical basis connecting our filter design process to filter performance. Through this connection we can intelligently compare and optimise candidate filter models at the design stage, rather than having to resort to time consuming Monte Carlo simulations to gauge the relative performance of candidate designs. Moreover, the underlying entropy concepts are not constrained to any particular model type. This suggests that the RER concepts established here may be generalised to provide a useful design criterion for multiple model filtering approaches outside the class of HMM filters. In this thesis we also evaluate the performance of our proposed target detection algorithm under realistic operation conditions, and give consideration to the practical deployment of the detection algorithm onboard a UAV platform. Two fixed-wing UAVs were engaged to recreate various collision-course scenarios to capture highly realistic vision (from an onboard camera perspective) of the moments leading up to a collision. Based on this collected data, our proposed detection approach was able to detect targets out to distances ranging from about 400m to 900m. These distances, (with some assumptions about closing speeds and aircraft trajectories) translate to an advanced warning ahead of impact that approaches the 12.5 second response time recommended for human pilots. Furthermore, readily available graphic processing unit (GPU) based hardware is exploited for its parallel computing capabilities to demonstrate the practical feasibility of the proposed target detection algorithm. A prototype hardware-in- the-loop system has been found to be capable of achieving data processing rates sufficient for real-time operation. There is also scope for further improvement in performance through code optimisations. Overall, our proposed image-based target detection algorithm offers UAVs a cost-effective real-time target detection capability that is a step forward in ad- dressing the collision avoidance issue that is currently one of the most significant obstacles preventing widespread civilian applications of uninhabited aircraft. We also highlight that the algorithm development process has led to the discovery of a powerful multiple HMM filtering approach and a novel RER-based multiple filter design process. The utility of our multiple HMM filtering approach and RER concepts, however, extend beyond the target detection problem. This is demonstrated by our application of HMM filters and RER concepts to a heading angle estimation problem.