903 resultados para Computer vision industry
Resumo:
Il seguente elaborato di tesi tratta il problema della pianificazione di voli fotogrammetrici a bassa quota mediante l’uso di SAPR, in particolare è presentata una disamina delle principali applicazioni che permettono di programmare una copertura fotogrammetrica trasversale e longitudinale di un certo poligono con un drone commerciale. Il tema principale sviluppato è la gestione di un volo fotogrammetrico UAV mediante l’uso di applicativi software che permettono all’utente di inserire i parametri di volo in base alla tipologia di rilievo che vuole effettuare. L’obbiettivo finale è quello di ottenere una corretta presa fotogrammetrica da utilizzare per la creazione di un modello digitale del terreno o di un oggetto attraverso elaborazione dati in post-processing. La perfetta configurazione del volo non può prescindere dalle conoscenze base di fotogrammetria e delle meccaniche di un veicolo UAV. I capitoli introduttivi tratteranno infatti i principi della fotogrammetria analogica e digitale soffermandosi su temi utili alla comprensione delle problematiche relative al progetto di rilievo fotogrammetrico aereo. Una particolare attenzione è stata posta sulle nozioni di fotogrammetria digitale che, insieme agli algoritmi di Imagine Matching derivanti dalla Computer Vision, permette di definire il ramo della Fotogrammetria Moderna. Nei capitoli centrali verranno esaminate e confrontate una serie di applicazioni commerciali per smartphone e tablet, disponibili per sistemi Apple e Android, per trarne un breve resoconto conclusivo che le compari in termini di accessibilità, potenzialità e destinazione d’uso. Per una maggiore comprensione si determinano univocamente gli acronimi con cui i droni vengono chiamati nei diversi contesti: UAV (Unmanned Aerial Vehicle), SAPR (Sistemi Aeromobili a Pilotaggio Remoto), RPAS (Remotely Piloted Aicraft System), ARP (Aeromobili a Pilotaggio Remoto).
Resumo:
Abstract : Images acquired from unmanned aerial vehicles (UAVs) can provide data with unprecedented spatial and temporal resolution for three-dimensional (3D) modeling. Solutions developed for this purpose are mainly operating based on photogrammetry concepts, namely UAV-Photogrammetry Systems (UAV-PS). Such systems are used in applications where both geospatial and visual information of the environment is required. These applications include, but are not limited to, natural resource management such as precision agriculture, military and police-related services such as traffic-law enforcement, precision engineering such as infrastructure inspection, and health services such as epidemic emergency management. UAV-photogrammetry systems can be differentiated based on their spatial characteristics in terms of accuracy and resolution. That is some applications, such as precision engineering, require high-resolution and high-accuracy information of the environment (e.g. 3D modeling with less than one centimeter accuracy and resolution). In other applications, lower levels of accuracy might be sufficient, (e.g. wildlife management needing few decimeters of resolution). However, even in those applications, the specific characteristics of UAV-PSs should be well considered in the steps of both system development and application in order to yield satisfying results. In this regard, this thesis presents a comprehensive review of the applications of unmanned aerial imagery, where the objective was to determine the challenges that remote-sensing applications of UAV systems currently face. This review also allowed recognizing the specific characteristics and requirements of UAV-PSs, which are mostly ignored or not thoroughly assessed in recent studies. Accordingly, the focus of the first part of this thesis is on exploring the methodological and experimental aspects of implementing a UAV-PS. The developed system was extensively evaluated for precise modeling of an open-pit gravel mine and performing volumetric-change measurements. This application was selected for two main reasons. Firstly, this case study provided a challenging environment for 3D modeling, in terms of scale changes, terrain relief variations as well as structure and texture diversities. Secondly, open-pit-mine monitoring demands high levels of accuracy, which justifies our efforts to improve the developed UAV-PS to its maximum capacities. The hardware of the system consisted of an electric-powered helicopter, a high-resolution digital camera, and an inertial navigation system. The software of the system included the in-house programs specifically designed for camera calibration, platform calibration, system integration, onboard data acquisition, flight planning and ground control point (GCP) detection. The detailed features of the system are discussed in the thesis, and solutions are proposed in order to enhance the system and its photogrammetric outputs. The accuracy of the results was evaluated under various mapping conditions, including direct georeferencing and indirect georeferencing with different numbers, distributions and types of ground control points. Additionally, the effects of imaging configuration and network stability on modeling accuracy were assessed. The second part of this thesis concentrates on improving the techniques of sparse and dense reconstruction. The proposed solutions are alternatives to traditional aerial photogrammetry techniques, properly adapted to specific characteristics of unmanned, low-altitude imagery. Firstly, a method was developed for robust sparse matching and epipolar-geometry estimation. The main achievement of this method was its capacity to handle a very high percentage of outliers (errors among corresponding points) with remarkable computational efficiency (compared to the state-of-the-art techniques). Secondly, a block bundle adjustment (BBA) strategy was proposed based on the integration of intrinsic camera calibration parameters as pseudo-observations to Gauss-Helmert model. The principal advantage of this strategy was controlling the adverse effect of unstable imaging networks and noisy image observations on the accuracy of self-calibration. The sparse implementation of this strategy was also performed, which allowed its application to data sets containing a lot of tie points. Finally, the concepts of intrinsic curves were revisited for dense stereo matching. The proposed technique could achieve a high level of accuracy and efficiency by searching only through a small fraction of the whole disparity search space as well as internally handling occlusions and matching ambiguities. These photogrammetric solutions were extensively tested using synthetic data, close-range images and the images acquired from the gravel-pit mine. Achieving absolute 3D mapping accuracy of 11±7 mm illustrated the success of this system for high-precision modeling of the environment.
Resumo:
The estimating of the relative orientation and position of a camera is one of the integral topics in the field of computer vision. The accuracy of a certain Finnish technology company’s traffic sign inventory and localization process can be improved by utilizing the aforementioned concept. The company’s localization process uses video data produced by a vehicle installed camera. The accuracy of estimated traffic sign locations depends on the relative orientation between the camera and the vehicle. This thesis proposes a computer vision based software solution which can estimate a camera’s orientation relative to the movement direction of the vehicle by utilizing video data. The task was solved by using feature-based methods and open source software. When using simulated data sets, the camera orientation estimates had an absolute error of 0.31 degrees on average. The software solution can be integrated to be a part of the traffic sign localization pipeline of the company in question.
Resumo:
AIRES, Kelson R. T. ; ARAÚJO, Hélder J. ; MEDEIROS, Adelardo A. D. . Plane Detection from Monocular Image Sequences. In: VISUALIZATION, IMAGING AND IMAGE PROCESSING, 2008, Palma de Mallorca, Spain. Proceedings..., Palma de Mallorca: VIIP, 2008
Resumo:
In the last decade, research in Computer Vision has developed several algorithms to help botanists and non-experts to classify plants based on images of their leaves. LeafSnap is a mobile application that uses a multiscale curvature model of the leaf margin to classify leaf images into species. It has achieved high levels of accuracy on 184 tree species from Northeast US. We extend the research that led to the development of LeafSnap along two lines. First, LeafSnap’s underlying algorithms are applied to a set of 66 tree species from Costa Rica. Then, texture is used as an additional criterion to measure the level of improvement achieved in the automatic identification of Costa Rica tree species. A 25.6% improvement was achieved for a Costa Rican clean image dataset and 42.5% for a Costa Rican noisy image dataset. In both cases, our results show this increment as statistically significant. Further statistical analysis of visual noise impact, best algorithm combinations per species, and best value of , the minimal cardinality of the set of candidate species that the tested algorithms render as best matches is also presented in this research
Resumo:
This thesis deals with tensor completion for the solution of multidimensional inverse problems. We study the problem of reconstructing an approximately low rank tensor from a small number of noisy linear measurements. New recovery guarantees, numerical algorithms, non-uniform sampling strategies, and parameter selection algorithms are developed. We derive a fixed point continuation algorithm for tensor completion and prove its convergence. A restricted isometry property (RIP) based tensor recovery guarantee is proved. Probabilistic recovery guarantees are obtained for sub-Gaussian measurement operators and for measurements obtained by non-uniform sampling from a Parseval tight frame. We show how tensor completion can be used to solve multidimensional inverse problems arising in NMR relaxometry. Algorithms are developed for regularization parameter selection, including accelerated k-fold cross-validation and generalized cross-validation. These methods are validated on experimental and simulated data. We also derive condition number estimates for nonnegative least squares problems. Tensor recovery promises to significantly accelerate N-dimensional NMR relaxometry and related experiments, enabling previously impractical experiments. Our methods could also be applied to other inverse problems arising in machine learning, image processing, signal processing, computer vision, and other fields.
Resumo:
Increasing the size of training data in many computer vision tasks has shown to be very effective. Using large scale image datasets (e.g. ImageNet) with simple learning techniques (e.g. linear classifiers) one can achieve state-of-the-art performance in object recognition compared to sophisticated learning techniques on smaller image sets. Semantic search on visual data has become very popular. There are billions of images on the internet and the number is increasing every day. Dealing with large scale image sets is intense per se. They take a significant amount of memory that makes it impossible to process the images with complex algorithms on single CPU machines. Finding an efficient image representation can be a key to attack this problem. A representation being efficient is not enough for image understanding. It should be comprehensive and rich in carrying semantic information. In this proposal we develop an approach to computing binary codes that provide a rich and efficient image representation. We demonstrate several tasks in which binary features can be very effective. We show how binary features can speed up large scale image classification. We present learning techniques to learn the binary features from supervised image set (With different types of semantic supervision; class labels, textual descriptions). We propose several problems that are very important in finding and using efficient image representation.
Resumo:
A cor da superfície dos alimentos é o primeiro parâmetro de qualidade avaliado pelos consumidores, e é critico para a aceitação do produto, então a medição adequada da cor é uma importante ferramenta. Nesta pesquisa avaliou-se a variação da cor em corvina (Micropogonias furnieri) armazenada em gelo durante 16 dias; os parâmetros de luminosidade (L*), valor cromático a*, valor cromático b*, variação total da cor (ΔE) e croma (C*) foram obtidos por sistema de visão computacional, e por colorímetro Konica Minolta CR-400. O frescor da corvina baseada nas mudanças da cor das brânquias foi avaliado utilizando um sistema de visão computacional. Também se modelou a oxidação da mioglobina em files de burriquete (Pogonias cromis), utilizando os parâmetros de vermelho (valor a* e R). Para registrar as mudanças da cor durante 57,6 h utilizou-se um sistema de visão computacional, a análise química realizou-se determinando a concentração de metamioglobina (%). Na avaliação da cor de corvina armazenada em gelo, o sistema de visão computacional mostrou diferenças significativas para L*, a*, ΔE e C*, enquanto que o colorímetro mostrou diferenças significativas para L* e ΔE, o único parâmetro que não apresentou diferenças entre instrumentos foi ΔE durante a avaliação da corvina armazenada em gelo. O coeficiente de correlação entre os parâmetros da cor (L*, a* e b*) das brânquias da corvina armazenada em gelo pelo tempo de armazenamento foi de 0,9747. O sistema de visão computacional registrou as mudanças da cor em filés de burriquete e se modelaram as mudanças utilizando um modelo exponencial. O sistema de visão computacional mostrou ser mais sensível às mudanças da cor durante a avaliação da cor na corvina armazenada em gelo. É possível prognosticar o tempo de armazenamento da corvina em gelo em função da mudança da cor das brânquias. Assim, foi possível modelar a variação da mioglobina em filés de burriquete utilizando sistemas de visão computacional para registrar ditas mudanças. Os sistemas de visão computacional têm grande capacidade para registrar as mudanças da cor e é possível utiliza-los para avaliar os alimentos em função da cor.
Resumo:
Visual recognition is a fundamental research topic in computer vision. This dissertation explores datasets, features, learning, and models used for visual recognition. In order to train visual models and evaluate different recognition algorithms, this dissertation develops an approach to collect object image datasets on web pages using an analysis of text around the image and of image appearance. This method exploits established online knowledge resources (Wikipedia pages for text; Flickr and Caltech data sets for images). The resources provide rich text and object appearance information. This dissertation describes results on two datasets. The first is Berg’s collection of 10 animal categories; on this dataset, we significantly outperform previous approaches. On an additional set of 5 categories, experimental results show the effectiveness of the method. Images are represented as features for visual recognition. This dissertation introduces a text-based image feature and demonstrates that it consistently improves performance on hard object classification problems. The feature is built using an auxiliary dataset of images annotated with tags, downloaded from the Internet. Image tags are noisy. The method obtains the text features of an unannotated image from the tags of its k-nearest neighbors in this auxiliary collection. A visual classifier presented with an object viewed under novel circumstances (say, a new viewing direction) must rely on its visual examples. This text feature may not change, because the auxiliary dataset likely contains a similar picture. While the tags associated with images are noisy, they are more stable when appearance changes. The performance of this feature is tested using PASCAL VOC 2006 and 2007 datasets. This feature performs well; it consistently improves the performance of visual object classifiers, and is particularly effective when the training dataset is small. With more and more collected training data, computational cost becomes a bottleneck, especially when training sophisticated classifiers such as kernelized SVM. This dissertation proposes a fast training algorithm called Stochastic Intersection Kernel Machine (SIKMA). This proposed training method will be useful for many vision problems, as it can produce a kernel classifier that is more accurate than a linear classifier, and can be trained on tens of thousands of examples in two minutes. It processes training examples one by one in a sequence, so memory cost is no longer the bottleneck to process large scale datasets. This dissertation applies this approach to train classifiers of Flickr groups with many group training examples. The resulting Flickr group prediction scores can be used to measure image similarity between two images. Experimental results on the Corel dataset and a PASCAL VOC dataset show the learned Flickr features perform better on image matching, retrieval, and classification than conventional visual features. Visual models are usually trained to best separate positive and negative training examples. However, when recognizing a large number of object categories, there may not be enough training examples for most objects, due to the intrinsic long-tailed distribution of objects in the real world. This dissertation proposes an approach to use comparative object similarity. The key insight is that, given a set of object categories which are similar and a set of categories which are dissimilar, a good object model should respond more strongly to examples from similar categories than to examples from dissimilar categories. This dissertation develops a regularized kernel machine algorithm to use this category dependent similarity regularization. Experiments on hundreds of categories show that our method can make significant improvement for categories with few or even no positive examples.
Resumo:
One of the most significant research topics in computer vision is object detection. Most of the reported object detection results localise the detected object within a bounding box, but do not explicitly label the edge contours of the object. Since object contours provide a fundamental diagnostic of object shape, some researchers have initiated work on linear contour feature representations for object detection and localisation. However, linear contour feature-based localisation is highly dependent on the performance of linear contour detection within natural images, and this can be perturbed significantly by a cluttered background. In addition, the conventional approach to achieving rotation-invariant features is to rotate the feature receptive field to align with the local dominant orientation before computing the feature representation. Grid resampling after rotation adds extra computational cost and increases the total time consumption for computing the feature descriptor. Though it is not an expensive process if using current computers, it is appreciated that if each step of the implementation is faster to compute especially when the number of local features is increasing and the application is implemented on resource limited ”smart devices”, such as mobile phones, in real-time. Motivated by the above issues, a 2D object localisation system is proposed in this thesis that matches features of edge contour points, which is an alternative method that takes advantage of the shape information for object localisation. This is inspired by edge contour points comprising the basic components of shape contours. In addition, edge point detection is usually simpler to achieve than linear edge contour detection. Therefore, the proposed localization system could avoid the need for linear contour detection and reduce the pathological disruption from the image background. Moreover, since natural images usually comprise many more edge contour points than interest points (i.e. corner points), we also propose new methods to generate rotation-invariant local feature descriptors without pre-rotating the feature receptive field to improve the computational efficiency of the whole system. In detail, the 2D object localisation system is achieved by matching edge contour points features in a constrained search area based on the initial pose-estimate produced by a prior object detection process. The local feature descriptor obtains rotation invariance by making use of rotational symmetry of the hexagonal structure. Therefore, a set of local feature descriptors is proposed based on the hierarchically hexagonal grouping structure. Ultimately, the 2D object localisation system achieves a very promising performance based on matching the proposed features of edge contour points with the mean correct labelling rate of the edge contour points 0.8654 and the mean false labelling rate 0.0314 applied on the data from Amsterdam Library of Object Images (ALOI). Furthermore, the proposed descriptors are evaluated by comparing to the state-of-the-art descriptors and achieve competitive performances in terms of pose estimate with around half-pixel pose error.
Resumo:
International audience
Resumo:
Hand detection on images has important applications on person activities recognition. This thesis focuses on PASCAL Visual Object Classes (VOC) system for hand detection. VOC has become a popular system for object detection, based on twenty common objects, and has been released with a successful deformable parts model in VOC2007. A hand detection on an image is made when the system gets a bounding box which overlaps with at least 50% of any ground truth bounding box for a hand on the image. The initial average precision of this detector is around 0.215 compared with a state-of-art of 0.104; however, color and frequency features for detected bounding boxes contain important information for re-scoring, and the average precision can be improved to 0.218 with these features. Results show that these features help on getting higher precision for low recall, even though the average precision is similar.
Resumo:
Humans have a high ability to extract visual data information acquired by sight. Trought a learning process, which starts at birth and continues throughout life, image interpretation becomes almost instinctively. At a glance, one can easily describe a scene with reasonable precision, naming its main components. Usually, this is done by extracting low-level features such as edges, shapes and textures, and associanting them to high level meanings. In this way, a semantic description of the scene is done. An example of this, is the human capacity to recognize and describe other people physical and behavioral characteristics, or biometrics. Soft-biometrics also represents inherent characteristics of human body and behaviour, but do not allow unique person identification. Computer vision area aims to develop methods capable of performing visual interpretation with performance similar to humans. This thesis aims to propose computer vison methods which allows high level information extraction from images in the form of soft biometrics. This problem is approached in two ways, unsupervised and supervised learning methods. The first seeks to group images via an automatic feature extraction learning , using both convolution techniques, evolutionary computing and clustering. In this approach employed images contains faces and people. Second approach employs convolutional neural networks, which have the ability to operate on raw images, learning both feature extraction and classification processes. Here, images are classified according to gender and clothes, divided into upper and lower parts of human body. First approach, when tested with different image datasets obtained an accuracy of approximately 80% for faces and non-faces and 70% for people and non-person. The second tested using images and videos, obtained an accuracy of about 70% for gender, 80% to the upper clothes and 90% to lower clothes. The results of these case studies, show that proposed methods are promising, allowing the realization of automatic high level information image annotation. This opens possibilities for development of applications in diverse areas such as content-based image and video search and automatica video survaillance, reducing human effort in the task of manual annotation and monitoring.
Resumo:
The purpose of this work is to demonstrate and to assess a simple algorithm for automatic estimation of the most salient region in an image, that have possible application in computer vision. The algorithm uses the connection between color dissimilarities in the image and the image’s most salient region. The algorithm also avoids using image priors. Pixel dissimilarity is an informal function of the distance of a specific pixel’s color to other pixels’ colors in an image. We examine the relation between pixel color dissimilarity and salient region detection on the MSRA1K image dataset. We propose a simple algorithm for salient region detection through random pixel color dissimilarity. We define dissimilarity by accumulating the distance between each pixel and a sample of n other random pixels, in the CIELAB color space. An important result is that random dissimilarity between each pixel and just another pixel (n = 1) is enough to create adequate saliency maps when combined with median filter, with competitive average performance if compared with other related methods in the saliency detection research field. The assessment was performed by means of precision-recall curves. This idea is inspired on the human attention mechanism that is able to choose few specific regions to focus on, a biological system that the computer vision community aims to emulate. We also review some of the history on this topic of selective attention.