920 resultados para decoupled image-based visual servoing
Resumo:
Image categorization by means of bag of visual words has received increasing attention by the image processing and vision communities in the last years. In these approaches, each image is represented by invariant points of interest which are mapped to a Hilbert Space representing a visual dictionary which aims at comprising the most discriminative features in a set of images. Notwithstanding, the main problem of such approaches is to find a compact and representative dictionary. Finding such representative dictionary automatically with no user intervention is an even more difficult task. In this paper, we propose a method to automatically find such dictionary by employing a recent developed graph-based clustering algorithm called Optimum-Path Forest, which does not make any assumption about the visual dictionary's size and is more efficient and effective than the state-of-the-art techniques used for dictionary generation.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
In this letter, a semiautomatic method for road extraction in object space is proposed that combines a stereoscopic pair of low-resolution aerial images with a digital terrain model (DTM) structured as a triangulated irregular network (TIN). First, we formulate an objective function in the object space to allow the modeling of roads in 3-D. In this model, the TIN-based DTM allows the search for the optimal polyline to be restricted along a narrow band that is overlaid upon it. Finally, the optimal polyline for each road is obtained by optimizing the objective function using the dynamic programming optimization algorithm. A few seed points need to be supplied by an operator. To evaluate the performance of the proposed method, a set of experiments was designed using two stereoscopic pairs of low-resolution aerial images and a TIN-based DTM with an average resolution of 1 m. The experimental results showed that the proposed method worked properly, even when faced with anomalies along roads, such as obstructions caused by shadows and trees.
Resumo:
Relevance feedback approaches have been established as an important tool for interactive search, enabling users to express their needs. However, in view of the growth of multimedia collections available, the user efforts required by these methods tend to increase as well, demanding approaches for reducing the need of user interactions. In this context, this paper proposes a semi-supervised learning algorithm for relevance feedback to be used in image retrieval tasks. The proposed semi-supervised algorithm aims at using both supervised and unsupervised approaches simultaneously. While a supervised step is performed using the information collected from the user feedback, an unsupervised step exploits the intrinsic dataset structure, which is represented in terms of ranked lists of images. Several experiments were conducted for different image retrieval tasks involving shape, color, and texture descriptors and different datasets. The proposed approach was also evaluated on multimodal retrieval tasks, considering visual and textual descriptors. Experimental results demonstrate the effectiveness of the proposed approach.
Resumo:
The visual identity is based on a semantic relationship of several signs that make up a coherent system. A bimédia language formed by text and image complement to create an understandable message. This study aims the use of non-verbal communication in the corporate visual identity design project, contextualizing the role of the designer as mediator for informational corporate message to their audiences.
Resumo:
This paper presents an optimum user-steered boundary tracking approach for image segmentation, which simulates the behavior of water flowing through a riverbed. The riverbed approach was devised using the image foresting transform with a never-exploited connectivity function. We analyze its properties in the derived image graphs and discuss its theoretical relation with other popular methods such as live wire and graph cuts. Several experiments show that riverbed can significantly reduce the number of user interactions (anchor points), as compared to live wire for objects with complex shapes. This paper also includes a discussion about how to combine different methods in order to take advantage of their complementary strengths.
Resumo:
Dimensionality reduction is employed for visual data analysis as a way to obtaining reduced spaces for high dimensional data or to mapping data directly into 2D or 3D spaces. Although techniques have evolved to improve data segregation on reduced or visual spaces, they have limited capabilities for adjusting the results according to user's knowledge. In this paper, we propose a novel approach to handling both dimensionality reduction and visualization of high dimensional data, taking into account user's input. It employs Partial Least Squares (PLS), a statistical tool to perform retrieval of latent spaces focusing on the discriminability of the data. The method employs a training set for building a highly precise model that can then be applied to a much larger data set very effectively. The reduced data set can be exhibited using various existing visualization techniques. The training data is important to code user's knowledge into the loop. However, this work also devises a strategy for calculating PLS reduced spaces when no training data is available. The approach produces increasingly precise visual mappings as the user feeds back his or her knowledge and is capable of working with small and unbalanced training sets.
Resumo:
The strength and durability of materials produced from aggregates (e.g., concrete bricks, concrete, and ballast) are critically affected by the weathering of the particles, which is closely related to their mineral composition. It is possible to infer the degree of weathering from visual features derived from the surface of the aggregates. By using sound pattern recognition methods, this study shows that the characterization of the visual texture of particles, performed by using texture-related features of gray scale images, allows the effective differentiation between weathered and nonweathered aggregates. The selection of the most discriminative features is also performed by taking into account a feature ranking method. The evaluation of the methodology in the presence of noise suggests that it can be used in stone quarries for automatic detection of weathered materials.
Resumo:
During the last few years, several methods have been proposed in order to study and to evaluate characteristic properties of the human skin by using non-invasive approaches. Mostly, these methods cover aspects related to either dermatology, to analyze skin physiology and to evaluate the effectiveness of medical treatments in skin diseases, or dermocosmetics and cosmetic science to evaluate, for example, the effectiveness of anti-aging treatments. To these purposes a routine approach must be followed. Although very accurate and high resolution measurements can be achieved by using conventional methods, such as optical or mechanical profilometry for example, their use is quite limited primarily to the high cost of the instrumentation required, which in turn is usually cumbersome, highlighting some of the limitations for a routine based analysis. This thesis aims to investigate the feasibility of a noninvasive skin characterization system based on the analysis of capacitive images of the skin surface. The system relies on a CMOS portable capacitive device which gives 50 micron/pixel resolution capacitance map of the skin micro-relief. In order to extract characteristic features of the skin topography, image analysis techniques, such as watershed segmentation and wavelet analysis, have been used to detect the main structures of interest: wrinkles and plateau of the typical micro-relief pattern. In order to validate the method, the features extracted from a dataset of skin capacitive images acquired during dermatological examinations of a healthy group of volunteers have been compared with the age of the subjects involved, showing good correlation with the skin ageing effect. Detailed analysis of the output of the capacitive sensor compared with optical profilometry of silicone replica of the same skin area has revealed potentiality and some limitations of this technology. Also, applications to follow-up studies, as needed to objectively evaluate the effectiveness of treatments in a routine manner, are discussed.
Resumo:
A single picture provides a largely incomplete representation of the scene one is looking at. Usually it reproduces only a limited spatial portion of the scene according to the standpoint and the viewing angle, besides it contains only instantaneous information. Thus very little can be understood on the geometrical structure of the scene, the position and orientation of the observer with respect to it remaining also hard to guess. When multiple views, taken from different positions in space and time, observe the same scene, then a much deeper knowledge is potentially achievable. Understanding inter-views relations enables construction of a collective representation by fusing the information contained in every single image. Visual reconstruction methods confront with the formidable, and still unanswered, challenge of delivering a comprehensive representation of structure, motion and appearance of a scene from visual information. Multi-view visual reconstruction deals with the inference of relations among multiple views and the exploitation of revealed connections to attain the best possible representation. This thesis investigates novel methods and applications in the field of visual reconstruction from multiple views. Three main threads of research have been pursued: dense geometric reconstruction, camera pose reconstruction, sparse geometric reconstruction of deformable surfaces. Dense geometric reconstruction aims at delivering the appearance of a scene at every single point. The construction of a large panoramic image from a set of traditional pictures has been extensively studied in the context of image mosaicing techniques. An original algorithm for sequential registration suitable for real-time applications has been conceived. The integration of the algorithm into a visual surveillance system has lead to robust and efficient motion detection with Pan-Tilt-Zoom cameras. Moreover, an evaluation methodology for quantitatively assessing and comparing image mosaicing algorithms has been devised and made available to the community. Camera pose reconstruction deals with the recovery of the camera trajectory across an image sequence. A novel mosaic-based pose reconstruction algorithm has been conceived that exploit image-mosaics and traditional pose estimation algorithms to deliver more accurate estimates. An innovative markerless vision-based human-machine interface has also been proposed, so as to allow a user to interact with a gaming applications by moving a hand held consumer grade camera in unstructured environments. Finally, sparse geometric reconstruction refers to the computation of the coarse geometry of an object at few preset points. In this thesis, an innovative shape reconstruction algorithm for deformable objects has been designed. A cooperation with the Solar Impulse project allowed to deploy the algorithm in a very challenging real-world scenario, i.e. the accurate measurements of airplane wings deformations.
Resumo:
Images of a scene, static or dynamic, are generally acquired at different epochs from different viewpoints. They potentially gather information about the whole scene and its relative motion with respect to the acquisition device. Data from different (in the spatial or temporal domain) visual sources can be fused together to provide a unique consistent representation of the whole scene, even recovering the third dimension, permitting a more complete understanding of the scene content. Moreover, the pose of the acquisition device can be achieved by estimating the relative motion parameters linking different views, thus providing localization information for automatic guidance purposes. Image registration is based on the use of pattern recognition techniques to match among corresponding parts of different views of the acquired scene. Depending on hypotheses or prior information about the sensor model, the motion model and/or the scene model, this information can be used to estimate global or local geometrical mapping functions between different images or different parts of them. These mapping functions contain relative motion parameters between the scene and the sensor(s) and can be used to integrate accordingly informations coming from the different sources to build a wider or even augmented representation of the scene. Accordingly, for their scene reconstruction and pose estimation capabilities, nowadays image registration techniques from multiple views are increasingly stirring up the interest of the scientific and industrial community. Depending on the applicative domain, accuracy, robustness, and computational payload of the algorithms represent important issues to be addressed and generally a trade-off among them has to be reached. Moreover, on-line performance is desirable in order to guarantee the direct interaction of the vision device with human actors or control systems. This thesis follows a general research approach to cope with these issues, almost independently from the scene content, under the constraint of rigid motions. This approach has been motivated by the portability to very different domains as a very desirable property to achieve. A general image registration approach suitable for on-line applications has been devised and assessed through two challenging case studies in different applicative domains. The first case study regards scene reconstruction through on-line mosaicing of optical microscopy cell images acquired with non automated equipment, while moving manually the microscope holder. By registering the images the field of view of the microscope can be widened, preserving the resolution while reconstructing the whole cell culture and permitting the microscopist to interactively explore the cell culture. In the second case study, the registration of terrestrial satellite images acquired by a camera integral with the satellite is utilized to estimate its three-dimensional orientation from visual data, for automatic guidance purposes. Critical aspects of these applications are emphasized and the choices adopted are motivated accordingly. Results are discussed in view of promising future developments.
Resumo:
La radioterapia guidata da immagini (IGRT), grazie alle ripetute verifiche della posizione del paziente e della localizzazione del volume bersaglio, si è recentemente affermata come nuovo paradigma nella radioterapia, avendo migliorato radicalmente l’accuratezza nella somministrazione di dose a scopo terapeutico. Una promettente tecnica nel campo dell’IGRT è rappresentata dalla tomografia computerizzata a fascio conico (CBCT). La CBCT a kilovoltaggio, consente di fornire un’accurata mappatura tridimensionale dell’anatomia del paziente, in fase di pianificazione del trattamento e a ogni frazione del medisimo. Tuttavia, la dose da imaging attribuibile alle ripetute scansioni è diventata, negli ultimi anni, oggetto di una crescente preoccupazione nel contesto clinico. Lo scopo di questo lavoro è di valutare quantitativamente la dose addizionale somministrata da CBCT a kilovoltaggio, con riferimento a tre tipici protocolli di scansione per Varian OnBoard Imaging Systems (OBI, Palo Alto, California). A questo scopo sono state condotte simulazioni con codici Monte Carlo per il calcolo della dose, utilizzando il pacchetto gCTD, sviluppato sull’architettura della scheda grafica. L’utilizzo della GPU per sistemi server di calcolo ha permesso di raggiungere alte efficienze computazionali, accelerando le simulazioni Monte Carlo fino a raggiungere tempi di calcolo di ~1 min per un caso tipico. Inizialmente sono state condotte misure sperimentali di dose su un fantoccio d’acqua. I parametri necessari per la modellazione della sorgente di raggi X nel codice gCTD sono stati ottenuti attraverso un processo di validazione del codice al fine di accordare i valori di dose simulati in acqua con le misure nel fantoccio. Lo studio si concentra su cinquanta pazienti sottoposti a cicli di radioterapia a intensità modulata (IMRT). Venticinque pazienti con tumore al cervello sono utilizzati per studiare la dose nel protocollo standard-dose head e venticinque pazienti con tumore alla prostata sono selezionati per studiare la dose nei protocolli pelvis e pelvis spotlight. La dose media a ogni organo è calcolata. La dose media al 2% dei voxels con i valori più alti di dose è inoltre computata per ogni organo, al fine di caratterizzare l’omogeneità spaziale della distribuzione.
Resumo:
Generic object recognition is an important function of the human visual system and everybody finds it highly useful in their everyday life. For an artificial vision system it is a really hard, complex and challenging task because instances of the same object category can generate very different images, depending of different variables such as illumination conditions, the pose of an object, the viewpoint of the camera, partial occlusions, and unrelated background clutter. The purpose of this thesis is to develop a system that is able to classify objects in 2D images based on the context, and identify to which category the object belongs to. Given an image, the system can classify it and decide the correct categorie of the object. Furthermore the objective of this thesis is also to test the performance and the precision of different supervised Machine Learning algorithms in this specific task of object image categorization. Through different experiments the implemented application reveals good categorization performances despite the difficulty of the problem. However this project is open to future improvement; it is possible to implement new algorithms that has not been invented yet or using other techniques to extract features to make the system more reliable. This application can be installed inside an embedded system and after trained (performed outside the system), so it can become able to classify objects in a real-time. The information given from a 3D stereocamera, developed inside the department of Computer Engineering of the University of Bologna, can be used to improve the accuracy of the classification task. The idea is to segment a single object in a scene using the depth given from a stereocamera and in this way make the classification more accurate.
Resumo:
Craniosynostosis consists of a premature fusion of the sutures in an infant skull, which restricts the skull and brain growth. During the last decades there has been a rapid increase of fundamentally diverse surgical treatment methods. At present, the surgical outcome has been assessed using global variables such as cephalic index, head circumerence and intracranial volume. However, the variables have failed in describing the local deformations and morphological changes, which are proposed to more likely induce neurological disorders.
Resumo:
Non-linear image registration is an important tool in many areas of image analysis. For instance, in morphometric studies of a population of brains, free-form deformations between images are analyzed to describe the structural anatomical variability. Such a simple deformation model is justified by the absence of an easy expressible prior about the shape changes. Applying the same algorithms used in brain imaging to orthopedic images might not be optimal due to the difference in the underlying prior on the inter-subject deformations. In particular, using an un-informed deformation prior often leads to local minima far from the expected solution. To improve robustness and promote anatomically meaningful deformations, we propose a locally affine and geometry-aware registration algorithm that automatically adapts to the data. We build upon the log-domain demons algorithm and introduce a new type of OBBTree-based regularization in the registration with a natural multiscale structure. The regularization model is composed of a hierarchy of locally affine transformations via their logarithms. Experiments on mandibles show improved accuracy and robustness when used to initialize the demons, and even similar performance by direct comparison to the demons, with a significantly lower degree of freedom. This closes the gap between polyaffine and non-rigid registration and opens new ways to statistically analyze the registration results.