847 resultados para pose estimation
Resumo:
Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal.
Resumo:
This work explores the use of statistical methods in describing and estimating camera poses, as well as the information feedback loop between camera pose and object detection. Surging development in robotics and computer vision has pushed the need for algorithms that infer, understand, and utilize information about the position and orientation of the sensor platforms when observing and/or interacting with their environment.
The first contribution of this thesis is the development of a set of statistical tools for representing and estimating the uncertainty in object poses. A distribution for representing the joint uncertainty over multiple object positions and orientations is described, called the mirrored normal-Bingham distribution. This distribution generalizes both the normal distribution in Euclidean space, and the Bingham distribution on the unit hypersphere. It is shown to inherit many of the convenient properties of these special cases: it is the maximum-entropy distribution with fixed second moment, and there is a generalized Laplace approximation whose result is the mirrored normal-Bingham distribution. This distribution and approximation method are demonstrated by deriving the analytical approximation to the wrapped-normal distribution. Further, it is shown how these tools can be used to represent the uncertainty in the result of a bundle adjustment problem.
Another application of these methods is illustrated as part of a novel camera pose estimation algorithm based on object detections. The autocalibration task is formulated as a bundle adjustment problem using prior distributions over the 3D points to enforce the objects' structure and their relationship with the scene geometry. This framework is very flexible and enables the use of off-the-shelf computational tools to solve specialized autocalibration problems. Its performance is evaluated using a pedestrian detector to provide head and foot location observations, and it proves much faster and potentially more accurate than existing methods.
Finally, the information feedback loop between object detection and camera pose estimation is closed by utilizing camera pose information to improve object detection in scenarios with significant perspective warping. Methods are presented that allow the inverse perspective mapping traditionally applied to images to be applied instead to features computed from those images. For the special case of HOG-like features, which are used by many modern object detection systems, these methods are shown to provide substantial performance benefits over unadapted detectors while achieving real-time frame rates, orders of magnitude faster than comparable image warping methods.
The statistical tools and algorithms presented here are especially promising for mobile cameras, providing the ability to autocalibrate and adapt to the camera pose in real time. In addition, these methods have wide-ranging potential applications in diverse areas of computer vision, robotics, and imaging.
Resumo:
Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal.
Resumo:
In recent years, depth cameras have been widely utilized in camera tracking for augmented and mixed reality. Many of the studies focus on the methods that generate the reference model simultaneously with the tracking and allow operation in unprepared environments. However, methods that rely on predefined CAD models have their advantages. In such methods, the measurement errors are not accumulated to the model, they are tolerant to inaccurate initialization, and the tracking is always performed directly in reference model's coordinate system. In this paper, we present a method for tracking a depth camera with existing CAD models and the Iterative Closest Point (ICP) algorithm. In our approach, we render the CAD model using the latest pose estimate and construct a point cloud from the corresponding depth map. We construct another point cloud from currently captured depth frame, and find the incremental change in the camera pose by aligning the point clouds. We utilize a GPGPU-based implementation of the ICP which efficiently uses all the depth data in the process. The method runs in real-time, it is robust for outliers, and it does not require any preprocessing of the CAD models. We evaluated the approach using the Kinect depth sensor, and compared the results to a 2D edge-based method, to a depth-based SLAM method, and to the ground truth. The results show that the approach is more stable compared to the edge-based method and it suffers less from drift compared to the depth-based SLAM.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
Augmented Reality (AR) is currently gaining popularity in multiple different fields. However, the technology for AR still requires development in both hardware and software when considering industrial use. In order to create immersive AR applications, more accurate pose estimation techniques to define virtual camera location are required. The algorithms for pose estimation often require a lot of processing power, which makes robust pose estimation a difficult task when using mobile devices or designated AR tools. The difficulties are even larger in outdoor scenarios where the environment can vary a lot and is often unprepared for AR. This thesis aims to research different possibilities for creating AR applications for outdoor environments. Both hardware and software solutions are considered, but the focus is more on software. The majority of the thesis focuses on different visual pose estimation and tracking techniques for natural features. During the thesis, multiple different solutions were tested for outdoor AR. One commercial AR SDK was tested, and three different custom software solutions were developed for an Android tablet. The custom software solutions were an algorithm for combining data from magnetometer and a gyroscope, a natural feature tracker and a tracker based on panorama images. The tracker based on panorama images was implemented based on an existing scientific publication, and the presented tracker was further developed by integrating it to Unity 3D and adding a possibility for augmenting content. This thesis concludes that AR is very close to becoming a usable tool for professional use. The commercial solutions currently available are not yet ready for creating tools for professional use, but especially for different visualization tasks some custom solutions are capable of achieving a required robustness. The panorama tracker implemented in this thesis seems like a promising tool for robust pose estimation in unprepared outdoor environments.
Resumo:
Simultaneous Localization and Mapping (SLAM) is a procedure used to determine the location of a mobile vehicle in an unknown environment, while constructing a map of the unknown environment at the same time. Mobile platforms, which make use of SLAM algorithms, have industrial applications in autonomous maintenance, such as the inspection of flaws and defects in oil pipelines and storage tanks. A typical SLAM consists of four main components, namely, experimental setup (data gathering), vehicle pose estimation, feature extraction, and filtering. Feature extraction is the process of realizing significant features from the unknown environment such as corners, edges, walls, and interior features. In this work, an original feature extraction algorithm specific to distance measurements obtained through SONAR sensor data is presented. This algorithm has been constructed by combining the SONAR Salient Feature Extraction Algorithm and the Triangulation Hough Based Fusion with point-in-polygon detection. The reconstructed maps obtained through simulations and experimental data with the fusion algorithm are compared to the maps obtained with existing feature extraction algorithms. Based on the results obtained, it is suggested that the proposed algorithm can be employed as an option for data obtained from SONAR sensors in environment, where other forms of sensing are not viable. The algorithm fusion for feature extraction requires the vehicle pose estimation as an input, which is obtained from a vehicle pose estimation model. For the vehicle pose estimation, the author uses sensor integration to estimate the pose of the mobile vehicle. Different combinations of these sensors are studied (e.g., encoder, gyroscope, or encoder and gyroscope). The different sensor fusion techniques for the pose estimation are experimentally studied and compared. The vehicle pose estimation model, which produces the least amount of error, is used to generate inputs for the feature extraction algorithm fusion. In the experimental studies, two different environmental configurations are used, one without interior features and another one with two interior features. Numerical and experimental findings are discussed. Finally, the SLAM algorithm is implemented along with the algorithms for feature extraction and vehicle pose estimation. Three different cases are experimentally studied, with the floor of the environment intentionally altered to induce slipping. Results obtained for implementations with and without SLAM are compared and discussed. The present work represents a step towards the realization of autonomous inspection platforms for performing concurrent localization and mapping in harsh environments.
Resumo:
Safe collaboration between a robot and human operator forms a critical requirement for deploying a robotic system into a manufacturing and testing environment. In this dissertation, the safety requirement for is developed and implemented for the navigation system of the mobile manipulators. A methodology for human-robot co-existence through a 3d scene analysis is also investigated. The proposed approach exploits the advance in computing capability by relying on graphic processing units (GPU’s) for volumetric predictive human-robot contact checking. Apart from guaranteeing safety of operators, human-robot collaboration is also fundamental when cooperative activities are required, as in appliance test automation floor. To achieve this, a generalized hierarchical task controller scheme for collision avoidance is developed. This allows the robotic arm to safely approach and inspect the interior of the appliance without collision during the testing procedure. The unpredictable presence of the operators also forms dynamic obstacle that changes very fast, thereby requiring a quick reaction from the robot side. In this aspect, a GPU-accelarated distance field is computed to speed up reaction time to avoid collision between human operator and the robot. An automated appliance testing also involves robotized laundry loading and unloading during life cycle testing. This task involves Laundry detection, grasp pose estimation and manipulation in a container, inside the drum and during recovery grasping. A wrinkle and blob detection algorithms for grasp pose estimation are developed and grasp poses are calculated along the wrinkle and blobs to efficiently perform grasping task. By ranking the estimated laundry grasp poses according to a predefined cost function, the robotic arm attempt to grasp poses that are more comfortable from the robot kinematic side as well as collision free on the appliance side. This is achieved through appliance detection and full-model registration and collision free trajectory execution using online collision avoidance.
Resumo:
Collecting and analysing data is an important element in any field of human activity and research. Even in sports, collecting and analyzing statistical data is attracting a growing interest. Some exemplar use cases are: improvement of technical/tactical aspects for team coaches, definition of game strategies based on the opposite team play or evaluation of the performance of players. Other advantages are related to taking more precise and impartial judgment in referee decisions: a wrong decision can change the outcomes of important matches. Finally, it can be useful to provide better representations and graphic effects that make the game more engaging for the audience during the match. Nowadays it is possible to delegate this type of task to automatic software systems that can use cameras or even hardware sensors to collect images or data and process them. One of the most efficient methods to collect data is to process the video images of the sporting event through mixed techniques concerning machine learning applied to computer vision. As in other domains in which computer vision can be applied, the main tasks in sports are related to object detection, player tracking, and to the pose estimation of athletes. The goal of the present thesis is to apply different models of CNNs to analyze volleyball matches. Starting from video frames of a volleyball match, we reproduce a bird's eye view of the playing court where all the players are projected, reporting also for each player the type of action she/he is performing.
Resumo:
Voilà è un servizio dedicato ad allenatori e atleti di ginnastica ritmica. Questa disciplina è uno sport estetico che, idealmente, si pone tra la danza e la ginnastica artistica. In Italia è considerato uno sport minore, nonostante i grandi risultati in campo internazionale. Il progetto si sviluppa in risposta alle logiche di valutazione e alle richieste tecniche della ginnastica ritmica, con l'obiettivo di aumentare nei ginnasti la sicurezza delle proprie capacità, attraverso la scoperta e la costruzione di un’immagine corporea positiva. Ciò servirà a perfezionare il gesto tecnico che attraverso voilà può anche essere monitorato per superare l’eterna dicotomia tra risultati attuali e futuribilità. Non mancheranno strumenti di supporto all’allenatore dal punto di vista organizzativo, nella gestione degli errori e nella relazione con i propri atleti, con i quali è possibile condividere gli obiettivi e tenere traccia del raggiungimento degli stessi. Tutto questo avviene attraverso strumenti di registrazione e analisi video, grazie anche a un algoritmo di intelligenza artificiale. Questo servizio si impegna inoltre a colmare l’attuale frammentarietà delle fonti di informazione con cui si scontrano gli addetti ai lavori quasi quotidianamente: la condivisione o raccolta dei contenuti sui social e la successiva analisi fa ormai parte del lavoro di un buon tecnico, ma la chiusura del sito beatricevivaldi.com lascia un vuoto nel mondo della ginnastica ritmica, aprendo la strada a diverse possibilità progettuali. Infine, grazie a un sistema di gamification, possibile in un contesto competitivo come questo, l’esperienza di voilà coinvolgerà gli utenti in eventi e stage, contribuendo alla crescita omogenea di un movimento costituito da tecnici preparati e ginnasti motivati.
Resumo:
Miniaturized flying robotic platforms, called nano-drones, have the potential to revolutionize the autonomous robots industry sector thanks to their very small form factor. The nano-drones’ limited payload only allows for a sub-100mW microcontroller unit for the on-board computations. Therefore, traditional computer vision and control algorithms are too computationally expensive to be executed on board these palm-sized robots, and we are forced to rely on artificial intelligence to trade off accuracy in favor of lightweight pipelines for autonomous tasks. However, relying on deep learning exposes us to the problem of generalization since the deployment scenario of a convolutional neural network (CNN) is often composed by different visual cues and different features from those learned during training, leading to poor inference performances. Our objective is to develop and deploy and adaptation algorithm, based on the concept of latent replays, that would allow us to fine-tune a CNN to work in new and diverse deployment scenarios. To do so we start from an existing model for visual human pose estimation, called PULPFrontnet, which is used to identify the pose of a human subject in space through its 4 output variables, and we present the design of our novel adaptation algorithm, which features automatic data gathering and labeling and on-device deployment. We therefore showcase the ability of our algorithm to adapt PULP-Frontnet to new deployment scenarios, improving the R2 scores of the four network outputs, with respect to an unknown environment, from approximately [−0.2, 0.4, 0.0,−0.7] to [0.25, 0.45, 0.2, 0.1]. Finally we demonstrate how it is possible to fine-tune our neural network in real time (i.e., under 76 seconds), using the target parallel ultra-low power GAP 8 System-on-Chip on board the nano-drone, and we show how all adaptation operations can take place using less than 2mWh of energy, a small fraction of the available battery power.
Resumo:
The application of image-guided systems with or without support by surgical robots relies on the accuracy of the navigation process, including patient-to-image registration. The surgeon must carry out the procedure based on the information provided by the navigation system, usually without being able to verify its correctness beyond visual inspection. Misleading surrogate parameters such as the fiducial registration error are often used to describe the success of the registration process, while a lack of methods describing the effects of navigation errors, such as those caused by tracking or calibration, may prevent the application of image guidance in certain accuracy-critical interventions. During minimally invasive mastoidectomy for cochlear implantation, a direct tunnel is drilled from the outside of the mastoid to a target on the cochlea based on registration using landmarks solely on the surface of the skull. Using this methodology, it is impossible to detect if the drill is advancing in the correct direction and that injury of the facial nerve will be avoided. To overcome this problem, a tool localization method based on drilling process information is proposed. The algorithm estimates the pose of a robot-guided surgical tool during a drilling task based on the correlation of the observed axial drilling force and the heterogeneous bone density in the mastoid extracted from 3-D image data. We present here one possible implementation of this method tested on ten tunnels drilled into three human cadaver specimens where an average tool localization accuracy of 0.29 mm was observed.
Resumo:
Les préhenseurs robotiques sont largement utilisés en industrie et leur déploiement pourrait être encore plus important si ces derniers étaient plus intelligents. En leur conférant des capacités tactiles et une intelligence leur permettant d’estimer la pose d’un objet saisi, une plus vaste gamme de tâches pourraient être accomplies par les robots. Ce mémoire présente le développement d’algorithmes d’estimation de la pose d’objets saisis par un préhenseur robotique. Des algorithmes ont été développés pour trois systèmes robotisés différents, mais pour les mêmes considérations. Effectivement, pour les trois systèmes la pose est estimée uniquement à partir d’une saisie d’objet, de données tactiles et de la configuration du préhenseur. Pour chaque système, la performance atteignable pour le système minimaliste étudié est évaluée. Dans ce mémoire, les concepts généraux sur l’estimation de la pose sont d’abord exposés. Ensuite, un préhenseur plan à deux doigts comprenant deux phalanges chacun est modélisé dans un environnement de simulation et un algorithme permettant d’estimer la pose d’un objet saisi par le préhenseur est décrit. Cet algorithme est basé sur les arbres d’interprétation et l’algorithme de RANSAC. Par la suite, un système expérimental plan comprenant une phalange supplémentaire par doigt est modélisé et étudié pour le développement d’un algorithme approprié d’estimation de la pose. Les principes de ce dernier sont similaires au premier algorithme, mais les capteurs compris dans le système sont moins précis et des adaptations et améliorations ont dû être appliquées. Entre autres, les mesures des capteurs ont été mieux exploitées. Finalement, un système expérimental spatial composé de trois doigts comprenant trois phalanges chacun est étudié. Suite à la modélisation, l’algorithme développé pour ce système complexe est présenté. Des hypothèses partiellement aléatoires sont générées, complétées, puis évaluées. L’étape d’évaluation fait notamment appel à l’algorithme de Levenberg-Marquardt.
Resumo:
We present a Bayesian approach for estimating the relative frequencies of multi-single nucleotide polymorphism (SNP) haplotypes in populations of the malaria parasite Plasmodium falciparum by using microarray SNP data from human blood samples. Each sample comes from a malaria patient and contains one or several parasite clones that may genetically differ. Samples containing multiple parasite clones with different genetic markers pose a special challenge. The situation is comparable with a polyploid organism. The data from each blood sample indicates whether the parasites in the blood carry a mutant or a wildtype allele at various selected genomic positions. If both mutant and wildtype alleles are detected at a given position in a multiply infected sample, the data indicates the presence of both alleles, but the ratio is unknown. Thus, the data only partially reveals which specific combinations of genetic markers (i.e. haplotypes across the examined SNPs) occur in distinct parasite clones. In addition, SNP data may contain errors at non-negligible rates. We use a multinomial mixture model with partially missing observations to represent this data and a Markov chain Monte Carlo method to estimate the haplotype frequencies in a population. Our approach addresses both challenges, multiple infections and data errors.
Resumo:
Sensor-based robot control allows manipulation in dynamic environments with uncertainties. Vision is a versatile low-cost sensory modality, but low sample rate, high sensor delay and uncertain measurements limit its usability, especially in strongly dynamic environments. Force is a complementary sensory modality allowing accurate measurements of local object shape when a tooltip is in contact with the object. In multimodal sensor fusion, several sensors measuring different modalities are combined to give a more accurate estimate of the environment. As force and vision are fundamentally different sensory modalities not sharing a common representation, combining the information from these sensors is not straightforward. In this thesis, methods for fusing proprioception, force and vision together are proposed. Making assumptions of object shape and modeling the uncertainties of the sensors, the measurements can be fused together in an extended Kalman filter. The fusion of force and visual measurements makes it possible to estimate the pose of a moving target with an end-effector mounted moving camera at high rate and accuracy. The proposed approach takes the latency of the vision system into account explicitly, to provide high sample rate estimates. The estimates also allow a smooth transition from vision-based motion control to force control. The velocity of the end-effector can be controlled by estimating the distance to the target by vision and determining the velocity profile giving rapid approach and minimal force overshoot. Experiments with a 5-degree-of-freedom parallel hydraulic manipulator and a 6-degree-of-freedom serial manipulator show that integration of several sensor modalities can increase the accuracy of the measurements significantly.