983 resultados para Camera Pose Estimation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present a convolutional neuralnetwork (CNN)-based model for human head pose estimation inlow-resolution multi-modal RGB-D data. We pose the problemas one of classification of human gazing direction. We furtherfine-tune a regressor based on the learned deep classifier. Next wecombine the two models (classification and regression) to estimateapproximate regression confidence. We present state-of-the-artresults in datasets that span the range of high-resolution humanrobot interaction (close up faces plus depth information) data tochallenging low resolution outdoor surveillance data. We buildupon our robust head-pose estimation and further introduce anew visual attention model to recover interaction with theenvironment. Using this probabilistic model, we show thatmany higher level scene understanding like human-human/sceneinteraction detection can be achieved. Our solution runs inreal-time on commercial hardware

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, depth cameras have been widely utilized in camera tracking for augmented and mixed reality. Many of the studies focus on the methods that generate the reference model simultaneously with the tracking and allow operation in unprepared environments. However, methods that rely on predefined CAD models have their advantages. In such methods, the measurement errors are not accumulated to the model, they are tolerant to inaccurate initialization, and the tracking is always performed directly in reference model's coordinate system. In this paper, we present a method for tracking a depth camera with existing CAD models and the Iterative Closest Point (ICP) algorithm. In our approach, we render the CAD model using the latest pose estimate and construct a point cloud from the corresponding depth map. We construct another point cloud from currently captured depth frame, and find the incremental change in the camera pose by aligning the point clouds. We utilize a GPGPU-based implementation of the ICP which efficiently uses all the depth data in the process. The method runs in real-time, it is robust for outliers, and it does not require any preprocessing of the CAD models. We evaluated the approach using the Kinect depth sensor, and compared the results to a 2D edge-based method, to a depth-based SLAM method, and to the ground truth. The results show that the approach is more stable compared to the edge-based method and it suffers less from drift compared to the depth-based SLAM.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper demonstrates the application of a robust form of pose estimation and scene reconstruction using data from camera images. We demonstrate results that suggest the ability of the algorithm to rival methods of RANSAC based pose estimation polished by bundle adjustment in terms of solution robustness, speed and accuracy, even when given poor initialisations. Our simulated results show the behaviour of the algorithm in a number of novel simulated scenarios reflective of real world cases that show the ability of the algorithm to handle large observation noise and difficult reconstruction scenes. These results have a number of implications for the vision and robotics community, and show that the application of visual motion estimation on robotic platforms in an online fashion is approaching real-world feasibility.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Next-generation autonomous underwater vehicles (AUVs) will be required to robustly identify underwater targets for tasks such as inspection, localization, and docking. Given their often unstructured operating environments, vision offers enormous potential in underwater navigation over more traditional methods; however, reliable target segmentation often plagues these systems. This paper addresses robust vision-based target recognition by presenting a novel scale and rotationally invariant target design and recognition routine based on self-similar landmarks that enables robust target pose estimation with respect to a single camera. These algorithms are applied to an AUV with controllers developed for vision-based docking with the target. Experimental results show that the system performs exceptionally on limited processing power and demonstrates how the combined vision and controller system enables robust target identification and docking in a variety of operating conditions.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The ability to measure surface temperature and represent it on a metrically accurate 3D model has proven applications in many areas such as medical imaging, building energy auditing, and search and rescue. A system is proposed that enables this task to be performed with a handheld sensor, and for the first time with results able to be visualized and analyzed in real-time. A device comprising a thermal-infrared camera and range sensor is calibrated geometrically and used for data capture. The device is localized using a combination of ICP and video-based pose estimation from the thermal-infrared video footage which is shown to reduce the occurrence of failure modes. Furthermore, the problem of misregistration which can introduce severe distortions in assigned surface temperatures is avoided through the use of a risk-averse neighborhood weighting mechanism. Results demonstrate that the system is more stable and accurate than previous approaches, and can be used to accurately model complex objects and environments for practical tasks.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We propose and evaluate a novel methodology to identify the rolling shutter parameters of a real camera. We also present a model for the geometric distortion introduced when a moving camera with a rolling shutter views a scene. Unlike previous work this model allows for arbitrary camera motion, including accelerations, is exact rather than a linearization and allows for arbitrary camera projection models, for example fisheye or panoramic. We show the significance of the errors introduced by a rolling shutter for typical robot vision problems such as structure from motion, visual odometry and pose estimation.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Multi-view head-pose estimation in low-resolution, dynamic scenes is difficult due to blurred facial appearance and perspective changes as targets move around freely in the environment. Under these conditions, acquiring sufficient training examples to learn the dynamic relationship between position, face appearance and head-pose can be very expensive. Instead, a transfer learning approach is proposed in this work. Upon learning a weighted-distance function from many examples where the target position is fixed, we adapt these weights to the scenario where target positions are varying. The adaptation framework incorporates reliability of the different face regions for pose estimation under positional variation, by transforming the target appearance to a canonical appearance corresponding to a reference scene location. Experimental results confirm effectiveness of the proposed approach, which outperforms state-of-the-art by 9.5% under relevant conditions. To aid further research on this topic, we also make DPOSE- a dynamic, multi-view head-pose dataset with ground-truth publicly available with this paper.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The utility of canonical correlation analysis (CCA) for domain adaptation (DA) in the context of multi-view head pose estimation is examined in this work. We consider the three problems studied in 1], where different DA approaches are explored to transfer head pose-related knowledge from an extensively labeled source dataset to a sparsely labeled target set, whose attributes are vastly different from the source. CCA is found to benefit DA for all the three problems, and the use of a covariance profile-based diagonality score (DS) also improves classification performance with respect to a nearest neighbor (NN) classifier.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This thesis presents a novel framework for state estimation in the context of robotic grasping and manipulation. The overall estimation approach is based on fusing various visual cues for manipulator tracking, namely appearance and feature-based, shape-based, and silhouette-based visual cues. Similarly, a framework is developed to fuse the above visual cues, but also kinesthetic cues such as force-torque and tactile measurements, for in-hand object pose estimation. The cues are extracted from multiple sensor modalities and are fused in a variety of Kalman filters.

A hybrid estimator is developed to estimate both a continuous state (robot and object states) and discrete states, called contact modes, which specify how each finger contacts a particular object surface. A static multiple model estimator is used to compute and maintain this mode probability. The thesis also develops an estimation framework for estimating model parameters associated with object grasping. Dual and joint state-parameter estimation is explored for parameter estimation of a grasped object's mass and center of mass. Experimental results demonstrate simultaneous object localization and center of mass estimation.

Dual-arm estimation is developed for two arm robotic manipulation tasks. Two types of filters are explored; the first is an augmented filter that contains both arms in the state vector while the second runs two filters in parallel, one for each arm. These two frameworks and their performance is compared in a dual-arm task of removing a wheel from a hub.

This thesis also presents a new method for action selection involving touch. This next best touch method selects an available action for interacting with an object that will gain the most information. The algorithm employs information theory to compute an information gain metric that is based on a probabilistic belief suitable for the task. An estimation framework is used to maintain this belief over time. Kinesthetic measurements such as contact and tactile measurements are used to update the state belief after every interactive action. Simulation and experimental results are demonstrated using next best touch for object localization, specifically a door handle on a door. The next best touch theory is extended for model parameter determination. Since many objects within a particular object category share the same rough shape, principle component analysis may be used to parametrize the object mesh models. These parameters can be estimated using the action selection technique that selects the touching action which best both localizes and estimates these parameters. Simulation results are then presented involving localizing and determining a parameter of a screwdriver.

Lastly, the next best touch theory is further extended to model classes. Instead of estimating parameters, object class determination is incorporated into the information gain metric calculation. The best touching action is selected in order to best discern between the possible model classes. Simulation results are presented to validate the theory.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The commercial far-range (>10m) infrastructure spatial data collection methods are not completely automated. They need significant amount of manual post-processing work and in some cases, the equipment costs are significant. This paper presents a method that is the first step of a stereo videogrammetric framework and holds the promise to address these issues. Under this method, video streams are initially collected from a calibrated set of two video cameras. For each pair of simultaneous video frames, visual feature points are detected and their spatial coordinates are then computed. The result, in the form of a sparse 3D point cloud, is the basis for the next steps in the framework (i.e., camera motion estimation and dense 3D reconstruction). A set of data, collected from an ongoing infrastructure project, is used to show the merits of the method. Comparison with existing tools is also shown, to indicate the performance differences of the proposed method in the level of automation and the accuracy of results.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

在工业、航天、医疗等许多领域中,经常需要测量两个空间物体坐标系间的相对位姿。位姿测量方法一般包括声纳或激光测距、GPS、视觉方法等多种方法,其中视觉方法由于其信息量大、处理速度快等特点发挥了越来越重要的作用。在基于模型的单目视觉位姿测量中,位姿测量系统的测量精度与摄像机标定误差、图像坐标的检测误差、目标模型的测量误差等许多因素有关,而摄像机标定误差是影响位姿测量精度的一个主要误差来源,是测量系统内在的不可避免的误差。 实际应用中,提高整个系统的测量精度是三维视觉测量的一项重要任务,而摄像机标定精度的提高将有效地提高位姿测量精度。针对传统的摄像机标定方法在实际工程应用中存在的问题,以提高基于模型的单目视觉位姿测量系统的测量精度为目标,利用理论推导和仿真实验相结合的方法,首先通过从几何意义求解摄像机内外参数,深刻理解摄像机各参数的物理意义;然后在此基础上,重点研究摄像机标定空间的选择、标定参数误差与位姿测量误差之间以及标定参数误差与测量空间之间的关系分析;最后在理论研究的基础上,针对工程应用提出摄像机标定的相应策略,以提高整个系统的测量精度。 首先,在基于几何意义的摄像机内外参数求解方法研究中,从摄像机投影变换矩阵出发,通过几何思路求解摄像机内外参数;同时在此求解过程中,从几何意义推导得到描述透视投影变换的 矩阵必须满足的约束条件。这对深刻理解摄像机各参数的物理意义及其几何关系,对从直观几何方面分析摄像机参数对位姿精度的影响有着一定的意义。 其次,针对标定空间对位姿测量精度的影响,首先推导出标定空间与标定参数误差之间的关系,在此基础上,再给出标定参数误差和位姿测量结果误差之间的关系。实验结果表明:不管测试目标的成像范围如何,在满视场范围内标定时标定误差最小,从而可以得到更高的位姿测量精度。研究结果可为实际工程应用中测量系统的摄像机标定提供理论依据,对标定点的布置和摆放具有指导意义。 最后,在摄像机标定参数误差对位姿测量精度的影响研究中,通过理论推导结合仿真实验及几何解释说明,使用误差传播的方法,建立了位姿测量误差受摄像机标定参数误差影响的数学模型,分析了摄像机标定参数误差与位姿测量误差的关系,得到如下结论:测量距离方向位置精度主要受焦比误差和摄像机外参数光轴方向平移量误差的影响;姿态角精度主要受主点误差和摄像机外参数旋转角误差的影响。在标定参数误差与测量空间之间的关系方面,摄像机内外参数的误差在不同测量距离有着不同的影响。近距离时,应主要考虑外参数光轴方向平移量误差的影响;而远距离时,应主要考虑内参数焦比误差的影响。该结论可以指导摄像机标定方法的选择,根据不同的情况采用不同的算法以满足实际精度需求,从而达到优化系统设计,提高整个定位系统性能的目的,对于指导视觉位姿测量系统的工程应用具有一定的意义。

Relevância:

90.00% 90.00%

Publicador:

Resumo:

基于旋转体的摄像机定位是单目合作目标定位领域中的涉及较少并且较为困难的一个问题,传统的基于点基元、直线基元及曲线基元的定位方法在用于旋转体定位过程中都存在相应的问题.文中设计了一种由4个相切椭圆构成的几何模型,该模型环绕于圆柱体表面,利用二次曲线的投影仍然是二次曲线的特性和椭圆的相应性质能够得到唯一确定模型位置的3个坐标点,从而将旋转体定位问题转化为P3P问题.在对P3P的解模式区域进行分析后,推导了根据模型上可视曲线的弯曲情况来确定P3P问题解模式的判别方法,并给出证明过程.仿真实验表明了这种模型定位方法的有效性.最后利用这个模型引导机械手完成目标定位的实验.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

介绍了基于模型的位姿估计中所使用的一些优化方法。为了提高位姿估计的精度,摄像机的标定参数必须足够精确,这就对标定过程的非线性优化算法提出了很高的要求,采用了一种新的优化目标函数,用来最小化控制点间的三维重建误差,从而使标定参数是全局最优;在双像机位姿估计中,引入了实时遗传算法进行全局搜索,加快了算法的收敛速度。最后的实验证明了这些方法的正确性并显示出这些方法在精度上比传统方法有了较大程度的提高。

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A method is proposed that can generate a ranked list of plausible three-dimensional hand configurations that best match an input image. Hand pose estimation is formulated as an image database indexing problem, where the closest matches for an input hand image are retrieved from a large database of synthetic hand images. In contrast to previous approaches, the system can function in the presence of clutter, thanks to two novel clutter-tolerant indexing methods. First, a computationally efficient approximation of the image-to-model chamfer distance is obtained by embedding binary edge images into a high-dimensional Euclide an space. Second, a general-purpose, probabilistic line matching method identifies those line segment correspondences between model and input images that are the least likely to have occurred by chance. The performance of this clutter-tolerant approach is demonstrated in quantitative experiments with hundreds of real hand images.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

When developing software for autonomous mobile robots, one has to inevitably tackle some kind of perception. Moreover, when dealing with agents that possess some level of reasoning for executing their actions, there is the need to model the environment and the robot internal state in a way that it represents the scenario in which the robot operates. Inserted in the ATRI group, part of the IEETA research unit at Aveiro University, this work uses two of the projects of the group as test bed, particularly in the scenario of robotic soccer with real robots. With the main objective of developing algorithms for sensor and information fusion that could be used e ectively on these teams, several state of the art approaches were studied, implemented and adapted to each of the robot types. Within the MSL RoboCup team CAMBADA, the main focus was the perception of ball and obstacles, with the creation of models capable of providing extended information so that the reasoning of the robot can be ever more e ective. To achieve it, several methodologies were analyzed, implemented, compared and improved. Concerning the ball, an analysis of ltering methodologies for stabilization of its position and estimation of its velocity was performed. Also, with the goal keeper in mind, work has been done to provide it with information of aerial balls. As for obstacles, a new de nition of the way they are perceived by the vision and the type of information provided was created, as well as a methodology for identifying which of the obstacles are team mates. Also, a tracking algorithm was developed, which ultimately assigned each of the obstacles a unique identi er. Associated with the improvement of the obstacles perception, a new algorithm of estimating reactive obstacle avoidance was created. In the context of the SPL RoboCup team Portuguese Team, besides the inevitable adaptation of many of the algorithms already developed for sensor and information fusion and considering that it was recently created, the objective was to create a sustainable software architecture that could be the base for future modular development. The software architecture created is based on a series of di erent processes and the means of communication among them. All processes were created or adapted for the new architecture and a base set of roles and behaviors was de ned during this work to achieve a base functional framework. In terms of perception, the main focus was to de ne a projection model and camera pose extraction that could provide information in metric coordinates. The second main objective was to adapt the CAMBADA localization algorithm to work on the NAO robots, considering all the limitations it presents when comparing to the MSL team, especially in terms of computational resources. A set of support tools were developed or improved in order to support the test and development in both teams. In general, the work developed during this thesis improved the performance of the teams during play and also the e ectiveness of the developers team when in development and test phases.