970 resultados para Vision 3D
Resumo:
Tracking of project related entities such as construction equipment, materials, and personnel is used to calculate productivity, detect travel path conflicts, enhance the safety on the site, and monitor the project. Radio frequency tracking technologies (Wi-Fi, RFID, UWB) and GPS are commonly used for this purpose. However, on large-scale sites, deploying, maintaining and removing such systems can be costly and time-consuming. In addition, privacy issues with personnel tracking often limits the usability of these technologies on construction sites. This paper presents a vision based tracking framework that holds promise to address these limitations. The framework uses videos from a set of two or more static cameras placed on construction sites. In each camera view, the framework identifies and tracks construction entities providing 2D image coordinates across frames. Combining the 2D coordinates based on the installed camera system (the distance between the cameras and the view angles of them), 3D coordinates are calculated at each frame. The results of each step are presented to illustrate the feasibility of the framework.
Resumo:
Tracking applications provide real time on-site information that can be used to detect travel path conflicts, calculate crew productivity and eliminate unnecessary processes at the site. This paper presents the validation of a novel vision based tracking methodology at the Egnatia Odos Motorway in Thessaloniki, Greece. Egnatia Odos is a motorway that connects Turkey with Italy through Greece. Its multiple open construction sites serves as an ideal multi-site test bed for validating construction site tracking methods. The vision based tracking methodology uses video cameras and computer algorithms to calculate the 3D position of project related entities (e.g. personnel, materials and equipment) in construction sites. The approach provides an unobtrusive, inexpensive way of effectively identifying and tracking the 3D location of entities. The process followed in this study starts by acquiring video data from multiple synchronous cameras at several large scale project sites of Egnatia Odos, such as tunnels, interchanges and bridges under construction. Subsequent steps include the evaluation of the collected data and finally, performing the 3D tracking operations on selected entities (heavy equipment and personnel). The accuracy and precision of the method's results is evaluated by comparing it with the actual 3D position of the object, thus assessing the 3D tracking method's effectiveness.
Resumo:
The existing machine vision-based 3D reconstruction software programs provide a promising low-cost and in some cases automatic solution for infrastructure as-built documentation. However in several steps of the reconstruction process, they only rely on detecting and matching corner-like features in multiple views of a scene. Therefore, in infrastructure scenes which include uniform materials and poorly textured surfaces, these programs fail with high probabilities due to lack of feature points. Moreover, except few programs that generate dense 3D models through significantly time-consuming algorithms, most of them only provide a sparse reconstruction which does not necessarily include required points such as corners or edges; hence these points have to be manually matched across different views that could make the process considerably laborious. To address these limitations, this paper presents a video-based as-built documentation method that automatically builds detailed 3D maps of a scene by aligning edge points between video frames. Compared to corner-like features, edge points are far more plentiful even in untextured scenes and often carry important semantic associations. The method has been tested for poorly textured infrastructure scenes and the results indicate that a combination of edge and corner-like features would allow dealing with a broader range of scenes.
Resumo:
Most of the existing automated machine vision-based techniques for as-built documentation of civil infrastructure utilize only point features to recover the 3D structure of a scene. However it is often the case in man-made structures that not enough point features can be reliably detected (e.g. buildings and roofs); this can potentially lead to the failure of these techniques. To address the problem, this paper utilizes the prominence of straight lines in infrastructure scenes. It presents a hybrid approach that benefits from both point and line features. A calibrated stereo set of video cameras is used to collect data. Point and line features are then detected and matched across video frames. Finally, the 3D structure of the scene is recovered by finding 3D coordinates of the matched features. The proposed approach has been tested on realistic outdoor environments and preliminary results indicate its capability to deal with a variety of scenes.
Resumo:
This paper presents the first performance evaluation of interest points on scalar volumetric data. Such data encodes 3D shape, a fundamental property of objects. The use of another such property, texture (i.e. 2D surface colouration), or appearance, for object detection, recognition and registration has been well studied; 3D shape less so. However, the increasing prevalence of 3D shape acquisition techniques and the diminishing returns to be had from appearance alone have seen a surge in 3D shape-based methods. In this work, we investigate the performance of several state of the art interest points detectors in volumetric data, in terms of repeatability, number and nature of interest points. Such methods form the first step in many shape-based applications. Our detailed comparison, with both quantitative and qualitative measures on synthetic and real 3D data, both point-based and volumetric, aids readers in selecting a method suitable for their application. © 2012 Springer Science+Business Media, LLC.
Resumo:
This work addresses the challenging problem of unconstrained 3D human pose estimation (HPE) from a novel perspective. Existing approaches struggle to operate in realistic applications, mainly due to their scene-dependent priors, such as background segmentation and multi-camera network, which restrict their use in unconstrained environments. We therfore present a framework which applies action detection and 2D pose estimation techniques to infer 3D poses in an unconstrained video. Action detection offers spatiotemporal priors to 3D human pose estimation by both recognising and localising actions in space-time. Instead of holistic features, e.g. silhouettes, we leverage the flexibility of deformable part model to detect 2D body parts as a feature to estimate 3D poses. A new unconstrained pose dataset has been collected to justify the feasibility of our method, which demonstrated promising results, significantly outperforming the relevant state-of-the-arts. © 2013 IEEE.
Resumo:
Large concrete structures need to be inspected in order to assess their current physical and functional state, to predict future conditions, to support investment planning and decision making, and to allocate limited maintenance and rehabilitation resources. Current procedures in condition and safety assessment of large concrete structures are performed manually leading to subjective and unreliable results, costly and time-consuming data collection, and safety issues. To address these limitations, automated machine vision-based inspection procedures have increasingly been proposed by the research community. This paper presents current achievements and open challenges in vision-based inspection of large concrete structures. First, the general concept of Building Information Modeling is introduced. Then, vision-based 3D reconstruction and as-built spatial modeling of concrete civil infrastructure are presented. Following that, the focus is set on structural member recognition as well as on concrete damage detection and assessment exemplified for concrete columns. Although some challenges are still under investigation, it can be concluded that vision-based inspection methods have significantly improved over the last 10 years, and now, as-built spatial modeling as well as damage detection and assessment of large concrete structures have the potential to be fully automated.
Resumo:
A portable 3D laser scanning system has been designed and built for robot vision. By tilting the charge coupled device (CCD) plane of portable 3D scanning system according to the Scheimpflug condition, the depth-of-view is successfully extended from less than 40 to 100 mm. Based on the tilted camera model, the traditional two-step camera calibration method is modified by introducing the angle factor. Meanwhile, a novel segmental calibration approach, i.e., dividing the whole work range into two parts and calibrating, respectively, with corresponding system parameters, is proposed to effectively improve the measurement accuracy of the large depth-of-view 3D laser scanner. In the process of 3D reconstruction, different calibration parameters are used to transform the 2D coordinates into 3D coordinates according to the different positions of the image in the CCD plane, and the measurement accuracy of 60 mu m is obtained experimentally. Finally, the experiment of scanning a lamina by the large depth-of-view portable 3D laser scanner used by an industrial robot IRB 4400 is also employed to demonstrate the effectiveness and high measurement accuracy of our scanning system. (C) 2007 Elsevier Ltd. All rights reserved.
Resumo:
微观尺度下的观测与操作是进行微纳米科学技术研究与实现、微纳米特性发现与利用、加工制造的重要技术手段。因此微纳米操作的关键技术问题主要包括两个方面:微纳米操作的观测成像,通过成像微纳米尺度下的物体可以被观测者所感知和观测;利用感知与观测信息指导微纳米尺度下机械操作控制。深度信息在计算机视觉的研究中占有着重要的地位,它使我们更好地理解现实世界中物体的3D关系。因此,利用深度信息实现3D测量逐渐被应用于微纳米操作的观测成像领域。工作域显微图像是唯一能反映被控目标体运动和位置的反馈信息,自然对象的深度信息也只能从此中获得。虽然很难自动地从这个平面图像中获得,但根据显微镜点扩散模型的光学特点,可以构造合理的模糊度判据,实现对象深度信息恢复。本文作者以微观尺度下的3D视觉观测与可视化为应用背景,通过分析几何光学成像中的各种成像规律。建立图像的模糊度判据,并利用该判据完成了微观尺度下的3D视觉观测与可视化。主要工作包括:(1)分析光学成像的基本原理,了解光学成像过程中聚焦和离焦成像现象发生条件和描述方法;分析图像清晰/模糊程度与景物深度变化之间的关系规律,进而给出基于光学图像信息的微观景物深度测量理论依据;(2)结合序列图像的清晰/模糊程度变化规律,分析不同测度算子对于清晰/模糊程度响应的灵敏度与适应性;提出建立适宜的模糊测度算子方法。(3)基于模糊测度算子和模糊化测度分布模型,提出建立微观尺度下的显微视觉图像与实际景物的模糊度-深度关系模型的获取实验方法。设计实验系统与实验方法,完成微观3D视觉观测;(4)通过基于模糊化测度的微观景物深度信息获取研究,提出微观景物的3D重建方法,实现微观尺度下的3D重建及其可视化方法,完成实验验证。本文就微纳米技术研究中的显微成像离焦现象进行了分析,给出了建立基于模糊测度的微米尺度下离焦度与景物深度信息关系的方法;分析了不同梯度算子所具有的不同模糊测度响应;并以实验验证了利用这种模糊测度可以对微观尺度下的景物进行深度信息获取,并且利用深度信息进行3D重建。
Resumo:
Alignment is a prevalent approach for recognizing 3D objects in 2D images. A major problem with current implementations is how to robustly handle errors that propagate from uncertainties in the locations of image features. This thesis gives a technique for bounding these errors. The technique makes use of a new solution to the problem of recovering 3D pose from three matching point pairs under weak-perspective projection. Furthermore, the error bounds are used to demonstrate that using line segments for features instead of points significantly reduces the false positive rate, to the extent that alignment can remain reliable even in cluttered scenes.
Resumo:
Liu, Yonghuai. Automatic 3d free form shape matching using the graduated assignment algorithm. Pattern Recognition, vol. 38, no. 10, pp. 1615-1631, 2005.
Resumo:
A fundamental task of vision systems is to infer the state of the world given some form of visual observations. From a computational perspective, this often involves facing an ill-posed problem; e.g., information is lost via projection of the 3D world into a 2D image. Solution of an ill-posed problem requires additional information, usually provided as a model of the underlying process. It is important that the model be both computationally feasible as well as theoretically well-founded. In this thesis, a probabilistic, nonlinear supervised computational learning model is proposed: the Specialized Mappings Architecture (SMA). The SMA framework is demonstrated in a computer vision system that can estimate the articulated pose parameters of a human body or human hands, given images obtained via one or more uncalibrated cameras. The SMA consists of several specialized forward mapping functions that are estimated automatically from training data, and a possibly known feedback function. Each specialized function maps certain domains of the input space (e.g., image features) onto the output space (e.g., articulated body parameters). A probabilistic model for the architecture is first formalized. Solutions to key algorithmic problems are then derived: simultaneous learning of the specialized domains along with the mapping functions, as well as performing inference given inputs and a feedback function. The SMA employs a variant of the Expectation-Maximization algorithm and approximate inference. The approach allows the use of alternative conditional independence assumptions for learning and inference, which are derived from a forward model and a feedback model. Experimental validation of the proposed approach is conducted in the task of estimating articulated body pose from image silhouettes. Accuracy and stability of the SMA framework is tested using artificial data sets, as well as synthetic and real video sequences of human bodies and hands.
Resumo:
How does the laminar organization of cortical circuitry in areas VI and V2 give rise to 3D percepts of stratification, transparency, and neon color spreading in response to 2D pictures and 3D scenes? Psychophysical experiments have shown that such 3D percepts are sensitive to whether contiguous image regions have the same relative contrast polarity (dark-light or lightdark), yet long-range perceptual grouping is known to pool over opposite contrast polarities. The ocularity of contiguous regions is also critical for neon color spreading: Having different ocularity despite the contrast relationship that favors neon spreading blocks the spread. In addition, half visible points in a stereogram can induce near-depth transparency if the contrast relationship favors transparency in the half visible areas. It thus seems critical to have the whole contrast relationship in a monocular configuration, since splitting it between two stereogram images cancels the effect. What adaptive functions of perceptual grouping enable it to both preserve sensitivity to monocular contrast and also to pool over opposite contrasts? Aspects of cortical development, grouping, attention, perceptual learning, stereopsis and 3D planar surface perception have previously been analyzed using a 3D LAMINART model of cortical areas VI, V2, and V4. The present work consistently extends this model to show how like-polarity competition between VI simple cells in layer 4 may be combined with other LAMINART grouping mechanisms, such as cooperative pooling of opposite polarities at layer 2/3 complex cells. The model also explains how the Metelli Rules can lead to transparent percepts, how bistable transparency percepts can arise in which either surface can be perceived as transparent, and how such a transparency reversal can be facilitated by an attention shift. The like-polarity inhibition prediction is consistent with lateral masking experiments in which two f1anking Gabor patches with the same contrast polarity as the target increase the target detection threshold when they approach the target. It is also consistent with LAMINART simulations of cortical development. Other model explanations and testable predictions will also be presented.
Resumo:
A neural model is presented of how cortical areas V1, V2, and V4 interact to convert a textured 2D image into a representation of curved 3D shape. Two basic problems are solved to achieve this: (1) Patterns of spatially discrete 2D texture elements are transformed into a spatially smooth surface representation of 3D shape. (2) Changes in the statistical properties of texture elements across space induce the perceived 3D shape of this surface representation. This is achieved in the model through multiple-scale filtering of a 2D image, followed by a cooperative-competitive grouping network that coherently binds texture elements into boundary webs at the appropriate depths using a scale-to-depth map and a subsequent depth competition stage. These boundary webs then gate filling-in of surface lightness signals in order to form a smooth 3D surface percept. The model quantitatively simulates challenging psychophysical data about perception of prolate ellipsoids (Todd and Akerstrom, 1987, J. Exp. Psych., 13, 242). In particular, the model represents a high degree of 3D curvature for a certain class of images, all of whose texture elements have the same degree of optical compression, in accordance with percepts of human observers. Simulations of 3D percepts of an elliptical cylinder, a slanted plane, and a photo of a golf ball are also presented.