992 resultados para Multi-view geometry


Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is great demand for easily-accessible, user-friendly dietary self-management applications. Yet accurate, fully-automatic estimation of nutritional intake using computer vision methods remains an open research problem. One key element of this problem is the volume estimation, which can be computed from 3D models obtained using multi-view geometry. The paper presents a computational system for volume estimation based on the processing of two meal images. A 3D model of the served meal is reconstructed using the acquired images and the volume is computed from the shape. The algorithm was tested on food models (dummy foods) with known volume and on real served food. Volume accuracy was in the order of 90 %, while the total execution time was below 15 seconds per image pair. The proposed system combines simple and computational affordable methods for 3D reconstruction, remained stable throughout the experiments, operates in near real time, and places minimum constraints on users.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The seminal multiple-view stereo benchmark evaluations from Middlebury and by Strecha et al. have played a major role in propelling the development of multi-view stereopsis (MVS) methodology. The somewhat small size and variability of these data sets, however, limit their scope and the conclusions that can be derived from them. To facilitate further development within MVS, we here present a new and varied data set consisting of 80 scenes, seen from 49 or 64 accurate camera positions. This is accompanied by accurate structured light scans for reference and evaluation. In addition all images are taken under seven different lighting conditions. As a benchmark and to validate the use of our data set for obtaining reasonable and statistically significant findings about MVS, we have applied the three state-of-the-art MVS algorithms by Campbell et al., Furukawa et al., and Tola et al. to the data set. To do this we have extended the evaluation protocol from the Middlebury evaluation, necessitated by the more complex geometry of some of our scenes. The data set and accompanying evaluation framework are made freely available online. Based on this evaluation, we are able to observe several characteristics of state-of-the-art MVS, e.g. that there is a tradeoff between the quality of the reconstructed 3D points (accuracy) and how much of an object’s surface is captured (completeness). Also, several issues that we hypothesized would challenge MVS, such as specularities and changing lighting conditions did not pose serious problems. Our study finds that the two most pressing issues for MVS are lack of texture and meshing (forming 3D points into closed triangulated surfaces).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Affect is an important feature of multimedia content and conveys valuable information for multimedia indexing and retrieval. Most existing studies for affective content analysis are limited to low-level features or mid-level representations, and are generally criticized for their incapacity to address the gap between low-level features and high-level human affective perception. The facial expressions of subjects in images carry important semantic information that can substantially influence human affective perception, but have been seldom investigated for affective classification of facial images towards practical applications. This paper presents an automatic image emotion detector (IED) for affective classification of practical (or non-laboratory) data using facial expressions, where a lot of “real-world” challenges are present, including pose, illumination, and size variations etc. The proposed method is novel, with its framework designed specifically to overcome these challenges using multi-view versions of face and fiducial point detectors, and a combination of point-based texture and geometry. Performance comparisons of several key parameters of relevant algorithms are conducted to explore the optimum parameters for high accuracy and fast computation speed. A comprehensive set of experiments with existing and new datasets, shows that the method is effective despite pose variations, fast, and appropriate for large-scale data, and as accurate as the method with state-of-the-art performance on laboratory-based data. The proposed method was also applied to affective classification of images from the British Broadcast Corporation (BBC) in a task typical for a practical application providing some valuable insights.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Many visual datasets are traditionally used to analyze the performance of different learning techniques. The evaluation is usually done within each dataset, therefore it is questionable if such results are a reliable indicator of true generalization ability. We propose here an algorithm to exploit the existing data resources when learning on a new multiclass problem. Our main idea is to identify an image representation that decomposes orthogonally into two subspaces: a part specific to each dataset, and a part generic to, and therefore shared between, all the considered source sets. This allows us to use the generic representation as un-biased reference knowledge for a novel classification task. By casting the method in the multi-view setting, we also make it possible to use different features for different databases. We call the algorithm MUST, Multitask Unaligned Shared knowledge Transfer. Through extensive experiments on five public datasets, we show that MUST consistently improves the cross-datasets generalization performance. © 2013 Springer-Verlag.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

本文以水下机器人的遥操作作业为应用背景 ,提出并实现了虚拟现实技术和视觉感知信息辅助机器人遥操作实验系统 .该系统使用了 CAD模型和立体视觉信息完成遥操作机器人及其作业环境的几何建模和运动学建模 ,实现了虚拟作业环境的生成和实时动态图形显示 .采用了基于立体视觉的虚拟环境与真实环境的一致性校正、图形图像叠加、作业体与环境位姿关系建立、基于网络的监控通讯等关键技术 .在这个实验系统中 ,操作人员可利用所生成的虚拟环境 ,在多视点、多窗口作业状态图形和图像显示帮助下 ,实时动态地进行作业观测与机器人遥操作与运动规划 ,为先进遥操作机器人系统的实现提供了经验和关键技术 .

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A method for reconstruction of 3D polygonal models from multiple views is presented. The method uses sampling techniques to construct a texture-mapped semi-regular polygonal mesh of the object in question. Given a set of views and segmentation of the object in each view, constructive solid geometry is used to build a visual hull from silhouette prisms. The resulting polygonal mesh is simplified and subdivided to produce a semi-regular mesh. Regions of model fit inaccuracy are found by projecting the reference images onto the mesh from different views. The resulting error images for each view are used to compute a probability density function, and several points are sampled from it. Along the epipolar lines corresponding to these sampled points, photometric consistency is evaluated. The mesh surface is then pulled towards the regions of higher photometric consistency using free-form deformations. This sampling-based approach produces a photometrically consistent solution in much less time than possible with previous multi-view algorithms given arbitrary camera placement.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Scene flow methods estimate the three-dimensional motion field for points in the world, using multi-camera video data. Such methods combine multi-view reconstruction with motion estimation approaches. This paper describes an alternative formulation for dense scene flow estimation that provides convincing results using only two cameras by fusing stereo and optical flow estimation into a single coherent framework. To handle the aperture problems inherent in the estimation task, a multi-scale method along with a novel adaptive smoothing technique is used to gain a regularized solution. This combined approach both preserves discontinuities and prevents over-regularization-two problems commonly associated with basic multi-scale approaches. Internally, the framework generates probability distributions for optical flow and disparity. Taking into account the uncertainty in the intermediate stages allows for more reliable estimation of the 3D scene flow than standard stereo and optical flow methods allow. Experiments with synthetic and real test data demonstrate the effectiveness of the approach.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Scene flow methods estimate the three-dimensional motion field for points in the world, using multi-camera video data. Such methods combine multi-view reconstruction with motion estimation. This paper describes an alternative formulation for dense scene flow estimation that provides reliable results using only two cameras by fusing stereo and optical flow estimation into a single coherent framework. Internally, the proposed algorithm generates probability distributions for optical flow and disparity. Taking into account the uncertainty in the intermediate stages allows for more reliable estimation of the 3D scene flow than previous methods allow. To handle the aperture problems inherent in the estimation of optical flow and disparity, a multi-scale method along with a novel region-based technique is used within a regularized solution. This combined approach both preserves discontinuities and prevents over-regularization – two problems commonly associated with the basic multi-scale approaches. Experiments with synthetic and real test data demonstrate the strength of the proposed approach.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper outlines a methodology to generate a distinctive object representation offline, using short-baseline stereo fundamentals to triangulate highly descriptive object features in multiple pairs of stereo images. A group of sparse 2.5D perspective views are built and the multiple views are then fused into a single sparse 3D model using a common 3D shape registration technique. Having prior knowledge, such as the proposed sparse feature model, is useful when detecting an object and estimating its pose for real-time systems like augmented reality.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A single picture provides a largely incomplete representation of the scene one is looking at. Usually it reproduces only a limited spatial portion of the scene according to the standpoint and the viewing angle, besides it contains only instantaneous information. Thus very little can be understood on the geometrical structure of the scene, the position and orientation of the observer with respect to it remaining also hard to guess. When multiple views, taken from different positions in space and time, observe the same scene, then a much deeper knowledge is potentially achievable. Understanding inter-views relations enables construction of a collective representation by fusing the information contained in every single image. Visual reconstruction methods confront with the formidable, and still unanswered, challenge of delivering a comprehensive representation of structure, motion and appearance of a scene from visual information. Multi-view visual reconstruction deals with the inference of relations among multiple views and the exploitation of revealed connections to attain the best possible representation. This thesis investigates novel methods and applications in the field of visual reconstruction from multiple views. Three main threads of research have been pursued: dense geometric reconstruction, camera pose reconstruction, sparse geometric reconstruction of deformable surfaces. Dense geometric reconstruction aims at delivering the appearance of a scene at every single point. The construction of a large panoramic image from a set of traditional pictures has been extensively studied in the context of image mosaicing techniques. An original algorithm for sequential registration suitable for real-time applications has been conceived. The integration of the algorithm into a visual surveillance system has lead to robust and efficient motion detection with Pan-Tilt-Zoom cameras. Moreover, an evaluation methodology for quantitatively assessing and comparing image mosaicing algorithms has been devised and made available to the community. Camera pose reconstruction deals with the recovery of the camera trajectory across an image sequence. A novel mosaic-based pose reconstruction algorithm has been conceived that exploit image-mosaics and traditional pose estimation algorithms to deliver more accurate estimates. An innovative markerless vision-based human-machine interface has also been proposed, so as to allow a user to interact with a gaming applications by moving a hand held consumer grade camera in unstructured environments. Finally, sparse geometric reconstruction refers to the computation of the coarse geometry of an object at few preset points. In this thesis, an innovative shape reconstruction algorithm for deformable objects has been designed. A cooperation with the Solar Impulse project allowed to deploy the algorithm in a very challenging real-world scenario, i.e. the accurate measurements of airplane wings deformations.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper addresses the problem of obtaining 3d detailed reconstructions of human faces in real-time and with inexpensive hardware. We present an algorithm based on a monocular multi-spectral photometric-stereo setup. This system is known to capture high-detailed deforming 3d surfaces at high frame rates and without having to use any expensive hardware or synchronized light stage. However, the main challenge of such a setup is the calibration stage, which depends on the lights setup and how they interact with the specific material being captured, in this case, human faces. For this purpose we develop a self-calibration technique where the person being captured is asked to perform a rigid motion in front of the camera, maintaining a neutral expression. Rigidity constrains are then used to compute the head's motion with a structure-from-motion algorithm. Once the motion is obtained, a multi-view stereo algorithm reconstructs a coarse 3d model of the face. This coarse model is then used to estimate the lighting parameters with a stratified approach: In the first step we use a RANSAC search to identify purely diffuse points on the face and to simultaneously estimate this diffuse reflectance model. In the second step we apply non-linear optimization to fit a non-Lambertian reflectance model to the outliers of the previous step. The calibration procedure is validated with synthetic and real data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-08

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Practice-led or multi modal theses (describing examinable outcomes of postgraduate study which comprise the practice of dancing/choreography with an accompanying exegesis) are an emerging strength of dance scholarship; a form of enquiry that has been gaining momentum for over a decade, particularly in Australia and the United Kingdom. It has been strongly argued that, in this form of research, legitimate claims to new knowledge are embodied predominantly within the practice itself (Pakes, 2003) and that these findings are emergent, contingent and often interstitial, contained within both the material form of the practice and in the symbolic languages surrounding the form. In a recent study on ‘dancing’ theses Phillips, Stock, Vincs (2009) found that there was general agreement from academics and artists that ‘there could be more flexibility in matching written language with conceptual thought expressed in practice’. The authors discuss how the seemingly intangible nature of danced / embodied research, reliant on what Melrose (2003) terms ‘performance mastery’ by the ‘expert practitioner’ (2006, Point 4) involving ‘expert’ intuition (2006, Point 5), might be accessed, articulated and validated in terms of alternative ways of knowing through exploring an ongoing dialogue in which the danced practice develops emergent theory. They also propose ways in which the danced thesis can be ‘converted’ into the required ‘durable’ artefact which the ephemerality of live performance denies, drawing on the work of Rye’s ‘multi-view’ digital record (2003) and Stapleton’s ‘multi-voiced audio visual document’(2006, 82). Building on a two-year research project (2007-2008) Dancing Between Diversity and Consistency: Refining Assessment in Postgraduate Degrees in Dance, which examined such issues in relation to assessment in an Australian context, the three researchers have further explored issues around interdisciplinarity, cultural differences and documentation through engaging with the following questions:  How do we represent research in which understandings, meanings and findings are situated within the body of the dancer/choreographer?  Do these need a form of ‘translating’ into textual form in order to be accessed as research?  What kind of language structures can be developed to effect this translation: metaphor, allusion, symbol?  How important is contextualising the creative practice?  How do we incorporate differing cultural inflections and practices into our reading and evaluation?  What kind of layered documentation can assist in producing a ‘durable’ research artefact from a non-reproduce-able live event?

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper proposes a new method of using foreground silhouette images for human pose estimation. Labels are introduced to the silhouette images, providing an extra layer of information that can be used in the model fitting process. The pixels in the silhouettes are labelled according to the corresponding body part in the model of the current fit, with the labels propagated into the silhouette of the next frame to be used in the fitting for the next frame. Both single and multi-view implementations are detailed, with results showing performance improvements over only using standard unlabelled silhouettes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In semisupervised learning (SSL), a predictive model is learn from a collection of labeled data and a typically much larger collection of unlabeled data. These paper presented a framework called multi-view point cloud regularization (MVPCR), which unifies and generalizes several semisupervised kernel methods that are based on data-dependent regularization in reproducing kernel Hilbert spaces (RKHSs). Special cases of MVPCR include coregularized least squares (CoRLS), manifold regularization (MR), and graph-based SSL. An accompanying theorem shows how to reduce any MVPCR problem to standard supervised learning with a new multi-view kernel.