859 resultados para Computer vision teaching
Resumo:
Automatic identification and extraction of bone contours from X-ray images is an essential first step task for further medical image analysis. In this paper we propose a 3D statistical model based framework for the proximal femur contour extraction from calibrated X-ray images. The automatic initialization is solved by an estimation of Bayesian network algorithm to fit a multiple component geometrical model to the X-ray data. The contour extraction is accomplished by a non-rigid 2D/3D registration between a 3D statistical model and the X-ray images, in which bone contours are extracted by a graphical model based Bayesian inference. Preliminary experiments on clinical data sets verified its validity
Resumo:
Correspondence establishment is a key step in statistical shape model building. There are several automated methods for solving this problem in 3D, but they usually can only handle objects with simple topology, like that of a sphere or a disc. We propose an extension to correspondence establishment over a population based on the optimization of the minimal description length function, allowing considering objects with arbitrary topology. Instead of using a fixed structure of kernel placement on a sphere for the systematic manipulation of point landmark positions, we rely on an adaptive, hierarchical organization of surface patches. This hierarchy can be built on surfaces of arbitrary topology and the resulting patches are used as a basis for a consistent, multi-scale modification of the surfaces' parameterization, based on point distribution models. The feasibility of the approach is demonstrated on synthetic models with different topologies.
Resumo:
In this paper, we propose the use of specific system architecture, based on mobile device, for navigation in urban environments. The aim of this work is to assess how virtual and augmented reality interface paradigms can provide enhanced location based services using real-time techniques in the context of these two different technologies. The virtual reality interface is based on faithful graphical representation of the localities of interest, coupled with sensory information on the location and orientation of the user, while the augmented reality interface uses computer vision techniques to capture patterns from the real environment and overlay additional way-finding information, aligned with real imagery, in real-time. The knowledge obtained from the evaluation of the virtual reality navigational experience has been used to inform the design of the augmented reality interface. Initial results of the user testing of the experimental augmented reality system for navigation are presented.
Resumo:
Many applications, such as telepresence, virtual reality, and interactive walkthroughs, require a three-dimensional(3D)model of real-world environments. Methods, such as lightfields, geometric reconstruction and computer vision use cameras to acquire visual samples of the environment and construct a model. Unfortunately, obtaining models of real-world locations is a challenging task. In particular, important environments are often actively in use, containing moving objects, such as people entering and leaving the scene. The methods previously listed have difficulty in capturing the color and structure of the environment while in the presence of moving and temporary occluders. We describe a class of cameras called lag cameras. The main concept is to generalize a camera to take samples over space and time. Such a camera, can easily and interactively detect moving objects while continuously moving through the environment. Moreover, since both the lag camera and occluder are moving, the scene behind the occluder is captured by the lag camera even from viewpoints where the occluder lies in between the lag camera and the hidden scene. We demonstrate an implementation of a lag camera, complete with analysis and captured environments.
Resumo:
In this paper we present a hybrid method to track human motions in real-time. With simplified marker sets and monocular video input, the strength of both marker-based and marker-free motion capturing are utilized: A cumbersome marker calibration is avoided while the robustness of the marker-free tracking is enhanced by referencing the tracked marker positions. An improved inverse kinematics solver is employed for real-time pose estimation. A computer-visionbased approach is applied to refine the pose estimation and reduce the ambiguity of the inverse kinematics solutions. We use this hybrid method to capture typical table tennis upper body movements in a real-time virtual reality application.
Resumo:
Given arbitrary pictures, we explore the possibility of using new techniques from computer vision and artificial intelligence to create customized visual games on-the-fly. This includes coloring books, link-the-dot and spot-the-difference popular games. The feasibility of these systems is discussed and we describe prototype implementation that work well in practice in an automatic or semi-automatic way.
Resumo:
To master changing performance demands, autonomous transport vehicles are deployed to make inhouse material flow applications more flexible. The socalled cellular transport system consists of a multitude of small scale transport vehicles which shall be able to form a swarm. Therefore the vehicles need to detect each other, exchange information amongst each other and sense their environment. By provision of peripherally acquired information of other transport entities, more convenient decisions can be made in terms of navigation and collision avoidance. This paper is a contribution to collective utilization of sensor data in the swarm of cellular transport vehicles.
Resumo:
Images of an object under different illumination are known to provide strong cues about the object surface. A mathematical formalization of how to recover the normal map of such a surface leads to the so-called uncalibrated photometric stereo problem. In the simplest instance, this problem can be reduced to the task of identifying only three parameters: the so-called generalized bas-relief (GBR) ambiguity. The challenge is to find additional general assumptions about the object, that identify these parameters uniquely. Current approaches are not consistent, i.e., they provide different solutions when run multiple times on the same data. To address this limitation, we propose exploiting local diffuse reflectance (LDR) maxima, i.e., points in the scene where the normal vector is parallel to the illumination direction (see Fig. 1). We demonstrate several noteworthy properties of these maxima: a closed-form solution, computational efficiency and GBR consistency. An LDR maximum yields a simple closed-form solution corresponding to a semi-circle in the GBR parameters space (see Fig. 2); because as few as two diffuse maxima in different images identify a unique solution, the identification of the GBR parameters can be achieved very efficiently; finally, the algorithm is consistent as it always returns the same solution given the same data. Our algorithm is also remarkably robust: It can obtain an accurate estimate of the GBR parameters even with extremely high levels of outliers in the detected maxima (up to 80 % of the observations). The method is validated on real data and achieves state-of-the-art results.
Resumo:
Computer vision-based food recognition could be used to estimate a meal's carbohydrate content for diabetic patients. This study proposes a methodology for automatic food recognition, based on the Bag of Features (BoF) model. An extensive technical investigation was conducted for the identification and optimization of the best performing components involved in the BoF architecture, as well as the estimation of the corresponding parameters. For the design and evaluation of the prototype system, a visual dataset with nearly 5,000 food images was created and organized into 11 classes. The optimized system computes dense local features, using the scale-invariant feature transform on the HSV color space, builds a visual dictionary of 10,000 visual words by using the hierarchical k-means clustering and finally classifies the food images with a linear support vector machine classifier. The system achieved classification accuracy of the order of 78%, thus proving the feasibility of the proposed approach in a very challenging image dataset.
Resumo:
There is great demand for easily-accessible, user-friendly dietary self-management applications. Yet accurate, fully-automatic estimation of nutritional intake using computer vision methods remains an open research problem. One key element of this problem is the volume estimation, which can be computed from 3D models obtained using multi-view geometry. The paper presents a computational system for volume estimation based on the processing of two meal images. A 3D model of the served meal is reconstructed using the acquired images and the volume is computed from the shape. The algorithm was tested on food models (dummy foods) with known volume and on real served food. Volume accuracy was in the order of 90 %, while the total execution time was below 15 seconds per image pair. The proposed system combines simple and computational affordable methods for 3D reconstruction, remained stable throughout the experiments, operates in near real time, and places minimum constraints on users.