956 resultados para 3D-object recognition
Resumo:
Ce mémoire s'intéresse à la reconstruction d'un modèle 3D à partir de plusieurs images. Le modèle 3D est élaboré avec une représentation hiérarchique de voxels sous la forme d'un octree. Un cube englobant le modèle 3D est calculé à partir de la position des caméras. Ce cube contient les voxels et il définit la position de caméras virtuelles. Le modèle 3D est initialisé par une enveloppe convexe basée sur la couleur uniforme du fond des images. Cette enveloppe permet de creuser la périphérie du modèle 3D. Ensuite un coût pondéré est calculé pour évaluer la qualité de chaque voxel à faire partie de la surface de l'objet. Ce coût tient compte de la similarité des pixels provenant de chaque image associée à la caméra virtuelle. Finalement et pour chacune des caméras virtuelles, une surface est calculée basée sur le coût en utilisant la méthode de SGM. La méthode SGM tient compte du voisinage lors du calcul de profondeur et ce mémoire présente une variation de la méthode pour tenir compte des voxels précédemment exclus du modèle par l'étape d'initialisation ou de creusage par une autre surface. Par la suite, les surfaces calculées sont utilisées pour creuser et finaliser le modèle 3D. Ce mémoire présente une combinaison innovante d'étapes permettant de créer un modèle 3D basé sur un ensemble d'images existant ou encore sur une suite d'images capturées en série pouvant mener à la création d'un modèle 3D en temps réel.
Resumo:
Pedicle screw insertion technique has made revolution in the surgical treatment of spinal fractures and spinal disorders. Although X- ray fluoroscopy based navigation is popular, there is risk of prolonged exposure to X- ray radiation. Systems that have lower radiation risk are generally quite expensive. The position and orientation of the drill is clinically very important in pedicle screw fixation. In this paper, the position and orientation of the marker on the drill is determined using pattern recognition based methods, using geometric features, obtained from the input video sequence taken from CCD camera. A search is then performed on the video frames after preprocessing, to obtain the exact position and orientation of the drill. An animated graphics, showing the instantaneous position and orientation of the drill is then overlaid on the processed video for real time drill control and navigation
Resumo:
This paper describes a general, trainable architecture for object detection that has previously been applied to face and peoplesdetection with a new application to car detection in static images. Our technique is a learning based approach that uses a set of labeled training data from which an implicit model of an object class -- here, cars -- is learned. Instead of pixel representations that may be noisy and therefore not provide a compact representation for learning, our training images are transformed from pixel space to that of Haar wavelets that respond to local, oriented, multiscale intensity differences. These feature vectors are then used to train a support vector machine classifier. The detection of cars in images is an important step in applications such as traffic monitoring, driver assistance systems, and surveillance, among others. We show several examples of car detection on out-of-sample images and show an ROC curve that highlights the performance of our system.
Resumo:
There is general consensus that context can be a rich source of information about an object's identity, location and scale. In fact, the structure of many real-world scenes is governed by strong configurational rules akin to those that apply to a single object. Here we introduce a simple probabilistic framework for modeling the relationship between context and object properties based on the correlation between the statistics of low-level features across the entire scene and the objects that it contains. The resulting scheme serves as an effective procedure for object priming, context driven focus of attention and automatic scale-selection on real-world scenes.
Resumo:
To recognize a previously seen object, the visual system must overcome the variability in the object's appearance caused by factors such as illumination and pose. Developments in computer vision suggest that it may be possible to counter the influence of these factors, by learning to interpolate between stored views of the target object, taken under representative combinations of viewing conditions. Daily life situations, however, typically require categorization, rather than recognition, of objects. Due to the open-ended character both of natural kinds and of artificial categories, categorization cannot rely on interpolation between stored examples. Nonetheless, knowledge of several representative members, or prototypes, of each of the categories of interest can still provide the necessary computational substrate for the categorization of new instances. The resulting representational scheme based on similarities to prototypes appears to be computationally viable, and is readily mapped onto the mechanisms of biological vision revealed by recent psychophysical and physiological studies.
Resumo:
This report explores methods for determining the pose of a grasped object using only limited sensor information. The problem of pose determination is to find the position of an object relative to the hand. The information is useful when grasped objects are being manipulated. The problem is hard because of the large space of grasp configurations and the large amount of uncertainty inherent in dexterous hand control. By studying limited sensing approaches, the problem's inherent constraints can be better understood. This understanding helps to show how additional sensor data can be used to make recognition methods more effective and robust.
Resumo:
In this paper we present a component based person detection system that is capable of detecting frontal, rear and near side views of people, and partially occluded persons in cluttered scenes. The framework that is described here for people is easily applied to other objects as well. The motivation for developing a component based approach is two fold: first, to enhance the performance of person detection systems on frontal and rear views of people and second, to develop a framework that directly addresses the problem of detecting people who are partially occluded or whose body parts blend in with the background. The data classification is handled by several support vector machine classifiers arranged in two layers. This architecture is known as Adaptive Combination of Classifiers (ACC). The system performs very well and is capable of detecting people even when all components of a person are not found. The performance of the system is significantly better than a full body person detector designed along similar lines. This suggests that the improved performance is due to the components based approach and the ACC data classification structure.
Resumo:
We propose a probabilistic object classifier for outdoor scene analysis as a first step in solving the problem of scene context generation. The method begins with a top-down control, which uses the previously learned models (appearance and absolute location) to obtain an initial pixel-level classification. This information provides us the core of objects, which is used to acquire a more accurate object model. Therefore, their growing by specific active regions allows us to obtain an accurate recognition of known regions. Next, a stage of general segmentation provides the segmentation of unknown regions by a bottom-strategy. Finally, the last stage tries to perform a region fusion of known and unknown segmented objects. The result is both a segmentation of the image and a recognition of each segment as a given object class or as an unknown segmented object. Furthermore, experimental results are shown and evaluated to prove the validity of our proposal
Resumo:
The accuracy of a 3D reconstruction using laser scanners is significantly determined by the detection of the laser stripe. Since the energy pattern of such a stripe corresponds to a Gaussian profile, it makes sense to detect the point of maximum light intensity (or peak) by computing the zero-crossing point of the first derivative of such Gaussian profile. However, because noise is present in every physical process, such as electronic image formation, it is not sensitive to perform the derivative of the image of the stripe in almost any situation, unless a previous filtering stage is done. Considering that stripe scanning is an inherently row-parallel process, every row of a given image must be processed independently in order to compute its corresponding peak position in the row. This paper reports on the use of digital filtering techniques in order to cope with the scanning of different surfaces with different optical properties and different noise levels, leading to the proposal of a more accurate numerical peak detector, even at very low signal-to-noise ratios
Resumo:
In this paper we present a novel structure from motion (SfM) approach able to infer 3D deformable models from uncalibrated stereo images. Using a stereo setup dramatically improves the 3D model estimation when the observed 3D shape is mostly deforming without undergoing strong rigid motion. Our approach first calibrates the stereo system automatically and then computes a single metric rigid structure for each frame. Afterwards, these 3D shapes are aligned to a reference view using a RANSAC method in order to compute the mean shape of the object and to select the subset of points on the object which have remained rigid throughout the sequence without deforming. The selected rigid points are then used to compute frame-wise shape registration and to extract the motion parameters robustly from frame to frame. Finally, all this information is used in a global optimization stage with bundle adjustment which allows to refine the frame-wise initial solution and also to recover the non-rigid 3D model. We show results on synthetic and real data that prove the performance of the proposed method even when there is no rigid motion in the original sequence
Resumo:
We describe a model-based objects recognition system which is part of an image interpretation system intended to assist autonomous vehicles navigation. The system is intended to operate in man-made environments. Behavior-based navigation of autonomous vehicles involves the recognition of navigable areas and the potential obstacles. The recognition system integrates color, shape and texture information together with the location of the vanishing point. The recognition process starts from some prior scene knowledge, that is, a generic model of the expected scene and the potential objects. The recognition system constitutes an approach where different low-level vision techniques extract a multitude of image descriptors which are then analyzed using a rule-based reasoning system to interpret the image content. This system has been implemented using CEES, the C++ embedded expert system shell developed in the Systems Engineering and Automatic Control Laboratory (University of Girona) as a specific rule-based problem solving tool. It has been especially conceived for supporting cooperative expert systems, and uses the object oriented programming paradigm
Resumo:
The automatic interpretation of conventional traffic signs is very complex and time consuming. The paper concerns an automatic warning system for driving assistance. It does not interpret the standard traffic signs on the roadside; the proposal is to incorporate into the existing signs another type of traffic sign whose information will be more easily interpreted by a processor. The type of information to be added is profuse and therefore the most important object is the robustness of the system. The basic proposal of this new philosophy is that the co-pilot system for automatic warning and driving assistance can interpret with greater ease the information contained in the new sign, whilst the human driver only has to interpret the "classic" sign. One of the codings that has been tested with good results and which seems to us easy to implement is that which has a rectangular shape and 4 vertical bars of different colours. The size of these signs is equivalent to the size of the conventional signs (approximately 0.4 m2). The colour information from the sign can be easily interpreted by the proposed processor and the interpretation is much easier and quicker than the information shown by the pictographs of the classic signs
Resumo:
Coded structured light is an optical technique based on active stereovision that obtains the shape of objects. One shot techniques are based on projecting a unique light pattern with an LCD projector so that grabbing an image with a camera, a large number of correspondences can be obtained. Then, a 3D reconstruction of the illuminated object can be recovered by means of triangulation. The most used strategy to encode one-shot patterns is based on De Bruijn sequences. In This work a new way to design patterns using this type of sequences is presented. The new coding strategy minimises the number of required colours and maximises both the resolution and the accuracy
Resumo:
En este artículo mostramos un micromundo que llamaremos Espacio 3D para el Object-LOGO (sobre un ordenador Macintosh) implementado por nuestro equipo de trabajo. Se hace un estudio de la programación modular, justificando la conveniencia de utilizarla en nuestro micromundo y se ofrecen un conjunto de utilidades para facilitar la programación modular en el espacio tridimensional. A la vez pretendemos ilustrar con ejemplos el manejo del micromundo, el empleo de las utilidades y la integración de todas la herramientas anteriores en gráficos tridimensionales construidos a partir de la idea de sintonicidad corporal.
Resumo:
A new algorithm is described for refining the pose of a model of a rigid object, to conform more accurately to the image structure. Elemental 3D forces are considered to act on the model. These are derived from directional derivatives of the image local to the projected model features. The convergence properties of the algorithm is investigated and compared to a previous technique. Its use in a video sequence of a cluttered outdoor traffic scene is also illustrated and assessed.