892 resultados para SIFT,Computer Vision,Python,Object Recognition,Feature Detection,Descriptor Computation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The need to digitise music scores has led to the development of Optical Music Recognition (OMR) tools. Unfortunately, the performance of these systems is still far from providing acceptable results. This situation forces the user to be involved in the process due to the need of correcting the mistakes made during recognition. However, this correction is performed over the output of the system, so these interventions are not exploited to improve the performance of the recognition. This work sets the scenario in which human and machine interact to accurately complete the OMR task with the least possible effort for the user.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

New low cost sensors and open free libraries for 3D image processing are making important advances in robot vision applications possible, such as three-dimensional object recognition, semantic mapping, navigation and localization of robots, human detection and/or gesture recognition for human-machine interaction. In this paper, a novel method for recognizing and tracking the fingers of a human hand is presented. This method is based on point clouds from range images captured by a RGBD sensor. It works in real time and it does not require visual marks, camera calibration or previous knowledge of the environment. Moreover, it works successfully even when multiple objects appear in the scene or when the ambient light is changed. Furthermore, this method was designed to develop a human interface to control domestic or industrial devices, remotely. In this paper, the method was tested by operating a robotic hand. Firstly, the human hand was recognized and the fingers were detected. Secondly, the movement of the fingers was analysed and mapped to be imitated by a robotic hand.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a study and analysis of surface normal-base descriptors for 3D object recognition. Specifically, we evaluate the behaviour of descriptors in the recognition process using virtual models of objects created from CAD software. Later, we test them in real scenes using synthetic objects created with a 3D printer from the virtual models. In both cases, the same virtual models are used on the matching process to find similarity. The difference between both experiments is in the type of views used in the tests. Our analysis evaluates three subjects: the effectiveness of 3D descriptors depending on the viewpoint of camera, the geometry complexity of the model and the runtime used to do the recognition process and the success rate to recognize a view of object among the models saved in the database.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

During grasping and intelligent robotic manipulation tasks, the camera position relative to the scene changes dramatically because the robot is moving to adapt its path and correctly grasp objects. This is because the camera is mounted at the robot effector. For this reason, in this type of environment, a visual recognition system must be implemented to recognize and “automatically and autonomously” obtain the positions of objects in the scene. Furthermore, in industrial environments, all objects that are manipulated by robots are made of the same material and cannot be differentiated by features such as texture or color. In this work, first, a study and analysis of 3D recognition descriptors has been completed for application in these environments. Second, a visual recognition system designed from specific distributed client-server architecture has been proposed to be applied in the recognition process of industrial objects without these appearance features. Our system has been implemented to overcome problems of recognition when the objects can only be recognized by geometric shape and the simplicity of shapes could create ambiguity. Finally, some real tests are performed and illustrated to verify the satisfactory performance of the proposed system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We introduce a new second-order method of texture analysis called Adaptive Multi-Scale Grey Level Co-occurrence Matrix (AMSGLCM), based on the well-known Grey Level Co-occurrence Matrix (GLCM) method. The method deviates significantly from GLCM in that features are extracted, not via a fixed 2D weighting function of co-occurrence matrix elements, but by a variable summation of matrix elements in 3D localized neighborhoods. We subsequently present a new methodology for extracting optimized, highly discriminant features from these localized areas using adaptive Gaussian weighting functions. Genetic Algorithm (GA) optimization is used to produce a set of features whose classification worth is evaluated by discriminatory power and feature correlation considerations. We critically appraised the performance of our method and GLCM in pairwise classification of images from visually similar texture classes, captured from Markov Random Field (MRF) synthesized, natural, and biological origins. In these cross-validated classification trials, our method demonstrated significant benefits over GLCM, including increased feature discriminatory power, automatic feature adaptability, and significantly improved classification performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Beyond the inherent technical challenges, current research into the three dimensional surface correspondence problem is hampered by a lack of uniform terminology, an abundance of application specific algorithms, and the absence of a consistent model for comparing existing approaches and developing new ones. This paper addresses these challenges by presenting a framework for analysing, comparing, developing, and implementing surface correspondence algorithms. The framework uses five distinct stages to establish correspondence between surfaces. It is general, encompassing a wide variety of existing techniques, and flexible, facilitating the synthesis of new correspondence algorithms. This paper presents a review of existing surface correspondence algorithms, and shows how they fit into the correspondence framework. It also shows how the framework can be used to analyse and compare existing algorithms and develop new algorithms using the framework's modular structure. Six algorithms, four existing and two new, are implemented using the framework. Each implemented algorithm is used to match a number of surface pairs. Results demonstrate that the correspondence framework implementations are faithful implementations of existing algorithms, and that powerful new surface correspondence algorithms can be created. (C) 2004 Elsevier Inc. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper defines the 3D reconstruction problem as the process of reconstructing a 3D scene from numerous 2D visual images of that scene. It is well known that this problem is ill-posed, and numerous constraints and assumptions are used in 3D reconstruction algorithms in order to reduce the solution space. Unfortunately, most constraints only work in a certain range of situations and often constraints are built into the most fundamental methods (e.g. Area Based Matching assumes that all the pixels in the window belong to the same object). This paper presents a novel formulation of the 3D reconstruction problem, using a voxel framework and first order logic equations, which does not contain any additional constraints or assumptions. Solving this formulation for a set of input images gives all the possible solutions for that set, rather than picking a solution that is deemed most likely. Using this formulation, this paper studies the problem of uniqueness in 3D reconstruction and how the solution space changes for different configurations of input images. It is found that it is not possible to guarantee a unique solution, no matter how many images are taken of the scene, their orientation or even how much color variation is in the scene itself. Results of using the formulation to reconstruct a few small voxel spaces are also presented. They show that the number of solutions is extremely large for even very small voxel spaces (5 x 5 voxel space gives 10 to 10(7) solutions). This shows the need for constraints to reduce the solution space to a reasonable size. Finally, it is noted that because of the discrete nature of the formulation, the solution space size can be easily calculated, making the formulation a useful tool to numerically evaluate the usefulness of any constraints that are added.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes the real time global vision system for the robot soccer team the RoboRoos. It has a highly optimised pipeline that includes thresholding, segmenting, colour normalising, object recognition and perspective and lens correction. It has a fast ‘paint’ colour calibration system that can calibrate in any face of the YUV or HSI cube. It also autonomously selects both an appropriate camera gain and colour gains robot regions across the field to achieve colour uniformity. Camera geometry calibration is performed automatically from selection of keypoints on the field. The system acheives a position accuracy of better than 15mm over a 4m × 5.5m field, and orientation accuracy to within 1°. It processes 614 × 480 pixels at 60Hz on a 2.0GHz Pentium 4 microprocessor.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A major impediment to developing real-time computer vision systems has been the computational power and level of skill required to process video streams in real-time. This has meant that many researchers have either analysed video streams off-line or used expensive dedicated hardware acceleration techniques. Recent software and hardware developments have greatly eased the development burden of realtime image analysis leading to the development of portable systems using cheap PC hardware and software exploiting the Multimedia Extension (MMX) instruction set of the Intel Pentium chip. This paper describes the implementation of a computationally efficient computer vision system for recognizing hand gestures using efficient coding and MMX-acceleration to achieve real-time performance on low cost hardware.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Feature detection is a crucial stage of visual processing. In previous feature-marking experiments we found that peaks in the 3rd derivative of the luminance profile can signify edges where there are no 1st derivative peaks nor 2nd derivative zero-crossings (Wallis and George 'Mach edges' (the edges of Mach bands) were nicely predicted by a new nonlinear model based on 3rd derivative filtering. As a critical test of the model, we now use a new class of stimuli, formed by adding a linear luminance ramp to the blurred triangle waves used previously. The ramp has no effect on the second or higher derivatives, but the nonlinear model predicts a shift from seeing two edges to seeing only one edge as the added ramp gradient increases. In experiment 1, subjects judged whether one or two edges were visible on each trial. In experiment 2, subjects used a cursor to mark perceived edges and bars. The position and polarity of the marked edges were close to model predictions. Both experiments produced the predicted shift from two to one Mach edge, but the shift was less complete than predicted. We conclude that the model is a useful predictor of edge perception, but needs some modification.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There have been two main approaches to feature detection in human and computer vision - based either on the luminance distribution and its spatial derivatives, or on the spatial distribution of local contrast energy. Thus, bars and edges might arise from peaks of luminance and luminance gradient respectively, or bars and edges might be found at peaks of local energy, where local phases are aligned across spatial frequency. This basic issue of definition is important because it guides more detailed models and interpretations of early vision. Which approach better describes the perceived positions of features in images? We used the class of 1-D images defined by Morrone and Burr in which the amplitude spectrum is that of a (partially blurred) square-wave and all Fourier components have a common phase. Observers used a cursor to mark where bars and edges were seen for different test phases (Experiment 1) or judged the spatial alignment of contours that had different phases (e.g. 0 degrees and 45 degrees ; Experiment 2). The feature positions defined by both tasks shifted systematically to the left or right according to the sign of the phase offset, increasing with the degree of blur. These shifts were well predicted by the location of luminance peaks (bars) and gradient peaks (edges), but not by energy peaks which (by design) predicted no shift at all. These results encourage models based on a Gaussian-derivative framework, but do not support the idea that human vision uses points of phase alignment to find local, first-order features. Nevertheless, we argue that both approaches are presently incomplete and a better understanding of early vision may combine insights from both. (C)2004 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Marr's work offered guidelines on how to investigate vision (the theory - algorithm - implementation distinction), as well as specific proposals on how vision is done. Many of the latter have inevitably been superseded, but the approach was inspirational and remains so. Marr saw the computational study of vision as tightly linked to psychophysics and neurophysiology, but the last twenty years have seen some weakening of that integration. Because feature detection is a key stage in early human vision, we have returned to basic questions about representation of edges at coarse and fine scales. We describe an explicit model in the spirit of the primal sketch, but tightly constrained by psychophysical data. Results from two tasks (location-marking and blur-matching) point strongly to the central role played by second-derivative operators, as proposed by Marr and Hildreth. Edge location and blur are evaluated by finding the location and scale of the Gaussian-derivative `template' that best matches the second-derivative profile (`signature') of the edge. The system is scale-invariant, and accurately predicts blur-matching data for a wide variety of 1-D and 2-D images. By finding the best-fitting scale, it implements a form of local scale selection and circumvents the knotty problem of integrating filter outputs across scales. [Supported by BBSRC and the Wellcome Trust]

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ernst Mach observed that light or dark bands could be seen at abrupt changes of luminance gradient in the absence of peaks or troughs in luminance. Many models of feature detection share the idea that bars, lines, and Mach bands are found at peaks and troughs in the output of even-symmetric spatial filters. Our experiments assessed the appearance of Mach bands (position and width) and the probability of seeing them on a novel set of generalized Gaussian edges. Mach band probability was mainly determined by the shape of the luminance profile and increased with the sharpness of its corners, controlled by a single parameter (n). Doubling or halving the size of the images had no significant effect. Variations in contrast (20%-80%) and duration (50-300 ms) had relatively minor effects. These results rule out the idea that Mach bands depend simply on the amplitude of the second derivative, but a multiscale model, based on Gaussian-smoothed first- and second-derivative filtering, can account accurately for the probability and perceived spatial layout of the bands. A key idea is that Mach band visibility depends on the ratio of second- to first-derivative responses at peaks in the second-derivative scale-space map. This ratio is approximately scale-invariant and increases with the sharpness of the corners of the luminance ramp, as observed. The edges of Mach bands pose a surprisingly difficult challenge for models of edge detection, but a nonlinear third-derivative operation is shown to predict the locations of Mach band edges strikingly well. Mach bands thus shed new light on the role of multiscale filtering systems in feature coding. © 2012 ARVO.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Graph-based representations have been used with considerable success in computer vision in the abstraction and recognition of object shape and scene structure. Despite this, the methodology available for learning structural representations from sets of training examples is relatively limited. In this paper we take a simple yet effective Bayesian approach to attributed graph learning. We present a naïve node-observation model, where we make the important assumption that the observation of each node and each edge is independent of the others, then we propose an EM-like approach to learn a mixture of these models and a Minimum Message Length criterion for components selection. Moreover, in order to avoid the bias that could arise with a single estimation of the node correspondences, we decide to estimate the sampling probability over all the possible matches. Finally we show the utility of the proposed approach on popular computer vision tasks such as 2D and 3D shape recognition. © 2011 Springer-Verlag.