885 resultados para SIFT,Computer Vision,Python,Object Recognition,Feature Detection,Descriptor Computation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Corner detection has shown its great importance in many computer vision tasks. However, in real-world applications, noise in the image strongly affects the performance of corner detectors. Few corner detectors have been designed to be robust to heavy noise by now, partly because the noise could be reduced by a denoising procedure. In this paper, we present a corner detector that could find discriminative corners in images contaminated by noise of different levels, without any denoising procedure. Candidate corners (i.e., features) are firstly detected by a modified SUSAN approach, and then false corners in noise are rejected based on their local characteristics. Features in flat regions are removed based on their intensity centroid, and features on edge structures are removed using the Harris response. The detector is self-adaptive to noise since the image signal-to-noise ratio (SNR) is automatically estimated to choose an appropriate threshold for refining features. Experimental results show that our detector has better performance at locating discriminative corners in images with strong noise than other widely used corner or keypoint detectors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The research reported here addresses the problem of detecting and tracking independently moving objects from a moving observer in real time, using corners as object tokens. Local image-plane constraints are employed to solve the correspondence problem removing the need for a 3D motion model. The approach relaxes the restrictive static-world assumption conventionally made, and is therefore capable of tracking independently moving and deformable objects. The technique is novel in that feature detection and tracking is restricted to areas likely to contain meaningful image structure. Feature instantiation regions are defined from a combination of odometry informatin and a limited knowledge of the operating scenario. The algorithms developed have been tested on real image sequences taken from typical driving scenarios. Preliminary experiments on a parallel (transputer) architecture indication that real-time operation is achievable.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a novel crop detection system applied to the challenging task of field sweet pepper (capsicum) detection. The field-grown sweet pepper crop presents several challenges for robotic systems such as the high degree of occlusion and the fact that the crop can have a similar colour to the background (green on green). To overcome these issues, we propose a two-stage system that performs per-pixel segmentation followed by region detection. The output of the segmentation is used to search for highly probable regions and declares these to be sweet pepper. We propose the novel use of the local binary pattern (LBP) to perform crop segmentation. This feature improves the accuracy of crop segmentation from an AUC of 0.10, for previously proposed features, to 0.56. Using the LBP feature as the basis for our two-stage algorithm, we are able to detect 69.2% of field grown sweet peppers in three sites. This is an impressive result given that the average detection accuracy of people viewing the same colour imagery is 66.8%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Stationary processes are random variables whose value is a signal and whose distribution is invariant to translation in the domain of the signal. They are intimately connected to convolution, and therefore to the Fourier transform, since the covariance matrix of a stationary process is a Toeplitz matrix, and Toeplitz matrices are the expression of convolution as a linear operator. This thesis utilises this connection in the study of i) efficient training algorithms for object detection and ii) trajectory-based non-rigid structure-from-motion.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a novel coarse-to-fine global localization approach that is inspired by object recognition and text retrieval techniques. Harris-Laplace interest points characterized by SIFT descriptors are used as natural land-marks. These descriptors are indexed into two databases: an inverted index and a location database. The inverted index is built based on a visual vocabulary learned from the feature descriptors. In the location database, each location is directly represented by a set of scale invariant descriptors. The localization process consists of two stages: coarse localization and fine localization. Coarse localization from the inverted index is fast but not accurate enough; whereas localization from the location database using voting algorithm is relatively slow but more accurate. The combination of coarse and fine stages makes fast and reliable localization possible. In addition, if necessary, the localization result can be verified by epipolar geometry between the representative view in database and the view to be localized. Experimental results show that our approach is efficient and reliable. ©2005 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper tackles the novel challenging problem of 3D object phenotype recognition from a single 2D silhouette. To bridge the large pose (articulation or deformation) and camera viewpoint changes between the gallery images and query image, we propose a novel probabilistic inference algorithm based on 3D shape priors. Our approach combines both generative and discriminative learning. We use latent probabilistic generative models to capture 3D shape and pose variations from a set of 3D mesh models. Based on these 3D shape priors, we generate a large number of projections for different phenotype classes, poses, and camera viewpoints, and implement Random Forests to efficiently solve the shape and pose inference problems. By model selection in terms of the silhouette coherency between the query and the projections of 3D shapes synthesized using the galleries, we achieve the phenotype recognition result as well as a fast approximate 3D reconstruction of the query. To verify the efficacy of the proposed approach, we present new datasets which contain over 500 images of various human and shark phenotypes and motions. The experimental results clearly show the benefits of using the 3D priors in the proposed method over previous 2D-based methods. © 2011 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The problem of automatic face recognition is to visually identify a person in an input image. This task is performed by matching the input face against the faces of known people in a database of faces. Most existing work in face recognition has limited the scope of the problem, however, by dealing primarily with frontal views, neutral expressions, and fixed lighting conditions. To help generalize existing face recognition systems, we look at the problem of recognizing faces under a range of viewpoints. In particular, we consider two cases of this problem: (i) many example views are available of each person, and (ii) only one view is available per person, perhaps a driver's license or passport photograph. Ideally, we would like to address these two cases using a simple view-based approach, where a person is represented in the database by using a number of views on the viewing sphere. While the view-based approach is consistent with case (i), for case (ii) we need to augment the single real view of each person with synthetic views from other viewpoints, views we call 'virtual views'. Virtual views are generated using prior knowledge of face rotation, knowledge that is 'learned' from images of prototype faces. This prior knowledge is used to effectively rotate in depth the single real view available of each person. In this thesis, I present the view-based face recognizer, techniques for synthesizing virtual views, and experimental results using real and virtual views in the recognizer.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Object detection and recognition are important problems in computer vision. The challenges of these problems come from the presence of noise, background clutter, large within class variations of the object class and limited training data. In addition, the computational complexity in the recognition process is also a concern in practice. In this thesis, we propose one approach to handle the problem of detecting an object class that exhibits large within-class variations, and a second approach to speed up the classification processes. In the first approach, we show that foreground-background classification (detection) and within-class classification of the foreground class (pose estimation) can be jointly solved with using a multiplicative form of two kernel functions. One kernel measures similarity for foreground-background classification. The other kernel accounts for latent factors that control within-class variation and implicitly enables feature sharing among foreground training samples. For applications where explicit parameterization of the within-class states is unavailable, a nonparametric formulation of the kernel can be constructed with a proper foreground distance/similarity measure. Detector training is accomplished via standard Support Vector Machine learning. The resulting detectors are tuned to specific variations in the foreground class. They also serve to evaluate hypotheses of the foreground state. When the image masks for foreground objects are provided in training, the detectors can also produce object segmentation. Methods for generating a representative sample set of detectors are proposed that can enable efficient detection and tracking. In addition, because individual detectors verify hypotheses of foreground state, they can also be incorporated in a tracking-by-detection frame work to recover foreground state in image sequences. To run the detectors efficiently at the online stage, an input-sensitive speedup strategy is proposed to select the most relevant detectors quickly. The proposed approach is tested on data sets of human hands, vehicles and human faces. On all data sets, the proposed approach achieves improved detection accuracy over the best competing approaches. In the second part of the thesis, we formulate a filter-and-refine scheme to speed up recognition processes. The binary outputs of the weak classifiers in a boosted detector are used to identify a small number of candidate foreground state hypotheses quickly via Hamming distance or weighted Hamming distance. The approach is evaluated in three applications: face recognition on the face recognition grand challenge version 2 data set, hand shape detection and parameter estimation on a hand data set, and vehicle detection and estimation of the view angle on a multi-pose vehicle data set. On all data sets, our approach is at least five times faster than simply evaluating all foreground state hypotheses with virtually no loss in classification accuracy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The performance of different classification approaches is evaluated using a view-based approach for motion representation. The view-based approach uses computer vision and image processing techniques to register and process the video sequence. Two motion representations called Motion Energy Images and Motion History Image are then constructed. These representations collapse the temporal component in a way that no explicit temporal analysis or sequence matching is needed. Statistical descriptions are then computed using moment-based features and dimensionality reduction techniques. For these tests, we used 7 Hu moments, which are invariant to scale and translation. Principal Components Analysis is used to reduce the dimensionality of this representation. The system is trained using different subjects performing a set of examples of every action to be recognized. Given these samples, K-nearest neighbor, Gaussian, and Gaussian mixture classifiers are used to recognize new actions. Experiments are conducted using instances of eight human actions (i.e., eight classes) performed by seven different subjects. Comparisons in the performance among these classifiers under different conditions are analyzed and reported. Our main goals are to test this dimensionality-reduced representation of actions, and more importantly to use this representation to compare the advantages of different classification approaches in this recognition task.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nearest neighbor retrieval is the task of identifying, given a database of objects and a query object, the objects in the database that are the most similar to the query. Retrieving nearest neighbors is a necessary component of many practical applications, in fields as diverse as computer vision, pattern recognition, multimedia databases, bioinformatics, and computer networks. At the same time, finding nearest neighbors accurately and efficiently can be challenging, especially when the database contains a large number of objects, and when the underlying distance measure is computationally expensive. This thesis proposes new methods for improving the efficiency and accuracy of nearest neighbor retrieval and classification in spaces with computationally expensive distance measures. The proposed methods are domain-independent, and can be applied in arbitrary spaces, including non-Euclidean and non-metric spaces. In this thesis particular emphasis is given to computer vision applications related to object and shape recognition, where expensive non-Euclidean distance measures are often needed to achieve high accuracy. The first contribution of this thesis is the BoostMap algorithm for embedding arbitrary spaces into a vector space with a computationally efficient distance measure. Using this approach, an approximate set of nearest neighbors can be retrieved efficiently - often orders of magnitude faster than retrieval using the exact distance measure in the original space. The BoostMap algorithm has two key distinguishing features with respect to existing embedding methods. First, embedding construction explicitly maximizes the amount of nearest neighbor information preserved by the embedding. Second, embedding construction is treated as a machine learning problem, in contrast to existing methods that are based on geometric considerations. The second contribution is a method for constructing query-sensitive distance measures for the purposes of nearest neighbor retrieval and classification. In high-dimensional spaces, query-sensitive distance measures allow for automatic selection of the dimensions that are the most informative for each specific query object. It is shown theoretically and experimentally that query-sensitivity increases the modeling power of embeddings, allowing embeddings to capture a larger amount of the nearest neighbor structure of the original space. The third contribution is a method for speeding up nearest neighbor classification by combining multiple embedding-based nearest neighbor classifiers in a cascade. In a cascade, computationally efficient classifiers are used to quickly classify easy cases, and classifiers that are more computationally expensive and also more accurate are only applied to objects that are harder to classify. An interesting property of the proposed cascade method is that, under certain conditions, classification time actually decreases as the size of the database increases, a behavior that is in stark contrast to the behavior of typical nearest neighbor classification systems. The proposed methods are evaluated experimentally in several different applications: hand shape recognition, off-line character recognition, online character recognition, and efficient retrieval of time series. In all datasets, the proposed methods lead to significant improvements in accuracy and efficiency compared to existing state-of-the-art methods. In some datasets, the general-purpose methods introduced in this thesis even outperform domain-specific methods that have been custom-designed for such datasets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hidden State Shape Models (HSSMs) [2], a variant of Hidden Markov Models (HMMs) [9], were proposed to detect shape classes of variable structure in cluttered images. In this paper, we formulate a probabilistic framework for HSSMs which provides two major improvements in comparison to the previous method [2]. First, while the method in [2] required the scale of the object to be passed as an input, the method proposed here estimates the scale of the object automatically. This is achieved by introducing a new term for the observation probability that is based on a object-clutter feature model. Second, a segmental HMM [6, 8] is applied to model the "duration probability" of each HMM state, which is learned from the shape statistics in a training set and helps obtain meaningful registration results. Using a segmental HMM provides a principled way to model dependencies between the scales of different parts of the object. In object localization experiments on a dataset of real hand images, the proposed method significantly outperforms the method of [2], reducing the incorrect localization rate from 40% to 15%. The improvement in accuracy becomes more significant if we consider that the method proposed here is scale-independent, whereas the method of [2] takes as input the scale of the object we want to localize.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A neural network model of synchronized oscillator activity in visual cortex is presented in order to account for recent neurophysiological findings that such synchronization may reflect global properties of the stimulus. In these recent experiments, it was reported that synchronization of oscillatory firing responses to moving bar stimuli occurred not only for nearby neurons, but also occurred between neurons separated by several cortical columns (several mm of cortex) when these neurons shared some receptive field preferences specific to the stimuli. These results were obtained not only for single bar stimuli but also across two disconnected, but colinear, bars moving in the same direction. Our model and computer simulations obtain these synchrony results across both single and double bar stimuli. For the double bar case, synchronous oscillations are induced in the region between the bars, but no oscillations are induced in the regions beyond the stimuli. These results were achieved with cellular units that exhibit limit cycle oscillations for a robust range of input values, but which approach an equilibrium state when undriven. Single and double bar synchronization of these oscillators was achieved by different, but formally related, models of preattentive visual boundary segmentation and attentive visual object recognition, as well as nearest-neighbor and randomly coupled models. In preattentive visual segmentation, synchronous oscillations may reflect the binding of local feature detectors into a globally coherent grouping. In object recognition, synchronous oscillations may occur during an attentive resonant state that triggers new learning. These modelling results support earlier theoretical predictions of synchronous visual cortical oscillations and demonstrate the robustness of the mechanisms capable of generating synchrony.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One of the most widely used techniques in computer vision for foreground detection is to model each background pixel as a Mixture of Gaussians (MoG). While this is effective for a static camera with a fixed or a slowly varying background, it fails to handle any fast, dynamic movement in the background. In this paper, we propose a generalised framework, called region-based MoG (RMoG), that takes into consideration neighbouring pixels while generating the model of the observed scene. The model equations are derived from Expectation Maximisation theory for batch mode, and stochastic approximation is used for online mode updates. We evaluate our region-based approach against ten sequences containing dynamic backgrounds, and show that the region-based approach provides a performance improvement over the traditional single pixel MoG. For feature and region sizes that are equal, the effect of increasing the learning rate is to reduce both true and false positives. Comparison with four state-of-the art approaches shows that RMoG outperforms the others in reducing false positives whilst still maintaining reasonable foreground definition. Lastly, using the ChangeDetection (CDNet 2014) benchmark, we evaluated RMoG against numerous surveillance scenes and found it to amongst the leading performers for dynamic background scenes, whilst providing comparable performance for other commonly occurring surveillance scenes.