968 resultados para Scale invariant feature transform (SIFT)
Resumo:
We consider brightness/contrast-invariant and rotation-discriminating template matching that searches an image to analyze A for a query image Q We propose to use the complex coefficients of the discrete Fourier transform of the radial projections to compute new rotation-invariant local features. These coefficients can be efficiently obtained via FFT. We classify templates in ""stable"" and ""unstable"" ones and argue that any local feature-based template matching may fail to find unstable templates. We extract several stable sub-templates of Q and find them in A by comparing the features. The matchings of the sub-templates are combined using the Hough transform. As the features of A are computed only once, the algorithm can find quickly many different sub-templates in A, and it is Suitable for finding many query images in A, multi-scale searching and partial occlusion-robust template matching. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
This paper proposes an automatic hand detection system that combines the Fourier-Mellin Transform along with other computer vision techniques to achieve hand detection in cluttered scene color images. The proposed system uses the Fourier-Mellin Transform as an invariant feature extractor to perform RST invariant hand detection. In a first stage of the system a simple non-adaptive skin color-based image segmentation and an interest point detector based on corners are used in order to identify regions of interest that contains possible matches. A sliding window algorithm is then used to scan the image at different scales performing the FMT calculations only in the previously detected regions of interest and comparing the extracted FM descriptor of the windows with a hand descriptors database obtained from a train image set. The results of the performed experiments suggest the use of Fourier-Mellin invariant features as a promising approach for automatic hand detection.
Resumo:
This paper proposes an automatic hand detection system that combines the Fourier-Mellin Transform along with other computer vision techniques to achieve hand detection in cluttered scene color images. The proposed system uses the Fourier-Mellin Transform as an invariant feature extractor to perform RST invariant hand detection. In a first stage of the system a simple non-adaptive skin color-based image segmentation and an interest point detector based on corners are used in order to identify regions of interest that contains possible matches. A sliding window algorithm is then used to scan the image at different scales performing the FMT calculations only in the previously detected regions of interest and comparing the extracted FM descriptor of the windows with a hand descriptors database obtained from a train image set. The results of the performed experiments suggest the use of Fourier-Mellin invariant features as a promising approach for automatic hand detection.
Resumo:
L’apprentissage supervisé de réseaux hiérarchiques à grande échelle connaît présentement un succès fulgurant. Malgré cette effervescence, l’apprentissage non-supervisé représente toujours, selon plusieurs chercheurs, un élément clé de l’Intelligence Artificielle, où les agents doivent apprendre à partir d’un nombre potentiellement limité de données. Cette thèse s’inscrit dans cette pensée et aborde divers sujets de recherche liés au problème d’estimation de densité par l’entremise des machines de Boltzmann (BM), modèles graphiques probabilistes au coeur de l’apprentissage profond. Nos contributions touchent les domaines de l’échantillonnage, l’estimation de fonctions de partition, l’optimisation ainsi que l’apprentissage de représentations invariantes. Cette thèse débute par l’exposition d’un nouvel algorithme d'échantillonnage adaptatif, qui ajuste (de fa ̧con automatique) la température des chaînes de Markov sous simulation, afin de maintenir une vitesse de convergence élevée tout au long de l’apprentissage. Lorsqu’utilisé dans le contexte de l’apprentissage par maximum de vraisemblance stochastique (SML), notre algorithme engendre une robustesse accrue face à la sélection du taux d’apprentissage, ainsi qu’une meilleure vitesse de convergence. Nos résultats sont présent ́es dans le domaine des BMs, mais la méthode est générale et applicable à l’apprentissage de tout modèle probabiliste exploitant l’échantillonnage par chaînes de Markov. Tandis que le gradient du maximum de vraisemblance peut-être approximé par échantillonnage, l’évaluation de la log-vraisemblance nécessite un estimé de la fonction de partition. Contrairement aux approches traditionnelles qui considèrent un modèle donné comme une boîte noire, nous proposons plutôt d’exploiter la dynamique de l’apprentissage en estimant les changements successifs de log-partition encourus à chaque mise à jour des paramètres. Le problème d’estimation est reformulé comme un problème d’inférence similaire au filtre de Kalman, mais sur un graphe bi-dimensionnel, où les dimensions correspondent aux axes du temps et au paramètre de température. Sur le thème de l’optimisation, nous présentons également un algorithme permettant d’appliquer, de manière efficace, le gradient naturel à des machines de Boltzmann comportant des milliers d’unités. Jusqu’à présent, son adoption était limitée par son haut coût computationel ainsi que sa demande en mémoire. Notre algorithme, Metric-Free Natural Gradient (MFNG), permet d’éviter le calcul explicite de la matrice d’information de Fisher (et son inverse) en exploitant un solveur linéaire combiné à un produit matrice-vecteur efficace. L’algorithme est prometteur: en terme du nombre d’évaluations de fonctions, MFNG converge plus rapidement que SML. Son implémentation demeure malheureusement inefficace en temps de calcul. Ces travaux explorent également les mécanismes sous-jacents à l’apprentissage de représentations invariantes. À cette fin, nous utilisons la famille de machines de Boltzmann restreintes “spike & slab” (ssRBM), que nous modifions afin de pouvoir modéliser des distributions binaires et parcimonieuses. Les variables latentes binaires de la ssRBM peuvent être rendues invariantes à un sous-espace vectoriel, en associant à chacune d’elles, un vecteur de variables latentes continues (dénommées “slabs”). Ceci se traduit par une invariance accrue au niveau de la représentation et un meilleur taux de classification lorsque peu de données étiquetées sont disponibles. Nous terminons cette thèse sur un sujet ambitieux: l’apprentissage de représentations pouvant séparer les facteurs de variations présents dans le signal d’entrée. Nous proposons une solution à base de ssRBM bilinéaire (avec deux groupes de facteurs latents) et formulons le problème comme l’un de “pooling” dans des sous-espaces vectoriels complémentaires.
Resumo:
This paper presents an empirical study of affine invariant feature detectors to perform matching on video sequences of people with non-rigid surface deformation. Recent advances in feature detection and wide baseline matching have focused on static scenes. Video frames of human movement capture highly non-rigid deformation such as loose hair, cloth creases, skin stretching and free flowing clothing. This study evaluates the performance of six widely used feature detectors for sparse temporal correspondence on single view and multiple view video sequences. Quantitative evaluation is performed of both the number of features detected and their temporal matching against and without ground truth correspondence. Recall-accuracy analysis of feature matching is reported for temporal correspondence on single view and multiple view sequences of people with variation in clothing and movement. This analysis identifies that existing feature detection and matching algorithms are unreliable for fast movement with common clothing.
Resumo:
The human visual system is able to effortlessly integrate local features to form our rich perception of patterns, despite the fact that visual information is discretely sampled by the retina and cortex. By using a novel perturbation technique, we show that the mechanisms by which features are integrated into coherent percepts are scale-invariant and nonlinear (phase and contrast polarity independent). They appear to operate by assigning position labels or “place tags” to each feature. Specifically, in the first series of experiments, we show that the positional tolerance of these place tags in foveal, and peripheral vision is about half the separation of the features, suggesting that the neural mechanisms that bind features into forms are quite robust to topographical jitter. In the second series of experiment, we asked how many stimulus samples are required for pattern identification by human and ideal observers. In human foveal vision, only about half the features are needed for reliable pattern interpolation. In this regard, human vision is quite efficient (ratio of ideal to real ≈ 0.75). Peripheral vision, on the other hand is rather inefficient, requiring more features, suggesting that the stimulus may be relatively underrepresented at the stage of feature integration.
Resumo:
Knowledge of the reflectivity of the sediment-covered seabed is of significant importance to marine seismic data acquisition and interpretation as it governs the generation of reverberations in the water layer. In this context pertinent, but largely unresolved, questions concern the importance of the typically very prominent vertical seismic velocity gradients as well as the potential presence and magnitude of anisotropy in soft surficial seabed sediments. To address these issues, we explore the seismic properties of granulometric end-member-type clastic sedimentary seabed models consisting of sand, silt, and clay as well as scale-invariant stochastic layer sequences of these components characterized by realistic vertical gradients of the P- and S-wave velocities. Using effective media theory, we then assess the nature and magnitude of seismic anisotropy associated with these models. Our results indicate that anisotropy is rather benign for P-waves, and that the S-wave velocities in the axial directions differ only slightly. Because of the very high P- to S-wave velocity ratios in the vicinity of the seabed our models nevertheless suggest that S-wave triplications may occur at very small incidence angles. To numerically evaluate the P-wave reflection coefficient of our seabed models, we apply a frequency-slowness technique to the corresponding synthetic seismic wavefields. Comparison with analytical plane-wave reflection coefficients calculated for corresponding isotropic elastic half-space models shows that the differences tend to be most pronounced in the vicinity of the elastic equivalent of the critical angle as well as in the post-critical range. We also find that the presence of intrinsic anisotropy in the clay component of our layered models tends to dramatically reduce the overall magnitude of the P-wave reflection coefficient as well as its variation with incidence angle.
Resumo:
We examine the scale invariants in the preparation of highly concentrated w/o emulsions at different scales and in varying conditions. The emulsions are characterized using rheological parameters, owing to their highly elastic behavior. We first construct and validate empirical models to describe the rheological properties. These models yield a reasonable prediction of experimental data. We then build an empirical scale-up model, to predict the preparation and composition conditions that have to be kept constant at each scale to prepare the same emulsion. For this purpose, three preparation scales with geometric similarity are used. The parameter N¿D^α, as a function of the stirring rate N, the scale (D, impeller diameter) and the exponent α (calculated empirically from the regression of all the experiments in the three scales), is defined as the scale invariant that needs to be optimized, once the dispersed phase of the emulsion, the surfactant concentration, and the dispersed phase addition time are set. As far as we know, no other study has obtained a scale invariant factor N¿Dα for the preparation of highly concentrated emulsions prepared at three different scales, which covers all three scales, different addition times and surfactant concentrations. The power law exponent obtained seems to indicate that the scale-up criterion for this system is the power input per unit volume (P/V).
Resumo:
Patient-specific biomechanical models including local bone mineral density and anisotropy have gained importance for assessing musculoskeletal disorders. However the trabecular bone anisotropy captured by high-resolution imaging is only available at the peripheral skeleton in clinical practice. In this work, we propose a supervised learning approach to predict trabecular bone anisotropy that builds on a novel set of pose invariant feature descriptors. The statistical relationship between trabecular bone anisotropy and feature descriptors were learned from a database of pairs of high resolution QCT and clinical QCT reconstructions. On a set of leave-one-out experiments, we compared the accuracy of the proposed approach to previous ones, and report a mean prediction error of 6% for the tensor norm, 6% for the degree of anisotropy and 19◦ for the principal tensor direction. These findings show the potential of the proposed approach to predict trabecular bone anisotropy from clinically available QCT images.
Resumo:
Symmetries have played an important role in a variety of problems in geology and geophysics. A large fraction of studies in mineralogy are devoted to the symmetry properties of crystals. In this paper, however, the emphasis will be on scale-invariant (fractal) symmetries. The earth’s topography is an example of both statistically self-similar and self-affine fractals. Landforms are also associated with drainage networks, which are statistical fractal trees. A universal feature of drainage networks and other growth networks is side branching. Deterministic space-filling networks with side-branching symmetries are illustrated. It is shown that naturally occurring drainage networks have symmetries similar to diffusion-limited aggregation clusters.
Resumo:
Marr's work offered guidelines on how to investigate vision (the theory - algorithm - implementation distinction), as well as specific proposals on how vision is done. Many of the latter have inevitably been superseded, but the approach was inspirational and remains so. Marr saw the computational study of vision as tightly linked to psychophysics and neurophysiology, but the last twenty years have seen some weakening of that integration. Because feature detection is a key stage in early human vision, we have returned to basic questions about representation of edges at coarse and fine scales. We describe an explicit model in the spirit of the primal sketch, but tightly constrained by psychophysical data. Results from two tasks (location-marking and blur-matching) point strongly to the central role played by second-derivative operators, as proposed by Marr and Hildreth. Edge location and blur are evaluated by finding the location and scale of the Gaussian-derivative `template' that best matches the second-derivative profile (`signature') of the edge. The system is scale-invariant, and accurately predicts blur-matching data for a wide variety of 1-D and 2-D images. By finding the best-fitting scale, it implements a form of local scale selection and circumvents the knotty problem of integrating filter outputs across scales. [Supported by BBSRC and the Wellcome Trust]
Cross-orientation masking is speed invariant between ocular pathways but speed dependent within them
Resumo:
In human (D. H. Baker, T. S. Meese, & R. J. Summers, 2007b) and in cat (B. Li, M. R. Peterson, J. K. Thompson, T. Duong, & R. D. Freeman, 2005; F. Sengpiel & V. Vorobyov, 2005) there are at least two routes to cross-orientation suppression (XOS): a broadband, non-adaptable, monocular (within-eye) pathway and a more narrowband, adaptable interocular (between the eyes) pathway. We further characterized these two routes psychophysically by measuring the weight of suppression across spatio-temporal frequency for cross-oriented pairs of superimposed flickering Gabor patches. Masking functions were normalized to unmasked detection thresholds and fitted by a two-stage model of contrast gain control (T. S. Meese, M. A. Georgeson, & D. H. Baker, 2006) that was developed to accommodate XOS. The weight of monocular suppression was a power function of the scalar quantity ‘speed’ (temporal-frequency/spatial-frequency). This weight can be expressed as the ratio of non-oriented magno- and parvo-like mechanisms, permitting a fast-acting, early locus, as befits the urgency for action associated with high retinal speeds. In contrast, dichoptic-masking functions superimposed. Overall, this (i) provides further evidence for dissociation between the two forms of XOS in humans, and (ii) indicates that the monocular and interocular varieties of XOS are space/time scale-dependent and scale-invariant, respectively. This suggests an image-processing role for interocular XOS that is tailored to natural image statistics—very different from that of the scale-dependent (speed-dependent) monocular variety.
Resumo:
Ernst Mach observed that light or dark bands could be seen at abrupt changes of luminance gradient in the absence of peaks or troughs in luminance. Many models of feature detection share the idea that bars, lines, and Mach bands are found at peaks and troughs in the output of even-symmetric spatial filters. Our experiments assessed the appearance of Mach bands (position and width) and the probability of seeing them on a novel set of generalized Gaussian edges. Mach band probability was mainly determined by the shape of the luminance profile and increased with the sharpness of its corners, controlled by a single parameter (n). Doubling or halving the size of the images had no significant effect. Variations in contrast (20%-80%) and duration (50-300 ms) had relatively minor effects. These results rule out the idea that Mach bands depend simply on the amplitude of the second derivative, but a multiscale model, based on Gaussian-smoothed first- and second-derivative filtering, can account accurately for the probability and perceived spatial layout of the bands. A key idea is that Mach band visibility depends on the ratio of second- to first-derivative responses at peaks in the second-derivative scale-space map. This ratio is approximately scale-invariant and increases with the sharpness of the corners of the luminance ramp, as observed. The edges of Mach bands pose a surprisingly difficult challenge for models of edge detection, but a nonlinear third-derivative operation is shown to predict the locations of Mach band edges strikingly well. Mach bands thus shed new light on the role of multiscale filtering systems in feature coding. © 2012 ARVO.
Resumo:
In this paper, a modification for the high-order neural network (HONN) is presented. Third order networks are considered for achieving translation, rotation and scale invariant pattern recognition. They require however much storage and computation power for the task. The proposed modified HONN takes into account a priori knowledge of the binary patterns that have to be learned, achieving significant gain in computation time and memory requirements. This modification enables the efficient computation of HONNs for image fields of greater that 100 × 100 pixels without any loss of pattern information.
Resumo:
2000 Mathematics Subject Classification: 26A33 (main), 44A40, 44A35, 33E30, 45J05, 45D05