952 resultados para Contour Integration, Psychophysics, Humans, Object Recognition, Cue Summation
Resumo:
Ziel der Arbeit ist die Analyse von Prinzipien der Konturintegration im menschlichen visuellen System. Die perzeptuelle Verbindung benachbarter Teile in einer visuellen Szene zu einem Ganzen wird durch zwei gestalttheoretisch begründete Propositionen gekennzeichnet, die komplementäre lokale Mechanismen der Konturintegration beschreiben. Das erste Prinzip der Konturintegration fordert, dass lokale Ähnlichkeit von Elementen in einem anderen Merkmal als Orientierung nicht hinreicht für die Entdeckung von Konturen, sondern ein zusätzlicher statistischer Merkmalsunterschied von Konturelementen und Umgebung vorliegen muss, um Konturentdeckung zu ermöglichen. Das zweite Prinzip der Konturintegration behauptet, dass eine kollineare Ausrichtung von Konturelementen für Konturintegration hinreicht, und es bei deren Vorliegen zu robuster Konturintegrationsleistung kommt, auch wenn die lokalen merkmalstragenden Elemente in anderen Merkmalen in hohem Maße zufällig variieren und damit keine nachbarschaftliche Ähnlichkeitsbeziehung entlang der Kontur aufweisen. Als empirische Grundlage für die beiden vorgeschlagenen Prinzipien der Konturintegration werden drei Experimente berichtet, die zunächst die untergeordnete Rolle globaler Konturmerkmale wie Geschlossenheit bei der Konturentdeckung aufweisen und daraufhin die Bedeutung lokaler Mechanismen für die Konturintegration anhand der Merkmale Kollinearität, Ortsfrequenz sowie der spezifischen Art der Interaktion zwischen beiden Merkmalen beleuchten. Im ersten Experiment wird das globale Merkmal der Geschlossenheit untersucht und gezeigt, dass geschlossene Konturen nicht effektiver entdeckt werden als offene Konturen. Das zweite Experiment zeigt die Robustheit von über Kollinearität definierten Konturen über die zufällige Variation im Merkmal Ortsfrequenz entlang der Kontur und im Hintergrund, sowie die Unmöglichkeit der Konturintegration bei nachbarschaftlicher Ähnlichkeit der Konturelemente, wenn Ähnlichkeit statt über kollineare Orientierung über gleiche Ortsfrequenzen realisiert ist. Im dritten Experiment wird gezeigt, dass eine redundante Kombination von kollinearer Orientierung mit einem statistischen Unterschied im Merkmal Ortsfrequenz zu erheblichen Sichtbarkeitsgewinnen bei der Konturentdeckung führt. Aufgrund der Stärke der Summationswirkung wird vorgeschlagen, dass durch die Kombination mehrerer Hinweisreize neue kortikale Mechanismen angesprochen werden, die die Konturentdeckung unterstützen. Die Resultate der drei Experimente werden in den Kontext aktueller Forschung zur Objektwahrnehmung gestellt und ihre Bedeutung für die postulierten allgemeinen Prinzipien visueller Gruppierung in der Konturintegration diskutiert. Anhand phänomenologischer Beispiele mit anderen Merkmalen als Orientierung und Ortsfrequenz wird gezeigt, dass die gefundenen Prinzipien Generalisierbarkeit für die Verarbeitung von Konturen im visuellen System beanspruchen können.
Resumo:
There is evidence for the late development in humans of configural face and animal recognition. We show that the recognition of artificial three-dimensional (3D) objects from part configurations develops similarly late. We also demonstrate that the cross-modal integration of object information reinforces the development of configural recognition more than the intra-modal integration does. Multimodal object representations in the brain may therefore play a role in configural object recognition. © 2003 Elsevier B.V. All rights reserved.
Resumo:
Perceiving the world visually is a basic act for humans, but for computers it is still an unsolved problem. The variability present innatural environments is an obstacle for effective computer vision. The goal of invariant object recognition is to recognise objects in a digital image despite variations in, for example, pose, lighting or occlusion. In this study, invariant object recognition is considered from the viewpoint of feature extraction. Thedifferences between local and global features are studied with emphasis on Hough transform and Gabor filtering based feature extraction. The methods are examined with respect to four capabilities: generality, invariance, stability, and efficiency. Invariant features are presented using both Hough transform and Gabor filtering. A modified Hough transform technique is also presented where the distortion tolerance is increased by incorporating local information. In addition, methods for decreasing the computational costs of the Hough transform employing parallel processing and local information are introduced.
Resumo:
This thesis describes the development of a model-based vision system that exploits hierarchies of both object structure and object scale. The focus of the research is to use these hierarchies to achieve robust recognition based on effective organization and indexing schemes for model libraries. The goal of the system is to recognize parameterized instances of non-rigid model objects contained in a large knowledge base despite the presence of noise and occlusion. Robustness is achieved by developing a system that can recognize viewed objects that are scaled or mirror-image instances of the known models or that contain components sub-parts with different relative scaling, rotation, or translation than in models. The approach taken in this thesis is to develop an object shape representation that incorporates a component sub-part hierarchy- to allow for efficient and correct indexing into an automatically generated model library as well as for relative parameterization among sub-parts, and a scale hierarchy- to allow for a general to specific recognition procedure. After analysis of the issues and inherent tradeoffs in the recognition process, a system is implemented using a representation based on significant contour curvature changes and a recognition engine based on geometric constraints of feature properties. Examples of the system's performance are given, followed by an analysis of the results. In conclusion, the system's benefits and limitations are presented.
Resumo:
Many 3D objects in the world around us are strongly constrained. For instance, not only cultural artifacts but also many natural objects are bilaterally symmetric. Thoretical arguments suggest and psychophysical experiments confirm that humans may be better in the recognition of symmetric objects. The hypothesis of symmetry-induced virtual views together with a network model that successfully accounts for human recognition of generic 3D objects leads to predictions that we have verified with psychophysical experiments.
Resumo:
Report for the scientific sojourn at the Swiss Federal Institute of Technology Zurich, Switzerland, between September and December 2007. In order to make robots useful assistants for our everyday life, the ability to learn and recognize objects is of essential importance. However, object recognition in real scenes is one of the most challenging problems in computer vision, as it is necessary to deal with difficulties. Furthermore, in mobile robotics a new challenge is added to the list: computational complexity. In a dynamic world, information about the objects in the scene can become obsolete before it is ready to be used if the detection algorithm is not fast enough. Two recent object recognition techniques have achieved notable results: the constellation approach proposed by Lowe and the bag of words approach proposed by Nistér and Stewénius. The Lowe constellation approach is the one currently being used in the robot localization project of the COGNIRON project. This report is divided in two main sections. The first section is devoted to briefly review the currently used object recognition system, the Lowe approach, and bring to light the drawbacks found for object recognition in the context of indoor mobile robot navigation. Additionally the proposed improvements for the algorithm are described. In the second section the alternative bag of words method is reviewed, as well as several experiments conducted to evaluate its performance with our own object databases. Furthermore, some modifications to the original algorithm to make it suitable for object detection in unsegmented images are proposed.
Resumo:
Despite myriad studies, neurophysiologic mechanisms mediating illusory contour (IC) sensitivity remain controversial. Among the competing models one favors feed-forward effects within lower-tier cortices (V1/V2). Another situates IC sensitivity first within higher-tier cortices, principally lateral-occipital cortices (LOC), with later feedback effects in V1/V2. Still others postulate that LOC are sensitive to salient regions demarcated by the inducing stimuli, whereas V1/V2 effects specifically support IC sensitivity. We resolved these discordances by using misaligned line gratings, oriented either horizontally or vertically, to induce ICs. Line orientation provides an established assay of V1/V2 modulations independently of IC presence, and gratings lack salient regions. Electrical neuroimaging analyses of visual evoked potentials (VEPs) disambiguated the relative timing and localization of IC sensitivity with respect to that for grating orientation. Millisecond-by-millisecond analyses of VEPs and distributed source estimations revealed a main effect of grating orientation beginning at 65 ms post-stimulus onset within the calcarine sulcus that was followed by a main effect of IC presence beginning at 85 ms post-stimulus onset within the LOC. There was no evidence for differential processing of ICs as a function of the orientation of the grating. These results support models wherein IC sensitivity occurs first within the LOC.
Resumo:
Résumé: Les récents progrès techniques de l'imagerie cérébrale non invasives ont permis d'améliorer la compréhension des différents systèmes fonctionnels cérébraux. Les approches multimodales sont devenues indispensables en recherche, afin d'étudier dans sa globalité les différentes caractéristiques de l'activité neuronale qui sont à la base du fonctionnement cérébral. Dans cette étude combinée d'imagerie par résonance magnétique fonctionnelle (IRMf) et d'électroencéphalographie (EEG), nous avons exploité le potentiel de chacune d'elles, soit respectivement la résolution spatiale et temporelle élevée. Les processus cognitifs, de perception et de mouvement nécessitent le recrutement d'ensembles neuronaux. Dans la première partie de cette thèse nous étudions, grâce à la combinaison des techniques IRMf et EEG, la réponse des aires visuelles lors d'une stimulation qui demande le regroupement d'éléments cohérents appartenant aux deux hémi-champs visuels pour en faire une seule image. Nous utilisons une mesure de synchronisation (EEG de cohérence) comme quantification de l'intégration spatiale inter-hémisphérique et la réponse BOLD (Blood Oxygenation Level Dependent) pour évaluer l'activité cérébrale qui en résulte. L'augmentation de la cohérence de l'EEG dans la bande beta-gamma mesurée au niveau des électrodes occipitales et sa corrélation linéaire avec la réponse BOLD dans les aires de VP/V4, reflète et visualise un ensemble neuronal synchronisé qui est vraisemblablement impliqué dans le regroupement spatial visuel. Ces résultats nous ont permis d'étendre la recherche à l'étude de l'impact que le contenu en fréquence des stimuli a sur la synchronisation. Avec la même approche, nous avons donc identifié les réseaux qui montrent une sensibilité différente à l'intégration des caractéristiques globales ou détaillées des images. En particulier, les données montrent que l'implication des réseaux visuels ventral et dorsal est modulée par le contenu en fréquence des stimuli. Dans la deuxième partie nous avons a testé l'hypothèse que l'augmentation de l'activité cérébrale pendant le processus de regroupement inter-hémisphérique dépend de l'activité des axones calleux qui relient les aires visuelles. Comme le Corps Calleux présente une maturation progressive pendant les deux premières décennies, nous avons analysé le développement de la fonction d'intégration spatiale chez des enfants âgés de 7 à 13 ans et le rôle de la myelinisation des fibres calleuses dans la maturation de l'activité visuelle. Nous avons combiné l'IRMf et la technique de MTI (Magnetization Transfer Imaging) afin de suivre les signes de maturation cérébrale respectivement sous l'aspect fonctionnel et morphologique (myelinisation). Chez lés enfants, les activations associées au processus d'intégration entre les hémi-champs visuels sont, comme chez l'adulte, localisées dans le réseau ventral mais se limitent à une zone plus restreinte. La forte corrélation que le signal BOLD montre avec la myelinisation des fibres du splenium est le signe de la dépendance entre la maturation des fonctions visuelles de haut niveau et celle des connections cortico-corticales. Abstract: Recent advances in non-invasive brain imaging allow the visualization of the different aspects of complex brain dynamics. The approaches based on a combination of imaging techniques facilitate the investigation and the link of multiple aspects of information processing. They are getting a leading tool for understanding the neural basis of various brain functions. Perception, motion, and cognition involve the formation of cooperative neuronal assemblies distributed over the cerebral cortex. In this research, we explore the characteristics of interhemispheric assemblies in the visual brain by taking advantage of the complementary characteristics provided by EEG (electroencephalography) and fMRI (Functional Magnetic Resonance Imaging) techniques. These are the high temporal resolution for EEG and high spatial resolution for fMRI. In the first part of this thesis we investigate the response of the visual areas to the interhemispheric perceptual grouping task. We use EEG coherence as a measure of synchronization and BOLD (Blood Oxygenar tion Level Dependent) response as a measure of the related brain activation. The increase of the interhemispheric EEG coherence restricted to the occipital electrodes and to the EEG beta band and its linear relation to the BOLD responses in VP/V4 area points to a trans-hemispheric synchronous neuronal assembly involved in early perceptual grouping. This result encouraged us to explore the formation of synchronous trans-hemispheric networks induced by the stimuli of various spatial frequencies with this multimodal approach. We have found the involvement of ventral and medio-dorsal visual networks modulated by the spatial frequency content of the stimulus. Thus, based on the combination of EEG coherence and fMRI BOLD data, we have identified visual networks with different sensitivity to integrating low vs. high spatial frequencies. In the second part of this work we test the hypothesis that the increase of brain activity during perceptual grouping depends on the activity of callosal axons interconnecting the visual areas that are involved. To this end, in children of 7-13 years, we investigated functional (functional activation with fMRI) and morphological (myelination of the corpus callosum with Magnetization Transfer Imaging (MTI)) aspects of spatial integration. In children, the activation associated with the spatial integration across visual fields was localized in visual ventral stream and limited to a part of the area activated in adults. The strong correlation between individual BOLD responses in .this area and the myelination of the splenial system of fibers points to myelination as a significant factor in the development of the spatial integration ability.
Resumo:
The report describes a recognition system called GROPER, which performs grouping by using distance and relative orientation constraints that estimate the likelihood of different edges in an image coming from the same object. The thesis presents both a theoretical analysis of the grouping problem and a practical implementation of a grouping system. GROPER also uses an indexing module to allow it to make use of knowledge of different objects, any of which might appear in an image. We test GROPER by comparing it to a similar recognition system that does not use grouping.
Resumo:
Two formulations of model-based object recognition are described. MAP Model Matching evaluates joint hypotheses of match and pose, while Posterior Marginal Pose Estimation evaluates the pose only. Local search in pose space is carried out with the Expectation--Maximization (EM) algorithm. Recognition experiments are described where the EM algorithm is used to refine and evaluate pose hypotheses in 2D and 3D. Initial hypotheses for the 2D experiments were generated by a simple indexing method: Angle Pair Indexing. The Linear Combination of Views method of Ullman and Basri is employed as the projection model in the 3D experiments.
Resumo:
A key problem in object recognition is selection, namely, the problem of identifying regions in an image within which to start the recognition process, ideally by isolating regions that are likely to come from a single object. Such a selection mechanism has been found to be crucial in reducing the combinatorial search involved in the matching stage of object recognition. Even though selection is of help in recognition, it has largely remained unsolved because of the difficulty in isolating regions belonging to objects under complex imaging conditions involving occlusions, changing illumination, and object appearances. This thesis presents a novel approach to the selection problem by proposing a computational model of visual attentional selection as a paradigm for selection in recognition. In particular, it proposes two modes of attentional selection, namely, attracted and pay attention modes as being appropriate for data and model-driven selection in recognition. An implementation of this model has led to new ways of extracting color, texture and line group information in images, and their subsequent use in isolating areas of the scene likely to contain the model object. Among the specific results in this thesis are: a method of specifying color by perceptual color categories for fast color region segmentation and color-based localization of objects, and a result showing that the recognition of texture patterns on model objects is possible under changes in orientation and occlusions without detailed segmentation. The thesis also presents an evaluation of the proposed model by integrating with a 3D from 2D object recognition system and recording the improvement in performance. These results indicate that attentional selection can significantly overcome the computational bottleneck in object recognition, both due to a reduction in the number of features, and due to a reduction in the number of matches during recognition using the information derived during selection. Finally, these studies have revealed a surprising use of selection, namely, in the partial solution of the pose of a 3D object.
Resumo:
Object recognition is complicated by clutter, occlusion, and sensor error. Since pose hypotheses are based on image feature locations, these effects can lead to false negatives and positives. In a typical recognition algorithm, pose hypotheses are tested against the image, and a score is assigned to each hypothesis. We use a statistical model to determine the score distribution associated with correct and incorrect pose hypotheses, and use binary hypothesis testing techniques to distinguish between them. Using this approach we can compare algorithms and noise models, and automatically choose values for internal system thresholds to minimize the probability of making a mistake.
Resumo:
This thesis presents a statistical framework for object recognition. The framework is motivated by the pictorial structure models introduced by Fischler and Elschlager nearly 30 years ago. The basic idea is to model an object by a collection of parts arranged in a deformable configuration. The appearance of each part is modeled separately, and the deformable configuration is represented by spring-like connections between pairs of parts. These models allow for qualitative descriptions of visual appearance, and are suitable for generic recognition problems. The problem of detecting an object in an image and the problem of learning an object model using training examples are naturally formulated under a statistical approach. We present efficient algorithms to solve these problems in our framework. We demonstrate our techniques by training models to represent faces and human bodies. The models are then used to locate the corresponding objects in novel images.
Resumo:
The HMAX model has recently been proposed by Riesenhuber & Poggio as a hierarchical model of position- and size-invariant object recognition in visual cortex. It has also turned out to model successfully a number of other properties of the ventral visual stream (the visual pathway thought to be crucial for object recognition in cortex), and particularly of (view-tuned) neurons in macaque inferotemporal cortex, the brain area at the top of the ventral stream. The original modeling study only used ``paperclip'' stimuli, as in the corresponding physiology experiment, and did not explore systematically how model units' invariance properties depended on model parameters. In this study, we aimed at a deeper understanding of the inner workings of HMAX and its performance for various parameter settings and ``natural'' stimulus classes. We examined HMAX responses for different stimulus sizes and positions systematically and found a dependence of model units' responses on stimulus position for which a quantitative description is offered. Interestingly, we find that scale invariance properties of hierarchical neural models are not independent of stimulus class, as opposed to translation invariance, even though both are affine transformations within the image plane.
Resumo:
A persistent issue of debate in the area of 3D object recognition concerns the nature of the experientially acquired object models in the primate visual system. One prominent proposal in this regard has expounded the use of object centered models, such as representations of the objects' 3D structures in a coordinate frame independent of the viewing parameters [Marr and Nishihara, 1978]. In contrast to this is another proposal which suggests that the viewing parameters encountered during the learning phase might be inextricably linked to subsequent performance on a recognition task [Tarr and Pinker, 1989; Poggio and Edelman, 1990]. The 'object model', according to this idea, is simply a collection of the sample views encountered during training. Given that object centered recognition strategies have the attractive feature of leading to viewpoint independence, they have garnered much of the research effort in the field of computational vision. Furthermore, since human recognition performance seems remarkably robust in the face of imaging variations [Ellis et al., 1989], it has often been implicitly assumed that the visual system employs an object centered strategy. In the present study we examine this assumption more closely. Our experimental results with a class of novel 3D structures strongly suggest the use of a view-based strategy by the human visual system even when it has the opportunity of constructing and using object-centered models. In fact, for our chosen class of objects, the results seem to support a stronger claim: 3D object recognition is 2D view-based.