951 resultados para Visual Object Recognition
Resumo:
The report describes a recognition system called GROPER, which performs grouping by using distance and relative orientation constraints that estimate the likelihood of different edges in an image coming from the same object. The thesis presents both a theoretical analysis of the grouping problem and a practical implementation of a grouping system. GROPER also uses an indexing module to allow it to make use of knowledge of different objects, any of which might appear in an image. We test GROPER by comparing it to a similar recognition system that does not use grouping.
Resumo:
The HMAX model has recently been proposed by Riesenhuber & Poggio as a hierarchical model of position- and size-invariant object recognition in visual cortex. It has also turned out to model successfully a number of other properties of the ventral visual stream (the visual pathway thought to be crucial for object recognition in cortex), and particularly of (view-tuned) neurons in macaque inferotemporal cortex, the brain area at the top of the ventral stream. The original modeling study only used ``paperclip'' stimuli, as in the corresponding physiology experiment, and did not explore systematically how model units' invariance properties depended on model parameters. In this study, we aimed at a deeper understanding of the inner workings of HMAX and its performance for various parameter settings and ``natural'' stimulus classes. We examined HMAX responses for different stimulus sizes and positions systematically and found a dependence of model units' responses on stimulus position for which a quantitative description is offered. Interestingly, we find that scale invariance properties of hierarchical neural models are not independent of stimulus class, as opposed to translation invariance, even though both are affine transformations within the image plane.
Resumo:
Numerous psychophysical experiments have shown an important role for attentional modulations in vision. Behaviorally, allocation of attention can improve performance in object detection and recognition tasks. At the neural level, attention increases firing rates of neurons in visual cortex whose preferred stimulus is currently attended to. However, it is not yet known how these two phenomena are linked, i.e., how the visual system could be "tuned" in a task-dependent fashion to improve task performance. To answer this question, we performed simulations with the HMAX model of object recognition in cortex [45]. We modulated firing rates of model neurons in accordance with experimental results about effects of feature-based attention on single neurons and measured changes in the model's performance in a variety of object recognition tasks. It turned out that recognition performance could only be improved under very limited circumstances and that attentional influences on the process of object recognition per se tend to display a lack of specificity or raise false alarm rates. These observations lead us to postulate a new role for the observed attention-related neural response modulations.
Resumo:
This thesis presents there important results in visual object recognition based on shape. (1) A new algorithm (RAST; Recognition by Adaptive Sudivisions of Tranformation space) is presented that has lower average-case complexity than any known recognition algorithm. (2) It is shown, both theoretically and empirically, that representing 3D objects as collections of 2D views (the "View-Based Approximation") is feasible and affects the reliability of 3D recognition systems no more than other commonly made approximations. (3) The problem of recognition in cluttered scenes is considered from a Bayesian perspective; the commonly-used "bounded-error errorsmeasure" is demonstrated to correspond to an independence assumption. It is shown that by modeling the statistical properties of real-scenes better, objects can be recognized more reliably.
Resumo:
This workshop paper reports recent developments to a vision system for traffic interpretation which relies extensively on the use of geometrical and scene context. Firstly, a new approach to pose refinement is reported, based on forces derived from prominent image derivatives found close to an initial hypothesis. Secondly, a parameterised vehicle model is reported, able to represent different vehicle classes. This general vehicle model has been fitted to sample data, and subjected to a Principal Component Analysis to create a deformable model of common car types having 6 parameters. We show that the new pose recovery technique is also able to operate on the PCA model, to allow the structure of an initial vehicle hypothesis to be adapted to fit the prevailing context. We report initial experiments with the model, which demonstrate significant improvements to pose recovery.
Resumo:
Multisensory memory traces established via single-trial exposures can impact subsequent visual object recognition. This impact appears to depend on the meaningfulness of the initial multisensory pairing, implying that multisensory exposures establish distinct object representations that are accessible during later unisensory processing. Multisensory contexts may be particularly effective in influencing auditory discrimination, given the purportedly inferior recognition memory in this sensory modality. The possibility of this generalization and the equivalence of effects when memory discrimination was being performed in the visual vs. auditory modality were at the focus of this study. First, we demonstrate that visual object discrimination is affected by the context of prior multisensory encounters, replicating and extending previous findings by controlling for the probability of multisensory contexts during initial as well as repeated object presentations. Second, we provide the first evidence that single-trial multisensory memories impact subsequent auditory object discrimination. Auditory object discrimination was enhanced when initial presentations entailed semantically congruent multisensory pairs and was impaired after semantically incongruent multisensory encounters, compared to sounds that had been encountered only in a unisensory manner. Third, the impact of single-trial multisensory memories upon unisensory object discrimination was greater when the task was performed in the auditory vs. visual modality. Fourth, there was no evidence for correlation between effects of past multisensory experiences on visual and auditory processing, suggestive of largely independent object processing mechanisms between modalities. We discuss these findings in terms of the conceptual short term memory (CSTM) model and predictive coding. Our results suggest differential recruitment and modulation of conceptual memory networks according to the sensory task at hand.
Resumo:
Previous functional imaging studies have shown that facilitated processing of a visual object on repeated, relative to initial, presentation (i.e., repetition priming) is associated with reductions in neural activity in multiple regions, including fusiforin/lateral occipital cortex. Moreover, activity reductions have been found, at diminished levels, when a different exemplar of an object is presented on repetition. In one previous study, the magnitude of diminished priming across exemplars was greater in the right relative to the left fusiform, suggesting greater exemplar specificity in the right. Another previous study, however, observed fusiform lateralization modulated by object viewpoint, but not object exemplar. The present fMRI study sought to determine whether the result of differential fusiform responses for perceptually different exemplars could be replicated. Furthermore, the role of the left fusiform cortex in object recognition was investigated via the inclusion of a lexical/semantic manipulation. Right fusiform cortex showed a significantly greater effect of exemplar change than left fusiform, replicating the previous result of exemplar-specific fusiform lateralization. Right fusiform and lateral occipital cortex were not differentially engaged by the lexical/semantic manipulation, suggesting that their role in visual object recognition is predominantly in the. C visual discrimination of specific objects. Activation in left fusiform cortex, but not left lateral occipital cortex, was modulated by both exemplar change and lexical/semantic manipulation, with further analysis suggesting a posterior-to-anterior progression between regions involved in processing visuoperceptual and lexical/semantic information about objects. The results are consistent with the view that the right fusiform plays a greater role in processing specific visual form information about objects, whereas the left fusiform is also involved in lexical/semantic processing. (C) 2003 Elsevier Science (USA). All rights reserved.
Resumo:
Human object recognition is generally considered to tolerate changes of the stimulus position in the visual field. A number of recent studies, however, have cast doubt on the completeness of translation invariance. In a new series of experiments we tried to investigate whether positional specificity of short-term memory is a general property of visual perception. We tested same/different discrimination of computer graphics models that were displayed at the same or at different locations of the visual field, and found complete translation invariance, regardless of the similarity of the animals and irrespective of direction and size of the displacement (Exp. 1 and 2). Decisions were strongly biased towards same decisions if stimuli appeared at a constant location, while after translation subjects displayed a tendency towards different decisions. Even if the spatial order of animal limbs was randomized ("scrambled animals"), no deteriorating effect of shifts in the field of view could be detected (Exp. 3). However, if the influence of single features was reduced (Exp. 4 and 5) small but significant effects of translation could be obtained. Under conditions that do not reveal an influence of translation, rotation in depth strongly interferes with recognition (Exp. 6). Changes of stimulus size did not reduce performance (Exp. 7). Tolerance to these object transformations seems to rely on different brain mechanisms, with translation and scale invariance being achieved in principle, while rotation invariance is not.
Resumo:
Recent experimental studies have shown that development towards adult performance levels in configural processing in object recognition is delayed through middle childhood. Whilst partchanges to animal and artefact stimuli are processed with similar to adult levels of accuracy from 7 years of age, relative size changes to stimuli result in a significant decrease in relative performance for participants aged between 7 and 10. Two sets of computational experiments were run using the JIM3 artificial neural network with adult and 'immature' versions to simulate these results. One set progressively decreased the number of neurons involved in the representation of view-independent metric relations within multi-geon objects. A second set of computational experiments involved decreasing the number of neurons that represent view-dependent (nonrelational) object attributes in JIM3's Surface Map. The simulation results which show the best qualitative match to empirical data occurred when artificial neurons representing metric-precision relations were entirely eliminated. These results therefore provide further evidence for the late development of relational processing in object recognition and suggest that children in middle childhood may recognise objects without forming structural description representations.
Resumo:
Single-trial encounters with multisensory stimuli affect both memory performance and early-latency brain responses to visual stimuli. Whether and how auditory cortices support memory processes based on single-trial multisensory learning is unknown and may differ qualitatively and quantitatively from comparable processes within visual cortices due to purported differences in memory capacities across the senses. We recorded event-related potentials (ERPs) as healthy adults (n = 18) performed a continuous recognition task in the auditory modality, discriminating initial (new) from repeated (old) sounds of environmental objects. Initial presentations were either unisensory or multisensory; the latter entailed synchronous presentation of a semantically congruent or a meaningless image. Repeated presentations were exclusively auditory, thus differing only according to the context in which the sound was initially encountered. Discrimination abilities (indexed by d') were increased for repeated sounds that were initially encountered with a semantically congruent image versus sounds initially encountered with either a meaningless or no image. Analyses of ERPs within an electrical neuroimaging framework revealed that early stages of auditory processing of repeated sounds were affected by prior single-trial multisensory contexts. These effects followed from significantly reduced activity within a distributed network, including the right superior temporal cortex, suggesting an inverse relationship between brain activity and behavioural outcome on this task. The present findings demonstrate how auditory cortices contribute to long-term effects of multisensory experiences on auditory object discrimination. We propose a new framework for the efficacy of multisensory processes to impact both current multisensory stimulus processing and unisensory discrimination abilities later in time.
Resumo:
Local features are used in many computer vision tasks including visual object categorization, content-based image retrieval and object recognition to mention a few. Local features are points, blobs or regions in images that are extracted using a local feature detector. To make use of extracted local features the localized interest points are described using a local feature descriptor. A descriptor histogram vector is a compact representation of an image and can be used for searching and matching images in databases. In this thesis the performance of local feature detectors and descriptors is evaluated for object class detection task. Features are extracted from image samples belonging to several object classes. Matching features are then searched using random image pairs of a same class. The goal of this thesis is to find out what are the best detector and descriptor methods for such task in terms of detector repeatability and descriptor matching rate.
Resumo:
A key problem in object recognition is selection, namely, the problem of identifying regions in an image within which to start the recognition process, ideally by isolating regions that are likely to come from a single object. Such a selection mechanism has been found to be crucial in reducing the combinatorial search involved in the matching stage of object recognition. Even though selection is of help in recognition, it has largely remained unsolved because of the difficulty in isolating regions belonging to objects under complex imaging conditions involving occlusions, changing illumination, and object appearances. This thesis presents a novel approach to the selection problem by proposing a computational model of visual attentional selection as a paradigm for selection in recognition. In particular, it proposes two modes of attentional selection, namely, attracted and pay attention modes as being appropriate for data and model-driven selection in recognition. An implementation of this model has led to new ways of extracting color, texture and line group information in images, and their subsequent use in isolating areas of the scene likely to contain the model object. Among the specific results in this thesis are: a method of specifying color by perceptual color categories for fast color region segmentation and color-based localization of objects, and a result showing that the recognition of texture patterns on model objects is possible under changes in orientation and occlusions without detailed segmentation. The thesis also presents an evaluation of the proposed model by integrating with a 3D from 2D object recognition system and recording the improvement in performance. These results indicate that attentional selection can significantly overcome the computational bottleneck in object recognition, both due to a reduction in the number of features, and due to a reduction in the number of matches during recognition using the information derived during selection. Finally, these studies have revealed a surprising use of selection, namely, in the partial solution of the pose of a 3D object.
Resumo:
This thesis presents a statistical framework for object recognition. The framework is motivated by the pictorial structure models introduced by Fischler and Elschlager nearly 30 years ago. The basic idea is to model an object by a collection of parts arranged in a deformable configuration. The appearance of each part is modeled separately, and the deformable configuration is represented by spring-like connections between pairs of parts. These models allow for qualitative descriptions of visual appearance, and are suitable for generic recognition problems. The problem of detecting an object in an image and the problem of learning an object model using training examples are naturally formulated under a statistical approach. We present efficient algorithms to solve these problems in our framework. We demonstrate our techniques by training models to represent faces and human bodies. The models are then used to locate the corresponding objects in novel images.
Resumo:
A persistent issue of debate in the area of 3D object recognition concerns the nature of the experientially acquired object models in the primate visual system. One prominent proposal in this regard has expounded the use of object centered models, such as representations of the objects' 3D structures in a coordinate frame independent of the viewing parameters [Marr and Nishihara, 1978]. In contrast to this is another proposal which suggests that the viewing parameters encountered during the learning phase might be inextricably linked to subsequent performance on a recognition task [Tarr and Pinker, 1989; Poggio and Edelman, 1990]. The 'object model', according to this idea, is simply a collection of the sample views encountered during training. Given that object centered recognition strategies have the attractive feature of leading to viewpoint independence, they have garnered much of the research effort in the field of computational vision. Furthermore, since human recognition performance seems remarkably robust in the face of imaging variations [Ellis et al., 1989], it has often been implicitly assumed that the visual system employs an object centered strategy. In the present study we examine this assumption more closely. Our experimental results with a class of novel 3D structures strongly suggest the use of a view-based strategy by the human visual system even when it has the opportunity of constructing and using object-centered models. In fact, for our chosen class of objects, the results seem to support a stronger claim: 3D object recognition is 2D view-based.
Resumo:
This paper sketches a hypothetical cortical architecture for visual 3D object recognition based on a recent computational model. The view-centered scheme relies on modules for learning from examples, such as Hyperbf-like networks. Such models capture a class of explanations we call Memory-Based Models (MBM) that contains sparse population coding, memory-based recognition, and codebooks of prototypes. Unlike the sigmoidal units of some artificial neural networks, the units of MBMs are consistent with the description of cortical neurons. We describe how an example of MBM may be realized in terms of cortical circuitry and biophysical mechanisms, consistent with psychophysical and physiological data.