946 resultados para Invariant Object Recognition


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Different approaches to visual object recognition can be divided into two general classes: model-based vs. non model-based schemes. In this paper we establish some limitation on the class of non model-based recognition schemes. We show that every function that is invariant to viewing position of all objects is the trivial (constant) function. It follows that every consistent recognition scheme for recognizing all 3-D objects must in general be model based. The result is extended to recognition schemes that are imperfect (allowed to make mistakes) or restricted to certain classes of objects.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Anterior inferotemporal cortex (ITa) plays a key role in visual object recognition. Recognition is tolerant to object position, size, and view changes, yet recent neurophysiological data show ITa cells with high object selectivity often have low position tolerance, and vice versa. A neural model learns to simulate both this tradeoff and ITa responses to image morphs using large-scale and small-scale IT cells whose population properties may support invariant recognition.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present an improved model for line and edge detection in cortical area V1. This model is based on responses of simple and complex cells, and it is multi-scale with no free parameters. We illustrate the use of the multi-scale line/edge representation in different processes: visual reconstruction or brightness perception, automatic scale selection and object segregation. A two-level object categorization scenario is tested in which pre-categorization is based on coarse scales only and final categorization on coarse plus fine scales. We also present a multi-scale object and face recognition model. Processing schemes are discussed in the framework of a complete cortical architecture. The fact that brightness perception and object recognition may be based on the same symbolic image representation is an indication that the entire (visual) cortex is involved in consciousness.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Adult monkeys (Macaca mulatta) with lesions of the hippocampal formation, perirhinal cortex, areas TH/TF, as well as controls were tested on tasks of object, spatial and contextual recognition memory. ^ Using a visual paired-comparison (VPC) task, all experimental groups showed a lack of object recognition relative to controls, although this impairment emerged at 10 sec with perirhinal lesions, 30 sec with areas TH/TF lesions and 60 sec with hippocampal lesions. In contrast, only perirhinal lesions impaired performance on delayed nonmatching-to-sample (DNMS), another task of object recognition memory. All groups were tested on DNMS with distraction (dDNMS) to examine whether the use of active cognitive strategies during the delay period could enable good performance on DNMS in spite of impaired recognition memory (revealed by the VPC task). Distractors affected performance of animals with perirhinal lesions at the 10-sec delay (the only delay in which their DNMS performance was above chance). They did not affect performance of animals with areas TH/TF lesions. Hippocampectomized animals were impaired at the 600-sec delay (the only delay at which prevention of active strategies would likely affect their behavior). ^ While lesions of areas TH/TF impaired spatial location memory and object-in-place memory, hippocampal lesions impaired only object-in-place memory. The pattern of results for perirhinal cortex lesions on the different task conditions indicated that this cortical area is not critical for spatial memory. ^ Finally, all three lesions impaired contextual recognition memory processes. The pattern of impairment appeared to result from the formation of only a global representation of the object and background, and suggests that all three areas are recruited for associating information across sources. ^ These results support the view that (1) the perirhinal cortex maintains storage of information about object and the context in which it is learned for a brief period of time, (2) areas TH/TF maintain information about spatial location and form associations between objects and their spatial relationship (a process that likely requires additional time) and (3) the hippocampal formation mediates associations between objects, their spatial relationship and the general context in which these associations are formed (an integrative function that requires additional time). ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Human object recognition is considered to be largely invariant to translation across the visual field. However, the origin of this invariance to positional changes has remained elusive, since numerous studies found that the ability to discriminate between visual patterns develops in a largely location-specific manner, with only a limited transfer to novel visual field positions. In order to reconcile these contradicting observations, we traced the acquisition of categories of unfamiliar grey-level patterns within an interleaved learning and testing paradigm that involved either the same or different retinal locations. Our results show that position invariance is an emergent property of category learning. Pattern categories acquired over several hours at a fixed location in either the peripheral or central visual field gradually become accessible at new locations without any position-specific feedback. Furthermore, categories of novel patterns presented in the left hemifield are distinctly faster learnt and better generalized to other locations than those learnt in the right hemifield. Our results suggest that during learning initially position-specific representations of categories based on spatial pattern structure become encoded in a relational, position-invariant format. Such representational shifts may provide a generic mechanism to achieve perceptual invariance in object recognition.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the visual perception literature, the recognition of faces has often been contrasted with that of non-face objects, in terms of differences with regard to the role of parts, part relations and holistic processing. However, recent evidence from developmental studies has begun to blur this sharp distinction. We review evidence for a protracted development of object recognition that is reminiscent of the well-documented slow maturation observed for faces. The prolonged development manifests itself in a retarded processing of metric part relations as opposed to that of individual parts and offers surprising parallels to developmental accounts of face recognition, even though the interpretation of the data is less clear with regard to holistic processing. We conclude that such results might indicate functional commonalities between the mechanisms underlying the recognition of faces and non-face objects, which are modulated by different task requirements in the two stimulus domains.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Probabilistic robotics, most often applied to the problem of simultaneous localisation and mapping (SLAM), requires measures of uncertainly to accompany observations of the environment. This paper describes how uncertainly can be characterised for a vision system that locates coloured landmark in a typical laboratory environment. The paper describes a model of the uncertainly in segmentation, the internal camera model and the mounting of the camera on the robot. It =plains the implementation of the system on a laboratory robot, and provides experimental results that show the coherence of the uncertainly model,

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper is concerned with the unsupervised learning of object representations by fusing visual and motor information. The problem is posed for a mobile robot that develops its representations as it incrementally gathers data. The scenario is problematic as the robot only has limited information at each time step with which it must generate and update its representations. Object representations are refined as multiple instances of sensory data are presented; however, it is uncertain whether two data instances are synonymous with the same object. This process can easily diverge from stability. The premise of the presented work is that a robot's motor information instigates successful generation of visual representations. An understanding of self-motion enables a prediction to be made before performing an action, resulting in a stronger belief of data association. The system is implemented as a data-driven partially observable semi-Markov decision process. Object representations are formed as the process's hidden states and are coordinated with motor commands through state transitions. Experiments show the prediction process is essential in enabling the unsupervised learning method to converge to a solution - improving precision and recall over using sensory data alone.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper investigates how neuronal activation for naming photographs of objects is influenced by the addition of appropriate colour or sound. Behaviourally, both colour and sound are known to facilitate object recognition from visual form. However, previous functional imaging studies have shown inconsistent effects. For example, the addition of appropriate colour has been shown to reduce antero-medial temporal activation whereas the addition of sound has been shown to increase posterior superior temporal activation. Here we compared the effect of adding colour or sound cues in the same experiment. We found that the addition of either the appropriate colour or sound increased activation for naming photographs of objects in bilateral occipital regions and the right anterior fusiform. Moreover, the addition of colour reduced left antero-medial temporal activation but this effect was not observed for the addition of object sound. We propose that activation in bilateral occipital and right fusiform areas precedes the integration of visual form with either its colour or associated sound. In contrast, left antero-medial temporal activation is reduced because object recognition is facilitated after colour and form have been integrated.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper presents visual detection and classification of light vehicles and personnel on a mine site.We capitalise on the rapid advances of ConvNet based object recognition but highlight that a naive black box approach results in a significant number of false positives. In particular, the lack of domain specific training data and the unique landscape in a mine site causes a high rate of errors. We exploit the abundance of background-only images to train a k-means classifier to complement the ConvNet. Furthermore, localisation of objects of interest and a reduction in computation is enabled through region proposals. Our system is tested on over 10km of real mine site data and we were able to detect both light vehicles and personnel. We show that the introduction of our background model can reduce the false positive rate by an order of magnitude.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The environment moderates behaviour using a subtle language of ‘affordances’ and ‘behaviour-settings’. Affordances are environmental offerings. They are objects that demand action; a cliff demands a leap and binoculars demand a peek. Behaviour-settings are ‘places;’ spaces encoded with expectations and meanings. Behaviour-settings work the opposite way to affordances; they demand inhibition; an introspective demeanour in a church or when under surveillance. Most affordances and behaviour-settings are designed, and as such, designers are effectively predicting brain reactions. • Affordances are nested within, and moderated by behaviour-settings. Both trigger automatic neural responses (excitation and inhibition). These, for the best part cancel each other out. This balancing enables object recognition and allows choice about what action should be taken (if any). But when excitation exceeds inhibition, instinctive action will automatically commence. In positive circumstances this may mean laughter or a smile. In negative circumstances, fleeing, screaming or other panic responses are likely. People with poor frontal function, due to immaturity (childhood or developmental disorders) or due to hypofrontality (schizophrenia, brain damage or dementia) have a reduced capacity to balance excitatory and inhibitory impulses. For these people, environmental behavioural demands increase with the decline of frontal brain function. • The world around us is not only encoded with symbols and sensory information. Opportunities and restrictions work on a much more primal level. Person/space interactions constantly take place at a molecular scale. Every space we enter has its own special dynamic, where individualism vies for supremacy between the opposing forces of affordance-related excitation and the inhibition intrinsic to behaviour-settings. And in this context, even a small change–the installation of a CCTV camera can turn a circus to a prison. • This paper draws on cutting-edge neurological theory to understand the psychological determinates of the everyday experience of the designed environment.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper, we tackle the problem of unsupervised domain adaptation for classification. In the unsupervised scenario where no labeled samples from the target domain are provided, a popular approach consists in transforming the data such that the source and target distributions be- come similar. To compare the two distributions, existing approaches make use of the Maximum Mean Discrepancy (MMD). However, this does not exploit the fact that prob- ability distributions lie on a Riemannian manifold. Here, we propose to make better use of the structure of this man- ifold and rely on the distance on the manifold to compare the source and target distributions. In this framework, we introduce a sample selection method and a subspace-based method for unsupervised domain adaptation, and show that both these manifold-based techniques outperform the cor- responding approaches based on the MMD. Furthermore, we show that our subspace-based approach yields state-of- the-art results on a standard object recognition benchmark.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The neural basis of visual perception can be understood only when the sequence of cortical activity underlying successful recognition is known. The early steps in this processing chain, from retina to the primary visual cortex, are highly local, and the perception of more complex shapes requires integration of the local information. In Study I of this thesis, the progression from local to global visual analysis was assessed by recording cortical magnetoencephalographic (MEG) responses to arrays of elements that either did or did not form global contours. The results demonstrated two spatially and temporally distinct stages of processing: The first, emerging 70 ms after stimulus onset around the calcarine sulcus, was sensitive to local features only, whereas the second, starting at 130 ms across the occipital and posterior parietal cortices, reflected the global configuration. To explore the links between cortical activity and visual recognition, Studies II III presented subjects with recognition tasks of varying levels of difficulty. The occipito-temporal responses from 150 ms onwards were closely linked to recognition performance, in contrast to the 100-ms mid-occipital responses. The averaged responses increased gradually as a function of recognition performance, and further analysis (Study III) showed the single response strengths to be graded as well. Study IV addressed the attention dependence of the different processing stages: Occipito-temporal responses peaking around 150 ms depended on the content of the visual field (faces vs. houses), whereas the later and more sustained activity was strongly modulated by the observers attention. Hemodynamic responses paralleled the pattern of the more sustained electrophysiological responses. Study V assessed the temporal processing capacity of the human object recognition system. Above sufficient luminance, contrast and size of the object, the processing speed was not limited by such low-level factors. Taken together, these studies demonstrate several distinct stages in the cortical activation sequence underlying the object recognition chain, reflecting the level of feature integration, difficulty of recognition, and direction of attention.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper presents a novel coarse-to-fine global localization approach that is inspired by object recognition and text retrieval techniques. Harris-Laplace interest points characterized by SIFT descriptors are used as natural land-marks. These descriptors are indexed into two databases: an inverted index and a location database. The inverted index is built based on a visual vocabulary learned from the feature descriptors. In the location database, each location is directly represented by a set of scale invariant descriptors. The localization process consists of two stages: coarse localization and fine localization. Coarse localization from the inverted index is fast but not accurate enough; whereas localization from the location database using voting algorithm is relatively slow but more accurate. The combination of coarse and fine stages makes fast and reliable localization possible. In addition, if necessary, the localization result can be verified by epipolar geometry between the representative view in database and the view to be localized. Experimental results show that our approach is efficient and reliable. ©2005 IEEE.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper we introduce a weighted complex networks model to investigate and recognize structures of patterns. The regular treating in pattern recognition models is to describe each pattern as a high-dimensional vector which however is insufficient to express the structural information. Thus, a number of methods are developed to extract the structural information, such as different feature extraction algorithms used in pre-processing steps, or the local receptive fields in convolutional networks. In our model, each pattern is attributed to a weighted complex network, whose topology represents the structure of that pattern. Based upon the training samples, we get several prototypal complex networks which could stand for the general structural characteristics of patterns in different categories. We use these prototypal networks to recognize the unknown patterns. It is an attempt to use complex networks in pattern recognition, and our result shows the potential for real-world pattern recognition. A spatial parameter is introduced to get the optimal recognition accuracy, and it remains constant insensitive to the amount of training samples. We have discussed the interesting properties of the prototypal networks. An approximate linear relation is found between the strength and color of vertexes, in which we could compare the structural difference between each category. We have visualized these prototypal networks to show that their topology indeed represents the common characteristics of patterns. We have also shown that the asymmetric strength distribution in these prototypal networks brings high robustness for recognition. Our study may cast a light on understanding the mechanism of the biologic neuronal systems in object recognition as well.