951 resultados para Visual Object Recognition


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Understanding how biological visual systems perform object recognition is one of the ultimate goals in computational neuroscience. Among the biological models of recognition the main distinctions are between feedforward and feedback and between object-centered and view-centered. From a computational viewpoint the different recognition tasks - for instance categorization and identification - are very similar, representing different trade-offs between specificity and invariance. Thus the different tasks do not strictly require different classes of models. The focus of the review is on feedforward, view-based models that are supported by psychophysical and physiological data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A new method for the automated selection of colour features is described. The algorithm consists of two stages of processing. In the first, a complete set of colour features is calculated for every object of interest in an image. In the second stage, each object is mapped into several n-dimensional feature spaces in order to select the feature set with the smallest variables able to discriminate the remaining objects. The evaluation of the discrimination power for each concrete subset of features is performed by means of decision trees composed of linear discrimination functions. This method can provide valuable help in outdoor scene analysis where no colour space has been demonstrated as being the most suitable. Experiment results recognizing objects in outdoor scenes are reported

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this project, we propose the implementation of a 3D object recognition system which will be optimized to operate under demanding time constraints. The system must be robust so that objects can be recognized properly in poor light conditions and cluttered scenes with significant levels of occlusion. An important requirement must be met: the system must exhibit a reasonable performance running on a low power consumption mobile GPU computing platform (NVIDIA Jetson TK1) so that it can be integrated in mobile robotics systems, ambient intelligence or ambient assisted living applications. The acquisition system is based on the use of color and depth (RGB-D) data streams provided by low-cost 3D sensors like Microsoft Kinect or PrimeSense Carmine. The range of algorithms and applications to be implemented and integrated will be quite broad, ranging from the acquisition, outlier removal or filtering of the input data and the segmentation or characterization of regions of interest in the scene to the very object recognition and pose estimation. Furthermore, in order to validate the proposed system, we will create a 3D object dataset. It will be composed by a set of 3D models, reconstructed from common household objects, as well as a handful of test scenes in which those objects appear. The scenes will be characterized by different levels of occlusion, diverse distances from the elements to the sensor and variations on the pose of the target objects. The creation of this dataset implies the additional development of 3D data acquisition and 3D object reconstruction applications. The resulting system has many possible applications, ranging from mobile robot navigation and semantic scene labeling to human-computer interaction (HCI) systems based on visual information.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spatial generalization skills in school children aged 8-16 were studied with regard to unfamiliar objects that had been previously learned in a cross-modal priming and learning paradigm. We observed a developmental dissociation with younger children recognizing objects only from previously learnt perspectives whereas older children generalized acquired object knowledge to new viewpoints as well. Haptic and - to a lesser extent - visual priming improved spatial generalization in all but the youngest children. The data supports the idea of dissociable, view-dependent and view-invariant object representations with different developmental trajectories that are subject to modulatory effects of priming. Late-developing areas in the parietal or the prefrontal cortex may account for the retarded onset of view-invariant object recognition. © 2006 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It has been suggested that the deleterious effect of contrast reversal on visual recognition is unique to faces, not objects. Here we show from priming, supervised category learning, and generalization that there is no such thing as general invariance of recognition of non-face objects against contrast reversal and, likewise, changes in direction of illumination. However, when recognition varies with rendering conditions, invariance may be restored, and effects of continuous learning may be reduced, by providing prior object knowledge from active sensation. Our findings suggest that the degree of contrast invariance achieved reflects functional characteristics of object representations learned in a task-dependent fashion.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Previous research (e.g., Jüttner et al, 2013, Developmental Psychology, 49, 161-176) has shown that object recognition may develop well into late childhood and adolescence. The present study extends that research and reveals novel di erences in holistic and analytic recognition performance in 7-11 year olds compared to that seen in adults. We interpret our data within Hummel’s hybrid model of object recognition (Hummel, 2001, Visual Cognition, 8, 489-517) that proposes two parallel routes for recognition (analytic vs. holistic) modulated by attention. Using a repetition-priming paradigm, we found in Experiment 1 that children showed no holistic priming, but only analytic priming. Given that holistic priming might be thought to be more ‘primitive’, we confirmed in Experiment 2 that our surprising finding was not because children’s analytic recognition was merely a result of name repetition. Our results suggest a developmental primacy of analytic object recognition. By contrast, holistic object recognition skills appear to emerge with a much more protracted trajectory extending into late adolescence

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Previous work examining context effects in children has been limited to semantic context. The current research examined the effects of grammatical priming of word-naming in fourth-grade children. In Experiment 1, children named both inflected and uninflected noun and verb target words faster when they were preceded by grammatically constraining primes than when they were preceded by neutral primes. Experiment 1 used a long stimulus onset asynchrony (SOA) interval of 750 msec. Experiment 2 replicated the grammatical priming effect at two SOA intervals (400 msec and 700 msec), suggesting that the grammatical priming effect does not reflect the operation of any gross strategic effects directly attributable to the long SOA interval employed in Experiment 1. Grammatical context appears to facilitate target word naming by constraining target word class. Further work is required to elucidate the loci of this effect.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Report for the scientific sojourn at the Swiss Federal Institute of Technology Zurich, Switzerland, between September and December 2007. In order to make robots useful assistants for our everyday life, the ability to learn and recognize objects is of essential importance. However, object recognition in real scenes is one of the most challenging problems in computer vision, as it is necessary to deal with difficulties. Furthermore, in mobile robotics a new challenge is added to the list: computational complexity. In a dynamic world, information about the objects in the scene can become obsolete before it is ready to be used if the detection algorithm is not fast enough. Two recent object recognition techniques have achieved notable results: the constellation approach proposed by Lowe and the bag of words approach proposed by Nistér and Stewénius. The Lowe constellation approach is the one currently being used in the robot localization project of the COGNIRON project. This report is divided in two main sections. The first section is devoted to briefly review the currently used object recognition system, the Lowe approach, and bring to light the drawbacks found for object recognition in the context of indoor mobile robot navigation. Additionally the proposed improvements for the algorithm are described. In the second section the alternative bag of words method is reviewed, as well as several experiments conducted to evaluate its performance with our own object databases. Furthermore, some modifications to the original algorithm to make it suitable for object detection in unsegmented images are proposed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Perceiving the world visually is a basic act for humans, but for computers it is still an unsolved problem. The variability present innatural environments is an obstacle for effective computer vision. The goal of invariant object recognition is to recognise objects in a digital image despite variations in, for example, pose, lighting or occlusion. In this study, invariant object recognition is considered from the viewpoint of feature extraction. Thedifferences between local and global features are studied with emphasis on Hough transform and Gabor filtering based feature extraction. The methods are examined with respect to four capabilities: generality, invariance, stability, and efficiency. Invariant features are presented using both Hough transform and Gabor filtering. A modified Hough transform technique is also presented where the distortion tolerance is increased by incorporating local information. In addition, methods for decreasing the computational costs of the Hough transform employing parallel processing and local information are introduced.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The number of digital images has been increasing exponentially in the last few years. People have problems managing their image collections and finding a specific image. An automatic image categorization system could help them to manage images and find specific images. In this thesis, an unsupervised visual object categorization system was implemented to categorize a set of unknown images. The system is unsupervised, and hence, it does not need known images to train the system which needs to be manually obtained. Therefore, the number of possible categories and images can be huge. The system implemented in the thesis extracts local features from the images. These local features are used to build a codebook. The local features and the codebook are then used to generate a feature vector for an image. Images are categorized based on the feature vectors. The system is able to categorize any given set of images based on the visual appearance of the images. Images that have similar image regions are grouped together in the same category. Thus, for example, images which contain cars are assigned to the same cluster. The unsupervised visual object categorization system can be used in many situations, e.g., in an Internet search engine. The system can categorize images for a user, and the user can then easily find a specific type of image.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The large and growing number of digital images is making manual image search laborious. Only a fraction of the images contain metadata that can be used to search for a particular type of image. Thus, the main research question of this thesis is whether it is possible to learn visual object categories directly from images. Computers process images as long lists of pixels that do not have a clear connection to high-level semantics which could be used in the image search. There are various methods introduced in the literature to extract low-level image features and also approaches to connect these low-level features with high-level semantics. One of these approaches is called Bag-of-Features which is studied in the thesis. In the Bag-of-Features approach, the images are described using a visual codebook. The codebook is built from the descriptions of the image patches using clustering. The images are described by matching descriptions of image patches with the visual codebook and computing the number of matches for each code. In this thesis, unsupervised visual object categorisation using the Bag-of-Features approach is studied. The goal is to find groups of similar images, e.g., images that contain an object from the same category. The standard Bag-of-Features approach is improved by using spatial information and visual saliency. It was found that the performance of the visual object categorisation can be improved by using spatial information of local features to verify the matches. However, this process is computationally heavy, and thus, the number of images must be limited in the spatial matching, for example, by using the Bag-of-Features method as in this study. Different approaches for saliency detection are studied and a new method based on the Hessian-Affine local feature detector is proposed. The new method achieves comparable results with current state-of-the-art. The visual object categorisation performance was improved by using foreground segmentation based on saliency information, especially when the background could be considered as clutter.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this thesis, three main questions were addressed using event-related potentials (ERPs): (1) the timing of lexical semantic access, (2) the influence of "top-down" processes on visual word processing, and (3) the influence of "bottom-up" factors on visual word processing. The timing of lexical semantic access was investigated in two studies using different designs. In Study 1,14 participants completed two tasks: a standard lexical decision (LD) task which required a word/nonword decision to each target stimulus, and a semantically primed version (LS) of it using the same category of words (e.g., animal) within each block following which participants made a category judgment. In Study 2, another 12 participants performed a standard semantic priming task, where target stimulus words (e.g., nurse) could be either semantically related or unrelated to their primes (e.g., doctor, tree) but the order of presentation was randomized. We found evidence in both ERP studies that lexical semantic access might occur early within the first 200 ms (at about 170 ms for Study 1 and at about 160 ms for Study 2). Our results were consistent with more recent ERP and eye-tracking studies and are in contrast with the traditional research focus on the N400 component. "Top-down" processes, such as a person's expectation and strategic decisions, were possible in Study 1 because of the blocked design, but they were not for Study 2 with a randomized design. Comparing results from two studies, we found that visual word processing could be affected by a person's expectation and the effect occurred early at a sensory/perceptual stage: a semantic task effect in the PI component at about 100 ms in the ERP was found in Study 1 , but not in Study 2. Furthermore, we found that such "top-down" influence on visual word processing might be mediated through separate mechanisms depending on whether the stimulus was a word or a nonword. "Bottom-up" factors involve inherent characteristics of particular words, such as bigram frequency (the total frequency of two-letter combinations of a word), word frequency (the frequency of the written form of a word), and neighborhood density (the number of words that can be generated by changing one letter of an original word or nonword). A bigram frequency effect was found when comparing the results from Studies 1 and 2, but it was examined more closely in Study 3. Fourteen participants performed a similar standard lexical decision task but the words and nonwords were selected systematically to provide a greater range in the aforementioned factors. As a result, a total of 18 word conditions were created with 18 nonword conditions matched on neighborhood density and neighborhood frequency. Using multiple regression analyses, we foimd that the PI amplitude was significantly related to bigram frequency for both words and nonwords, consistent with results from Studies 1 and 2. In addition, word frequency and neighborhood frequency were also able to influence the PI amplitude separately for words and for nonwords and there appeared to be a spatial dissociation between the two effects: for words, the word frequency effect in PI was found at the left electrode site; for nonwords, the neighborhood frequency effect in PI was fovind at the right elecfrode site. The implications of otir findings are discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis describes the development of a model-based vision system that exploits hierarchies of both object structure and object scale. The focus of the research is to use these hierarchies to achieve robust recognition based on effective organization and indexing schemes for model libraries. The goal of the system is to recognize parameterized instances of non-rigid model objects contained in a large knowledge base despite the presence of noise and occlusion. Robustness is achieved by developing a system that can recognize viewed objects that are scaled or mirror-image instances of the known models or that contain components sub-parts with different relative scaling, rotation, or translation than in models. The approach taken in this thesis is to develop an object shape representation that incorporates a component sub-part hierarchy- to allow for efficient and correct indexing into an automatically generated model library as well as for relative parameterization among sub-parts, and a scale hierarchy- to allow for a general to specific recognition procedure. After analysis of the issues and inherent tradeoffs in the recognition process, a system is implemented using a representation based on significant contour curvature changes and a recognition engine based on geometric constraints of feature properties. Examples of the system's performance are given, followed by an analysis of the results. In conclusion, the system's benefits and limitations are presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Two formulations of model-based object recognition are described. MAP Model Matching evaluates joint hypotheses of match and pose, while Posterior Marginal Pose Estimation evaluates the pose only. Local search in pose space is carried out with the Expectation--Maximization (EM) algorithm. Recognition experiments are described where the EM algorithm is used to refine and evaluate pose hypotheses in 2D and 3D. Initial hypotheses for the 2D experiments were generated by a simple indexing method: Angle Pair Indexing. The Linear Combination of Views method of Ullman and Basri is employed as the projection model in the 3D experiments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Object recognition is complicated by clutter, occlusion, and sensor error. Since pose hypotheses are based on image feature locations, these effects can lead to false negatives and positives. In a typical recognition algorithm, pose hypotheses are tested against the image, and a score is assigned to each hypothesis. We use a statistical model to determine the score distribution associated with correct and incorrect pose hypotheses, and use binary hypothesis testing techniques to distinguish between them. Using this approach we can compare algorithms and noise models, and automatically choose values for internal system thresholds to minimize the probability of making a mistake.