979 resultados para Visual texture recognition


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Stimulus recognition in monkeys is severely impaired by destruction or dysfunction of the perirhinal cortex and also by systemic administration of the cholinergic-muscarinic receptor blocker, scopolamine. These two effects are shown here to be linked: Stimulus recognition was found to be significantly impaired after bilateral microinjection of scopolamine directly into the perirhinal cortex, but not after equivalent injections into the laterally adjacent visual area TE or into the dentate gyrus of the overlying hippocampal formation. The results suggest that the formation of stimulus memories depends critically on cholinergic-muscarinic activation of the perirhinal area, providing a new clue to how stimulus representations are stored.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Probabilistic robotics most often applied to the problem of simultaneous localisation and mapping (SLAM), requires measures of uncertainty to accompany observations of the environment. This paper describes how uncertainty can be characterised for a vision system that locates coloured landmarks in a typical laboratory environment. The paper describes a model of the uncertainty in segmentation, the internal cameral model and the mounting of the camera on the robot. It explains the implementation of the system on a laboratory robot, and provides experimental results that show the coherence of the uncertainty model.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Recovering position from sensor information is an important problem in mobile robotics, known as localisation. Localisation requires a map or some other description of the environment to provide the robot with a context to interpret sensor data. The mobile robot system under discussion is using an artificial neural representation of position. Building a geometrical map of the environment with a single camera and artificial neural networks is difficult. Instead it would be simpler to learn position as a function of the visual input. Usually when learning images, an intermediate representation is employed. An appropriate starting point for biologically plausible image representation is the complex cells of the visual cortex, which have invariance properties that appear useful for localisation. The effectiveness for localisation of two different complex cell models are evaluated. Finally the ability of a simple neural network with single shot learning to recognise these representations and localise a robot is examined.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Human object recognition is considered to be largely invariant to translation across the visual field. However, the origin of this invariance to positional changes has remained elusive, since numerous studies found that the ability to discriminate between visual patterns develops in a largely location-specific manner, with only a limited transfer to novel visual field positions. In order to reconcile these contradicting observations, we traced the acquisition of categories of unfamiliar grey-level patterns within an interleaved learning and testing paradigm that involved either the same or different retinal locations. Our results show that position invariance is an emergent property of category learning. Pattern categories acquired over several hours at a fixed location in either the peripheral or central visual field gradually become accessible at new locations without any position-specific feedback. Furthermore, categories of novel patterns presented in the left hemifield are distinctly faster learnt and better generalized to other locations than those learnt in the right hemifield. Our results suggest that during learning initially position-specific representations of categories based on spatial pattern structure become encoded in a relational, position-invariant format. Such representational shifts may provide a generic mechanism to achieve perceptual invariance in object recognition.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this report we summarize the state-of-the-art of speech emotion recognition from the signal processing point of view. On the bases of multi-corporal experiments with machine-learning classifiers, the observation is made that existing approaches for supervised machine learning lead to database dependent classifiers which can not be applied for multi-language speech emotion recognition without additional training because they discriminate the emotion classes following the used training language. As there are experimental results showing that Humans can perform language independent categorisation, we made a parallel between machine recognition and the cognitive process and tried to discover the sources of these divergent results. The analysis suggests that the main difference is that the speech perception allows extraction of language independent features although language dependent features are incorporated in all levels of the speech signal and play as a strong discriminative function in human perception. Based on several results in related domains, we have suggested that in addition, the cognitive process of emotion-recognition is based on categorisation, assisted by some hierarchical structure of the emotional categories, existing in the cognitive space of all humans. We propose a strategy for developing language independent machine emotion recognition, related to the identification of language independent speech features and the use of additional information from visual (expression) features.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We report an extension of the procedure devised by Weinstein and Shanks (Memory & Cognition 36:1415-1428, 2008) to study false recognition and priming of pictures. Participants viewed scenes with multiple embedded objects (seen items), then studied the names of these objects and the names of other objects (read items). Finally, participants completed a combined direct (recognition) and indirect (identification) memory test that included seen items, read items, and new items. In the direct test, participants recognized pictures of seen and read items more often than new pictures. In the indirect test, participants' speed at identifying those same pictures was improved for pictures that they had actually studied, and also for falsely recognized pictures whose names they had read. These data provide new evidence that a false-memory induction procedure can elicit memory-like representations that are difficult to distinguish from "true" memories of studied pictures. © 2012 Psychonomic Society, Inc.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Recent experimental studies have shown that development towards adult performance levels in configural processing in object recognition is delayed through middle childhood. Whilst partchanges to animal and artefact stimuli are processed with similar to adult levels of accuracy from 7 years of age, relative size changes to stimuli result in a significant decrease in relative performance for participants aged between 7 and 10. Two sets of computational experiments were run using the JIM3 artificial neural network with adult and 'immature' versions to simulate these results. One set progressively decreased the number of neurons involved in the representation of view-independent metric relations within multi-geon objects. A second set of computational experiments involved decreasing the number of neurons that represent view-dependent (nonrelational) object attributes in JIM3's Surface Map. The simulation results which show the best qualitative match to empirical data occurred when artificial neurons representing metric-precision relations were entirely eliminated. These results therefore provide further evidence for the late development of relational processing in object recognition and suggest that children in middle childhood may recognise objects without forming structural description representations.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In the last decade, research in Computer Vision has developed several algorithms to help botanists and non-experts to classify plants based on images of their leaves. LeafSnap is a mobile application that uses a multiscale curvature model of the leaf margin to classify leaf images into species. It has achieved high levels of accuracy on 184 tree species from Northeast US. We extend the research that led to the development of LeafSnap along two lines. First, LeafSnap’s underlying algorithms are applied to a set of 66 tree species from Costa Rica. Then, texture is used as an additional criterion to measure the level of improvement achieved in the automatic identification of Costa Rica tree species. A 25.6% improvement was achieved for a Costa Rican clean image dataset and 42.5% for a Costa Rican noisy image dataset. In both cases, our results show this increment as statistically significant. Further statistical analysis of visual noise impact, best algorithm combinations per species, and best value of k , the minimal cardinality of the set of candidate species that the tested algorithms render as best matches is also presented in this research

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Visual recognition is a fundamental research topic in computer vision. This dissertation explores datasets, features, learning, and models used for visual recognition. In order to train visual models and evaluate different recognition algorithms, this dissertation develops an approach to collect object image datasets on web pages using an analysis of text around the image and of image appearance. This method exploits established online knowledge resources (Wikipedia pages for text; Flickr and Caltech data sets for images). The resources provide rich text and object appearance information. This dissertation describes results on two datasets. The first is Berg’s collection of 10 animal categories; on this dataset, we significantly outperform previous approaches. On an additional set of 5 categories, experimental results show the effectiveness of the method. Images are represented as features for visual recognition. This dissertation introduces a text-based image feature and demonstrates that it consistently improves performance on hard object classification problems. The feature is built using an auxiliary dataset of images annotated with tags, downloaded from the Internet. Image tags are noisy. The method obtains the text features of an unannotated image from the tags of its k-nearest neighbors in this auxiliary collection. A visual classifier presented with an object viewed under novel circumstances (say, a new viewing direction) must rely on its visual examples. This text feature may not change, because the auxiliary dataset likely contains a similar picture. While the tags associated with images are noisy, they are more stable when appearance changes. The performance of this feature is tested using PASCAL VOC 2006 and 2007 datasets. This feature performs well; it consistently improves the performance of visual object classifiers, and is particularly effective when the training dataset is small. With more and more collected training data, computational cost becomes a bottleneck, especially when training sophisticated classifiers such as kernelized SVM. This dissertation proposes a fast training algorithm called Stochastic Intersection Kernel Machine (SIKMA). This proposed training method will be useful for many vision problems, as it can produce a kernel classifier that is more accurate than a linear classifier, and can be trained on tens of thousands of examples in two minutes. It processes training examples one by one in a sequence, so memory cost is no longer the bottleneck to process large scale datasets. This dissertation applies this approach to train classifiers of Flickr groups with many group training examples. The resulting Flickr group prediction scores can be used to measure image similarity between two images. Experimental results on the Corel dataset and a PASCAL VOC dataset show the learned Flickr features perform better on image matching, retrieval, and classification than conventional visual features. Visual models are usually trained to best separate positive and negative training examples. However, when recognizing a large number of object categories, there may not be enough training examples for most objects, due to the intrinsic long-tailed distribution of objects in the real world. This dissertation proposes an approach to use comparative object similarity. The key insight is that, given a set of object categories which are similar and a set of categories which are dissimilar, a good object model should respond more strongly to examples from similar categories than to examples from dissimilar categories. This dissertation develops a regularized kernel machine algorithm to use this category dependent similarity regularization. Experiments on hundreds of categories show that our method can make significant improvement for categories with few or even no positive examples.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This thesis proposes a generic visual perception architecture for robotic clothes perception and manipulation. This proposed architecture is fully integrated with a stereo vision system and a dual-arm robot and is able to perform a number of autonomous laundering tasks. Clothes perception and manipulation is a novel research topic in robotics and has experienced rapid development in recent years. Compared to the task of perceiving and manipulating rigid objects, clothes perception and manipulation poses a greater challenge. This can be attributed to two reasons: firstly, deformable clothing requires precise (high-acuity) visual perception and dexterous manipulation; secondly, as clothing approximates a non-rigid 2-manifold in 3-space, that can adopt a quasi-infinite configuration space, the potential variability in the appearance of clothing items makes them difficult to understand, identify uniquely, and interact with by machine. From an applications perspective, and as part of EU CloPeMa project, the integrated visual perception architecture refines a pre-existing clothing manipulation pipeline by completing pre-wash clothes (category) sorting (using single-shot or interactive perception for garment categorisation and manipulation) and post-wash dual-arm flattening. To the best of the author’s knowledge, as investigated in this thesis, the autonomous clothing perception and manipulation solutions presented here were first proposed and reported by the author. All of the reported robot demonstrations in this work follow a perception-manipulation method- ology where visual and tactile feedback (in the form of surface wrinkledness captured by the high accuracy depth sensor i.e. CloPeMa stereo head or the predictive confidence modelled by Gaussian Processing) serve as the halting criteria in the flattening and sorting tasks, respectively. From scientific perspective, the proposed visual perception architecture addresses the above challenges by parsing and grouping 3D clothing configurations hierarchically from low-level curvatures, through mid-level surface shape representations (providing topological descriptions and 3D texture representations), to high-level semantic structures and statistical descriptions. A range of visual features such as Shape Index, Surface Topologies Analysis and Local Binary Patterns have been adapted within this work to parse clothing surfaces and textures and several novel features have been devised, including B-Spline Patches with Locality-Constrained Linear coding, and Topology Spatial Distance to describe and quantify generic landmarks (wrinkles and folds). The essence of this proposed architecture comprises 3D generic surface parsing and interpretation, which is critical to underpinning a number of laundering tasks and has the potential to be extended to other rigid and non-rigid object perception and manipulation tasks. The experimental results presented in this thesis demonstrate that: firstly, the proposed grasp- ing approach achieves on-average 84.7% accuracy; secondly, the proposed flattening approach is able to flatten towels, t-shirts and pants (shorts) within 9 iterations on-average; thirdly, the proposed clothes recognition pipeline can recognise clothes categories from highly wrinkled configurations and advances the state-of-the-art by 36% in terms of classification accuracy, achieving an 83.2% true-positive classification rate when discriminating between five categories of clothes; finally the Gaussian Process based interactive perception approach exhibits a substantial improvement over single-shot perception. Accordingly, this thesis has advanced the state-of-the-art of robot clothes perception and manipulation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Motivated by a recently proposed biologically inspired face recognition approach, we investigated the relation between human behavior and a computational model based on Fourier-Bessel (FB) spatial patterns. We measured human recognition performance of FB filtered face images using an 8-alternative forced-choice method. Test stimuli were generated by converting the images from the spatial to the FB domain, filtering the resulting coefficients with a band-pass filter, and finally taking the inverse FB transformation of the filtered coefficients. The performance of the computational models was tested using a simulation of the psychophysical experiment. In the FB model, face images were first filtered by simulated V1- type neurons and later analyzed globally for their content of FB components. In general, there was a higher human contrast sensitivity to radially than to angularly filtered images, but both functions peaked at the 11.3-16 frequency interval. The FB-based model presented similar behavior with regard to peak position and relative sensitivity, but had a wider frequency band width and a narrower response range. The response pattern of two alternative models, based on local FB analysis and on raw luminance, strongly diverged from the human behavior patterns. These results suggest that human performance can be constrained by the type of information conveyed by polar patterns, and consequently that humans might use FB-like spatial patterns in face processing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In studies of mirror-self-recognition subjects are usually surreptitiously marked on their head, and then presented with a mirror. Scores of studies have established that by 18 to 24 months, children investigate their own head upon seeing the mark in the mirror. Scores of papers have debated what this means. Suggestions range from rich interpretations (e.g., the development of self-awareness) to lean accounts (e.g., the development of proprioceptivevisual matching), and include numerous more moderate proposals (e.g., the development of a concept of one's face). In Study 1, 18-24-monthold toddlers were given the standard test and a novel task in which they were marked on their legs rather than on their face. Toddlers performed equivalently on both tasks, suggesting that passing the test does not rely on information specific to facial features. In Study 2, toddlers were surreptitiously slipped into trouser legs that were prefixed to a highchair. Toddlers failed to retrieve the sticker now that their legs looked different from expectations. This finding, together with the findings from a third study which showed that self-recognition in live video feedback develops later than mirror selfrecognition, suggests that performance is not solely the result of proprioceptive-visual matching.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background Patients with early age-related maculopathy ( ARM) do not necessarily show obvious morphological signs or functional impairment. Many have good visual acuity, yet complain of decreased visual performance. The aim of this study was to investigate the aging effects on performance of parafoveal letter recognition at reduced contrast, and defects caused by early ARM and normal fellow eyes of patients with unilateral age-related macular degeneration (nfAMD). Methods Testing of the central visual field (8 radius) was performed by the Macular Mapping Test (MMT) using recognition of letters in 40 parafoveal target locations at four contrast levels (5, 10, 25 and 100%). Effects of aging were investigated in 64 healthy subjects aged 23 to 76 years (CTRL). In addition, 39 eyes (minimum visual acuity of 0.63; 20/30) from 39 patients with either no visible signs of ARM, while the fellow eye had advanced age-related macular degeneration (nfAMD; n=12), or early signs of ARM (eARM; n=27) were examined. Performance was expressed summarily as a ""field score"" (FS). Results Performance in the MMT begins to decline linearly with age in normal subjects from the age of 50 and 54 years on, at 5% and 10% contrast respectively. The differentiation between patients and CTRLs was enhanced if FS at 5% was analyzed along with FS at 10% contrast. In 8/12 patients from group nfAMD and in 18/27 from group eARM, the FS was statistically significantly lower than in the CTRL group in at least one of the lower contrast levels. Conclusion Using parafoveal test locations, a recognition task and diminished contrast increases the chance of early detection of functional defects due to eARM or nfAMD and can differentiate them from those due to aging alone.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The human nervous system constructs a Euclidean representation of near (personal) space by combining multiple sources of information (cues). We investigated the cues used for the representation of personal space in a patient with visual form agnosia (DF). Our results indicated that DF relies predominantly on binocular vergence information when determining the distance of a target despite the presence of other (retinal) cues. Notably, DF was able to construct an Euclidean representation of personal space from vergence alone. This finding supports previous assertions that vergence provides the nervous system with veridical information for the construction of personal space. The results from the current study, together with those of others, suggest that: (i) the ventral stream is responsible for extracting depth and distance information from monocular retinal cues (i.e. from shading, texture, perspective) and (ii) the dorsal stream has access to binocular information (from horizontal image disparities and vergence). These results also indicate that DF was not able to use size information to gauge target distance, suggesting that intact temporal cortex is necessary for learned size to influence distance processing. Our findings further suggest that in neurologically intact humans, object information extracted in the ventral pathway is combined with the products of dorsal stream processing for guiding prehension. Finally, we studied the size-distance paradox in visual form agnosia in order to explore the cognitive use of size information. The results of this experiment were consistent with a previous suggestion that the paradox is a cognitive phenomenon.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The branching structure of neurones is thought to influence patterns of connectivity and how inputs are integrated within the arbor. Recent studies have revealed a remarkable degree of variation in the branching structure of pyramidal cells in the cerebral cortex of diurnal primates, suggesting regional specialization in neuronal function. Such specialization in pyramidal cell structure may be important for various aspects of visual function, such as object recognition and color processing. To better understand the functional role of regional variation in the pyramidal cell phenotype in visual processing, we determined the complexity of the dendritic branching pattern of pyramidal cells in visual cortex of the nocturnal New World owl monkey. We used the fractal dilation method to quantify the branching structure of pyramidal cells in the primary visual area (V1), the second visual area (V2) and the caudal and rostral subdivisions of inferotemporal cortex (ITc and ITr, respectively), which are often associated with color processing. We found that, as in diurnal monkeys, there was a trend for cells of increasing fractal dimension with progression through these cortical areas. The increasing complexity paralleled a trend for increasing symmetry. That we found a similar trend in both diurnal and nocturnal monkeys suggests that it was a feature of a common anthropoid ancestor.