841 resultados para object representations
Resumo:
This paper is concerned with the unsupervised learning of object representations by fusing visual and motor information. The problem is posed for a mobile robot that develops its representations as it incrementally gathers data. The scenario is problematic as the robot only has limited information at each time step with which it must generate and update its representations. Object representations are refined as multiple instances of sensory data are presented; however, it is uncertain whether two data instances are synonymous with the same object. This process can easily diverge from stability. The premise of the presented work is that a robot's motor information instigates successful generation of visual representations. An understanding of self-motion enables a prediction to be made before performing an action, resulting in a stronger belief of data association. The system is implemented as a data-driven partially observable semi-Markov decision process. Object representations are formed as the process's hidden states and are coordinated with motor commands through state transitions. Experiments show the prediction process is essential in enabling the unsupervised learning method to converge to a solution - improving precision and recall over using sensory data alone.
Resumo:
The synchronization of neuronal activity, especially in the beta- (14-30 Hz) /gamma- (30 80 Hz) frequency bands, is thought to provide a means for the integration of anatomically distributed processing and for the formation of transient neuronal assemblies. Thus non-stimulus locked (i.e. induced) gamma-band oscillations are believed to underlie feature binding and the formation of neuronal object representations. On the other hand, the functional roles of neuronal oscillations in slower theta- (4 8 Hz) and alpha- (8 14 Hz) frequency bands remain controversial. In addition, early stimulus-locked activity has been largely ignored, as it is believed to reflect merely the physical properties of sensory stimuli. With human neuromagnetic recordings, both the functional roles of gamma- and alpha-band oscillations and the significance of early stimulus-locked activity in neuronal processing were examined in this thesis. Study I of this thesis shows that even the stimulus-locked (evoked) gamma oscillations were sensitive to high-level stimulus features for speech and non-speech sounds, suggesting that they may underlie the formation of early neuronal object representations for stimuli with a behavioural relevance. Study II shows that neuronal processing for consciously perceived and unperceived stimuli differed as early as 30 ms after stimulus onset. This study also showed that the alpha band oscillations selectively correlated with conscious perception. Study III, in turn, shows that prestimulus alpha-band oscillations influence the subsequent detection and processing of sensory stimuli. Further, in Study IV, we asked whether phase synchronization between distinct frequency bands is present in cortical circuits. This study revealed prominent task-sensitive phase synchrony between alpha and beta/gamma oscillations. Finally, the implications of Studies II, III, and IV to the broader scientific context are analysed in the last study of this thesis (V). I suggest, in this thesis that neuronal processing may be extremely fast and that the evoked response is important for cognitive processes. I also propose that alpha oscillations define the global neuronal workspace of perception, action, and consciousness and, further, that cross-frequency synchronization is required for the integration of neuronal object representations into global neuronal workspace.
Resumo:
Visual search data are given a unified quantitative explanation by a model of how spatial maps in the parietal cortex and object recognition categories in the inferotemporal cortex deploy attentional resources as they reciprocally interact with visual representations in the prestriate cortex. The model visual representations arc organized into multiple boundary and surface representations. Visual search in the model is initiated by organizing multiple items that lie within a given boundary or surface representation into a candidate search grouping. These items arc compared with object recognition categories to test for matches or mismatches. Mismatches can trigger deeper searches and recursive selection of new groupings until a target object io identified. This search model is algorithmically specified to quantitatively simulate search data using a single set of parameters, as well as to qualitatively explain a still larger data base, including data of Aks and Enns (1992), Bravo and Blake (1990), Chellazzi, Miller, Duncan, and Desimone (1993), Egeth, Viri, and Garbart (1984), Cohen and Ivry (1991), Enno and Rensink (1990), He and Nakayarna (1992), Humphreys, Quinlan, and Riddoch (1989), Mordkoff, Yantis, and Egeth (1990), Nakayama and Silverman (1986), Treisman and Gelade (1980), Treisman and Sato (1990), Wolfe, Cave, and Franzel (1989), and Wolfe and Friedman-Hill (1992). The model hereby provides an alternative to recent variations on the Feature Integration and Guided Search models, and grounds the analysis of visual search in neural models of preattentive vision, attentive object learning and categorization, and attentive spatial localization and orientation.
Resumo:
The modulation of neural activity in visual cortex is thought to be a key mechanism of visual attention. The investigation of attentional modulation in high-level visual areas, however, is hampered by the lack of clear tuning or contrast response functions. In the present functional magnetic resonance imaging study we therefore systematically assessed how small voxel-wise biases in object preference across hundreds of voxels in the lateral occipital complex were affected when attention was directed to objects. We found that the strength of attentional modulation depended on a voxel's object preference in the absence of attention, a pattern indicative of an amplificatory mechanism. Our results show that such attentional modulation effectively increased the mutual information between voxel responses and object identity. Further, these local modulatory effects led to improved information-based object readout at the level of multi-voxel activation patterns and to an increased reproducibility of these patterns across repeated presentations. We conclude that attentional modulation enhances object coding in local and distributed object representations of the lateral occipital complex.
Resumo:
Robots currently recognise and use objects through algorithms that are hand-coded or specifically trained. Such robots can operate in known, structured environments but cannot learn to recognise or use novel objects as they appear. This thesis demonstrates that a robot can develop meaningful object representations by learning the fundamental relationship between action and change in sensory state; the robot learns sensorimotor coordination. Methods based on Markov Decision Processes are experimentally validated on a mobile robot capable of gripping objects, and it is found that object recognition and manipulation can be learnt as an emergent property of sensorimotor coordination.
Resumo:
We propose a method for learning specific object representations that can be applied (and reused) in visual detection and identification tasks. A machine learning technique called Cartesian Genetic Programming (CGP) is used to create these models based on a series of images. Our research investigates how manipulation actions might allow for the development of better visual models and therefore better robot vision. This paper describes how visual object representations can be learned and improved by performing object manipulation actions, such as, poke, push and pick-up with a humanoid robot. The improvement can be measured and allows for the robot to select and perform the `right' action, i.e. the action with the best possible improvement of the detector.
Resumo:
How do humans use predictive contextual information to facilitate visual search? How are consistently paired scenic objects and positions learned and used to more efficiently guide search in familiar scenes? For example, a certain combination of objects can define a context for a kitchen and trigger a more efficient search for a typical object, such as a sink, in that context. A neural model, ARTSCENE Search, is developed to illustrate the neural mechanisms of such memory-based contextual learning and guidance, and to explain challenging behavioral data on positive/negative, spatial/object, and local/distant global cueing effects during visual search. The model proposes how global scene layout at a first glance rapidly forms a hypothesis about the target location. This hypothesis is then incrementally refined by enhancing target-like objects in space as a scene is scanned with saccadic eye movements. The model clarifies the functional roles of neuroanatomical, neurophysiological, and neuroimaging data in visual search for a desired goal object. In particular, the model simulates the interactive dynamics of spatial and object contextual cueing in the cortical What and Where streams starting from early visual areas through medial temporal lobe to prefrontal cortex. After learning, model dorsolateral prefrontal cortical cells (area 46) prime possible target locations in posterior parietal cortex based on goalmodulated percepts of spatial scene gist represented in parahippocampal cortex, whereas model ventral prefrontal cortical cells (area 47/12) prime possible target object representations in inferior temporal cortex based on the history of viewed objects represented in perirhinal cortex. The model hereby predicts how the cortical What and Where streams cooperate during scene perception, learning, and memory to accumulate evidence over time to drive efficient visual search of familiar scenes.
Resumo:
A persistent issue of debate in the area of 3D object recognition concerns the nature of the experientially acquired object models in the primate visual system. One prominent proposal in this regard has expounded the use of object centered models, such as representations of the objects' 3D structures in a coordinate frame independent of the viewing parameters [Marr and Nishihara, 1978]. In contrast to this is another proposal which suggests that the viewing parameters encountered during the learning phase might be inextricably linked to subsequent performance on a recognition task [Tarr and Pinker, 1989; Poggio and Edelman, 1990]. The 'object model', according to this idea, is simply a collection of the sample views encountered during training. Given that object centered recognition strategies have the attractive feature of leading to viewpoint independence, they have garnered much of the research effort in the field of computational vision. Furthermore, since human recognition performance seems remarkably robust in the face of imaging variations [Ellis et al., 1989], it has often been implicitly assumed that the visual system employs an object centered strategy. In the present study we examine this assumption more closely. Our experimental results with a class of novel 3D structures strongly suggest the use of a view-based strategy by the human visual system even when it has the opportunity of constructing and using object-centered models. In fact, for our chosen class of objects, the results seem to support a stronger claim: 3D object recognition is 2D view-based.
Resumo:
Online geographic information systems provide the means to extract a subset of desired spatial information from a larger remote repository. Data retrieved representing real-world geographic phenomena are then manipulated to suit the specific needs of an end-user. Often this extraction requires the derivation of representations of objects specific to a particular resolution or scale from a single original stored version. Currently standard spatial data handling techniques cannot support the multi-resolution representation of such features in a database. In this paper a methodology to store and retrieve versions of spatial objects at, different resolutions with respect to scale using standard database primitives and SQL is presented. The technique involves heavy fragmentation of spatial features that allows dynamic simplification into scale-specific object representations customised to the display resolution of the end-user's device. Experimental results comparing the new approach to traditional R-Tree indexing and external object simplification reveal the former performs notably better for mobile and WWW applications where client-side resources are limited and retrieved data loads are kept relatively small.
Resumo:
Spatial generalization skills in school children aged 8-16 were studied with regard to unfamiliar objects that had been previously learned in a cross-modal priming and learning paradigm. We observed a developmental dissociation with younger children recognizing objects only from previously learnt perspectives whereas older children generalized acquired object knowledge to new viewpoints as well. Haptic and - to a lesser extent - visual priming improved spatial generalization in all but the youngest children. The data supports the idea of dissociable, view-dependent and view-invariant object representations with different developmental trajectories that are subject to modulatory effects of priming. Late-developing areas in the parietal or the prefrontal cortex may account for the retarded onset of view-invariant object recognition. © 2006 Elsevier B.V. All rights reserved.
Resumo:
It has been suggested that the deleterious effect of contrast reversal on visual recognition is unique to faces, not objects. Here we show from priming, supervised category learning, and generalization that there is no such thing as general invariance of recognition of non-face objects against contrast reversal and, likewise, changes in direction of illumination. However, when recognition varies with rendering conditions, invariance may be restored, and effects of continuous learning may be reduced, by providing prior object knowledge from active sensation. Our findings suggest that the degree of contrast invariance achieved reflects functional characteristics of object representations learned in a task-dependent fashion.
Resumo:
There is evidence for the late development in humans of configural face and animal recognition. We show that the recognition of artificial three-dimensional (3D) objects from part configurations develops similarly late. We also demonstrate that the cross-modal integration of object information reinforces the development of configural recognition more than the intra-modal integration does. Multimodal object representations in the brain may therefore play a role in configural object recognition. © 2003 Elsevier B.V. All rights reserved.
Resumo:
Spatial objects may not only be perceived visually but also by touch. We report recent experiments investigating to what extent prior object knowledge acquired in either the haptic or visual sensory modality transfers to a subsequent visual learning task. Results indicate that even mental object representations learnt in one sensory modality may attain a multi-modal quality. These findings seem incompatible with picture-based reasoning schemas but leave open the possibility of modality-specific reasoning mechanisms.
Resumo:
The present study investigated the behavioral and neuropsychological characteristics of decision-making behavior during a gambling task as well as how these characteristics may relate to the Somatic Marker Hypothesis and the Frequency of Gain model. The applicability to intertemporal choice was also discussed. Patterns of card selection during a computerized interpretation of the Iowa Gambling Task were assessed for 10 men and 10 women. Steady State Topography was employed to assess cortical processing throughout this task. Results supported the hypothesis that patterns of card selection were in line with both theories. As hypothesized, these 2 patterns of card selection were also associated with distinct patterns of cortical activity, suggesting that intertemporal choice may involve the recruitment of right dorsolateral prefrontal cortex for somatic labeling, left fusiform gyrus for object representations, and the left dorsolateral prefrontal cortex for an analysis of the associated frequency of gain or loss. It is suggested that processes contributing to intertemporal choice may include inhibition of negatively valenced options, guiding decisions away from those options, as well as computations favoring frequently rewarded options.
Resumo:
In a musical context, the pitch of sounds is encoded according to domain-general principles not confined to music or even to audition overall but common to other perceptual and cognitive processes (such as multiple pattern encoding and feature integration), and to domain-specific and culture-specific properties related to a particular musical system only (such as the pitch steps of the Western tonal system). The studies included in this thesis shed light on the processing stages during which pitch encoding occurs on the basis of both domain-general and music-specific properties, and elucidate the putative brain mechanisms underlying pitch-related music perception. Study I showed, in subjects without formal musical education, that the pitch and timbre of multiple sounds are integrated as unified object representations in sensory memory before attentional intervention. Similarly, multiple pattern pitches are simultaneously maintained in non-musicians' sensory memory (Study II). These findings demonstrate the degree of sophistication of pitch processing at the sensory memory stage, requiring neither attention nor any special expertise of the subjects. Furthermore, music- and culture-specific properties, such as the pitch steps of the equal-tempered musical scale, are automatically discriminated in sensory memory even by subjects without formal musical education (Studies III and IV). The cognitive processing of pitch according to culture-specific musical-scale schemata hence occurs as early as at the sensory-memory stage of pitch analysis. Exposure and cortical plasticity seem to be involved in musical pitch encoding. For instance, after only one hour of laboratory training, the neural representations of pitch in the auditory cortex are altered (Study V). However, faulty brain mechanisms for attentive processing of fine-grained pitch steps lead to inborn deficits in music perception and recognition such as those encountered in congenital amusia (Study VI). These findings suggest that predispositions for exact pitch-step discrimination together with long-term exposure to music govern the acquisition of the automatized schematic knowledge of the music of a particular culture that even non-musicians possess.