22 resultados para Bag-of-visual Words


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Some WWW image engines allow the user to form a query in terms of text keywords. To build the image index, keywords are extracted heuristically from HTML documents containing each image, and/or from the image URL and file headers. Unfortunately, text-based image engines have merely retro-fitted standard SQL database query methods, and it is difficult to include images cues within such a framework. On the other hand, visual statistics (e.g., color histograms) are often insufficient for helping users find desired images in a vast WWW index. By truly unifying textual and visual statistics, one would expect to get better results than either used separately. In this paper, we propose an approach that allows the combination of visual statistics with textual statistics in the vector space representation commonly used in query by image content systems. Text statistics are captured in vector form using latent semantic indexing (LSI). The LSI index for an HTML document is then associated with each of the images contained therein. Visual statistics (e.g., color, orientedness) are also computed for each image. The LSI and visual statistic vectors are then combined into a single index vector that can be used for content-based search of the resulting image database. By using an integrated approach, we are able to take advantage of possible statistical couplings between the topic of the document (latent semantic content) and the contents of images (visual statistics). This allows improved performance in conducting content-based search. This approach has been implemented in a WWW image search engine prototype.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The second-order statistics of neural activity was examined in a model of the cat LGN and V1 during free-viewing of natural images. In the model, the specific patterns of thalamocortical activity required for a Bebbian maturation of direction-selective cells in VI were found during the periods of visual fixation, when small eye movements occurred, but not when natural images were examined in the absence of fixational eye movements. In addition, simulations of stroboscopic reming that replicated the abnormal pattern of eye movements observed in kittens chronically exposed to stroboscopic illumination produced results consistent with the reported loss of direction selectivity and preservation of orientation selectivity. These results suggest the involvement of the oculomotor activity of visual fixation in the maturation of cortical direction selectivity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A neural network model is presented to account for the three dimensional perception of visual space by way of an analog Gestalt-like perceptual mechanism.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A key goal of computational neuroscience is to link brain mechanisms to behavioral functions. The present article describes recent progress towards explaining how laminar neocortical circuits give rise to biological intelligence. These circuits embody two new and revolutionary computational paradigms: Complementary Computing and Laminar Computing. Circuit properties include a novel synthesis of feedforward and feedback processing, of digital and analog processing, and of pre-attentive and attentive processing. This synthesis clarifies the appeal of Bayesian approaches but has a far greater predictive range that naturally extends to self-organizing processes. Examples from vision and cognition are summarized. A LAMINART architecture unifies properties of visual development, learning, perceptual grouping, attention, and 3D vision. A key modeling theme is that the mechanisms which enable development and learning to occur in a stable way imply properties of adult behavior. It is noted how higher-order attentional constraints can influence multiple cortical regions, and how spatial and object attention work together to learn view-invariant object categories. In particular, a form-fitting spatial attentional shroud can allow an emerging view-invariant object category to remain active while multiple view categories are associated with it during sequences of saccadic eye movements. Finally, the chapter summarizes recent work on the LIST PARSE model of cognitive information processing by the laminar circuits of prefrontal cortex. LIST PARSE models the short-term storage of event sequences in working memory, their unitization through learning into sequence, or list, chunks, and their read-out in planned sequential performance that is under volitional control. LIST PARSE provides a laminar embodiment of Item and Order working memories, also called Competitive Queuing models, that have been supported by both psychophysical and neurobiological data. These examples show how variations of a common laminar cortical design can embody properties of visual and cognitive intelligence that seem, at least on the surface, to be mechanistically unrelated.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Under natural viewing conditions, a single depthful percept of the world is consciously seen. When dissimilar images are presented to corresponding regions of the two eyes, binocular rivalry may occur, during which the brain consciously perceives alternating percepts through time. Perceptual bistability can also occur in response to a single ambiguous figure. These percepts raise basic questions: What brain mechanisms generate a single depthful percept of the world? How do the same mechanisms cause perceptual bistability, notably binocular rivalry? What properties of brain representations correspond to consciously seen percepts? How do the dynamics of the layered circuits of visual cortex generate single and bistable percepts? A laminar cortical model of how cortical areas V1, V2, and V4 generate depthful percepts is developed to explain and quantitatively simulate binocular rivalry data. The model proposes how mechanisms of cortical development, perceptual grouping, and figure-ground perception lead to single and rivalrous percepts.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A computational model of visual processing in the vertebrate retina provides a unified explanation of a range of data previously treated by disparate models. Three results are reported here: the model proposes a functional explanation for the primary feed-forward retinal circuit found in vertebrate retinae, it shows how this retinal circuit combines nonlinear adaptation with the desirable properties of linear processing, and it accounts for the origin of parallel transient (nonlinear) and sustained (linear) visual processing streams as simple variants of the same retinal circuit. The retina, owing to its accessibility and to its fundamental role in the initial transduction of light into neural signals, is among the most extensively studied neural structures in the nervous system. Since the pioneering anatomical work by Ramón y Cajal at the turn of the last century[1], technological advances have abetted detailed descriptions of the physiological, pharmacological, and functional properties of many types of retinal cells. However, the relationship between structure and function in the retina is still poorly understood. This article outlines a computational model developed to address fundamental constraints of biological visual systems. Neurons that process nonnegative input signals-such as retinal illuminance-are subject to an inescapable tradeoff between accurate processing in the spatial and temporal domains. Accurate processing in both domains can be achieved with a model that combines nonlinear mechanisms for temporal and spatial adaptation within three layers of feed-forward processing. The resulting architecture is structurally similar to the feed-forward retinal circuit connecting photoreceptors to retinal ganglion cells through bipolar cells. This similarity suggests that the three-layer structure observed in all vertebrate retinae[2] is a required minimal anatomy for accurate spatiotemporal visual processing. This hypothesis is supported through computer simulations showing that the model's output layer accounts for many properties of retinal ganglion cells[3],[4],[5],[6]. Moreover, the model shows how the retina can extend its dynamic range through nonlinear adaptation while exhibiting seemingly linear behavior in response to a variety of spatiotemporal input stimuli. This property is the basis for the prediction that the same retinal circuit can account for both sustained (X) and transient (Y) cat ganglion cells[7] by simple morphological changes. The ability to generate distinct functional behaviors by simple changes in cell morphology suggests that different functional pathways originating in the retina may have evolved from a unified anatomy designed to cope with the constraints of low-level biological vision.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a self-organizing neural model for eye-hand coordination. Called the DIRECT model, it embodies a solution of the classical motor equivalence problem. Motor equivalence computations allow humans and other animals to flexibly employ an arm with more degrees of freedom than the space in which it moves to carry out spatially defined tasks under conditions that may require novel joint configurations. During a motor babbling phase, the model endogenously generates movement commands that activate the correlated visual, spatial, and motor information that are used to learn its internal coordinate transformations. After learning occurs, the model is capable of controlling reaching movements of the arm to prescribed spatial targets using many different combinations of joints. When allowed visual feedback, the model can automatically perform, without additional learning, reaches with tools of variable lengths, with clamped joints, with distortions of visual input by a prism, and with unexpected perturbations. These compensatory computations occur within a single accurate reaching movement. No corrective movements are needed. Blind reaches using internal feedback have also been simulated. The model achieves its competence by transforming visual information about target position and end effector position in 3-D space into a body-centered spatial representation of the direction in 3-D space that the end effector must move to contact the target. The spatial direction vector is adaptively transformed into a motor direction vector, which represents the joint rotations that move the end effector in the desired spatial direction from the present arm configuration. Properties of the model are compared with psychophysical data on human reaching movements, neurophysiological data on the tuning curves of neurons in the monkey motor cortex, and alternative models of movement control.