843 resultados para Feature grouping
Resumo:
Perceptual grouping is well-known to be a fundamental process during visual perception, notably grouping across scenic regions that do not receive contrastive visual inputs. Illusory contours are a classical example of such groupings. Recent psychophysical and neurophysiological evidence have shown that the grouping process can facilitate rapid synchronization of the cells that are bound together by a grouping, even when the grouping must be completed across regions that receive no contrastive inputs. Synchronous grouping can hereby bind together different object parts that may have become desynchronized due to a variety of factors, and can enhance the efficiency of cortical transmission. Neural models of perceptual grouping have clarified how such fast synchronization may occur by using bipole grouping cells, whose predicted properties have been supported by psychophysical, anatomical, and neurophysiological experiments. These models have not, however, incorporated some of the realistic constraints on which groupings in the brain are conditioned, notably the measured spatial extent of long-range interactions in layer 2/3 of a grouping network, and realistic synaptic and axonal signaling delays within and across cells in different cortical layers. This work addresses the question: Can long-range interactions that obey the bipole constraint achieve fast synchronization under realistic anatomical and neurophysiological constraints that initially desynchronize grouping signals? Can the cells that synchronize retain their analog sensitivity to changing input amplitudes? Can the grouping process complete and synchronize illusory contours across gaps in bottom-up inputs? Our simulations show that the answer to these questions is Yes.
Resumo:
How do visual form and motion processes cooperate to compute object motion when each process separately is insufficient? A 3D FORMOTION model specifies how 3D boundary representations, which separate figures from backgrounds within cortical area V2, capture motion signals at the appropriate depths in MT; how motion signals in MT disambiguate boundaries in V2 via MT-to-Vl-to-V2 feedback; how sparse feature tracking signals are amplified; and how a spatially anisotropic motion grouping process propagates across perceptual space via MT-MST feedback to integrate feature-tracking and ambiguous motion signals to determine a global object motion percept. Simulated data include: the degree of motion coherence of rotating shapes observed through apertures, the coherent vs. element motion percepts separated in depth during the chopsticks illusion, and the rigid vs. non-rigid appearance of rotating ellipses.
Resumo:
When brain mechanism carry out motion integration and segmentation processes that compute unambiguous global motion percepts from ambiguous local motion signals? Consider, for example, a deer running at variable speeds behind forest cover. The forest cover is an occluder that creates apertures through which fragments of the deer's motion signals are intermittently experienced. The brain coherently groups these fragments into a trackable percept of the deer in its trajectory. Form and motion processes are needed to accomplish this using feedforward and feedback interactions both within and across cortical processing streams. All the cortical areas V1, V2, MT, and MST are involved in these interactions. Figure-ground processes in the form stream through V2, such as the seperation of occluding boundaries of the forest cover from the boundaries of the deer, select the motion signals which determine global object motion percepts in the motion stream through MT. Sparse, but unambiguous, feauture tracking signals are amplified before they propogate across position and are intergrated with far more numerous ambiguous motion signals. Figure-ground and integration processes together determine the global percept. A neural model predicts the processing stages that embody these form and motion interactions. Model concepts and data are summarized about motion grouping across apertures in response to a wide variety of displays, and probabilistic decision making in parietal cortex in response to random dot displays.
Resumo:
Grouping of collinear boundary contours is a fundamental process during visual perception. Illusory contour completion vividly illustrates how stable perceptual boundaries interpolate between pairs of contour inducers, but do not extrapolate from a single inducer. Neural models have simulated how perceptual grouping occurs in laminar visual cortical circuits. These models predicted the existence of grouping cells that obey a bipole property whereby grouping can occur inwardly between pairs or greater numbers of similarly oriented and co-axial inducers, but not outwardly from individual inducers. These models have not, however, incorporated spiking dynamics. Perceptual grouping is a challenge for spiking cells because its properties of collinear facilitation and analog sensitivity to inducer configurations occur despite irregularities in spike timing across all the interacting cells. Other models have demonstrated spiking dynamics in laminar neocortical circuits, but not how perceptual grouping occurs. The current model begins to unify these two modeling streams by implementing a laminar cortical network of spiking cells whose intracellular temporal dynamics interact with recurrent intercellular spiking interactions to quantitatively simulate data from neurophysiological experiments about perceptual grouping, the structure of non-classical visual receptive fields, and gamma oscillations.
Resumo:
A neural theory is proposed in which visual search is accomplished by perceptual grouping and segregation, which occurs simultaneous across the visual field, and object recognition, which is restricted to a selected region of the field. The theory offers an alternative hypothesis to recently developed variations on Feature Integration Theory (Treisman, and Sato, 1991) and Guided Search Model (Wolfe, Cave, and Franzel, 1989). A neural architecture and search algorithm is specified that quantitatively explains a wide range of psychophysical search data (Wolfe, Cave, and Franzel, 1989; Cohen, and lvry, 1991; Mordkoff, Yantis, and Egeth, 1990; Treisman, and Sato, 1991).
Resumo:
An improved Boundary Contour System (BCS) and Feature Contour System (FCS) neural network model of preattentive vision is applied to large images containing range data gathered by a synthetic aperture radar (SAR) sensor. The goal of processing is to make structures such as motor vehicles, roads, or buildings more salient and more interpretable to human observers than they are in the original imagery. Early processing by shunting center-surround networks compresses signal dynamic range and performs local contrast enhancement. Subsequent processing by filters sensitive to oriented contrast, including short-range competition and long-range cooperation, segments the image into regions. The segmentation is performed by three "copies" of the BCS and FCS, of small, medium, and large scales, wherein the "short-range" and "long-range" interactions within each scale occur over smaller or larger distances, corresponding to the size of the early filters of each scale. A diffusive filling-in operation within the segmented regions at each scale produces coherent surface representations. The combination of BCS and FCS helps to locate and enhance structure over regions of many pixels, without the resulting blur characteristic of approaches based on low spatial frequency filtering alone.
Resumo:
Recognition of objects in complex visual scenes is greatly simplified by the ability to segment features belonging to different objects while grouping features belonging to the same object. This feature-binding process can be driven by the local relations between visual contours. The standard method for implementing this process with neural networks uses a temporal code to bind features together. I propose a spatial coding alternative for the dynamic binding of visual contours, and demonstrate the spatial coding method for segmenting an image consisting of three overlapping objects.
Resumo:
Visual search data are given a unified quantitative explanation by a model of how spatial maps in the parietal cortex and object recognition categories in the inferotemporal cortex deploy attentional resources as they reciprocally interact with visual representations in the prestriate cortex. The model visual representations arc organized into multiple boundary and surface representations. Visual search in the model is initiated by organizing multiple items that lie within a given boundary or surface representation into a candidate search grouping. These items arc compared with object recognition categories to test for matches or mismatches. Mismatches can trigger deeper searches and recursive selection of new groupings until a target object io identified. This search model is algorithmically specified to quantitatively simulate search data using a single set of parameters, as well as to qualitatively explain a still larger data base, including data of Aks and Enns (1992), Bravo and Blake (1990), Chellazzi, Miller, Duncan, and Desimone (1993), Egeth, Viri, and Garbart (1984), Cohen and Ivry (1991), Enno and Rensink (1990), He and Nakayarna (1992), Humphreys, Quinlan, and Riddoch (1989), Mordkoff, Yantis, and Egeth (1990), Nakayama and Silverman (1986), Treisman and Gelade (1980), Treisman and Sato (1990), Wolfe, Cave, and Franzel (1989), and Wolfe and Friedman-Hill (1992). The model hereby provides an alternative to recent variations on the Feature Integration and Guided Search models, and grounds the analysis of visual search in neural models of preattentive vision, attentive object learning and categorization, and attentive spatial localization and orientation.
Resumo:
Adaptive Resonance Theory (ART) models are real-time neural networks for category learning, pattern recognition, and prediction. Unsupervised fuzzy ART and supervised fuzzy ARTMAP synthesize fuzzy logic and ART networks by exploiting the formal similarity between the computations of fuzzy subsethood and the dynamics of ART category choice, search, and learning. Fuzzy ART self-organizes stable recognition categories in response to arbitrary sequences of analog or binary input patterns. It generalizes the binary ART 1 model, replacing the set-theoretic: intersection (∩) with the fuzzy intersection (∧), or component-wise minimum. A normalization procedure called complement coding leads to a symmetric: theory in which the fuzzy inter:>ec:tion and the fuzzy union (∨), or component-wise maximum, play complementary roles. Complement coding preserves individual feature amplitudes while normalizing the input vector, and prevents a potential category proliferation problem. Adaptive weights :otart equal to one and can only decrease in time. A geometric interpretation of fuzzy AHT represents each category as a box that increases in size as weights decrease. A matching criterion controls search, determining how close an input and a learned representation must be for a category to accept the input as a new exemplar. A vigilance parameter (p) sets the matching criterion and determines how finely or coarsely an ART system will partition inputs. High vigilance creates fine categories, represented by small boxes. Learning stops when boxes cover the input space. With fast learning, fixed vigilance, and an arbitrary input set, learning stabilizes after just one presentation of each input. A fast-commit slow-recode option allows rapid learning of rare events yet buffers memories against recoding by noisy inputs. Fuzzy ARTMAP unites two fuzzy ART networks to solve supervised learning and prediction problems. A Minimax Learning Rule controls ARTMAP category structure, conjointly minimizing predictive error and maximizing code compression. Low vigilance maximizes compression but may therefore cause very different inputs to make the same prediction. When this coarse grouping strategy causes a predictive error, an internal match tracking control process increases vigilance just enough to correct the error. ARTMAP automatically constructs a minimal number of recognition categories, or "hidden units," to meet accuracy criteria. An ARTMAP voting strategy improves prediction by training the system several times using different orderings of the input set. Voting assigns confidence estimates to competing predictions given small, noisy, or incomplete training sets. ARPA benchmark simulations illustrate fuzzy ARTMAP dynamics. The chapter also compares fuzzy ARTMAP to Salzberg's Nested Generalized Exemplar (NGE) and to Simpson's Fuzzy Min-Max Classifier (FMMC); and concludes with a summary of ART and ARTMAP applications.
Resumo:
A neural model is presented of how cortical areas V1, V2, and V4 interact to convert a textured 2D image into a representation of curved 3D shape. Two basic problems are solved to achieve this: (1) Patterns of spatially discrete 2D texture elements are transformed into a spatially smooth surface representation of 3D shape. (2) Changes in the statistical properties of texture elements across space induce the perceived 3D shape of this surface representation. This is achieved in the model through multiple-scale filtering of a 2D image, followed by a cooperative-competitive grouping network that coherently binds texture elements into boundary webs at the appropriate depths using a scale-to-depth map and a subsequent depth competition stage. These boundary webs then gate filling-in of surface lightness signals in order to form a smooth 3D surface percept. The model quantitatively simulates challenging psychophysical data about perception of prolate ellipsoids (Todd and Akerstrom, 1987, J. Exp. Psych., 13, 242). In particular, the model represents a high degree of 3D curvature for a certain class of images, all of whose texture elements have the same degree of optical compression, in accordance with percepts of human observers. Simulations of 3D percepts of an elliptical cylinder, a slanted plane, and a photo of a golf ball are also presented.
Resumo:
How do visual form and motion processes cooperate to compute object motion when each process separately is insufficient? Consider, for example, a deer moving behind a bush. Here the partially occluded fragments of motion signals available to an observer must be coherently grouped into the motion of a single object. A 3D FORMOTION model comprises five important functional interactions involving the brain’s form and motion systems that address such situations. Because the model’s stages are analogous to areas of the primate visual system, we refer to the stages by corresponding anatomical names. In one of these functional interactions, 3D boundary representations, in which figures are separated from their backgrounds, are formed in cortical area V2. These depth-selective V2 boundaries select motion signals at the appropriate depths in MT via V2-to-MT signals. In another, motion signals in MT disambiguate locally incomplete or ambiguous boundary signals in V2 via MT-to-V1-to-V2 feedback. The third functional property concerns resolution of the aperture problem along straight moving contours by propagating the influence of unambiguous motion signals generated at contour terminators or corners. Here, sparse “feature tracking signals” from, e.g., line ends, are amplified to overwhelm numerically superior ambiguous motion signals along line segment interiors. In the fourth, a spatially anisotropic motion grouping process takes place across perceptual space via MT-MST feedback to integrate veridical feature-tracking and ambiguous motion signals to determine a global object motion percept. The fifth property uses the MT-MST feedback loop to convey an attentional priming signal from higher brain areas back to V1 and V2. The model's use of mechanisms such as divisive normalization, endstopping, cross-orientation inhibition, and longrange cooperation is described. Simulated data include: the degree of motion coherence of rotating shapes observed through apertures, the coherent vs. element motion percepts separated in depth during the chopsticks illusion, and the rigid vs. non-rigid appearance of rotating ellipses.
Resumo:
A neural model is proposed of how laminar interactions in the visual cortex may learn and recognize object texture and form boundaries. The model brings together five interacting processes: region-based texture classification, contour-based boundary grouping, surface filling-in, spatial attention, and object attention. The model shows how form boundaries can determine regions in which surface filling-in occurs; how surface filling-in interacts with spatial attention to generate a form-fitting distribution of spatial attention, or attentional shroud; how the strongest shroud can inhibit weaker shrouds; and how the winning shroud regulates learning of texture categories, and thus the allocation of object attention. The model can discriminate abutted textures with blurred boundaries and is sensitive to texture boundary attributes like discontinuities in orientation and texture flow curvature as well as to relative orientations of texture elements. The model quantitatively fits a large set of human psychophysical data on orientation-based textures. Object boundar output of the model is compared to computer vision algorithms using a set of human segmented photographic images. The model classifies textures and suppresses noise using a multiple scale oriented filterbank and a distributed Adaptive Resonance Theory (dART) classifier. The matched signal between the bottom-up texture inputs and top-down learned texture categories is utilized by oriented competitive and cooperative grouping processes to generate texture boundaries that control surface filling-in and spatial attention. Topdown modulatory attentional feedback from boundary and surface representations to early filtering stages results in enhanced texture boundaries and more efficient learning of texture within attended surface regions. Surface-based attention also provides a self-supervising training signal for learning new textures. Importance of the surface-based attentional feedback in texture learning and classification is tested using a set of textured images from the Brodatz micro-texture album. Benchmark studies vary from 95.1% to 98.6% with attention, and from 90.6% to 93.2% without attention.
Resumo:
An improved Boundary Contour System (BCS) neural network model of preattentive vision is applied to two images that produce strong "pop-out" of emergent groupings in humans. In humans these images generate groupings collinear with or perpendicular to image contrasts. Analogous groupings occur in computer simulations of the model. Long-range cooperative and short-range competitive processes of the BCS dynamically form the stable groupings of texture regions in response to the images.
Synchronized Oscillations During Cooperative Feature Lining in a Cortical Model of Visual Perception
Resumo:
A neural network model of synchronized oscillations in visual cortex is presented to account for recent neurophysiological findings that such synchronization may reflect global properties of the stimulus. In these experiments, synchronization of oscillatory firing responses to moving bar stimuli occurred not only for nearby neurons, but also occurred between neurons separated by several cortical columns (several mm of cortex) when these neurons shared some receptive field preferences specific to the stimuli. These results were obtained for single bar stimuli and also across two disconnected, but colinear, bars moving in the same direction. Our model and computer simulations obtain these synchrony results across both single and double bar stimuli using different, but formally related, models of preattentive visual boundary segmentation and attentive visual object recognition, as well as nearest-neighbor and randomly coupled models.
Resumo:
An improved Boundary Contour System (BCS) and Feature Contour System (FCS) neural network model of preattentive vision is applied to two large images containing range data gathered by a synthetic aperture radar (SAR) sensor. The goal of processing is to make structures such as motor vehicles, roads, or buildings more salient and more interpretable to human observers than they are in the original imagery. Early processing by shunting center-surround networks compresses signal dynamic range and performs local contrast enhancement. Subsequent processing by filters sensitive to oriented contrast, including short-range competition and long-range cooperation, segments the image into regions. Finally, a diffusive filling-in operation within the segmented regions produces coherent visible structures. The combination of BCS and FCS helps to locate and enhance structure over regions of many pixels, without the resulting blur characteristic of approaches based on low spatial frequency filtering alone.