36 resultados para Center for Night Vision


Relevância:

20.00% 20.00%

Publicador:

Resumo:

A fundamental task of vision systems is to infer the state of the world given some form of visual observations. From a computational perspective, this often involves facing an ill-posed problem; e.g., information is lost via projection of the 3D world into a 2D image. Solution of an ill-posed problem requires additional information, usually provided as a model of the underlying process. It is important that the model be both computationally feasible as well as theoretically well-founded. In this thesis, a probabilistic, nonlinear supervised computational learning model is proposed: the Specialized Mappings Architecture (SMA). The SMA framework is demonstrated in a computer vision system that can estimate the articulated pose parameters of a human body or human hands, given images obtained via one or more uncalibrated cameras. The SMA consists of several specialized forward mapping functions that are estimated automatically from training data, and a possibly known feedback function. Each specialized function maps certain domains of the input space (e.g., image features) onto the output space (e.g., articulated body parameters). A probabilistic model for the architecture is first formalized. Solutions to key algorithmic problems are then derived: simultaneous learning of the specialized domains along with the mapping functions, as well as performing inference given inputs and a feedback function. The SMA employs a variant of the Expectation-Maximization algorithm and approximate inference. The approach allows the use of alternative conditional independence assumptions for learning and inference, which are derived from a forward model and a feedback model. Experimental validation of the proposed approach is conducted in the task of estimating articulated body pose from image silhouettes. Accuracy and stability of the SMA framework is tested using artificial data sets, as well as synthetic and real video sequences of human bodies and hands.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A vision based technique for non-rigid control is presented that can be used for animation and video game applications. The user grasps a soft, squishable object in front of a camera that can be moved and deformed in order to specify motion. Active Blobs, a non-rigid tracking technique is used to recover the position, rotation and non-rigid deformations of the object. The resulting transformations can be applied to a texture mapped mesh, thus allowing the user to control it interactively. Our use of texture mapping hardware allows us to make the system responsive enough for interactive animation and video game character control.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Lehar's lively discussion builds on a critique of neural models of vision that is incorrect in its general and specific claims. He espouses a Gestalt perceptual approach, rather than one consistent with the "objective neurophysiological state of the visual system" (p. 1). Contemporary vision models realize his perceptual goals and also quantitatively explain neurophysiological and anatomical data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

How does the laminar organization of cortical circuitry in areas VI and V2 give rise to 3D percepts of stratification, transparency, and neon color spreading in response to 2D pictures and 3D scenes? Psychophysical experiments have shown that such 3D percepts are sensitive to whether contiguous image regions have the same relative contrast polarity (dark-light or lightdark), yet long-range perceptual grouping is known to pool over opposite contrast polarities. The ocularity of contiguous regions is also critical for neon color spreading: Having different ocularity despite the contrast relationship that favors neon spreading blocks the spread. In addition, half visible points in a stereogram can induce near-depth transparency if the contrast relationship favors transparency in the half visible areas. It thus seems critical to have the whole contrast relationship in a monocular configuration, since splitting it between two stereogram images cancels the effect. What adaptive functions of perceptual grouping enable it to both preserve sensitivity to monocular contrast and also to pool over opposite contrasts? Aspects of cortical development, grouping, attention, perceptual learning, stereopsis and 3D planar surface perception have previously been analyzed using a 3D LAMINART model of cortical areas VI, V2, and V4. The present work consistently extends this model to show how like-polarity competition between VI simple cells in layer 4 may be combined with other LAMINART grouping mechanisms, such as cooperative pooling of opposite polarities at layer 2/3 complex cells. The model also explains how the Metelli Rules can lead to transparent percepts, how bistable transparency percepts can arise in which either surface can be perceived as transparent, and how such a transparency reversal can be facilitated by an attention shift. The like-polarity inhibition prediction is consistent with lateral masking experiments in which two f1anking Gabor patches with the same contrast polarity as the target increase the target detection threshold when they approach the target. It is also consistent with LAMINART simulations of cortical development. Other model explanations and testable predictions will also be presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ongoing research at Boston University has produced computational models of biological vision and learning that embody a growing corpus of scientific data and predictions. Vision models perform long-range grouping and figure/ground segmentation, and memory models create attentionally controlled recognition codes that intrinsically cornbine botton-up activation and top-down learned expectations. These two streams of research form the foundation of novel dynamically integrated systems for image understanding. Simulations using multispectral images illustrate road completion across occlusions in a cluttered scene and information fusion from incorrect labels that are simultaneously inconsistent and correct. The CNS Vision and Technology Labs (cns.bu.edulvisionlab and cns.bu.edu/techlab) are further integrating science and technology through analysis, testing, and development of cognitive and neural models for large-scale applications, complemented by software specification and code distribution.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Log-polar image architectures, motivated by the structure of the human visual field, have long been investigated in computer vision for use in estimating motion parameters from an optical flow vector field. Practical problems with this approach have been: (i) dependence on assumed alignment of the visual and motion axes; (ii) sensitivity to occlusion form moving and stationary objects in the central visual field, where much of the numerical sensitivity is concentrated; and (iii) inaccuracy of the log-polar architecture (which is an approximation to the central 20°) for wide-field biological vision. In the present paper, we show that an algorithm based on generalization of the log-polar architecture; termed the log-dipolar sensor, provides a large improvement in performance relative to the usual log-polar sampling. Specifically, our algorithm: (i) is tolerant of large misalignmnet of the optical and motion axes; (ii) is insensitive to significant occlusion by objects of unknown motion; and (iii) represents a more correct analogy to the wide-field structure of human vision. Using the Helmholtz-Hodge decomposition to estimate the optical flow vector field on a log-dipolar sensor, we demonstrate these advantages, using synthetic optical flow maps as well as natural image sequences.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

How do reactive and planned behaviors interact in real time? How are sequences of such behaviors released at appropriate times during autonomous navigation to realize valued goals? Controllers for both animals and mobile robots, or animats, need reactive mechanisms for exploration, and learned plans to reach goal objects once an environment becomes familiar. The SOVEREIGN (Self-Organizing, Vision, Expectation, Recognition, Emotion, Intelligent, Goaloriented Navigation) animat model embodies these capabilities, and is tested in a 3D virtual reality environment. SOVEREIGN includes several interacting subsystems which model complementary properties of cortical What and Where processing streams and which clarify similarities between mechanisms for navigation and arm movement control. As the animat explores an environment, visual inputs are processed by networks that are sensitive to visual form and motion in the What and Where streams, respectively. Position-invariant and sizeinvariant recognition categories are learned by real-time incremental learning in the What stream. Estimates of target position relative to the animat are computed in the Where stream, and can activate approach movements toward the target. Motion cues from animat locomotion can elicit head-orienting movements to bring a new target into view. Approach and orienting movements are alternately performed during animat navigation. Cumulative estimates of each movement are derived from interacting proprioceptive and visual cues. Movement sequences are stored within a motor working memory. Sequences of visual categories are stored in a sensory working memory. These working memories trigger learning of sensory and motor sequence categories, or plans, which together control planned movements. Predictively effective chunk combinations are selectively enhanced via reinforcement learning when the animat is rewarded. Selected planning chunks effect a gradual transition from variable reactive exploratory movements to efficient goal-oriented planned movement sequences. Volitional signals gate interactions between model subsystems and the release of overt behaviors. The model can control different motor sequences under different motivational states and learns more efficient sequences to rewarded goals as exploration proceeds.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article applies a recent theory of 3-D biological vision, called FACADE Theory, to explain several percepts which Kanizsa pioneered. These include 3-D pop-out of an occluding form in front of an occluded form, leading to completion and recognition of the occluded form; 3-D transparent and opaque percepts of Kanizsa squares, with and without Varin wedges; and interactions between percepts of illusory contours, brightness, and depth in response to 2-D Kanizsa images. These explanations clarify how a partially occluded object representation can be completed for purposes of object recognition, without the completed part of the representation necessarily being seen. The theory traces these percepts to neural mechanisms that compensate for measurement uncertainty and complementarity at individual cortical processing stages by using parallel and hierarchical interactions among several cortical processing stages. These interactions are modelled by a Boundary Contour System (BCS) that generates emergent boundary segmentations and a complementary Feature Contour System (FCS) that fills-in surface representations of brightness, color, and depth. The BCS and FCS interact reciprocally with an Object Recognition System (ORS) that binds BCS boundary and FCS surface representations into attentive object representations. The BCS models the parvocellular LGNâInterblobâInterstripeâV4 cortical processing stream, the FCS models the parvocellular LGNâBlobâThin StripeâV4 cortical processing stream, and the ORS models inferotemporal cortex.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A neural network model of early visual processing offers an explanation of brightness effects often associated with illusory contours. Top-down feedback from the model's analog of visual cortical complex cells to model lateral geniculate nucleus (LGN) cells are used to enhance contrast at line ends and other areas of boundary discontinuity. The result is an increase in perceived brightness outside a dark line end, akin to what Kennedy (1979) termed "brightness buttons" in his analysis of visual illusions. When several lines form a suitable configuration, as in an Ehrenstein pattern, the perceptual effect of enhanced brightness can be quite strong. Model simulations show the generation of brightness buttons. With the LGN model circuitry embedded in a larger model of preattentive vision, simulations using complex inputs show the interaction of the brightness buttons with real and illusory contours.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An improved Boundary Contour System (BCS) and Feature Contour System (FCS) neural network model of preattentive vision is applied to large images containing range data gathered by a synthetic aperture radar (SAR) sensor. The goal of processing is to make structures such as motor vehicles, roads, or buildings more salient and more interpretable to human observers than they are in the original imagery. Early processing by shunting center-surround networks compresses signal dynamic range and performs local contrast enhancement. Subsequent processing by filters sensitive to oriented contrast, including short-range competition and long-range cooperation, segments the image into regions. The segmentation is performed by three "copies" of the BCS and FCS, of small, medium, and large scales, wherein the "short-range" and "long-range" interactions within each scale occur over smaller or larger distances, corresponding to the size of the early filters of each scale. A diffusive filling-in operation within the segmented regions at each scale produces coherent surface representations. The combination of BCS and FCS helps to locate and enhance structure over regions of many pixels, without the resulting blur characteristic of approaches based on low spatial frequency filtering alone.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Visual search data are given a unified quantitative explanation by a model of how spatial maps in the parietal cortex and object recognition categories in the inferotemporal cortex deploy attentional resources as they reciprocally interact with visual representations in the prestriate cortex. The model visual representations arc organized into multiple boundary and surface representations. Visual search in the model is initiated by organizing multiple items that lie within a given boundary or surface representation into a candidate search grouping. These items arc compared with object recognition categories to test for matches or mismatches. Mismatches can trigger deeper searches and recursive selection of new groupings until a target object io identified. This search model is algorithmically specified to quantitatively simulate search data using a single set of parameters, as well as to qualitatively explain a still larger data base, including data of Aks and Enns (1992), Bravo and Blake (1990), Chellazzi, Miller, Duncan, and Desimone (1993), Egeth, Viri, and Garbart (1984), Cohen and Ivry (1991), Enno and Rensink (1990), He and Nakayarna (1992), Humphreys, Quinlan, and Riddoch (1989), Mordkoff, Yantis, and Egeth (1990), Nakayama and Silverman (1986), Treisman and Gelade (1980), Treisman and Sato (1990), Wolfe, Cave, and Franzel (1989), and Wolfe and Friedman-Hill (1992). The model hereby provides an alternative to recent variations on the Feature Integration and Guided Search models, and grounds the analysis of visual search in neural models of preattentive vision, attentive object learning and categorization, and attentive spatial localization and orientation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A neural model of peripheral auditory processing is described and used to separate features of coarticulated vowels and consonants. After preprocessing of speech via a filterbank, the model splits into two parallel channels, a sustained channel and a transient channel. The sustained channel is sensitive to relatively stable parts of the speech waveform, notably synchronous properties of the vocalic portion of the stimulus it extends the dynamic range of eighth nerve filters using coincidence deteectors that combine operations of raising to a power, rectification, delay, multiplication, time averaging, and preemphasis. The transient channel is sensitive to critical features at the onsets and offsets of speech segments. It is built up from fast excitatory neurons that are modulated by slow inhibitory interneurons. These units are combined over high frequency and low frequency ranges using operations of rectification, normalization, multiplicative gating, and opponent processing. Detectors sensitive to frication and to onset or offset of stop consonants and vowels are described. Model properties are characterized by mathematical analysis and computer simulations. Neural analogs of model cells in the cochlear nucleus and inferior colliculus are noted, as are psychophysical data about perception of CV syllables that may be explained by the sustained transient channel hypothesis. The proposed sustained and transient processing seems to be an auditory analog of the sustained and transient processing that is known to occur in vision.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A neural model is presented of how cortical areas V1, V2, and V4 interact to convert a textured 2D image into a representation of curved 3D shape. Two basic problems are solved to achieve this: (1) Patterns of spatially discrete 2D texture elements are transformed into a spatially smooth surface representation of 3D shape. (2) Changes in the statistical properties of texture elements across space induce the perceived 3D shape of this surface representation. This is achieved in the model through multiple-scale filtering of a 2D image, followed by a cooperative-competitive grouping network that coherently binds texture elements into boundary webs at the appropriate depths using a scale-to-depth map and a subsequent depth competition stage. These boundary webs then gate filling-in of surface lightness signals in order to form a smooth 3D surface percept. The model quantitatively simulates challenging psychophysical data about perception of prolate ellipsoids (Todd and Akerstrom, 1987, J. Exp. Psych., 13, 242). In particular, the model represents a high degree of 3D curvature for a certain class of images, all of whose texture elements have the same degree of optical compression, in accordance with percepts of human observers. Simulations of 3D percepts of an elliptical cylinder, a slanted plane, and a photo of a golf ball are also presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A neural model is proposed of how laminar interactions in the visual cortex may learn and recognize object texture and form boundaries. The model brings together five interacting processes: region-based texture classification, contour-based boundary grouping, surface filling-in, spatial attention, and object attention. The model shows how form boundaries can determine regions in which surface filling-in occurs; how surface filling-in interacts with spatial attention to generate a form-fitting distribution of spatial attention, or attentional shroud; how the strongest shroud can inhibit weaker shrouds; and how the winning shroud regulates learning of texture categories, and thus the allocation of object attention. The model can discriminate abutted textures with blurred boundaries and is sensitive to texture boundary attributes like discontinuities in orientation and texture flow curvature as well as to relative orientations of texture elements. The model quantitatively fits a large set of human psychophysical data on orientation-based textures. Object boundar output of the model is compared to computer vision algorithms using a set of human segmented photographic images. The model classifies textures and suppresses noise using a multiple scale oriented filterbank and a distributed Adaptive Resonance Theory (dART) classifier. The matched signal between the bottom-up texture inputs and top-down learned texture categories is utilized by oriented competitive and cooperative grouping processes to generate texture boundaries that control surface filling-in and spatial attention. Topdown modulatory attentional feedback from boundary and surface representations to early filtering stages results in enhanced texture boundaries and more efficient learning of texture within attended surface regions. Surface-based attention also provides a self-supervising training signal for learning new textures. Importance of the surface-based attentional feedback in texture learning and classification is tested using a set of textured images from the Brodatz micro-texture album. Benchmark studies vary from 95.1% to 98.6% with attention, and from 90.6% to 93.2% without attention.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An improved Boundary Contour System (BCS) neural network model of preattentive vision is applied to two images that produce strong "pop-out" of emergent groupings in humans. In humans these images generate groupings collinear with or perpendicular to image contrasts. Analogous groupings occur in computer simulations of the model. Long-range cooperative and short-range competitive processes of the BCS dynamically form the stable groupings of texture regions in response to the images.