8 resultados para Subgrid Scale Model
em Boston University Digital Common
Resumo:
An extension to the orientational harmonic model is presented as a rotation, translation, and scale invariant representation of geometrical form in biological vision.
Resumo:
The proposed model, called the combinatorial and competitive spatio-temporal memory or CCSTM, provides an elegant solution to the general problem of having to store and recall spatio-temporal patterns in which states or sequences of states can recur in various contexts. For example, fig. 1 shows two state sequences that have a common subsequence, C and D. The CCSTM assumes that any state has a distributed representation as a collection of features. Each feature has an associated competitive module (CM) containing K cells. On any given occurrence of a particular feature, A, exactly one of the cells in CMA will be chosen to represent it. It is the particular set of cells active on the previous time step that determines which cells are chosen to represent instances of their associated features on the current time step. If we assume that typically S features are active in any state then any state has K^S different neural representations. This huge space of possible neural representations of any state is what underlies the model's ability to store and recall numerous context-sensitive state sequences. The purpose of this paper is simply to describe this mechanism.
Resumo:
A neural model is presented of how cortical areas V1, V2, and V4 interact to convert a textured 2D image into a representation of curved 3D shape. Two basic problems are solved to achieve this: (1) Patterns of spatially discrete 2D texture elements are transformed into a spatially smooth surface representation of 3D shape. (2) Changes in the statistical properties of texture elements across space induce the perceived 3D shape of this surface representation. This is achieved in the model through multiple-scale filtering of a 2D image, followed by a cooperative-competitive grouping network that coherently binds texture elements into boundary webs at the appropriate depths using a scale-to-depth map and a subsequent depth competition stage. These boundary webs then gate filling-in of surface lightness signals in order to form a smooth 3D surface percept. The model quantitatively simulates challenging psychophysical data about perception of prolate ellipsoids (Todd and Akerstrom, 1987, J. Exp. Psych., 13, 242). In particular, the model represents a high degree of 3D curvature for a certain class of images, all of whose texture elements have the same degree of optical compression, in accordance with percepts of human observers. Simulations of 3D percepts of an elliptical cylinder, a slanted plane, and a photo of a golf ball are also presented.
Resumo:
Hidden State Shape Models (HSSMs) [2], a variant of Hidden Markov Models (HMMs) [9], were proposed to detect shape classes of variable structure in cluttered images. In this paper, we formulate a probabilistic framework for HSSMs which provides two major improvements in comparison to the previous method [2]. First, while the method in [2] required the scale of the object to be passed as an input, the method proposed here estimates the scale of the object automatically. This is achieved by introducing a new term for the observation probability that is based on a object-clutter feature model. Second, a segmental HMM [6, 8] is applied to model the "duration probability" of each HMM state, which is learned from the shape statistics in a training set and helps obtain meaningful registration results. Using a segmental HMM provides a principled way to model dependencies between the scales of different parts of the object. In object localization experiments on a dataset of real hand images, the proposed method significantly outperforms the method of [2], reducing the incorrect localization rate from 40% to 15%. The improvement in accuracy becomes more significant if we consider that the method proposed here is scale-independent, whereas the method of [2] takes as input the scale of the object we want to localize.
Resumo:
This article develops a neural model of how the visual system processes natural images under variable illumination conditions to generate surface lightness percepts. Previous models have clarified how the brain can compute the relative contrast of images from variably illuminate scenes. How the brain determines an absolute lightness scale that "anchors" percepts of surface lightness to us the full dynamic range of neurons remains an unsolved problem. Lightness anchoring properties include articulation, insulation, configuration, and are effects. The model quantatively simulates these and other lightness data such as discounting the illuminant, the double brilliant illusion, lightness constancy and contrast, Mondrian contrast constancy, and the Craik-O'Brien-Cornsweet illusion. The model also clarifies the functional significance for lightness perception of anatomical and neurophysiological data, including gain control at retinal photoreceptors, and spatioal contrast adaptation at the negative feedback circuit between the inner segment of photoreceptors and interacting horizontal cells. The model retina can hereby adjust its sensitivity to input intensities ranging from dim moonlight to dazzling sunlight. A later model cortical processing stages, boundary representations gate the filling-in of surface lightness via long-range horizontal connections. Variants of this filling-in mechanism run 100-1000 times faster than diffusion mechanisms of previous biological filling-in models, and shows how filling-in can occur at realistic speeds. A new anchoring mechanism called the Blurred-Highest-Luminance-As-White (BHLAW) rule helps simulate how surface lightness becomes sensitive to the spatial scale of objects in a scene. The model is also able to process natural images under variable lighting conditions.
Resumo:
Multiple sound sources often contain harmonics that overlap and may be degraded by environmental noise. The auditory system is capable of teasing apart these sources into distinct mental objects, or streams. Such an "auditory scene analysis" enables the brain to solve the cocktail party problem. A neural network model of auditory scene analysis, called the AIRSTREAM model, is presented to propose how the brain accomplishes this feat. The model clarifies how the frequency components that correspond to a give acoustic source may be coherently grouped together into distinct streams based on pitch and spatial cues. The model also clarifies how multiple streams may be distinguishes and seperated by the brain. Streams are formed as spectral-pitch resonances that emerge through feedback interactions between frequency-specific spectral representaion of a sound source and its pitch. First, the model transforms a sound into a spatial pattern of frequency-specific activation across a spectral stream layer. The sound has multiple parallel representations at this layer. A sound's spectral representation activates a bottom-up filter that is sensitive to harmonics of the sound's pitch. The filter activates a pitch category which, in turn, activate a top-down expectation that allows one voice or instrument to be tracked through a noisy multiple source environment. Spectral components are suppressed if they do not match harmonics of the top-down expectation that is read-out by the selected pitch, thereby allowing another stream to capture these components, as in the "old-plus-new-heuristic" of Bregman. Multiple simultaneously occuring spectral-pitch resonances can hereby emerge. These resonance and matching mechanisms are specialized versions of Adaptive Resonance Theory, or ART, which clarifies how pitch representations can self-organize durin learning of harmonic bottom-up filters and top-down expectations. The model also clarifies how spatial location cues can help to disambiguate two sources with similar spectral cures. Data are simulated from psychophysical grouping experiments, such as how a tone sweeping upwards in frequency creates a bounce percept by grouping with a downward sweeping tone due to proximity in frequency, even if noise replaces the tones at their interection point. Illusory auditory percepts are also simulated, such as the auditory continuity illusion of a tone continuing through a noise burst even if the tone is not present during the noise, and the scale illusion of Deutsch whereby downward and upward scales presented alternately to the two ears are regrouped based on frequency proximity, leading to a bounce percept. Since related sorts of resonances have been used to quantitatively simulate psychophysical data about speech perception, the model strengthens the hypothesis the ART-like mechanisms are used at multiple levels of the auditory system. Proposals for developing the model to explain more complex streaming data are also provided.
Resumo:
This study develops a neuromorphic model of human lightness perception that is inspired by how the mammalian visual system is designed for this function. It is known that biological visual representations can adapt to a billion-fold change in luminance. How such a system determines absolute lightness under varying illumination conditions to generate a consistent interpretation of surface lightness remains an unsolved problem. Such a process, called "anchoring" of lightness, has properties including articulation, insulation, configuration, and area effects. The model quantitatively simulates such psychophysical lightness data, as well as other data such as discounting the illuminant, the double brilliant illusion, and lightness constancy and contrast effects. The model retina embodies gain control at retinal photoreceptors, and spatial contrast adaptation at the negative feedback circuit between mechanisms that model the inner segment of photoreceptors and interacting horizontal cells. The model can thereby adjust its sensitivity to input intensities ranging from dim moonlight to dazzling sunlight. A new anchoring mechanism, called the Blurred-Highest-Luminance-As-White (BHLAW) rule, helps simulate how surface lightness becomes sensitive to the spatial scale of objects in a scene. The model is also able to process natural color images under variable lighting conditions, and is compared with the popular RETINEX model.
Resumo:
CONFIGR (CONtour FIgure GRound) is a computational model based on principles of biological vision that completes sparse and noisy image figures. Within an integrated vision/recognition system, CONFIGR posits an initial recognition stage which identifies figure pixels from spatially local input information. The resulting, and typically incomplete, figure is fed back to the “early vision” stage for long-range completion via filling-in. The reconstructed image is then re-presented to the recognition system for global functions such as object recognition. In the CONFIGR algorithm, the smallest independent image unit is the visible pixel, whose size defines a computational spatial scale. Once pixel size is fixed, the entire algorithm is fully determined, with no additional parameter choices. Multi-scale simulations illustrate the vision/recognition system. Open-source CONFIGR code is available online, but all examples can be derived analytically, and the design principles applied at each step are transparent. The model balances filling-in as figure against complementary filling-in as ground, which blocks spurious figure completions. Lobe computations occur on a subpixel spatial scale. Originally designed to fill-in missing contours in an incomplete image such as a dashed line, the same CONFIGR system connects and segments sparse dots, and unifies occluded objects from pieces locally identified as figure in the initial recognition stage. The model self-scales its completion distances, filling-in across gaps of any length, where unimpeded, while limiting connections among dense image-figure pixel groups that already have intrinsic form. Long-range image completion promises to play an important role in adaptive processors that reconstruct images from highly compressed video and still camera images.