18 resultados para Vision Tests

em Boston University Digital Common


Relevância:

20.00% 20.00%

Publicador:

Resumo:

A fundamental task of vision systems is to infer the state of the world given some form of visual observations. From a computational perspective, this often involves facing an ill-posed problem; e.g., information is lost via projection of the 3D world into a 2D image. Solution of an ill-posed problem requires additional information, usually provided as a model of the underlying process. It is important that the model be both computationally feasible as well as theoretically well-founded. In this thesis, a probabilistic, nonlinear supervised computational learning model is proposed: the Specialized Mappings Architecture (SMA). The SMA framework is demonstrated in a computer vision system that can estimate the articulated pose parameters of a human body or human hands, given images obtained via one or more uncalibrated cameras. The SMA consists of several specialized forward mapping functions that are estimated automatically from training data, and a possibly known feedback function. Each specialized function maps certain domains of the input space (e.g., image features) onto the output space (e.g., articulated body parameters). A probabilistic model for the architecture is first formalized. Solutions to key algorithmic problems are then derived: simultaneous learning of the specialized domains along with the mapping functions, as well as performing inference given inputs and a feedback function. The SMA employs a variant of the Expectation-Maximization algorithm and approximate inference. The approach allows the use of alternative conditional independence assumptions for learning and inference, which are derived from a forward model and a feedback model. Experimental validation of the proposed approach is conducted in the task of estimating articulated body pose from image silhouettes. Accuracy and stability of the SMA framework is tested using artificial data sets, as well as synthetic and real video sequences of human bodies and hands.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A vision based technique for non-rigid control is presented that can be used for animation and video game applications. The user grasps a soft, squishable object in front of a camera that can be moved and deformed in order to specify motion. Active Blobs, a non-rigid tracking technique is used to recover the position, rotation and non-rigid deformations of the object. The resulting transformations can be applied to a texture mapped mesh, thus allowing the user to control it interactively. Our use of texture mapping hardware allows us to make the system responsive enough for interactive animation and video game character control.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The performance of different classification approaches is evaluated using a view-based approach for motion representation. The view-based approach uses computer vision and image processing techniques to register and process the video sequence. Two motion representations called Motion Energy Images and Motion History Image are then constructed. These representations collapse the temporal component in a way that no explicit temporal analysis or sequence matching is needed. Statistical descriptions are then computed using moment-based features and dimensionality reduction techniques. For these tests, we used 7 Hu moments, which are invariant to scale and translation. Principal Components Analysis is used to reduce the dimensionality of this representation. The system is trained using different subjects performing a set of examples of every action to be recognized. Given these samples, K-nearest neighbor, Gaussian, and Gaussian mixture classifiers are used to recognize new actions. Experiments are conducted using instances of eight human actions (i.e., eight classes) performed by seven different subjects. Comparisons in the performance among these classifiers under different conditions are analyzed and reported. Our main goals are to test this dimensionality-reduced representation of actions, and more importantly to use this representation to compare the advantages of different classification approaches in this recognition task.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A key goal of behavioral and cognitive neuroscience is to link brain mechanisms to behavioral functions. The present article describes recent progress towards explaining how the visual cortex sees. Visual cortex, like many parts of perceptual and cognitive neocortex, is organized into six main layers of cells, as well as characteristic sub-lamina. Here it is proposed how these layered circuits help to realize the processes of developement, learning, perceptual grouping, attention, and 3D vision through a combination of bottom-up, horizontal, and top-down interactions. A key theme is that the mechanisms which enable developement and learning to occur in a stable way imply properties of adult behavior. These results thus begin to unify three fields: infant cortical developement, adult cortical neurophysiology and anatomy, and adult visual perception. The identified cortical mechanisms promise to generalize to explain how other perceptual and cognitive processes work.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Both animals and mobile robots, or animats, need adaptive control systems to guide their movements through a novel environment. Such control systems need reactive mechanisms for exploration, and learned plans to efficiently reach goal objects once the environment is familiar. How reactive and planned behaviors interact together in real time, and arc released at the appropriate times, during autonomous navigation remains a major unsolved problern. This work presents an end-to-end model to address this problem, named SOVEREIGN: A Self-Organizing, Vision, Expectation, Recognition, Emotion, Intelligent, Goal-oriented Navigation system. The model comprises several interacting subsystems, governed by systems of nonlinear differential equations. As the animat explores the environment, a vision module processes visual inputs using networks that arc sensitive to visual form and motion. Targets processed within the visual form system arc categorized by real-time incremental learning. Simultaneously, visual target position is computed with respect to the animat's body. Estimates of target position activate a motor system to initiate approach movements toward the target. Motion cues from animat locomotion can elicit orienting head or camera movements to bring a never target into view. Approach and orienting movements arc alternately performed during animat navigation. Cumulative estimates of each movement, based on both visual and proprioceptive cues, arc stored within a motor working memory. Sensory cues are stored in a parallel sensory working memory. These working memories trigger learning of sensory and motor sequence chunks, which together control planned movements. Effective chunk combinations arc selectively enhanced via reinforcement learning when the animat is rewarded. The planning chunks effect a gradual transition from reactive to planned behavior. The model can read-out different motor sequences under different motivational states and learns more efficient paths to rewarded goals as exploration proceeds. Several volitional signals automatically gate the interactions between model subsystems at appropriate times. A 3-D visual simulation environment reproduces the animat's sensory experiences as it moves through a simplified spatial environment. The SOVEREIGN model exhibits robust goal-oriented learning of sequential motor behaviors. Its biomimctic structure explicates a number of brain processes which are involved in spatial navigation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Air Force Office of Scientific Research (F49620-01-1-0423); National Geospatial-Intelligence Agency (NMA 201-01-1-2016); National Science Foundation (SBE-035437, DEG-0221680); Office of Naval Research (N00014-01-1-0624)

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Under natural viewing conditions, a single depthful percept of the world is consciously seen. When dissimilar images are presented to corresponding regions of the two eyes, binocular rivalyr may occur, during which the brain consciously perceives alternating percepts through time. How do the same brain mechanisms that generate a single depthful percept of the world also cause perceptual bistability, notably binocular rivalry? What properties of brain representations correspond to consciously seen percepts? A laminar cortical model of how cortical areas V1, V2, and V4 generate depthful percepts is developed to explain and quantitatively simulate binocualr rivalry data. The model proposes how mechanisms of cortical developement, perceptual grouping, and figure-ground perception lead to signle and rivalrous percepts. Quantitative model simulations include influences of contrast changes that are synchronized with switches in the dominant eye percept, gamma distribution of dominant phase durations, piecemeal percepts, and coexistence of eye-based and stimulus-based rivalry. The model also quantitatively explains data about multiple brain regions involved in rivalry, effects of object attention on switching between superimposed transparent surfaces, and monocular rivalry. These data explanations are linked to brain mechanisms that assure non-rivalrous conscious percepts. To our knowledge, no existing model can explain all of these phenomena.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

CONFIGR (CONtour FIgure GRound) is a computational model based on principles of biological vision that completes sparse and noisy image figures. Within an integrated vision/recognition system, CONFIGR posits an initial recognition stage which identifies figure pixels from spatially local input information. The resulting, and typically incomplete, figure is fed back to the “early vision” stage for long-range completion via filling-in. The reconstructed image is then re-presented to the recognition system for global functions such as object recognition. In the CONFIGR algorithm, the smallest independent image unit is the visible pixel, whose size defines a computational spatial scale. Once pixel size is fixed, the entire algorithm is fully determined, with no additional parameter choices. Multi-scale simulations illustrate the vision/recognition system. Open-source CONFIGR code is available online, but all examples can be derived analytically, and the design principles applied at each step are transparent. The model balances filling-in as figure against complementary filling-in as ground, which blocks spurious figure completions. Lobe computations occur on a subpixel spatial scale. Originally designed to fill-in missing contours in an incomplete image such as a dashed line, the same CONFIGR system connects and segments sparse dots, and unifies occluded objects from pieces locally identified as figure in the initial recognition stage. The model self-scales its completion distances, filling-in across gaps of any length, where unimpeded, while limiting connections among dense image-figure pixel groups that already have intrinsic form. Long-range image completion promises to play an important role in adaptive processors that reconstruct images from highly compressed video and still camera images.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

— Consideration of how people respond to the question What is this? has suggested new problem frontiers for pattern recognition and information fusion, as well as neural systems that embody the cognitive transformation of declarative information into relational knowledge. In contrast to traditional classification methods, which aim to find the single correct label for each exemplar (This is a car), the new approach discovers rules that embody coherent relationships among labels which would otherwise appear contradictory to a learning system (This is a car, that is a vehicle, over there is a sedan). This talk will describe how an individual who experiences exemplars in real time, with each exemplar trained on at most one category label, can autonomously discover a hierarchy of cognitive rules, thereby converting local information into global knowledge. Computational examples are based on the observation that sensors working at different times, locations, and spatial scales, and experts with different goals, languages, and situations, may produce apparently inconsistent image labels, which are reconciled by implicit underlying relationships that the network’s learning process discovers. The ARTMAP information fusion system can, moreover, integrate multiple separate knowledge hierarchies, by fusing independent domains into a unified structure. In the process, the system discovers cross-domain rules, inferring multilevel relationships among groups of output classes, without any supervised labeling of these relationships. In order to self-organize its expert system, the ARTMAP information fusion network features distributed code representations which exploit the model’s intrinsic capacity for one-to-many learning (This is a car and a vehicle and a sedan) as well as many-to-one learning (Each of those vehicles is a car). Fusion system software, testbed datasets, and articles are available from http://cns.bu.edu/techlab.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Boundary Contour System neural vision model reproduces perceptual illusory boundary formation by a conjunctive boundary completion process within a large cellular receptive field. The conjunctive chain allows the same kind of conjunction to occur across multiple receptive fields, which allows for sharper, more flexible boundary completion.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An extension to the Boundary Contour System model is proposed to account for boundary completion through vertices with arbitrary numbers of orientations, in a manner consistent with psychophysical observartions, by way of harmonic resonance in a neural architecture.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An extension to the orientational harmonic model is presented as a rotation, translation, and scale invariant representation of geometrical form in biological vision.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The proposed model, called the combinatorial and competitive spatio-temporal memory or CCSTM, provides an elegant solution to the general problem of having to store and recall spatio-temporal patterns in which states or sequences of states can recur in various contexts. For example, fig. 1 shows two state sequences that have a common subsequence, C and D. The CCSTM assumes that any state has a distributed representation as a collection of features. Each feature has an associated competitive module (CM) containing K cells. On any given occurrence of a particular feature, A, exactly one of the cells in CMA will be chosen to represent it. It is the particular set of cells active on the previous time step that determines which cells are chosen to represent instances of their associated features on the current time step. If we assume that typically S features are active in any state then any state has K^S different neural representations. This huge space of possible neural representations of any state is what underlies the model's ability to store and recall numerous context-sensitive state sequences. The purpose of this paper is simply to describe this mechanism.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A key goal of computational neuroscience is to link brain mechanisms to behavioral functions. The present article describes recent progress towards explaining how laminar neocortical circuits give rise to biological intelligence. These circuits embody two new and revolutionary computational paradigms: Complementary Computing and Laminar Computing. Circuit properties include a novel synthesis of feedforward and feedback processing, of digital and analog processing, and of pre-attentive and attentive processing. This synthesis clarifies the appeal of Bayesian approaches but has a far greater predictive range that naturally extends to self-organizing processes. Examples from vision and cognition are summarized. A LAMINART architecture unifies properties of visual development, learning, perceptual grouping, attention, and 3D vision. A key modeling theme is that the mechanisms which enable development and learning to occur in a stable way imply properties of adult behavior. It is noted how higher-order attentional constraints can influence multiple cortical regions, and how spatial and object attention work together to learn view-invariant object categories. In particular, a form-fitting spatial attentional shroud can allow an emerging view-invariant object category to remain active while multiple view categories are associated with it during sequences of saccadic eye movements. Finally, the chapter summarizes recent work on the LIST PARSE model of cognitive information processing by the laminar circuits of prefrontal cortex. LIST PARSE models the short-term storage of event sequences in working memory, their unitization through learning into sequence, or list, chunks, and their read-out in planned sequential performance that is under volitional control. LIST PARSE provides a laminar embodiment of Item and Order working memories, also called Competitive Queuing models, that have been supported by both psychophysical and neurobiological data. These examples show how variations of a common laminar cortical design can embody properties of visual and cognitive intelligence that seem, at least on the surface, to be mechanistically unrelated.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Under natural viewing conditions, a single depthful percept of the world is consciously seen. When dissimilar images are presented to corresponding regions of the two eyes, binocular rivalry may occur, during which the brain consciously perceives alternating percepts through time. Perceptual bistability can also occur in response to a single ambiguous figure. These percepts raise basic questions: What brain mechanisms generate a single depthful percept of the world? How do the same mechanisms cause perceptual bistability, notably binocular rivalry? What properties of brain representations correspond to consciously seen percepts? How do the dynamics of the layered circuits of visual cortex generate single and bistable percepts? A laminar cortical model of how cortical areas V1, V2, and V4 generate depthful percepts is developed to explain and quantitatively simulate binocular rivalry data. The model proposes how mechanisms of cortical development, perceptual grouping, and figure-ground perception lead to single and rivalrous percepts.