986 resultados para National Science Foundation (U.S.). Office of Polar Programs


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nearest neighbor classification using shape context can yield highly accurate results in a number of recognition problems. Unfortunately, the approach can be too slow for practical applications, and thus approximation strategies are needed to make shape context practical. This paper proposes a method for efficient and accurate nearest neighbor classification in non-Euclidean spaces, such as the space induced by the shape context measure. First, a method is introduced for constructing a Euclidean embedding that is optimized for nearest neighbor classification accuracy. Using that embedding, multiple approximations of the underlying non-Euclidean similarity measure are obtained, at different levels of accuracy and efficiency. The approximations are automatically combined to form a cascade classifier, which applies the slower approximations only to the hardest cases. Unlike typical cascade-of-classifiers approaches, that are applied to binary classification problems, our method constructs a cascade for a multiclass problem. Experiments with a standard shape data set indicate that a two-to-three order of magnitude speed up is gained over the standard shape context classifier, with minimal losses in classification accuracy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes a method for detecting shapes of variable structure in images with clutter. The term "variable structure" means that some shape parts can be repeated an arbitrary number of times, some parts can be optional, and some parts can have several alternative appearances. The particular variation of the shape structure that occurs in a given image is not known a priori. Existing computer vision methods, including deformable model methods, were not designed to detect shapes of variable structure; they may only be used to detect shapes that can be decomposed into a fixed, a priori known, number of parts. The proposed method can handle both variations in shape structure and variations in the appearance of individual shape parts. A new class of shape models is introduced, called Hidden State Shape Models, that can naturally represent shapes of variable structure. A detection algorithm is described that finds instances of such shapes in images with large amounts of clutter by finding globally optimal correspondences between image features and shape models. Experiments with real images demonstrate that our method can localize plant branches that consist of an a priori unknown number of leaves and can detect hands more accurately than a hand detector based on the chamfer distance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nearest neighbor search is commonly employed in face recognition but it does not scale well to large dataset sizes. A strategy to combine rejection classifiers into a cascade for face identification is proposed in this paper. A rejection classifier for a pair of classes is defined to reject at least one of the classes with high confidence. These rejection classifiers are able to share discriminants in feature space and at the same time have high confidence in the rejection decision. In the face identification problem, it is possible that a pair of known individual faces are very dissimilar. It is very unlikely that both of them are close to an unknown face in the feature space. Hence, only one of them needs to be considered. Using a cascade structure of rejection classifiers, the scope of nearest neighbor search can be reduced significantly. Experiments on Face Recognition Grand Challenge (FRGC) version 1 data demonstrate that the proposed method achieves significant speed up and an accuracy comparable with the brute force Nearest Neighbor method. In addition, a graph cut based clustering technique is employed to demonstrate that the pairwise separability of these rejection classifiers is capable of semantic grouping.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A learning based framework is proposed for estimating human body pose from a single image. Given a differentiable function that maps from pose space to image feature space, the goal is to invert the process: estimate the pose given only image features. The inversion is an ill-posed problem as the inverse mapping is a one to many process. Hence multiple solutions exist, and it is desirable to restrict the solution space to a smaller subset of feasible solutions. For example, not all human body poses are feasible due to anthropometric constraints. Since the space of feasible solutions may not admit a closed form description, the proposed framework seeks to exploit machine learning techniques to learn an approximation that is smoothly parameterized over such a space. One such technique is Gaussian Process Latent Variable Modelling. Scaled conjugate gradient is then used find the best matching pose in the space of feasible solutions when given an input image. The formulation allows easy incorporation of various constraints, e.g. temporal consistency and anthropometric constraints. The performance of the proposed approach is evaluated in the task of upper-body pose estimation from silhouettes and compared with the Specialized Mapping Architecture. The estimation accuracy of the Specialized Mapping Architecture is at least one standard deviation worse than the proposed approach in the experiments with synthetic data. In experiments with real video of humans performing gestures, the proposed approach produces qualitatively better estimation results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A model of laminar visual cortical dynamics proposes how 3D boundary and surface representations of slated and curved 3D objects and 2D images arise. The 3D boundary representations emerge from interactions between non-classical horizontal receptive field interactions with intracorticcal and intercortical feedback circuits. Such non-classical interactions contextually disambiguate classical receptive field responses to ambiguous visual cues using cells that are sensitive to angles and disparity gradients with cortical areas V1 and V2. These cells are all variants of bipole grouping cells. Model simulations show how horizontal connections can develop selectively to angles, how slanted surfaces can activate 3D boundary representations that are sensitive to angles and disparity gradients, how 3D filling-in occurs across slanted surfaces, how a 2D Necker cube image can be represented in 3D, and how bistable Necker cuber percepts occur. The model also explains data about slant aftereffects and 3D neon color spreading. It shows how habituative transmitters that help to control developement also help to trigger bistable 3D percepts and slant aftereffects, and how attention can influence which of these percepts is perceived by propogating along some object boundaries.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The perception of a glossy surface in a static monochromatic image can occur when a bright highlight is embedded in a compatible context of shading and a bounding contour. Some images naturally give rise to the impression that a surface has a uniform reflectance, characteristic of a shiny object, even though the highlight may only cover a small portion of the surface. Nonetheless, an observer may adopt an attitude of scrutiny in viewing a glossy surface, whereby the impression of gloss is partial and nonuniform at image regions outside of a higlight. Using a rating scale and small probe points to indicate image locations, differential perception of gloss within a single object is investigate in the present study. Observers' gloss ratings are not uniform across the surface, but decrease as a function of distance from highlight. When, by design, the distance from a highlight is uncoupled from the luminance value at corresponding probe points, the decrease in rated gloss correlates more with the distance than with the luminance change. Experiments also indicate that gloss ratings change as a function of estimated surface distance, rather than as a function of image distance. Surface continuity affects gloss ratings, suggesting that apprehension of 3D surface structure is crucial for gloss perception.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multiple sound sources often contain harmonics that overlap and may be degraded by environmental noise. The auditory system is capable of teasing apart these sources into distinct mental objects, or streams. Such an "auditory scene analysis" enables the brain to solve the cocktail party problem. A neural network model of auditory scene analysis, called the AIRSTREAM model, is presented to propose how the brain accomplishes this feat. The model clarifies how the frequency components that correspond to a give acoustic source may be coherently grouped together into distinct streams based on pitch and spatial cues. The model also clarifies how multiple streams may be distinguishes and seperated by the brain. Streams are formed as spectral-pitch resonances that emerge through feedback interactions between frequency-specific spectral representaion of a sound source and its pitch. First, the model transforms a sound into a spatial pattern of frequency-specific activation across a spectral stream layer. The sound has multiple parallel representations at this layer. A sound's spectral representation activates a bottom-up filter that is sensitive to harmonics of the sound's pitch. The filter activates a pitch category which, in turn, activate a top-down expectation that allows one voice or instrument to be tracked through a noisy multiple source environment. Spectral components are suppressed if they do not match harmonics of the top-down expectation that is read-out by the selected pitch, thereby allowing another stream to capture these components, as in the "old-plus-new-heuristic" of Bregman. Multiple simultaneously occuring spectral-pitch resonances can hereby emerge. These resonance and matching mechanisms are specialized versions of Adaptive Resonance Theory, or ART, which clarifies how pitch representations can self-organize durin learning of harmonic bottom-up filters and top-down expectations. The model also clarifies how spatial location cues can help to disambiguate two sources with similar spectral cures. Data are simulated from psychophysical grouping experiments, such as how a tone sweeping upwards in frequency creates a bounce percept by grouping with a downward sweeping tone due to proximity in frequency, even if noise replaces the tones at their interection point. Illusory auditory percepts are also simulated, such as the auditory continuity illusion of a tone continuing through a noise burst even if the tone is not present during the noise, and the scale illusion of Deutsch whereby downward and upward scales presented alternately to the two ears are regrouped based on frequency proximity, leading to a bounce percept. Since related sorts of resonances have been used to quantitatively simulate psychophysical data about speech perception, the model strengthens the hypothesis the ART-like mechanisms are used at multiple levels of the auditory system. Proposals for developing the model to explain more complex streaming data are also provided.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Air Force Office of Scientific Research (F49620-01-1-0423); National Geospatial-Intelligence Agency (NMA 201-01-1-2016); National Science Foundation (SBE-035437, DEG-0221680); Office of Naval Research (N00014-01-1-0624)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

How do visual form and motion processes cooperate to compute object motion when each process separately is insufficient? A 3D FORMOTION model specifies how 3D boundary representations, which separate figures from backgrounds within cortical area V2, capture motion signals at the appropriate depths in MT; how motion signals in MT disambiguate boundaries in V2 via MT-to-Vl-to-V2 feedback; how sparse feature tracking signals are amplified; and how a spatially anisotropic motion grouping process propagates across perceptual space via MT-MST feedback to integrate feature-tracking and ambiguous motion signals to determine a global object motion percept. Simulated data include: the degree of motion coherence of rotating shapes observed through apertures, the coherent vs. element motion percepts separated in depth during the chopsticks illusion, and the rigid vs. non-rigid appearance of rotating ellipses.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

How does the brain make decisions? Speed and accuracy of perceptual decisions covary with certainty in the input, and correlate with the rate of evidence accumulation in parietal and frontal cortical "decision neurons." A biophysically realistic model of interactions within and between Retina/LGN and cortical areas V1, MT, MST, and LIP, gated by basal ganglia, simulates dynamic properties of decision-making in response to ambiguous visual motion stimuli used by Newsome, Shadlen, and colleagues in their neurophysiological experiments. The model clarifies how brain circuits that solve the aperture problem interact with a recurrent competitive network with self-normalizing choice properties to carry out probablistic decisions in real time. Some scientists claim that perception and decision-making can be described using Bayesian inference or related general statistical ideas, that estimate the optimal interpretation of the stimulus given priors and likelihoods. However, such concepts do not propose the neocortical mechanisms that enable perception, and make decisions. The present model explains behavioral and neurophysiological decision-making data without an appeal to Bayesian concepts and, unlike other existing models of these data, generates perceptual representations and choice dynamics in response to the experimental visual stimuli. Quantitative model simulations include the time course of LIP neuronal dynamics, as well as behavioral accuracy and reaction time properties, during both correct and error trials at different levels of input ambiguity in both fixed duration and reaction time tasks. Model MT/MST interactions compute the global direction of random dot motion stimuli, while model LIP computes the stochastic perceptual decision that leads to a saccadic eye movement.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A neural model is developed to explain how humans can approach a goal object on foot while steering around obstacles to avoid collisions in a cluttered environment. The model uses optic flow from a 3D virtual reality environment to determine the position of objects based on motion discotinuities, and computes heading direction, or the direction of self-motion, from global optic flow. The cortical representation of heading interacts with the representations of a goal and obstacles such that the goal acts as an attractor of heading, while obstacles act as repellers. In addition the model maintains fixation on the goal object by generating smooth pursuit eye movements. Eye rotations can distort the optic flow field, complicating heading perception, and the model uses extraretinal signals to correct for this distortion and accurately represent heading. The model explains how motion processing mechanisms in cortical areas MT, MST, and VIP can be used to guide steering. The model quantitatively simulates human psychophysical data about visually-guided steering, obstacle avoidance, and route selection.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Under natural viewing conditions, a single depthful percept of the world is consciously seen. When dissimilar images are presented to corresponding regions of the two eyes, binocular rivalyr may occur, during which the brain consciously perceives alternating percepts through time. How do the same brain mechanisms that generate a single depthful percept of the world also cause perceptual bistability, notably binocular rivalry? What properties of brain representations correspond to consciously seen percepts? A laminar cortical model of how cortical areas V1, V2, and V4 generate depthful percepts is developed to explain and quantitatively simulate binocualr rivalry data. The model proposes how mechanisms of cortical developement, perceptual grouping, and figure-ground perception lead to signle and rivalrous percepts. Quantitative model simulations include influences of contrast changes that are synchronized with switches in the dominant eye percept, gamma distribution of dominant phase durations, piecemeal percepts, and coexistence of eye-based and stimulus-based rivalry. The model also quantitatively explains data about multiple brain regions involved in rivalry, effects of object attention on switching between superimposed transparent surfaces, and monocular rivalry. These data explanations are linked to brain mechanisms that assure non-rivalrous conscious percepts. To our knowledge, no existing model can explain all of these phenomena.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Advanced Research Projects Agency (ONR N00014-92-J-4015); National Science Foundation (IRI-90-24877); Office of Naval Research (N00014-91-J-4100)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

1) A large body of behavioral data conceming animal and human gaits and gait transitions is simulated as emergent properties of a central pattern generator (CPG) model. The CPG model incorporates neurons obeying Hodgkin-Huxley type dynamics that interact via an on-center off-surround anatomy whose excitatory signals operate on a faster time scale than their inhibitory signals. A descending cornmand or arousal signal called a GO signal activates the gaits and controL their transitions. The GO signal and the CPG model are compared with neural data from globus pallidus and spinal cord, among other brain structures. 2) Data from human bimanual finger coordination tasks are simulated in which anti-phase oscillations at low frequencies spontaneously switch to in-phase oscillations at high frequencies, in-phase oscillations can be performed both at low and high frequencies, phase fluctuations occur at the anti-phase in-phase transition, and a "seagull effect" of larger errors occurs at intermediate phases. When driven by environmental patterns with intermediate phase relationships, the model's output exhibits a tendency to slip toward purely in-phase and anti-phase relationships as observed in humans subjects. 3) Quadruped vertebrate gaits, including the amble, the walk, all three pairwise gaits (trot, pace, and gallop) and the pronk are simulated. Rapid gait transitions are simulated in the order--walk, trot, pace, and gallop--that occurs in the cat, along with the observed increase in oscillation frequency. 4) Precise control of quadruped gait switching is achieved in the model by using GO-dependent modulation of the model's inhibitory interactions. This generates a different functional connectivity in a single CPG at different arousal levels. Such task-specific modulation of functional connectivity in neural pattern generators has been experimentally reported in invertebrates. Phase-dependent modulation of reflex gain has been observed in cats. A role for state-dependent modulation is herein predicted to occur in vertebrates for precise control of phase transitions from one gait to another. 5) The primary human gaits (the walk and the run) and elephant gaits (the amble and the walk) are sirnulated. Although these two gaits are qualitatively different, they both have the same limb order and may exhibit oscillation frequencies that overlap. The CPG model simulates the walk and the run by generating oscillations which exhibit the same phase relationships. but qualitatively different waveform shapes, at different GO signal levels. The fraction of each cycle that activity is above threshold quantitatively distinguishes the two gaits, much as the duty cycles of the feet are longer in the walk than in the run. 6) A key model properly concerns the ability of a single model CPG, that obeys a fixed set of opponent processing equations to generate both in-phase and anti-phase oscillations at different arousal levels. Phase transitions from either in-phase to anti-phase oscillations, or from anti-phase to in-phase oscillations, can occur in different parameter ranges, as the GO signal increases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a self-organizing neural network that rapidly learns a body-centered representation of 3-D target positions. This representation remains invariant under head and eye movements, and is a key component of sensory-motor systems for producing motor equivalent reaches to targets (Bullock, Grossberg, and Guenther, 1993).