42 resultados para Motion Representation
Resumo:
A specialized formulation of Azarbayejani and Pentland's framework for recursive recovery of motion, structure and focal length from feature correspondences tracked through an image sequence is presented. The specialized formulation addresses the case where all tracked points lie on a plane. This planarity constraint reduces the dimension of the original state vector, and consequently the number of feature points needed to estimate the state. Experiments with synthetic data and real imagery illustrate the system performance. The experiments confirm that the specialized formulation provides improved accuracy, stability to observation noise, and rate of convergence in estimation for the case where the tracked points lie on a plane.
Resumo:
Standard structure from motion algorithms recover 3D structure of points. If a surface representation is desired, for example a piece-wise planar representation, then a two-step procedure typically follows: in the first step the plane-membership of points is first determined manually, and in a subsequent step planes are fitted to the sets of points thus determined, and their parameters are recovered. This paper presents an approach for automatically segmenting planar structures from a sequence of images, and simultaneously estimating their parameters. In the proposed approach the plane-membership of points is determined automatically, and the planar structure parameters are recovered directly in the algorithm rather than indirectly in a post-processing stage. Simulated and real experimental results show the efficacy of this approach.
Resumo:
We introduce a view-point invariant representation of moving object trajectories that can be used in video database applications. It is assumed that trajectories lie on a surface that can be locally approximated with a plane. Raw trajectory data is first locally approximated with a cubic spline via least squares fitting. For each sampled point of the obtained curve, a projective invariant feature is computed using a small number of points in its neighborhood. The resulting sequence of invariant features computed along the entire trajectory forms the view invariant descriptor of the trajectory itself. Time parametrization has been exploited to compute cross ratios without ambiguity due to point ordering. Similarity between descriptors of different trajectories is measured with a distance that takes into account the statistical properties of the cross ratio, and its symmetry with respect to the point at infinity. In experiments, an overall correct classification rate of about 95% has been obtained on a dataset of 58 trajectories of players in soccer video, and an overall correct classification rate of about 80% has been obtained on matching partial segments of trajectories collected from two overlapping views of outdoor scenes with moving people and cars.
Resumo:
This study develops a neuromorphic model of human lightness perception that is inspired by how the mammalian visual system is designed for this function. It is known that biological visual representations can adapt to a billion-fold change in luminance. How such a system determines absolute lightness under varying illumination conditions to generate a consistent interpretation of surface lightness remains an unsolved problem. Such a process, called "anchoring" of lightness, has properties including articulation, insulation, configuration, and area effects. The model quantitatively simulates such psychophysical lightness data, as well as other data such as discounting the illuminant, the double brilliant illusion, and lightness constancy and contrast effects. The model retina embodies gain control at retinal photoreceptors, and spatial contrast adaptation at the negative feedback circuit between mechanisms that model the inner segment of photoreceptors and interacting horizontal cells. The model can thereby adjust its sensitivity to input intensities ranging from dim moonlight to dazzling sunlight. A new anchoring mechanism, called the Blurred-Highest-Luminance-As-White (BHLAW) rule, helps simulate how surface lightness becomes sensitive to the spatial scale of objects in a scene. The model is also able to process natural color images under variable lighting conditions, and is compared with the popular RETINEX model.
Resumo:
How do visual form and motion processes cooperate to compute object motion when each process separately is insufficient? A 3D FORMOTION model specifies how 3D boundary representations, which separate figures from backgrounds within cortical area V2, capture motion signals at the appropriate depths in MT; how motion signals in MT disambiguate boundaries in V2 via MT-to-Vl-to-V2 feedback; how sparse feature tracking signals are amplified; and how a spatially anisotropic motion grouping process propagates across perceptual space via MT-MST feedback to integrate feature-tracking and ambiguous motion signals to determine a global object motion percept. Simulated data include: the degree of motion coherence of rotating shapes observed through apertures, the coherent vs. element motion percepts separated in depth during the chopsticks illusion, and the rigid vs. non-rigid appearance of rotating ellipses.
Resumo:
Log-polar image architectures, motivated by the structure of the human visual field, have long been investigated in computer vision for use in estimating motion parameters from an optical flow vector field. Practical problems with this approach have been: (i) dependence on assumed alignment of the visual and motion axes; (ii) sensitivity to occlusion form moving and stationary objects in the central visual field, where much of the numerical sensitivity is concentrated; and (iii) inaccuracy of the log-polar architecture (which is an approximation to the central 20°) for wide-field biological vision. In the present paper, we show that an algorithm based on generalization of the log-polar architecture; termed the log-dipolar sensor, provides a large improvement in performance relative to the usual log-polar sampling. Specifically, our algorithm: (i) is tolerant of large misalignmnet of the optical and motion axes; (ii) is insensitive to significant occlusion by objects of unknown motion; and (iii) represents a more correct analogy to the wide-field structure of human vision. Using the Helmholtz-Hodge decomposition to estimate the optical flow vector field on a log-dipolar sensor, we demonstrate these advantages, using synthetic optical flow maps as well as natural image sequences.
Resumo:
How does the brain make decisions? Speed and accuracy of perceptual decisions covary with certainty in the input, and correlate with the rate of evidence accumulation in parietal and frontal cortical "decision neurons." A biophysically realistic model of interactions within and between Retina/LGN and cortical areas V1, MT, MST, and LIP, gated by basal ganglia, simulates dynamic properties of decision-making in response to ambiguous visual motion stimuli used by Newsome, Shadlen, and colleagues in their neurophysiological experiments. The model clarifies how brain circuits that solve the aperture problem interact with a recurrent competitive network with self-normalizing choice properties to carry out probablistic decisions in real time. Some scientists claim that perception and decision-making can be described using Bayesian inference or related general statistical ideas, that estimate the optimal interpretation of the stimulus given priors and likelihoods. However, such concepts do not propose the neocortical mechanisms that enable perception, and make decisions. The present model explains behavioral and neurophysiological decision-making data without an appeal to Bayesian concepts and, unlike other existing models of these data, generates perceptual representations and choice dynamics in response to the experimental visual stimuli. Quantitative model simulations include the time course of LIP neuronal dynamics, as well as behavioral accuracy and reaction time properties, during both correct and error trials at different levels of input ambiguity in both fixed duration and reaction time tasks. Model MT/MST interactions compute the global direction of random dot motion stimuli, while model LIP computes the stochastic perceptual decision that leads to a saccadic eye movement.
Resumo:
A neural model is developed to explain how humans can approach a goal object on foot while steering around obstacles to avoid collisions in a cluttered environment. The model uses optic flow from a 3D virtual reality environment to determine the position of objects based on motion discotinuities, and computes heading direction, or the direction of self-motion, from global optic flow. The cortical representation of heading interacts with the representations of a goal and obstacles such that the goal acts as an attractor of heading, while obstacles act as repellers. In addition the model maintains fixation on the goal object by generating smooth pursuit eye movements. Eye rotations can distort the optic flow field, complicating heading perception, and the model uses extraretinal signals to correct for this distortion and accurately represent heading. The model explains how motion processing mechanisms in cortical areas MT, MST, and VIP can be used to guide steering. The model quantitatively simulates human psychophysical data about visually-guided steering, obstacle avoidance, and route selection.
Resumo:
When brain mechanism carry out motion integration and segmentation processes that compute unambiguous global motion percepts from ambiguous local motion signals? Consider, for example, a deer running at variable speeds behind forest cover. The forest cover is an occluder that creates apertures through which fragments of the deer's motion signals are intermittently experienced. The brain coherently groups these fragments into a trackable percept of the deer in its trajectory. Form and motion processes are needed to accomplish this using feedforward and feedback interactions both within and across cortical processing streams. All the cortical areas V1, V2, MT, and MST are involved in these interactions. Figure-ground processes in the form stream through V2, such as the seperation of occluding boundaries of the forest cover from the boundaries of the deer, select the motion signals which determine global object motion percepts in the motion stream through MT. Sparse, but unambiguous, feauture tracking signals are amplified before they propogate across position and are intergrated with far more numerous ambiguous motion signals. Figure-ground and integration processes together determine the global percept. A neural model predicts the processing stages that embody these form and motion interactions. Model concepts and data are summarized about motion grouping across apertures in response to a wide variety of displays, and probabilistic decision making in parietal cortex in response to random dot displays.
Resumo:
A neural model is developed to explain how humans can approach a goal object on foot while steering around obstacles to avoid collisions in a cluttered environment. The model uses optic flow from a 3D virtual reality environment to determine the position of objects based on motion discontinuities, and computes heading direction, or the direction of self-motion, from global optic flow. The cortical representation of heading interacts with the representations of a goal and obstacles such that the goal acts as an attractor of heading, while obstacles act as repellers. In addition the model maintains fixation on the goal object by generating smooth pursuit eye movements. Eye rotations can distort the optic flow field, complicating heading perception, and the model uses extraretinal signals to correct for this distortion and accurately represent heading. The model explains how motion processing mechanisms in cortical areas MT, MST, and posterior parietal cortex can be used to guide steering. The model quantitatively simulates human psychophysical data about visually-guided steering, obstacle avoidance, and route selection.
Resumo:
Studies of perceptual learning have focused on aspects of learning that are related to early stages of sensory processing. However, conclusions that perceptual learning results in low-level sensory plasticity are of great controversy, largely because such learning can often be attributed to plasticity in later stages of sensory processing or in the decision processes. To address this controversy, we developed a novel random dot motion (RDM) stimulus to target motion cells selective to contrast polarity, by ensuring the motion direction information arises only from signal dot onsets and not their offsets, and used these stimuli in conjunction with the paradigm of task-irrelevant perceptual learning (TIPL). In TIPL, learning is achieved in response to a stimulus by subliminally pairing that stimulus with the targets of an unrelated training task. In this manner, we are able to probe learning for an aspect of motion processing thought to be a function of directional V1 simple cells with a learning procedure that dissociates the learned stimulus from the decision processes relevant to the training task. Our results show learning for the exposed contrast polarity and that this learning does not transfer to the unexposed contrast polarity. These results suggest that TIPL for motion stimuli may occur at the stage of directional V1 simple cells.
Resumo:
Advanced Research Projects Agency (ONR N00014-92-J-4015); National Science Foundation (IRI-90-24877); Office of Naval Research (N00014-91-J-4100)
Resumo:
This article applies a recent theory of 3-D biological vision, called FACADE Theory, to explain several percepts which Kanizsa pioneered. These include 3-D pop-out of an occluding form in front of an occluded form, leading to completion and recognition of the occluded form; 3-D transparent and opaque percepts of Kanizsa squares, with and without Varin wedges; and interactions between percepts of illusory contours, brightness, and depth in response to 2-D Kanizsa images. These explanations clarify how a partially occluded object representation can be completed for purposes of object recognition, without the completed part of the representation necessarily being seen. The theory traces these percepts to neural mechanisms that compensate for measurement uncertainty and complementarity at individual cortical processing stages by using parallel and hierarchical interactions among several cortical processing stages. These interactions are modelled by a Boundary Contour System (BCS) that generates emergent boundary segmentations and a complementary Feature Contour System (FCS) that fills-in surface representations of brightness, color, and depth. The BCS and FCS interact reciprocally with an Object Recognition System (ORS) that binds BCS boundary and FCS surface representations into attentive object representations. The BCS models the parvocellular LGN→Interblob→Interstripe→V4 cortical processing stream, the FCS models the parvocellular LGN→Blob→Thin Stripe→V4 cortical processing stream, and the ORS models inferotemporal cortex.
Resumo:
This paper describes a self-organizing neural network that rapidly learns a body-centered representation of 3-D target positions. This representation remains invariant under head and eye movements, and is a key component of sensory-motor systems for producing motor equivalent reaches to targets (Bullock, Grossberg, and Guenther, 1993).
Resumo:
This article presents a new neural pattern recognition architecture on multichannel data representation. The architecture emploies generalized ART modules as building blocks to construct a supervised learning system generating recognition codes on channels dynamically selected in context using serial and parallel match trackings led by inter-ART vigilance signals.