11 resultados para Alternative body cuture
em Boston University Digital Common
Resumo:
A fundamental task of vision systems is to infer the state of the world given some form of visual observations. From a computational perspective, this often involves facing an ill-posed problem; e.g., information is lost via projection of the 3D world into a 2D image. Solution of an ill-posed problem requires additional information, usually provided as a model of the underlying process. It is important that the model be both computationally feasible as well as theoretically well-founded. In this thesis, a probabilistic, nonlinear supervised computational learning model is proposed: the Specialized Mappings Architecture (SMA). The SMA framework is demonstrated in a computer vision system that can estimate the articulated pose parameters of a human body or human hands, given images obtained via one or more uncalibrated cameras. The SMA consists of several specialized forward mapping functions that are estimated automatically from training data, and a possibly known feedback function. Each specialized function maps certain domains of the input space (e.g., image features) onto the output space (e.g., articulated body parameters). A probabilistic model for the architecture is first formalized. Solutions to key algorithmic problems are then derived: simultaneous learning of the specialized domains along with the mapping functions, as well as performing inference given inputs and a feedback function. The SMA employs a variant of the Expectation-Maximization algorithm and approximate inference. The approach allows the use of alternative conditional independence assumptions for learning and inference, which are derived from a forward model and a feedback model. Experimental validation of the proposed approach is conducted in the task of estimating articulated body pose from image silhouettes. Accuracy and stability of the SMA framework is tested using artificial data sets, as well as synthetic and real video sequences of human bodies and hands.
Resumo:
M.A. Thesis / University of Pretoria / Department of Practical Theology / Advised by Prof M Masango
Resumo:
An approach for estimating 3D body pose from multiple, uncalibrated views is proposed. First, a mapping from image features to 2D body joint locations is computed using a statistical framework that yields a set of several body pose hypotheses. The concept of a "virtual camera" is introduced that makes this mapping invariant to translation, image-plane rotation, and scaling of the input. As a consequence, the calibration matrices (intrinsics) of the virtual cameras can be considered completely known, and their poses are known up to a single angular displacement parameter. Given pose hypotheses obtained in the multiple virtual camera views, the recovery of 3D body pose and camera relative orientations is formulated as a stochastic optimization problem. An Expectation-Maximization algorithm is derived that can obtain the locally most likely (self-consistent) combination of body pose hypotheses. Performance of the approach is evaluated with synthetic sequences as well as real video sequences of human motion.
Resumo:
A novel approach for estimating articulated body posture and motion from monocular video sequences is proposed. Human pose is defined as the instantaneous two dimensional configuration (i.e., the projection onto the image plane) of a single articulated body in terms of the position of a predetermined set of joints. First, statistical segmentation of the human bodies from the background is performed and low-level visual features are found given the segmented body shape. The goal is to be able to map these, generally low level, visual features to body configurations. The system estimates different mappings, each one with a specific cluster in the visual feature space. Given a set of body motion sequences for training, unsupervised clustering is obtained via the Expectation Maximation algorithm. Then, for each of the clusters, a function is estimated to build the mapping between low-level features to 3D pose. Currently this mapping is modeled by a neural network. Given new visual features, a mapping from each cluster is performed to yield a set of possible poses. From this set, the system selects the most likely pose given the learned probability distribution and the visual feature similarity between hypothesis and input. Performance of the proposed approach is characterized using a new set of known body postures, showing promising results.
Resumo:
A non-linear supervised learning architecture, the Specialized Mapping Architecture (SMA) and its application to articulated body pose reconstruction from single monocular images is described. The architecture is formed by a number of specialized mapping functions, each of them with the purpose of mapping certain portions (connected or not) of the input space, and a feedback matching process. A probabilistic model for the architecture is described along with a mechanism for learning its parameters. The learning problem is approached using a maximum likelihood estimation framework; we present Expectation Maximization (EM) algorithms for two different instances of the likelihood probability. Performance is characterized by estimating human body postures from low level visual features, showing promising results.
Resumo:
Particle filtering is a popular method used in systems for tracking human body pose in video. One key difficulty in using particle filtering is caused by the curse of dimensionality: generally a very large number of particles is required to adequately approximate the underlying pose distribution in a high-dimensional state space. Although the number of degrees of freedom in the human body is quite large, in reality, the subset of allowable configurations in state space is generally restricted by human biomechanics, and the trajectories in this allowable subspace tend to be smooth. Therefore, a framework is proposed to learn a low-dimensional representation of the high-dimensional human poses state space. This mapping can be learned using a Gaussian Process Latent Variable Model (GPLVM) framework. One important advantage of the GPLVM framework is that both the mapping to, and mapping from the embedded space are smooth; this facilitates sampling in the low-dimensional space, and samples generated in the low-dimensional embedded space are easily mapped back into the original highdimensional space. Moreover, human body poses that are similar in the original space tend to be mapped close to each other in the embedded space; this property can be exploited when sampling in the embedded space. The proposed framework is tested in tracking 2D human body pose using a Scaled Prismatic Model. Experiments on real life video sequences demonstrate the strength of the approach. In comparison with the Multiple Hypothesis Tracking and the standard Condensation algorithm, the proposed algorithm is able to maintain tracking reliably throughout the long test sequences. It also handles singularity and self occlusion robustly.
Resumo:
Many people suffer from conditions that lead to deterioration of motor control and makes access to the computer using traditional input devices difficult. In particular, they may loose control of hand movement to the extent that the standard mouse cannot be used as a pointing device. Most current alternatives use markers or specialized hardware to track and translate a user's movement to pointer movement. These approaches may be perceived as intrusive, for example, wearable devices. Camera-based assistive systems that use visual tracking of features on the user's body often require cumbersome manual adjustment. This paper introduces an enhanced computer vision based strategy where features, for example on a user's face, viewed through an inexpensive USB camera, are tracked and translated to pointer movement. The main contributions of this paper are (1) enhancing a video based interface with a mechanism for mapping feature movement to pointer movement, which allows users to navigate to all areas of the screen even with very limited physical movement, and (2) providing a customizable, hierarchical navigation framework for human computer interaction (HCI). This framework provides effective use of the vision-based interface system for accessing multiple applications in an autonomous setting. Experiments with several users show the effectiveness of the mapping strategy and its usage within the application framework as a practical tool for desktop users with disabilities.
Resumo:
Lehar's lively discussion builds on a critique of neural models of vision that is incorrect in its general and specific claims. He espouses a Gestalt perceptual approach, rather than one consistent with the "objective neurophysiological state of the visual system" (p. 1). Contemporary vision models realize his perceptual goals and also quantitatively explain neurophysiological and anatomical data.
Resumo:
This paper describes a self-organizing neural network that rapidly learns a body-centered representation of 3-D target positions. This representation remains invariant under head and eye movements, and is a key component of sensory-motor systems for producing motor equivalent reaches to targets (Bullock, Grossberg, and Guenther, 1993).
Resumo:
A neural model is described of how the brain may autonomously learn a body-centered representation of 3-D target position by combining information about retinal target position, eye position, and head position in real time. Such a body-centered spatial representation enables accurate movement commands to the limbs to be generated despite changes in the spatial relationships between the eyes, head, body, and limbs through time. The model learns a vector representation--otherwise known as a parcellated distributed representation--of target vergence with respect to the two eyes, and of the horizontal and vertical spherical angles of the target with respect to a cyclopean egocenter. Such a vergence-spherical representation has been reported in the caudal midbrain and medulla of the frog, as well as in psychophysical movement studies in humans. A head-centered vergence-spherical representation of foveated target position can be generated by two stages of opponent processing that combine corollary discharges of outflow movement signals to the two eyes. Sums and differences of opponent signals define angular and vergence coordinates, respectively. The head-centered representation interacts with a binocular visual representation of non-foveated target position to learn a visuomotor representation of both foveated and non-foveated target position that is capable of commanding yoked eye movementes. This head-centered vector representation also interacts with representations of neck movement commands to learn a body-centered estimate of target position that is capable of commanding coordinated arm movements. Learning occurs during head movements made while gaze remains fixed on a foveated target. An initial estimate is stored and a VOR-mediated gating signal prevents the stored estimate from being reset during a gaze-maintaining head movement. As the head moves, new estimates arc compared with the stored estimate to compute difference vectors which act as error signals that drive the learning process, as well as control the on-line merging of multimodal information.
Resumo:
This paper describes a self-organizing neural model for eye-hand coordination. Called the DIRECT model, it embodies a solution of the classical motor equivalence problem. Motor equivalence computations allow humans and other animals to flexibly employ an arm with more degrees of freedom than the space in which it moves to carry out spatially defined tasks under conditions that may require novel joint configurations. During a motor babbling phase, the model endogenously generates movement commands that activate the correlated visual, spatial, and motor information that are used to learn its internal coordinate transformations. After learning occurs, the model is capable of controlling reaching movements of the arm to prescribed spatial targets using many different combinations of joints. When allowed visual feedback, the model can automatically perform, without additional learning, reaches with tools of variable lengths, with clamped joints, with distortions of visual input by a prism, and with unexpected perturbations. These compensatory computations occur within a single accurate reaching movement. No corrective movements are needed. Blind reaches using internal feedback have also been simulated. The model achieves its competence by transforming visual information about target position and end effector position in 3-D space into a body-centered spatial representation of the direction in 3-D space that the end effector must move to contact the target. The spatial direction vector is adaptively transformed into a motor direction vector, which represents the joint rotations that move the end effector in the desired spatial direction from the present arm configuration. Properties of the model are compared with psychophysical data on human reaching movements, neurophysiological data on the tuning curves of neurons in the monkey motor cortex, and alternative models of movement control.