78 resultados para Stan Brakhage

em Boston University Digital Common


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Handshape is a key articulatory parameter in sign language, and thus handshape recognition from signing video is essential for sign recognition and retrieval. Handshape transitions within monomorphemic lexical signs (the largest class of signs in signed languages) are governed by phonological rules. For example, such transitions normally involve either closing or opening of the hand (i.e., to exclusively use either folding or unfolding of the palm and one or more fingers). Furthermore, akin to allophonic variations in spoken languages, both inter- and intra- signer variations in the production of specific handshapes are observed. We propose a Bayesian network formulation to exploit handshape co-occurrence constraints, also utilizing information about allophonic variations to aid in handshape recognition. We propose a fast non-rigid image alignment method to gain improved robustness to handshape appearance variations during computation of observation likelihoods in the Bayesian network. We evaluate our handshape recognition approach on a large dataset of monomorphemic lexical signs. We demonstrate that leveraging linguistic constraints on handshapes results in improved handshape recognition accuracy. As part of the overall project, we are collecting and preparing for dissemination a large corpus (three thousand signs from three native signers) of American Sign Language (ASL) video. The video have been annotated using SignStream® [Neidle et al.] with labels for linguistic information such as glosses, morphological properties and variations, and start/end handshapes associated with each ASL sign.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We describe our work on shape-based image database search using the technique of modal matching. Modal matching employs a deformable shape decomposition that allows users to select example objects and have the computer efficiently sort the set of objects based on the similarity of their shape. Shapes are compared in terms of the types of nonrigid deformations (differences) that relate them. The modal decomposition provides deformation "control knobs" for flexible matching and thus allows for selecting weighted subsets of shape parameters that are deemed significant for a particular category or context. We demonstrate the utility of this approach for shape comparison in 2-D image databases; however, the general formulation is applicable to signals of any dimensionality.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Nonrigid motion can be described as morphing or blending between extremal shapes, e.g., heart motion can be described as transitioning between the systole and diastole states. Using physically-based modeling techniques, shape similarity can be measured in terms of forces and strain. This provides a physically-based coordinate system in which motion is characterized in terms of physical similarity to a set of extremal shapes. Having such a low-dimensional characterization of nonrigid motion allows for the recognition and the comparison of different types of nonrigid motion.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A novel approach for real-time skin segmentation in video sequences is described. The approach enables reliable skin segmentation despite wide variation in illumination during tracking. An explicit second order Markov model is used to predict evolution of the skin-color (HSV) histogram over time. Histograms are dynamically updated based on feedback from the current segmentation and predictions of the Markov model. The evolution of the skin-color distribution at each frame is parameterized by translation, scaling and rotation in color space. Consequent changes in geometric parameterization of the distribution are propagated by warping and resampling the histogram. The parameters of the discrete-time dynamic Markov model are estimated using Maximum Likelihood Estimation, and also evolve over time. The accuracy of the new dynamic skin color segmentation algorithm is compared to that obtained via a static color model. Segmentation accuracy is evaluated using labeled ground-truth video sequences taken from staged experiments and popular movies. An overall increase in segmentation accuracy of up to 24% is observed in 17 out of 21 test sequences. In all but one case the skin-color classification rates for our system were higher, with background classification rates comparable to those of the static segmentation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A probabilistic, nonlinear supervised learning model is proposed: the Specialized Mappings Architecture (SMA). The SMA employs a set of several forward mapping functions that are estimated automatically from training data. Each specialized function maps certain domains of the input space (e.g., image features) onto the output space (e.g., articulated body parameters). The SMA can model ambiguous, one-to-many mappings that may yield multiple valid output hypotheses. Once learned, the mapping functions generate a set of output hypotheses for a given input via a statistical inference procedure. The SMA inference procedure incorporates an inverse mapping or feedback function in evaluating the likelihood of each of the hypothesis. Possible feedback functions include computer graphics rendering routines that can generate images for given hypotheses. The SMA employs a variant of the Expectation-Maximization algorithm for simultaneous learning of the specialized domains along with the mapping functions, and approximate strategies for inference. The framework is demonstrated in a computer vision system that can estimate the articulated pose parameters of a human’s body or hands, given silhouettes from a single image. The accuracy and stability of the SMA are also tested using synthetic images of human bodies and hands, where ground truth is known.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A new approach is proposed for clustering time-series data. The approach can be used to discover groupings of similar object motions that were observed in a video collection. A finite mixture of hidden Markov models (HMMs) is fitted to the motion data using the expectation-maximization (EM) framework. Previous approaches for HMM-based clustering employ a k-means formulation, where each sequence is assigned to only a single HMM. In contrast, the formulation presented in this paper allows each sequence to belong to more than a single HMM with some probability, and the hard decision about the sequence class membership can be deferred until a later time when such a decision is required. Experiments with simulated data demonstrate the benefit of using this EM-based approach when there is more "overlap" in the processes generating the data. Experiments with real data show the promising potential of HMM-based motion clustering in a number of applications.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A method is proposed that can generate a ranked list of plausible three-dimensional hand configurations that best match an input image. Hand pose estimation is formulated as an image database indexing problem, where the closest matches for an input hand image are retrieved from a large database of synthetic hand images. In contrast to previous approaches, the system can function in the presence of clutter, thanks to two novel clutter-tolerant indexing methods. First, a computationally efficient approximation of the image-to-model chamfer distance is obtained by embedding binary edge images into a high-dimensional Euclide an space. Second, a general-purpose, probabilistic line matching method identifies those line segment correspondences between model and input images that are the least likely to have occurred by chance. The performance of this clutter-tolerant approach is demonstrated in quantitative experiments with hundreds of real hand images.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Estimation of 3D hand pose is useful in many gesture recognition applications, ranging from human-computer interaction to automated recognition of sign languages. In this paper, 3D hand pose estimation is treated as a database indexing problem. Given an input image of a hand, the most similar images in a large database of hand images are retrieved. The hand pose parameters of the retrieved images are used as estimates for the hand pose in the input image. Lipschitz embeddings of edge images into a Euclidean space are used to improve the efficiency of database retrieval. In order to achieve interactive retrieval times, similarity queries are initially performed in this Euclidean space. The paper describes ongoing work that focuses on how to best choose reference images, in order to improve retrieval accuracy.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A method for reconstruction of 3D polygonal models from multiple views is presented. The method uses sampling techniques to construct a texture-mapped semi-regular polygonal mesh of the object in question. Given a set of views and segmentation of the object in each view, constructive solid geometry is used to build a visual hull from silhouette prisms. The resulting polygonal mesh is simplified and subdivided to produce a semi-regular mesh. Regions of model fit inaccuracy are found by projecting the reference images onto the mesh from different views. The resulting error images for each view are used to compute a probability density function, and several points are sampled from it. Along the epipolar lines corresponding to these sampled points, photometric consistency is evaluated. The mesh surface is then pulled towards the regions of higher photometric consistency using free-form deformations. This sampling-based approach produces a photometrically consistent solution in much less time than possible with previous multi-view algorithms given arbitrary camera placement.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An iterative method for reconstructing a 3D polygonal mesh and color texture map from multiple views of an object is presented. In each iteration, the method first estimates a texture map given the current shape estimate. The texture map and its associated residual error image are obtained via maximum a posteriori estimation and reprojection of the multiple views into texture space. Next, the surface shape is adjusted to minimize residual error in texture space. The surface is deformed towards a photometrically-consistent solution via a series of 1D epipolar searches at randomly selected surface points. The texture space formulation has improved computational complexity over standard image-based error approaches, and allows computation of the reprojection error and uncertainty for any point on the surface. Moreover, shape adjustments can be constrained such that the recovered model's silhouette matches those of the input images. Experiments with real world imagery demonstrate the validity of the approach.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The algorithm presented in this paper aims to segment the foreground objects in video (e.g., people) given time-varying, textured backgrounds. Examples of time-varying backgrounds include waves on water, clouds moving, trees waving in the wind, automobile traffic, moving crowds, escalators, etc. We have developed a novel foreground-background segmentation algorithm that explicitly accounts for the non-stationary nature and clutter-like appearance of many dynamic textures. The dynamic texture is modeled by an Autoregressive Moving Average Model (ARMA). A robust Kalman filter algorithm iteratively estimates the intrinsic appearance of the dynamic texture, as well as the regions of the foreground objects. Preliminary experiments with this method have demonstrated promising results.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper introduces BoostMap, a method that can significantly reduce retrieval time in image and video database systems that employ computationally expensive distance measures, metric or non-metric. Database and query objects are embedded into a Euclidean space, in which similarities can be rapidly measured using a weighted Manhattan distance. Embedding construction is formulated as a machine learning task, where AdaBoost is used to combine many simple, 1D embeddings into a multidimensional embedding that preserves a significant amount of the proximity structure in the original space. Performance is evaluated in a hand pose estimation system, and a dynamic gesture recognition system, where the proposed method is used to retrieve approximate nearest neighbors under expensive image and video similarity measures. In both systems, BoostMap significantly increases efficiency, with minimal losses in accuracy. Moreover, the experiments indicate that BoostMap compares favorably with existing embedding methods that have been employed in computer vision and database applications, i.e., FastMap and Bourgain embeddings.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Space carving has emerged as a powerful method for multiview scene reconstruction. Although a wide variety of methods have been proposed, the quality of the reconstruction remains highly-dependent on the photometric consistency measure, and the threshold used to carve away voxels. In this paper, we present a novel photo-consistency measure that is motivated by a multiset variant of the chamfer distance. The new measure is robust to high amounts of within-view color variance and also takes into account the projection angles of back-projected pixels. Another critical issue in space carving is the selection of the photo-consistency threshold used to determine what surface voxels are kept or carved away. In this paper, a reliable threshold selection technique is proposed that examines the photo-consistency values at contour generator points. Contour generators are points that lie on both the surface of the object and the visual hull. To determine the threshold, a percentile ranking of the photo-consistency values of these generator points is used. This improved technique is applicable to a wide variety of photo-consistency measures, including the new measure presented in this paper. Also presented in this paper is a method to choose between photo-consistency measures, and voxel array resolutions prior to carving using receiver operating characteristic (ROC) curves.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In many multi-camera vision systems the effect of camera locations on the task-specific quality of service is ignored. Researchers in Computational Geometry have proposed elegant solutions for some sensor location problem classes. Unfortunately, these solutions utilize unrealistic assumptions about the cameras' capabilities that make these algorithms unsuitable for many real-world computer vision applications: unlimited field of view, infinite depth of field, and/or infinite servo precision and speed. In this paper, the general camera placement problem is first defined with assumptions that are more consistent with the capabilities of real-world cameras. The region to be observed by cameras may be volumetric, static or dynamic, and may include holes that are caused, for instance, by columns or furniture in a room that can occlude potential camera views. A subclass of this general problem can be formulated in terms of planar regions that are typical of building floorplans. Given a floorplan to be observed, the problem is then to efficiently compute a camera layout such that certain task-specific constraints are met. A solution to this problem is obtained via binary optimization over a discrete problem space. In preliminary experiments the performance of the resulting system is demonstrated with different real floorplans.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper introduces an algorithm that uses boosting to learn a distance measure for multiclass k-nearest neighbor classification. Given a family of distance measures as input, AdaBoost is used to learn a weighted distance measure, that is a linear combination of the input measures. The proposed method can be seen both as a novel way to learn a distance measure from data, and as a novel way to apply boosting to multiclass recognition problems, that does not require output codes. In our approach, multiclass recognition of objects is reduced into a single binary recognition task, defined on triples of objects. Preliminary experiments with eight UCI datasets yield no clear winner among our method, boosting using output codes, and k-nn classification using an unoptimized distance measure. Our algorithm did achieve lower error rates in some of the datasets, which indicates that, in some domains, it may lead to better results than existing methods.