19 resultados para Programmazione video-giochi, iOS, Game Engine, Cocos2D
em Boston University Digital Common
Resumo:
Handshape is a key articulatory parameter in sign language, and thus handshape recognition from signing video is essential for sign recognition and retrieval. Handshape transitions within monomorphemic lexical signs (the largest class of signs in signed languages) are governed by phonological rules. For example, such transitions normally involve either closing or opening of the hand (i.e., to exclusively use either folding or unfolding of the palm and one or more fingers). Furthermore, akin to allophonic variations in spoken languages, both inter- and intra- signer variations in the production of specific handshapes are observed. We propose a Bayesian network formulation to exploit handshape co-occurrence constraints, also utilizing information about allophonic variations to aid in handshape recognition. We propose a fast non-rigid image alignment method to gain improved robustness to handshape appearance variations during computation of observation likelihoods in the Bayesian network. We evaluate our handshape recognition approach on a large dataset of monomorphemic lexical signs. We demonstrate that leveraging linguistic constraints on handshapes results in improved handshape recognition accuracy. As part of the overall project, we are collecting and preparing for dissemination a large corpus (three thousand signs from three native signers) of American Sign Language (ASL) video. The video have been annotated using SignStream® [Neidle et al.] with labels for linguistic information such as glosses, morphological properties and variations, and start/end handshapes associated with each ASL sign.
Resumo:
A novel approach for real-time skin segmentation in video sequences is described. The approach enables reliable skin segmentation despite wide variation in illumination during tracking. An explicit second order Markov model is used to predict evolution of the skin-color (HSV) histogram over time. Histograms are dynamically updated based on feedback from the current segmentation and predictions of the Markov model. The evolution of the skin-color distribution at each frame is parameterized by translation, scaling and rotation in color space. Consequent changes in geometric parameterization of the distribution are propagated by warping and resampling the histogram. The parameters of the discrete-time dynamic Markov model are estimated using Maximum Likelihood Estimation, and also evolve over time. The accuracy of the new dynamic skin color segmentation algorithm is compared to that obtained via a static color model. Segmentation accuracy is evaluated using labeled ground-truth video sequences taken from staged experiments and popular movies. An overall increase in segmentation accuracy of up to 24% is observed in 17 out of 21 test sequences. In all but one case the skin-color classification rates for our system were higher, with background classification rates comparable to those of the static segmentation.
Resumo:
A system is described that tracks moving objects in a video dataset so as to extract a representation of the objects' 3D trajectories. The system then finds hierarchical clusters of similar trajectories in the video dataset. Objects' motion trajectories are extracted via an EKF formulation that provides each object's 3D trajectory up to a constant factor. To increase accuracy when occlusions occur, multiple tracking hypotheses are followed. For trajectory-based clustering and retrieval, a modified version of edit distance, called longest common subsequence (LCSS) is employed. Similarities are computed between projections of trajectories on coordinate axes. Trajectories are grouped based, using an agglomerative clustering algorithm. To check the validity of the approach, experiments using real data were performed.
Resumo:
In gesture and sign language video sequences, hand motion tends to be rapid, and hands frequently appear in front of each other or in front of the face. Thus, hand location is often ambiguous, and naive color-based hand tracking is insufficient. To improve tracking accuracy, some methods employ a prediction-update framework, but such methods require careful initialization of model parameters, and tend to drift and lose track in extended sequences. In this paper, a temporal filtering framework for hand tracking is proposed that can initialize and reset itself without human intervention. In each frame, simple features like color and motion residue are exploited to identify multiple candidate hand locations. The temporal filter then uses the Viterbi algorithm to select among the candidates from frame to frame. The resulting tracking system can automatically identify video trajectories of unambiguous hand motion, and detect frames where tracking becomes ambiguous because of occlusions or overlaps. Experiments on video sequences of several hundred frames in duration demonstrate the system's ability to track hands robustly, to detect and handle tracking ambiguities, and to extract the trajectories of unambiguous hand motion.
Resumo:
Hand signals are commonly used in applications such as giving instructions to a pilot for airplane take off or direction of a crane operator by a foreman on the ground. A new algorithm for recognizing hand signals from a single camera is proposed. Typically, tracked 2D feature positions of hand signals are matched to 2D training images. In contrast, our approach matches the 2D feature positions to an archive of 3D motion capture sequences. The method avoids explicit reconstruction of the 3D articulated motion from 2D image features. Instead, the matching between the 2D and 3D sequence is done by backprojecting the 3D motion capture data onto 2D. Experiments demonstrate the effectiveness of the approach in an example application: recognizing six classes of basketball referee hand signals in video.
Resumo:
This report describes our attempt to add animation as another data type to be used on the World Wide Web. Our current network infrastructure, the Internet, is incapable of carrying video and audio streams for them to be used on the web for presentation purposes. In contrast, object-oriented animation proves to be efficient in terms of network resource requirements. We defined an animation model to support drawing-based and frame-based animation. We also extended the HyperText Markup Language in order to include this animation mode. BU-NCSA Mosanim, a modified version of the NCSA Mosaic for X(v2.5), is available to demonstrate the concept and potentials of animation in presentations an interactive game playing over the web.
Resumo:
We generalize the well-known pebble game to infinite dag's, and we use this generalization to give new and shorter proofs of results in different areas of computer science (as diverse as "logic of programs" and "formal language theory"). Our applications here include a proof of a theorem due to Salomaa, asserting the existence of a context-free language with infinite index, and a proof of a theorem due to Tiuryn and Erimbetov, asserting that unbounded memory increases the power of logics of programs. The original proofs by Salomaa, Tiuryn, and Erimbetov, are fairly technical. The proofs by Tiuryn and Erimbetov also involve advanced techniques of model theory, namely, back-and-forth constructions based on a variant of Ehrenfeucht-Fraisse games. By contrast, our proofs are not only shorter, but also elementary. All we need is essentially finite induction and, in the case of the Tiuryn-Erimbetov result, the compactness and completeness of first-order logic.
Resumo:
A new region-based approach to nonrigid motion tracking is described. Shape is defined in terms of a deformable triangular mesh that captures object shape plus a color texture map that captures object appearance. Photometric variations are also modeled. Nonrigid shape registration and motion tracking are achieved by posing the problem as an energy-based, robust minimization procedure. The approach provides robustness to occlusions, wrinkles, shadows, and specular highlights. The formulation is tailored to take advantage of texture mapping hardware available in many workstations, PC's, and game consoles. This enables nonrigid tracking at speeds approaching video rate.
Resumo:
ImageRover is a search by image content navigation tool for the world wide web. The staggering size of the WWW dictates certain strategies and algorithms for image collection, digestion, indexing, and user interface. This paper describes two key components of the ImageRover strategy: image digestion and relevance feedback. Image digestion occurs during image collection; robots digest the images they find, computing image decompositions and indices, and storing this extracted information in vector form for searches based on image content. Relevance feedback occurs during index search; users can iteratively guide the search through the selection of relevant examples. ImageRover employs a novel relevance feedback algorithm to determine the weighted combination of image similarity metrics appropriate for a particular query. ImageRover is available and running on the web site.
Resumo:
The advent of virtualization and cloud computing technologies necessitates the development of effective mechanisms for the estimation and reservation of resources needed by content providers to deliver large numbers of video-on-demand (VOD) streams through the cloud. Unfortunately, capacity planning for the QoS-constrained delivery of a large number of VOD streams is inherently difficult as VBR encoding schemes exhibit significant bandwidth variability. In this paper, we present a novel resource management scheme to make such allocation decisions using a mixture of per-stream reservations and an aggregate reservation, shared across all streams to accommodate peak demands. The shared reservation provides capacity slack that enables statistical multiplexing of peak rates, while assuring analytically bounded frame-drop probabilities, which can be adjusted by trading off buffer space (and consequently delay) and bandwidth. Our two-tiered bandwidth allocation scheme enables the delivery of any set of streams with less bandwidth (or equivalently with higher link utilization) than state-of-the-art deterministic smoothing approaches. The algorithm underlying our proposed frame-work uses three per-stream parameters and is linear in the number of servers, making it particularly well suited for use in an on-line setting. We present results from extensive trace-driven simulations, which confirm the efficiency of our scheme especially for small buffer sizes and delay bounds, and which underscore the significant realizable bandwidth savings, typically yielding losses that are an order of magnitude or more below our analytically derived bounds.
Resumo:
Locating hands in sign language video is challenging due to a number of factors. Hand appearance varies widely across signers due to anthropometric variations and varying levels of signer proficiency. Video can be captured under varying illumination, camera resolutions, and levels of scene clutter, e.g., high-res video captured in a studio vs. low-res video gathered by a web cam in a user’s home. Moreover, the signers’ clothing varies, e.g., skin-toned clothing vs. contrasting clothing, short-sleeved vs. long-sleeved shirts, etc. In this work, the hand detection problem is addressed in an appearance matching framework. The Histogram of Oriented Gradient (HOG) based matching score function is reformulated to allow non-rigid alignment between pairs of images to account for hand shape variation. The resulting alignment score is used within a Support Vector Machine hand/not-hand classifier for hand detection. The new matching score function yields improved performance (in ROC area and hand detection rate) over the Vocabulary Guided Pyramid Match Kernel (VGPMK) and the traditional, rigid HOG distance on American Sign Language video gestured by expert signers. The proposed match score function is computationally less expensive (for training and testing), has fewer parameters and is less sensitive to parameter settings than VGPMK. The proposed detector works well on test sequences from an inexpert signer in a non-studio setting with cluttered background.
Resumo:
Establishing correspondences among object instances is still challenging in multi-camera surveillance systems, especially when the cameras’ fields of view are non-overlapping. Spatiotemporal constraints can help in solving the correspondence problem but still leave a wide margin of uncertainty. One way to reduce this uncertainty is to use appearance information about the moving objects in the site. In this paper we present the preliminary results of a new method that can capture salient appearance characteristics at each camera node in the network. A Latent Dirichlet Allocation (LDA) model is created and maintained at each node in the camera network. Each object is encoded in terms of the LDA bag-of-words model for appearance. The encoded appearance is then used to establish probable matching across cameras. Preliminary experiments are conducted on a dataset of 20 individuals and comparison against Madden’s I-MCHR is reported.
Resumo:
A vision based technique for non-rigid control is presented that can be used for animation and video game applications. The user grasps a soft, squishable object in front of a camera that can be moved and deformed in order to specify motion. Active Blobs, a non-rigid tracking technique is used to recover the position, rotation and non-rigid deformations of the object. The resulting transformations can be applied to a texture mapped mesh, thus allowing the user to control it interactively. Our use of texture mapping hardware allows us to make the system responsive enough for interactive animation and video game character control.
Resumo:
Dynamic service aggregation techniques can exploit skewed access popularity patterns to reduce the costs of building interactive VoD systems. These schemes seek to cluster and merge users into single streams by bridging the temporal skew between them, thus improving server and network utilization. Rate adaptation and secondary content insertion are two such schemes. In this paper, we present and evaluate an optimal scheduling algorithm for inserting secondary content in this scenario. The algorithm runs in polynomial time, and is optimal with respect to the total bandwidth usage over the merging interval. We present constraints on content insertion which make the overall QoS of the delivered stream acceptable, and show how our algorithm can satisfy these constraints. We report simulation results which quantify the excellent gains due to content insertion. We discuss dynamic scenarios with user arrivals and interactions, and show that content insertion reduces the channel bandwidth requirement to almost half. We also discuss differentiated service techniques, such as N-VoD and premium no-advertisement service, and show how our algorithm can support these as well.
Resumo:
We introduce a view-point invariant representation of moving object trajectories that can be used in video database applications. It is assumed that trajectories lie on a surface that can be locally approximated with a plane. Raw trajectory data is first locally approximated with a cubic spline via least squares fitting. For each sampled point of the obtained curve, a projective invariant feature is computed using a small number of points in its neighborhood. The resulting sequence of invariant features computed along the entire trajectory forms the view invariant descriptor of the trajectory itself. Time parametrization has been exploited to compute cross ratios without ambiguity due to point ordering. Similarity between descriptors of different trajectories is measured with a distance that takes into account the statistical properties of the cross ratio, and its symmetry with respect to the point at infinity. In experiments, an overall correct classification rate of about 95% has been obtained on a dataset of 58 trajectories of players in soccer video, and an overall correct classification rate of about 80% has been obtained on matching partial segments of trajectories collected from two overlapping views of outdoor scenes with moving people and cars.