21 resultados para Mobile Video
em Boston University Digital Common
Resumo:
Handshape is a key articulatory parameter in sign language, and thus handshape recognition from signing video is essential for sign recognition and retrieval. Handshape transitions within monomorphemic lexical signs (the largest class of signs in signed languages) are governed by phonological rules. For example, such transitions normally involve either closing or opening of the hand (i.e., to exclusively use either folding or unfolding of the palm and one or more fingers). Furthermore, akin to allophonic variations in spoken languages, both inter- and intra- signer variations in the production of specific handshapes are observed. We propose a Bayesian network formulation to exploit handshape co-occurrence constraints, also utilizing information about allophonic variations to aid in handshape recognition. We propose a fast non-rigid image alignment method to gain improved robustness to handshape appearance variations during computation of observation likelihoods in the Bayesian network. We evaluate our handshape recognition approach on a large dataset of monomorphemic lexical signs. We demonstrate that leveraging linguistic constraints on handshapes results in improved handshape recognition accuracy. As part of the overall project, we are collecting and preparing for dissemination a large corpus (three thousand signs from three native signers) of American Sign Language (ASL) video. The video have been annotated using SignStream® [Neidle et al.] with labels for linguistic information such as glosses, morphological properties and variations, and start/end handshapes associated with each ASL sign.
Resumo:
A novel approach for real-time skin segmentation in video sequences is described. The approach enables reliable skin segmentation despite wide variation in illumination during tracking. An explicit second order Markov model is used to predict evolution of the skin-color (HSV) histogram over time. Histograms are dynamically updated based on feedback from the current segmentation and predictions of the Markov model. The evolution of the skin-color distribution at each frame is parameterized by translation, scaling and rotation in color space. Consequent changes in geometric parameterization of the distribution are propagated by warping and resampling the histogram. The parameters of the discrete-time dynamic Markov model are estimated using Maximum Likelihood Estimation, and also evolve over time. The accuracy of the new dynamic skin color segmentation algorithm is compared to that obtained via a static color model. Segmentation accuracy is evaluated using labeled ground-truth video sequences taken from staged experiments and popular movies. An overall increase in segmentation accuracy of up to 24% is observed in 17 out of 21 test sequences. In all but one case the skin-color classification rates for our system were higher, with background classification rates comparable to those of the static segmentation.
Resumo:
A system is described that tracks moving objects in a video dataset so as to extract a representation of the objects' 3D trajectories. The system then finds hierarchical clusters of similar trajectories in the video dataset. Objects' motion trajectories are extracted via an EKF formulation that provides each object's 3D trajectory up to a constant factor. To increase accuracy when occlusions occur, multiple tracking hypotheses are followed. For trajectory-based clustering and retrieval, a modified version of edit distance, called longest common subsequence (LCSS) is employed. Similarities are computed between projections of trajectories on coordinate axes. Trajectories are grouped based, using an agglomerative clustering algorithm. To check the validity of the approach, experiments using real data were performed.
Resumo:
In gesture and sign language video sequences, hand motion tends to be rapid, and hands frequently appear in front of each other or in front of the face. Thus, hand location is often ambiguous, and naive color-based hand tracking is insufficient. To improve tracking accuracy, some methods employ a prediction-update framework, but such methods require careful initialization of model parameters, and tend to drift and lose track in extended sequences. In this paper, a temporal filtering framework for hand tracking is proposed that can initialize and reset itself without human intervention. In each frame, simple features like color and motion residue are exploited to identify multiple candidate hand locations. The temporal filter then uses the Viterbi algorithm to select among the candidates from frame to frame. The resulting tracking system can automatically identify video trajectories of unambiguous hand motion, and detect frames where tracking becomes ambiguous because of occlusions or overlaps. Experiments on video sequences of several hundred frames in duration demonstrate the system's ability to track hands robustly, to detect and handle tracking ambiguities, and to extract the trajectories of unambiguous hand motion.
Resumo:
Hand signals are commonly used in applications such as giving instructions to a pilot for airplane take off or direction of a crane operator by a foreman on the ground. A new algorithm for recognizing hand signals from a single camera is proposed. Typically, tracked 2D feature positions of hand signals are matched to 2D training images. In contrast, our approach matches the 2D feature positions to an archive of 3D motion capture sequences. The method avoids explicit reconstruction of the 3D articulated motion from 2D image features. Instead, the matching between the 2D and 3D sequence is done by backprojecting the 3D motion capture data onto 2D. Experiments demonstrate the effectiveness of the approach in an example application: recognizing six classes of basketball referee hand signals in video.
Resumo:
We discuss the design principles of TCP within the context of heterogeneous wired/wireless networks and mobile networking. We identify three shortcomings in TCP's behavior: (i) the protocol's error detection mechanism, which does not distinguish different types of errors and thus does not suffice for heterogeneous wired/wireless environments, (ii) the error recovery, which is not responsive to the distinctive characteristics of wireless networks such as transient or burst errors due to handoffs and fading channels, and (iii) the protocol strategy, which does not control the tradeoff between performance measures such as goodput and energy consumption, and often entails a wasteful effort of retransmission and energy expenditure. We discuss a solution-framework based on selected research proposals and the associated evaluation criteria for the suggested modifications. We highlight an important angle that did not attract the required attention so far: the need for new performance metrics, appropriate for evaluating the impact of protocol strategies on battery-powered devices.
Resumo:
Personal communication devices are increasingly equipped with sensors for passive monitoring of encounters and surroundings. We envision the emergence of services that enable a community of mobile users carrying such resource-limited devices to query such information at remote locations in the field in which they collectively roam. One approach to implement such a service is directed placement and retrieval (DPR), whereby readings/queries about a specific location are routed to a node responsible for that location. In a mobile, potentially sparse setting, where end-to-end paths are unavailable, DPR is not an attractive solution as it would require the use of delay-tolerant (flooding-based store-carry-forward) routing of both readings and queries, which is inappropriate for applications with data freshness constraints, and which is incompatible with stringent device power/memory constraints. Alternatively, we propose the use of amorphous placement and retrieval (APR), in which routing and field monitoring are integrated through the use of a cache management scheme coupled with an informed exchange of cached samples to diffuse sensory data throughout the network, in such a way that a query answer is likely to be found close to the query origin. We argue that knowledge of the distribution of query targets could be used effectively by an informed cache management policy to maximize the utility of collective storage of all devices. Using a simple analytical model, we show that the use of informed cache management is particularly important when the mobility model results in a non-uniform distribution of users over the field. We present results from extensive simulations which show that in sparsely-connected networks, APR is more cost-effective than DPR, that it provides extra resilience to node failure and packet losses, and that its use of informed cache management yields superior performance.
Resumo:
We consider a mobile sensor network monitoring a spatio-temporal field. Given limited cache sizes at the sensor nodes, the goal is to develop a distributed cache management algorithm to efficiently answer queries with a known probability distribution over the spatial dimension. First, we propose a novel distributed information theoretic approach in which the nodes locally update their caches based on full knowledge of the space-time distribution of the monitored phenomenon. At each time instant, local decisions are made at the mobile nodes concerning which samples to keep and whether or not a new sample should be acquired at the current location. These decisions account for minimizing an entropic utility function that captures the average amount of uncertainty in queries given the probability distribution of query locations. Second, we propose a different correlation-based technique, which only requires knowledge of the second-order statistics, thus relaxing the stringent constraint of having a priori knowledge of the query distribution, while significantly reducing the computational overhead. It is shown that the proposed approaches considerably improve the average field estimation error by maintaining efficient cache content. It is further shown that the correlation-based technique is robust to model mismatch in case of imperfect knowledge of the underlying generative correlation structure.
Resumo:
The advent of virtualization and cloud computing technologies necessitates the development of effective mechanisms for the estimation and reservation of resources needed by content providers to deliver large numbers of video-on-demand (VOD) streams through the cloud. Unfortunately, capacity planning for the QoS-constrained delivery of a large number of VOD streams is inherently difficult as VBR encoding schemes exhibit significant bandwidth variability. In this paper, we present a novel resource management scheme to make such allocation decisions using a mixture of per-stream reservations and an aggregate reservation, shared across all streams to accommodate peak demands. The shared reservation provides capacity slack that enables statistical multiplexing of peak rates, while assuring analytically bounded frame-drop probabilities, which can be adjusted by trading off buffer space (and consequently delay) and bandwidth. Our two-tiered bandwidth allocation scheme enables the delivery of any set of streams with less bandwidth (or equivalently with higher link utilization) than state-of-the-art deterministic smoothing approaches. The algorithm underlying our proposed frame-work uses three per-stream parameters and is linear in the number of servers, making it particularly well suited for use in an on-line setting. We present results from extensive trace-driven simulations, which confirm the efficiency of our scheme especially for small buffer sizes and delay bounds, and which underscore the significant realizable bandwidth savings, typically yielding losses that are an order of magnitude or more below our analytically derived bounds.
Resumo:
The pervasiveness of personal computing platforms offers an unprecedented opportunity to deploy large-scale services that are distributed over wide physical spaces. Two major challenges face the deployment of such services: the often resource-limited nature of these platforms, and the necessity of preserving the autonomy of the owner of these devices. These challenges preclude using centralized control and preclude considering services that are subject to performance guarantees. To that end, this thesis advances a number of new distributed resource management techniques that are shown to be effective in such settings, focusing on two application domains: distributed Field Monitoring Applications (FMAs), and Message Delivery Applications (MDAs). In the context of FMA, this thesis presents two techniques that are well-suited to the fairly limited storage and power resources of autonomously mobile sensor nodes. The first technique relies on amorphous placement of sensory data through the use of novel storage management and sample diffusion techniques. The second approach relies on an information-theoretic framework to optimize local resource management decisions. Both approaches are proactive in that they aim to provide nodes with a view of the monitored field that reflects the characteristics of queries over that field, enabling them to handle more queries locally, and thus reduce communication overheads. Then, this thesis recognizes node mobility as a resource to be leveraged, and in that respect proposes novel mobility coordination techniques for FMAs and MDAs. Assuming that node mobility is governed by a spatio-temporal schedule featuring some slack, this thesis presents novel algorithms of various computational complexities to orchestrate the use of this slack to improve the performance of supported applications. The findings in this thesis, which are supported by analysis and extensive simulations, highlight the importance of two general design principles for distributed systems. First, a-priori knowledge (e.g., about the target phenomena of FMAs and/or the workload of either FMAs or DMAs) could be used effectively for local resource management. Second, judicious leverage and coordination of node mobility could lead to significant performance gains for distributed applications deployed over resource-impoverished infrastructures.
Resumo:
Recent advances in processor speeds, mobile communications and battery life have enabled computers to evolve from completely wired to completely mobile. In the most extreme case, all nodes are mobile and communication takes place at available opportunities – using both traditional communication infrastructure as well as the mobility of intermediate nodes. These are mobile opportunistic networks. Data communication in such networks is a difficult problem, because of the dynamic underlying topology, the scarcity of network resources and the lack of global information. Establishing end-to-end routes in such networks is usually not feasible. Instead a store-and-carry forwarding paradigm is better suited for such networks. This dissertation describes and analyzes algorithms for forwarding of messages in such networks. In order to design effective forwarding algorithms for mobile opportunistic networks, we start by first building an understanding of the set of all paths between nodes, which represent the available opportunities for any forwarding algorithm. Relying on real measurements, we enumerate paths between nodes and uncover what we refer to as the path explosion effect. The term path explosion refers to the fact that the number of paths between a randomly selected pair of nodes increases exponentially with time. We draw from the theory of epidemics to model and explain the path explosion effect. This is the first contribution of the thesis, and is a key observation that underlies subsequent results. Our second contribution is the study of forwarding algorithms. For this, we rely on trace driven simulations of different algorithms that span a range of design dimensions. We compare the performance (success rate and average delay) of these algorithms. We make the surprising observation that most algorithms we consider have roughly similar performance. We explain this result in light of the path explosion phenomenon. While the performance of most algorithms we studied was roughly the same, these algorithms differed in terms of cost. This prompted us to focus on designing algorithms with the explicit intent of reducing costs. For this, we cast the problem of forwarding as an optimal stopping problem. Our third main contribution is the design of strategies based on optimal stopping principles which we refer to as Delegation schemes. Our analysis shows that using a delegation scheme reduces cost over naive forwarding by a factor of O(√N), where N is the number of nodes in the network. We further validate this result on real traces, where the cost reduction observed is even greater. Our results so far include a key assumption, which is unbounded buffers on nodes. Next, we relax this assumption, so that the problem shifts to one of prioritization of messages for transmission and dropping. Our fourth contribution is the study of message prioritization schemes, combined with forwarding. Our main result is that one achieves higher performance by assigning higher priorities to young messages in the network. We again interpret this result in light of the path explosion effect.
Resumo:
Locating hands in sign language video is challenging due to a number of factors. Hand appearance varies widely across signers due to anthropometric variations and varying levels of signer proficiency. Video can be captured under varying illumination, camera resolutions, and levels of scene clutter, e.g., high-res video captured in a studio vs. low-res video gathered by a web cam in a user’s home. Moreover, the signers’ clothing varies, e.g., skin-toned clothing vs. contrasting clothing, short-sleeved vs. long-sleeved shirts, etc. In this work, the hand detection problem is addressed in an appearance matching framework. The Histogram of Oriented Gradient (HOG) based matching score function is reformulated to allow non-rigid alignment between pairs of images to account for hand shape variation. The resulting alignment score is used within a Support Vector Machine hand/not-hand classifier for hand detection. The new matching score function yields improved performance (in ROC area and hand detection rate) over the Vocabulary Guided Pyramid Match Kernel (VGPMK) and the traditional, rigid HOG distance on American Sign Language video gestured by expert signers. The proposed match score function is computationally less expensive (for training and testing), has fewer parameters and is less sensitive to parameter settings than VGPMK. The proposed detector works well on test sequences from an inexpert signer in a non-studio setting with cluttered background.
Resumo:
Establishing correspondences among object instances is still challenging in multi-camera surveillance systems, especially when the cameras’ fields of view are non-overlapping. Spatiotemporal constraints can help in solving the correspondence problem but still leave a wide margin of uncertainty. One way to reduce this uncertainty is to use appearance information about the moving objects in the site. In this paper we present the preliminary results of a new method that can capture salient appearance characteristics at each camera node in the network. A Latent Dirichlet Allocation (LDA) model is created and maintained at each node in the camera network. Each object is encoded in terms of the LDA bag-of-words model for appearance. The encoded appearance is then used to establish probable matching across cameras. Preliminary experiments are conducted on a dataset of 20 individuals and comparison against Madden’s I-MCHR is reported.
Resumo:
Dynamic service aggregation techniques can exploit skewed access popularity patterns to reduce the costs of building interactive VoD systems. These schemes seek to cluster and merge users into single streams by bridging the temporal skew between them, thus improving server and network utilization. Rate adaptation and secondary content insertion are two such schemes. In this paper, we present and evaluate an optimal scheduling algorithm for inserting secondary content in this scenario. The algorithm runs in polynomial time, and is optimal with respect to the total bandwidth usage over the merging interval. We present constraints on content insertion which make the overall QoS of the delivered stream acceptable, and show how our algorithm can satisfy these constraints. We report simulation results which quantify the excellent gains due to content insertion. We discuss dynamic scenarios with user arrivals and interactions, and show that content insertion reduces the channel bandwidth requirement to almost half. We also discuss differentiated service techniques, such as N-VoD and premium no-advertisement service, and show how our algorithm can support these as well.
Resumo:
We introduce a view-point invariant representation of moving object trajectories that can be used in video database applications. It is assumed that trajectories lie on a surface that can be locally approximated with a plane. Raw trajectory data is first locally approximated with a cubic spline via least squares fitting. For each sampled point of the obtained curve, a projective invariant feature is computed using a small number of points in its neighborhood. The resulting sequence of invariant features computed along the entire trajectory forms the view invariant descriptor of the trajectory itself. Time parametrization has been exploited to compute cross ratios without ambiguity due to point ordering. Similarity between descriptors of different trajectories is measured with a distance that takes into account the statistical properties of the cross ratio, and its symmetry with respect to the point at infinity. In experiments, an overall correct classification rate of about 95% has been obtained on a dataset of 58 trajectories of players in soccer video, and an overall correct classification rate of about 80% has been obtained on matching partial segments of trajectories collected from two overlapping views of outdoor scenes with moving people and cars.