431 resultados para Technical Report, Computer Science
Resumo:
We introduce a method for recovering the spatial and temporal alignment between two or more views of objects moving over a ground plane. Existing approaches either assume that the streams are globally synchronized, so that only solving the spatial alignment is needed, or that the temporal misalignment is small enough so that exhaustive search can be performed. In contrast, our approach can recover both the spatial and temporal alignment. We compute for each trajectory a number of interesting segments, and we use their description to form putative matches between trajectories. Each pair of corresponding interesting segments induces a temporal alignment, and defines an interval of common support across two views of an object that is used to recover the spatial alignment. Interesting segments and their descriptors are defined using algebraic projective invariants measured along the trajectories. Similarity between interesting segments is computed taking into account the statistics of such invariants. Candidate alignment parameters are verified checking the consistency, in terms of the symmetric transfer error, of all the putative pairs of corresponding interesting segments. Experiments are conducted with two different sets of data, one with two views of an outdoor scene featuring moving people and cars, and one with four views of a laboratory sequence featuring moving radio-controlled cars.
Resumo:
Moving cameras are needed for a wide range of applications in robotics, vehicle systems, surveillance, etc. However, many foreground object segmentation methods reported in the literature are unsuitable for such settings; these methods assume that the camera is fixed and the background changes slowly, and are inadequate for segmenting objects in video if there is significant motion of the camera or background. To address this shortcoming, a new method for segmenting foreground objects is proposed that utilizes binocular video. The method is demonstrated in the application of tracking and segmenting people in video who are approximately facing the binocular camera rig. Given a stereo image pair, the system first tries to find faces. Starting at each face, the region containing the person is grown by merging regions from an over-segmented color image. The disparity map is used to guide this merging process. The system has been implemented on a consumer-grade PC, and tested on video sequences of people indoors obtained from a moving camera rig. As can be expected, the proposed method works well in situations where other foreground-background segmentation methods typically fail. We believe that this superior performance is partly due to the use of object detection to guide region merging in disparity/color foreground segmentation, and partly due to the use of disparity information available with a binocular rig, in contrast with most previous methods that assumed monocular sequences.
Resumo:
Nearest neighbor classifiers are simple to implement, yet they can model complex non-parametric distributions, and provide state-of-the-art recognition accuracy in OCR databases. At the same time, they may be too slow for practical character recognition, especially when they rely on similarity measures that require computationally expensive pairwise alignments between characters. This paper proposes an efficient method for computing an approximate similarity score between two characters based on their exact alignment to a small number of prototypes. The proposed method is applied to both online and offline character recognition, where similarity is based on widely used and computationally expensive alignment methods, i.e., Dynamic Time Warping and the Hungarian method respectively. In both cases significant recognition speedup is obtained at the expense of only a minor increase in recognition error.
Resumo:
Gesture spotting is the challenging task of locating the start and end frames of the video stream that correspond to a gesture of interest, while at the same time rejecting non-gesture motion patterns. This paper proposes a new gesture spotting and recognition algorithm that is based on the continuous dynamic programming (CDP) algorithm, and runs in real-time. To make gesture spotting efficient a pruning method is proposed that allows the system to evaluate a relatively small number of hypotheses compared to CDP. Pruning is implemented by a set of model-dependent classifiers, that are learned from training examples. To make gesture spotting more accurate a subgesture reasoning process is proposed that models the fact that some gesture models can falsely match parts of other longer gestures. In our experiments, the proposed method with pruning and subgesture modeling is an order of magnitude faster and 18% more accurate compared to the original CDP algorithm.
Resumo:
This paper proposes a method for detecting shapes of variable structure in images with clutter. The term "variable structure" means that some shape parts can be repeated an arbitrary number of times, some parts can be optional, and some parts can have several alternative appearances. The particular variation of the shape structure that occurs in a given image is not known a priori. Existing computer vision methods, including deformable model methods, were not designed to detect shapes of variable structure; they may only be used to detect shapes that can be decomposed into a fixed, a priori known, number of parts. The proposed method can handle both variations in shape structure and variations in the appearance of individual shape parts. A new class of shape models is introduced, called Hidden State Shape Models, that can naturally represent shapes of variable structure. A detection algorithm is described that finds instances of such shapes in images with large amounts of clutter by finding globally optimal correspondences between image features and shape models. Experiments with real images demonstrate that our method can localize plant branches that consist of an a priori unknown number of leaves and can detect hands more accurately than a hand detector based on the chamfer distance.
Resumo:
Nearest neighbor search is commonly employed in face recognition but it does not scale well to large dataset sizes. A strategy to combine rejection classifiers into a cascade for face identification is proposed in this paper. A rejection classifier for a pair of classes is defined to reject at least one of the classes with high confidence. These rejection classifiers are able to share discriminants in feature space and at the same time have high confidence in the rejection decision. In the face identification problem, it is possible that a pair of known individual faces are very dissimilar. It is very unlikely that both of them are close to an unknown face in the feature space. Hence, only one of them needs to be considered. Using a cascade structure of rejection classifiers, the scope of nearest neighbor search can be reduced significantly. Experiments on Face Recognition Grand Challenge (FRGC) version 1 data demonstrate that the proposed method achieves significant speed up and an accuracy comparable with the brute force Nearest Neighbor method. In addition, a graph cut based clustering technique is employed to demonstrate that the pairwise separability of these rejection classifiers is capable of semantic grouping.
Resumo:
Facial features play an important role in expressing grammatical information in signed languages, including American Sign Language(ASL). Gestures such as raising or furrowing the eyebrows are key indicators of constructions such as yes-no questions. Periodic head movements (nods and shakes) are also an essential part of the expression of syntactic information, such as negation (associated with a side-to-side headshake). Therefore, identification of these facial gestures is essential to sign language recognition. One problem with detection of such grammatical indicators is occlusion recovery. If the signer's hand blocks his/her eyebrows during production of a sign, it becomes difficult to track the eyebrows. We have developed a system to detect such grammatical markers in ASL that recovers promptly from occlusion. Our system detects and tracks evolving templates of facial features, which are based on an anthropometric face model, and interprets the geometric relationships of these templates to identify grammatical markers. It was tested on a variety of ASL sentences signed by various Deaf native signers and detected facial gestures used to express grammatical information, such as raised and furrowed eyebrows as well as headshakes.
Resumo:
A learning based framework is proposed for estimating human body pose from a single image. Given a differentiable function that maps from pose space to image feature space, the goal is to invert the process: estimate the pose given only image features. The inversion is an ill-posed problem as the inverse mapping is a one to many process. Hence multiple solutions exist, and it is desirable to restrict the solution space to a smaller subset of feasible solutions. For example, not all human body poses are feasible due to anthropometric constraints. Since the space of feasible solutions may not admit a closed form description, the proposed framework seeks to exploit machine learning techniques to learn an approximation that is smoothly parameterized over such a space. One such technique is Gaussian Process Latent Variable Modelling. Scaled conjugate gradient is then used find the best matching pose in the space of feasible solutions when given an input image. The formulation allows easy incorporation of various constraints, e.g. temporal consistency and anthropometric constraints. The performance of the proposed approach is evaluated in the task of upper-body pose estimation from silhouettes and compared with the Specialized Mapping Architecture. The estimation accuracy of the Specialized Mapping Architecture is at least one standard deviation worse than the proposed approach in the experiments with synthetic data. In experiments with real video of humans performing gestures, the proposed approach produces qualitatively better estimation results.
Resumo:
Although cooperation generally increases the amount of resources available to a community of nodes, thus improving individual and collective performance, it also allows for the appearance of potential mistreatment problems through the exposition of one node’s resources to others. We study such concerns by considering a group of independent, rational, self-aware nodes that cooperate using on-line caching algorithms, where the exposed resource is the storage of each node. Motivated by content networking applications – including web caching, CDNs, and P2P – this paper extends our previous work on the off-line version of the problem, which was limited to object replication and was conducted under a game-theoretic framework. We identify and investigate two causes of mistreatment: (1) cache state interactions (due to the cooperative servicing of requests) and (2) the adoption of a common scheme for cache replacement/redirection/admission policies. Using analytic models, numerical solutions of these models, as well as simulation experiments, we show that online cooperation schemes using caching are fairly robust to mistreatment caused by state interactions. When this becomes possible, the interaction through the exchange of miss-streams has to be very intense, making it feasible for the mistreated nodes to detect and react to the exploitation. This robustness ceases to exist when nodes fetch and store objects in response to remote requests, i.e., when they operate as Level-2 caches (or proxies) for other nodes. Regarding mistreatment due to a common scheme, we show that this can easily take place when the “outlier” characteristics of some of the nodes get overlooked. This finding underscores the importance of allowing cooperative caching nodes the flexibility of choosing from a diverse set of schemes to fit the peculiarities of individual nodes. To that end, we outline an emulation-based framework for the development of mistreatment-resilient distributed selfish caching schemes.
Resumo:
A difficulty in lung image registration is accounting for changes in the size of the lungs due to inspiration. We propose two methods for computing a uniform scale parameter for use in lung image registration that account for size change. A scaled rigid-body transformation allows analysis of corresponding lung CT scans taken at different times and can serve as a good low-order transformation to initialize non-rigid registration approaches. Two different features are used to compute the scale parameter. The first method uses lung surfaces. The second uses lung volumes. Both approaches are computationally inexpensive and improve the alignment of lung images over rigid registration. The two methods produce different scale parameters and may highlight different functional information about the lungs.
Resumo:
The Border Gateway Protocol (BGP) is the current inter-domain routing protocol used to exchange reachability information between Autonomous Systems (ASes) in the Internet. BGP supports policy-based routing which allows each AS to independently adopt a set of local policies that specify which routes it accepts and advertises from/to other networks, as well as which route it prefers when more than one route becomes available. However, independently chosen local policies may cause global conflicts, which result in protocol divergence. In this paper, we propose a new algorithm, called Adaptive Policy Management Scheme (APMS), to resolve policy conflicts in a distributed manner. Akin to distributed feedback control systems, each AS independently classifies the state of the network as either conflict-free or potentially-conflicting by observing its local history only (namely, route flaps). Based on the degree of measured conflicts (policy conflict-avoidance vs. -control mode), each AS dynamically adjusts its own path preferences—increasing its preference for observably stable paths over flapping paths. APMS also includes a mechanism to distinguish route flaps due to topology changes, so as not to confuse them with those due to policy conflicts. A correctness and convergence analysis of APMS based on the substability property of chosen paths is presented. Implementation in the SSF network simulator is performed, and simulation results for different performance metrics are presented. The metrics capture the dynamic performance (in terms of instantaneous throughput, delay, routing load, etc.) of APMS and other competing solutions, thus exposing the often neglected aspects of performance.
Resumo:
Particle filtering is a popular method used in systems for tracking human body pose in video. One key difficulty in using particle filtering is caused by the curse of dimensionality: generally a very large number of particles is required to adequately approximate the underlying pose distribution in a high-dimensional state space. Although the number of degrees of freedom in the human body is quite large, in reality, the subset of allowable configurations in state space is generally restricted by human biomechanics, and the trajectories in this allowable subspace tend to be smooth. Therefore, a framework is proposed to learn a low-dimensional representation of the high-dimensional human poses state space. This mapping can be learned using a Gaussian Process Latent Variable Model (GPLVM) framework. One important advantage of the GPLVM framework is that both the mapping to, and mapping from the embedded space are smooth; this facilitates sampling in the low-dimensional space, and samples generated in the low-dimensional embedded space are easily mapped back into the original highdimensional space. Moreover, human body poses that are similar in the original space tend to be mapped close to each other in the embedded space; this property can be exploited when sampling in the embedded space. The proposed framework is tested in tracking 2D human body pose using a Scaled Prismatic Model. Experiments on real life video sequences demonstrate the strength of the approach. In comparison with the Multiple Hypothesis Tracking and the standard Condensation algorithm, the proposed algorithm is able to maintain tracking reliably throughout the long test sequences. It also handles singularity and self occlusion robustly.
Resumo:
Weak references are references that do not prevent the object they point to from being garbage collected. Most realistic languages, including Java, SML/NJ, and OCaml to name a few, have some facility for programming with weak references. Weak references are used in implementing idioms like memoizing functions and hash-consing in order to avoid potential memory leaks. However, the semantics of weak references in many languages are not clearly specified. Without a formal semantics for weak references it becomes impossible to prove the correctness of implementations making use of this feature. Previous work by Hallett and Kfoury extends λgc, a language for modeling garbage collection, to λweak, a similar language with weak references. Using this previously formalized semantics for weak references, we consider two issues related to well-behavedness of programs. Firstly, we provide a new, simpler proof of the well-behavedness of the syntactically restricted fragment of λweak defined previously. Secondly, we give a natural semantic criterion for well-behavedness much broader than the syntactic restriction, which is useful as principle for programming with weak references. Furthermore we extend the result, proved in previously of λgc, which allows one to use type-inference to collect some reachable objects that are never used. We prove that this result holds of our language, and we extend this result to allow the collection of weakly-referenced reachable garbage without incurring the computational overhead sometimes associated with collecting weak bindings (e.g. the need to recompute a memoized function). Lastly we use extend the semantic framework to model the key/value weak references found in Haskell and we prove the Haskell is semantics equivalent to a simpler semantics due to the lack of side-effects in our language.
Resumo:
A weak reference is a reference to an object that is not followed by the pointer tracer when garbage collection is called. That is, a weak reference cannot prevent the object it references from being garbage collected. Weak references remain a troublesome programming feature largely because there is not an accepted, precise semantics that describes their behavior (in fact, we are not aware of any formalization of their semantics). The trouble is that weak references allow reachable objects to be garbage collected, therefore allowing garbage collection to influence the result of a program. Despite this difficulty, weak references continue to be used in practice for reasons related to efficient storage management, and are included in many popular programming languages (Standard ML, Haskell, OCaml, and Java). We give a formal semantics for a calculus called λweak that includes weak references and is derived from Morrisett, Felleisen, and Harper’s λgc. λgc formalizes the notion of garbage collection by means of a rewrite rule. Such a formalization is required to precisely characterize the semantics of weak references. However, the inclusion of a garbage-collection rewrite-rule in a language with weak references introduces non-deterministic evaluation, even if the parameter-passing mechanism is deterministic (call-by-value in our case). This raises the question of confluence for our rewrite system. We discuss natural restrictions under which our rewrite system is confluent, thus guaranteeing uniqueness of program result. We define conditions that allow other garbage collection algorithms to co-exist with our semantics of weak references. We also introduce a polymorphic type system to prove the absence of erroneous program behavior (i.e., the absence of “stuck evaluation”) and a corresponding type inference algorithm. We prove the type system sound and the inference algorithm sound and complete.
Resumo:
The therapeutic effects of playing music are being recognized increasingly in the field of rehabilitation medicine. People with physical disabilities, however, often do not have the motor dexterity needed to play an instrument. We developed a camera-based human-computer interface called "Music Maker" to provide such people with a means to make music by performing therapeutic exercises. Music Maker uses computer vision techniques to convert the movements of a patient's body part, for example, a finger, hand, or foot, into musical and visual feedback using the open software platform EyesWeb. It can be adjusted to a patient's particular therapeutic needs and provides quantitative tools for monitoring the recovery process and assessing therapeutic outcomes. We tested the potential of Music Maker as a rehabilitation tool with six subjects who responded to or created music in various movement exercises. In these proof-of-concept experiments, Music Maker has performed reliably and shown its promise as a therapeutic device.