78 resultados para Stan


Relevância:

10.00% 10.00%

Publicador:

Resumo:

An improved method for deformable shape-based image indexing and retrieval is described. A pre-computed index tree is used to improve the speed of our previously reported on-line model fitting method; simple shape features are used as keys in a pre-generated index tree of model instances. In addition, a coarse to fine indexing scheme is used at different levels of the tree to further improve speed while maintaining matching accuracy. Experimental results show that the speedup is significant, while accuracy of shape-based indexing is maintained. A method for shape population-based retrieval is also described. The method allows query formulation based on the population distributions of shapes in each image. Results of population-based image queries for a database of blood cell micrographs are shown.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A non-linear supervised learning architecture, the Specialized Mapping Architecture (SMA) and its application to articulated body pose reconstruction from single monocular images is described. The architecture is formed by a number of specialized mapping functions, each of them with the purpose of mapping certain portions (connected or not) of the input space, and a feedback matching process. A probabilistic model for the architecture is described along with a mechanism for learning its parameters. The learning problem is approached using a maximum likelihood estimation framework; we present Expectation Maximization (EM) algorithms for two different instances of the likelihood probability. Performance is characterized by estimating human body postures from low level visual features, showing promising results.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A system for recovering 3D hand pose from monocular color sequences is proposed. The system employs a non-linear supervised learning framework, the specialized mappings architecture (SMA), to map image features to likely 3D hand poses. The SMA's fundamental components are a set of specialized forward mapping functions, and a single feedback matching function. The forward functions are estimated directly from training data, which in our case are examples of hand joint configurations and their corresponding visual features. The joint angle data in the training set is obtained via a CyberGlove, a glove with 22 sensors that monitor the angular motions of the palm and fingers. In training, the visual features are generated using a computer graphics module that renders the hand from arbitrary viewpoints given the 22 joint angles. We test our system both on synthetic sequences and on sequences taken with a color camera. The system automatically detects and tracks both hands of the user, calculates the appropriate features, and estimates the 3D hand joint angles from those features. Results are encouraging given the complexity of the task.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Standard structure from motion algorithms recover 3D structure of points. If a surface representation is desired, for example a piece-wise planar representation, then a two-step procedure typically follows: in the first step the plane-membership of points is first determined manually, and in a subsequent step planes are fitted to the sets of points thus determined, and their parameters are recovered. This paper presents an approach for automatically segmenting planar structures from a sequence of images, and simultaneously estimating their parameters. In the proposed approach the plane-membership of points is determined automatically, and the planar structure parameters are recovered directly in the algorithm rather than indirectly in a post-processing stage. Simulated and real experimental results show the efficacy of this approach.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An improved method for deformable shape-based image segmentation is described. Image regions are merged together and/or split apart, based on their agreement with an a priori distribution on the global deformation parameters for a shape template. The quality of a candidate region merging is evaluated by a cost measure that includes: homogeneity of image properties within the combined region, degree of overlap with a deformed shape model, and a deformation likelihood term. Perceptually-motivated criteria are used to determine where/how to split regions, based on the local shape properties of the region group's bounding contour. A globally consistent interpretation is determined in part by the minimum description length principle. Experiments show that the model-based splitting strategy yields a significant improvement in segmention over a method that uses merging alone.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We propose a new technique for efficiently delivering popular content from information repositories with bounded file caches. Our strategy relies on the use of fast erasure codes (a.k.a. forward error correcting codes) to generate encodings of popular files, of which only a small sliding window is cached at any time instant, even to satisfy an unbounded number of asynchronous requests for the file. Our approach capitalizes on concurrency to maximize sharing of state across different request threads while minimizing cache memory utilization. Additional reduction in resource requirements arises from providing for a lightweight version of the network stack. In this paper, we describe the design and implementation of our Cyclone server as a Linux kernel subsystem.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Nearest neighbor classification using shape context can yield highly accurate results in a number of recognition problems. Unfortunately, the approach can be too slow for practical applications, and thus approximation strategies are needed to make shape context practical. This paper proposes a method for efficient and accurate nearest neighbor classification in non-Euclidean spaces, such as the space induced by the shape context measure. First, a method is introduced for constructing a Euclidean embedding that is optimized for nearest neighbor classification accuracy. Using that embedding, multiple approximations of the underlying non-Euclidean similarity measure are obtained, at different levels of accuracy and efficiency. The approximations are automatically combined to form a cascade classifier, which applies the slower approximations only to the hardest cases. Unlike typical cascade-of-classifiers approaches, that are applied to binary classification problems, our method constructs a cascade for a multiclass problem. Experiments with a standard shape data set indicate that a two-to-three order of magnitude speed up is gained over the standard shape context classifier, with minimal losses in classification accuracy.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A common problem in many types of databases is retrieving the most similar matches to a query object. Finding those matches in a large database can be too slow to be practical, especially in domains where objects are compared using computationally expensive similarity (or distance) measures. This paper proposes a novel method for approximate nearest neighbor retrieval in such spaces. Our method is embedding-based, meaning that it constructs a function that maps objects into a real vector space. The mapping preserves a large amount of the proximity structure of the original space, and it can be used to rapidly obtain a short list of likely matches to the query. The main novelty of our method is that it constructs, together with the embedding, a query-sensitive distance measure that should be used when measuring distances in the vector space. The term "query-sensitive" means that the distance measure changes depending on the current query object. We report experiments with an image database of handwritten digits, and a time-series database. In both cases, the proposed method outperforms existing state-of-the-art embedding methods, meaning that it provides significantly better trade-offs between efficiency and retrieval accuracy.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We introduce a view-point invariant representation of moving object trajectories that can be used in video database applications. It is assumed that trajectories lie on a surface that can be locally approximated with a plane. Raw trajectory data is first locally approximated with a cubic spline via least squares fitting. For each sampled point of the obtained curve, a projective invariant feature is computed using a small number of points in its neighborhood. The resulting sequence of invariant features computed along the entire trajectory forms the view invariant descriptor of the trajectory itself. Time parametrization has been exploited to compute cross ratios without ambiguity due to point ordering. Similarity between descriptors of different trajectories is measured with a distance that takes into account the statistical properties of the cross ratio, and its symmetry with respect to the point at infinity. In experiments, an overall correct classification rate of about 95% has been obtained on a dataset of 58 trajectories of players in soccer video, and an overall correct classification rate of about 80% has been obtained on matching partial segments of trajectories collected from two overlapping views of outdoor scenes with moving people and cars.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We introduce a method for recovering the spatial and temporal alignment between two or more views of objects moving over a ground plane. Existing approaches either assume that the streams are globally synchronized, so that only solving the spatial alignment is needed, or that the temporal misalignment is small enough so that exhaustive search can be performed. In contrast, our approach can recover both the spatial and temporal alignment. We compute for each trajectory a number of interesting segments, and we use their description to form putative matches between trajectories. Each pair of corresponding interesting segments induces a temporal alignment, and defines an interval of common support across two views of an object that is used to recover the spatial alignment. Interesting segments and their descriptors are defined using algebraic projective invariants measured along the trajectories. Similarity between interesting segments is computed taking into account the statistics of such invariants. Candidate alignment parameters are verified checking the consistency, in terms of the symmetric transfer error, of all the putative pairs of corresponding interesting segments. Experiments are conducted with two different sets of data, one with two views of an outdoor scene featuring moving people and cars, and one with four views of a laboratory sequence featuring moving radio-controlled cars.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Moving cameras are needed for a wide range of applications in robotics, vehicle systems, surveillance, etc. However, many foreground object segmentation methods reported in the literature are unsuitable for such settings; these methods assume that the camera is fixed and the background changes slowly, and are inadequate for segmenting objects in video if there is significant motion of the camera or background. To address this shortcoming, a new method for segmenting foreground objects is proposed that utilizes binocular video. The method is demonstrated in the application of tracking and segmenting people in video who are approximately facing the binocular camera rig. Given a stereo image pair, the system first tries to find faces. Starting at each face, the region containing the person is grown by merging regions from an over-segmented color image. The disparity map is used to guide this merging process. The system has been implemented on a consumer-grade PC, and tested on video sequences of people indoors obtained from a moving camera rig. As can be expected, the proposed method works well in situations where other foreground-background segmentation methods typically fail. We believe that this superior performance is partly due to the use of object detection to guide region merging in disparity/color foreground segmentation, and partly due to the use of disparity information available with a binocular rig, in contrast with most previous methods that assumed monocular sequences.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Nearest neighbor classifiers are simple to implement, yet they can model complex non-parametric distributions, and provide state-of-the-art recognition accuracy in OCR databases. At the same time, they may be too slow for practical character recognition, especially when they rely on similarity measures that require computationally expensive pairwise alignments between characters. This paper proposes an efficient method for computing an approximate similarity score between two characters based on their exact alignment to a small number of prototypes. The proposed method is applied to both online and offline character recognition, where similarity is based on widely used and computationally expensive alignment methods, i.e., Dynamic Time Warping and the Hungarian method respectively. In both cases significant recognition speedup is obtained at the expense of only a minor increase in recognition error.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Gesture spotting is the challenging task of locating the start and end frames of the video stream that correspond to a gesture of interest, while at the same time rejecting non-gesture motion patterns. This paper proposes a new gesture spotting and recognition algorithm that is based on the continuous dynamic programming (CDP) algorithm, and runs in real-time. To make gesture spotting efficient a pruning method is proposed that allows the system to evaluate a relatively small number of hypotheses compared to CDP. Pruning is implemented by a set of model-dependent classifiers, that are learned from training examples. To make gesture spotting more accurate a subgesture reasoning process is proposed that models the fact that some gesture models can falsely match parts of other longer gestures. In our experiments, the proposed method with pruning and subgesture modeling is an order of magnitude faster and 18% more accurate compared to the original CDP algorithm.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper proposes a method for detecting shapes of variable structure in images with clutter. The term "variable structure" means that some shape parts can be repeated an arbitrary number of times, some parts can be optional, and some parts can have several alternative appearances. The particular variation of the shape structure that occurs in a given image is not known a priori. Existing computer vision methods, including deformable model methods, were not designed to detect shapes of variable structure; they may only be used to detect shapes that can be decomposed into a fixed, a priori known, number of parts. The proposed method can handle both variations in shape structure and variations in the appearance of individual shape parts. A new class of shape models is introduced, called Hidden State Shape Models, that can naturally represent shapes of variable structure. A detection algorithm is described that finds instances of such shapes in images with large amounts of clutter by finding globally optimal correspondences between image features and shape models. Experiments with real images demonstrate that our method can localize plant branches that consist of an a priori unknown number of leaves and can detect hands more accurately than a hand detector based on the chamfer distance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Nearest neighbor search is commonly employed in face recognition but it does not scale well to large dataset sizes. A strategy to combine rejection classifiers into a cascade for face identification is proposed in this paper. A rejection classifier for a pair of classes is defined to reject at least one of the classes with high confidence. These rejection classifiers are able to share discriminants in feature space and at the same time have high confidence in the rejection decision. In the face identification problem, it is possible that a pair of known individual faces are very dissimilar. It is very unlikely that both of them are close to an unknown face in the feature space. Hence, only one of them needs to be considered. Using a cascade structure of rejection classifiers, the scope of nearest neighbor search can be reduced significantly. Experiments on Face Recognition Grand Challenge (FRGC) version 1 data demonstrate that the proposed method achieves significant speed up and an accuracy comparable with the brute force Nearest Neighbor method. In addition, a graph cut based clustering technique is employed to demonstrate that the pairwise separability of these rejection classifiers is capable of semantic grouping.