161 resultados para naval architect knowledgebase
Resumo:
Scene flow methods estimate the three-dimensional motion field for points in the world, using multi-camera video data. Such methods combine multi-view reconstruction with motion estimation approaches. This paper describes an alternative formulation for dense scene flow estimation that provides convincing results using only two cameras by fusing stereo and optical flow estimation into a single coherent framework. To handle the aperture problems inherent in the estimation task, a multi-scale method along with a novel adaptive smoothing technique is used to gain a regularized solution. This combined approach both preserves discontinuities and prevents over-regularization-two problems commonly associated with basic multi-scale approaches. Internally, the framework generates probability distributions for optical flow and disparity. Taking into account the uncertainty in the intermediate stages allows for more reliable estimation of the 3D scene flow than standard stereo and optical flow methods allow. Experiments with synthetic and real test data demonstrate the effectiveness of the approach.
Resumo:
In gesture and sign language video sequences, hand motion tends to be rapid, and hands frequently appear in front of each other or in front of the face. Thus, hand location is often ambiguous, and naive color-based hand tracking is insufficient. To improve tracking accuracy, some methods employ a prediction-update framework, but such methods require careful initialization of model parameters, and tend to drift and lose track in extended sequences. In this paper, a temporal filtering framework for hand tracking is proposed that can initialize and reset itself without human intervention. In each frame, simple features like color and motion residue are exploited to identify multiple candidate hand locations. The temporal filter then uses the Viterbi algorithm to select among the candidates from frame to frame. The resulting tracking system can automatically identify video trajectories of unambiguous hand motion, and detect frames where tracking becomes ambiguous because of occlusions or overlaps. Experiments on video sequences of several hundred frames in duration demonstrate the system's ability to track hands robustly, to detect and handle tracking ambiguities, and to extract the trajectories of unambiguous hand motion.
Resumo:
Hand signals are commonly used in applications such as giving instructions to a pilot for airplane take off or direction of a crane operator by a foreman on the ground. A new algorithm for recognizing hand signals from a single camera is proposed. Typically, tracked 2D feature positions of hand signals are matched to 2D training images. In contrast, our approach matches the 2D feature positions to an archive of 3D motion capture sequences. The method avoids explicit reconstruction of the 3D articulated motion from 2D image features. Instead, the matching between the 2D and 3D sequence is done by backprojecting the 3D motion capture data onto 2D. Experiments demonstrate the effectiveness of the approach in an example application: recognizing six classes of basketball referee hand signals in video.
Resumo:
We describe a method for shape-based image database search that uses deformable prototypes to represent categories. Rather than directly comparing a candidate shape with all shape entries in the database, shapes are compared in terms of the types of nonrigid deformations (differences) that relate them to a small subset of representative prototypes. To solve the shape correspondence and alignment problem, we employ the technique of modal matching, an information-preserving shape decomposition for matching, describing, and comparing shapes despite sensor variations and nonrigid deformations. In modal matching, shape is decomposed into an ordered basis of orthogonal principal components. We demonstrate the utility of this approach for shape comparison in 2-D image databases.
Resumo:
A new region-based approach to nonrigid motion tracking is described. Shape is defined in terms of a deformable triangular mesh that captures object shape plus a color texture map that captures object appearance. Photometric variations are also modeled. Nonrigid shape registration and motion tracking are achieved by posing the problem as an energy-based, robust minimization procedure. The approach provides robustness to occlusions, wrinkles, shadows, and specular highlights. The formulation is tailored to take advantage of texture mapping hardware available in many workstations, PC's, and game consoles. This enables nonrigid tracking at speeds approaching video rate.
Resumo:
We developed an automated system that registers chest CT scans temporally. Our registration method matches corresponding anatomical landmarks to obtain initial registration parameters. The initial point-to-point registration is then generalized to an iterative surface-to-surface registration method. Our "goodness-of-fit" measure is evaluated at each step in the iterative scheme until the registration performance is sufficient. We applied our method to register the 3D lung surfaces of 11 pairs of chest CT scans and report promising registration performance.
Resumo:
Based on our previous work in deformable shape model-based object detection, a new method is proposed that uses index trees for organizing shape features to support content-based retrieval applications. In the proposed strategy, different shape feature sets can be used in index trees constructed for object detection and shape similarity comparison respectively. There is a direct correspondence between the two shape feature sets. As a result, application-specific features can be obtained efficiently for shape-based retrieval after object detection. A novel approach is proposed that allows retrieval of images based on the population distribution of deformed shapes in each image. Experiments testing these new approaches have been conducted using an image database that contains blood cell micrographs. The precision vs. recall performance measure shows that our method is superior to previous methods.
Resumo:
A method for reconstructing 3D rational B-spline surfaces from multiple views is proposed. The method takes advantage of the projective invariance properties of rational B-splines. Given feature correspondences in multiple views, the 3D surface is reconstructed via a four step framework. First, corresponding features in each view are given an initial surface parameter value (s; t), and a 2D B-spline is fitted in each view. After this initialization, an iterative minimization procedure alternates between updating the 2D B-spline control points and re-estimating each feature's (s; t). Next, a non-linear minimization method is used to upgrade the 2D B-splines to 2D rational B-splines, and obtain a better fit. Finally, a factorization method is used to reconstruct the 3D B-spline surface given 2D B-splines in each view. This surface recovery method can be applied in both the perspective and orthographic case. The orthographic case allows the use of additional constraints in the recovery. Experiments with real and synthetic imagery demonstrate the efficacy of the approach for the orthographic case.
Resumo:
A number of problems in network operations and engineering call for new methods of traffic analysis. While most existing traffic analysis methods are fundamentally temporal, there is a clear need for the analysis of traffic across multiple network links — that is, for spatial traffic analysis. In this paper we give examples of problems that can be addressed via spatial traffic analysis. We then propose a formal approach to spatial traffic analysis based on the wavelet transform. Our approach (graph wavelets) generalizes the traditional wavelet transform so that it can be applied to data elements connected via an arbitrary graph topology. We explore the necessary and desirable properties of this approach and consider some of its possible realizations. We then apply graph wavelets to measurements from an operating network. Our results show that graph wavelets are very useful for our motivating problems; for example, they can be used to form highly summarized views of an entire network's traffic load, to gain insight into a network's global traffic response to a link failure, and to localize the extent of a failure event within the network.
Resumo:
We designed the Eyebrow-Clicker, a camera-based human computer interface system that implements a new form of binary switch. When the user raises his or her eyebrows, the binary switch is activated and a selection command is issued. The Eyebrow-Clicker thus replaces the "click" functionality of a mouse. The system initializes itself by detecting the user's eyes and eyebrows, tracks these features at frame rate, and recovers in the event of errors. The initialization uses the natural blinking of the human eye to select suitable templates for tracking. Once execution has begun, a user therefore never has to restart the program or even touch the computer. In our experiments with human-computer interaction software, the system successfully determined 93% of the time when a user raised his eyebrows.
Resumo:
Scene flow methods estimate the three-dimensional motion field for points in the world, using multi-camera video data. Such methods combine multi-view reconstruction with motion estimation. This paper describes an alternative formulation for dense scene flow estimation that provides reliable results using only two cameras by fusing stereo and optical flow estimation into a single coherent framework. Internally, the proposed algorithm generates probability distributions for optical flow and disparity. Taking into account the uncertainty in the intermediate stages allows for more reliable estimation of the 3D scene flow than previous methods allow. To handle the aperture problems inherent in the estimation of optical flow and disparity, a multi-scale method along with a novel region-based technique is used within a regularized solution. This combined approach both preserves discontinuities and prevents over-regularization – two problems commonly associated with the basic multi-scale approaches. Experiments with synthetic and real test data demonstrate the strength of the proposed approach.
Resumo:
A method for deformable shape detection and recognition is described. Deformable shape templates are used to partition the image into a globally consistent interpretation, determined in part by the minimum description length principle. Statistical shape models enforce the prior probabilities on global, parametric deformations for each object class. Once trained, the system autonomously segments deformed shapes from the background, while not merging them with adjacent objects or shadows. The formulation can be used to group image regions based on any image homogeneity predicate; e.g., texture, color, or motion. The recovered shape models can be used directly in object recognition. Experiments with color imagery are reported.
Resumo:
An improved technique for 3D head tracking under varying illumination conditions is proposed. The head is modeled as a texture mapped cylinder. Tracking is formulated as an image registration problem in the cylinder's texture map image. To solve the registration problem in the presence of lighting variation and head motion, the residual error of registration is modeled as a linear combination of texture warping templates and orthogonal illumination templates. Fast and stable on-line tracking is then achieved via regularized, weighted least squares minimization of the registration error. The regularization term tends to limit potential ambiguities that arise in the warping and illumination templates. It enables stable tracking over extended sequences. Tracking does not require a precise initial fit of the model; the system is initialized automatically using a simple 2-D face detector. The only assumption is that the target is facing the camera in the first frame of the sequence. The warping templates are computed at the first frame of the sequence. Illumination templates are precomputed off-line over a training set of face images collected under varying lighting conditions. Experiments in tracking are reported.
Resumo:
A specialized formulation of Azarbayejani and Pentland's framework for recursive recovery of motion, structure and focal length from feature correspondences tracked through an image sequence is presented. The specialized formulation addresses the case where all tracked points lie on a plane. This planarity constraint reduces the dimension of the original state vector, and consequently the number of feature points needed to estimate the state. Experiments with synthetic data and real imagery illustrate the system performance. The experiments confirm that the specialized formulation provides improved accuracy, stability to observation noise, and rate of convergence in estimation for the case where the tracked points lie on a plane.
Resumo:
A novel approach for estimating articulated body posture and motion from monocular video sequences is proposed. Human pose is defined as the instantaneous two dimensional configuration (i.e., the projection onto the image plane) of a single articulated body in terms of the position of a predetermined set of joints. First, statistical segmentation of the human bodies from the background is performed and low-level visual features are found given the segmented body shape. The goal is to be able to map these, generally low level, visual features to body configurations. The system estimates different mappings, each one with a specific cluster in the visual feature space. Given a set of body motion sequences for training, unsupervised clustering is obtained via the Expectation Maximation algorithm. Then, for each of the clusters, a function is estimated to build the mapping between low-level features to 3D pose. Currently this mapping is modeled by a neural network. Given new visual features, a mapping from each cluster is performed to yield a set of possible poses. From this set, the system selects the most likely pose given the learned probability distribution and the visual feature similarity between hypothesis and input. Performance of the proposed approach is characterized using a new set of known body postures, showing promising results.