992 resultados para Multi-view geometry


Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the multi-view approach to semisupervised learning, we choose one predictor from each of multiple hypothesis classes, and we co-regularize our choices by penalizing disagreement among the predictors on the unlabeled data. We examine the co-regularization method used in the co-regularized least squares (CoRLS) algorithm, in which the views are reproducing kernel Hilbert spaces (RKHS's), and the disagreement penalty is the average squared difference in predictions. The final predictor is the pointwise average of the predictors from each view. We call the set of predictors that can result from this procedure the co-regularized hypothesis class. Our main result is a tight bound on the Rademacher complexity of the co-regularized hypothesis class in terms of the kernel matrices of each RKHS. We find that the co-regularization reduces the Rademacher complexity by an amount that depends on the distance between the two views, as measured by a data dependent metric. We then use standard techniques to bound the gap between training error and test error for the CoRLS algorithm. Experimentally, we find that the amount of reduction in complexity introduced by co regularization correlates with the amount of improvement that co-regularization gives in the CoRLS algorithm.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Gait energy images (GEIs) and its variants form the basis of many recent appearance-based gait recognition systems. The GEI combines good recognition performance with a simple implementation, though it suffers problems inherent to appearance-based approaches, such as being highly view dependent. In this paper, we extend the concept of the GEI to 3D, to create what we call the gait energy volume, or GEV. A basic GEV implementation is tested on the CMU MoBo database, showing improvements over both the GEI baseline and a fused multi-view GEI approach. We also demonstrate the efficacy of this approach on partial volume reconstructions created from frontal depth images, which can be more practically acquired, for example, in biometric portals implemented with stereo cameras, or other depth acquisition systems. Experiments on frontal depth images are evaluated on an in-house developed database captured using the Microsoft Kinect, and demonstrate the validity of the proposed approach.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present an iterative hierarchical algorithm for multi-view stereo. The algorithm attempts to utilise as much contextual information as is available to compute highly accurate and robust depth maps. There are three novel aspects to the approach: 1) firstly we incrementally improve the depth fidelity as the algorithm progresses through the image pyramid; 2) secondly we show how to incorporate visual hull information (when available) to constrain depth searches; and 3) we show how to simultaneously enforce the consistency of the depth-map by continual comparison with neighbouring depth-maps. We show that this approach produces highly accurate depth-maps and, since it is essentially a local method, is both extremely fast and simple to implement.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The core aim of machine learning is to make a computer program learn from the experience. Learning from data is usually defined as a task of learning regularities or patterns in data in order to extract useful information, or to learn the underlying concept. An important sub-field of machine learning is called multi-view learning where the task is to learn from multiple data sets or views describing the same underlying concept. A typical example of such scenario would be to study a biological concept using several biological measurements like gene expression, protein expression and metabolic profiles, or to classify web pages based on their content and the contents of their hyperlinks. In this thesis, novel problem formulations and methods for multi-view learning are presented. The contributions include a linear data fusion approach during exploratory data analysis, a new measure to evaluate different kinds of representations for textual data, and an extension of multi-view learning for novel scenarios where the correspondence of samples in the different views or data sets is not known in advance. In order to infer the one-to-one correspondence of samples between two views, a novel concept of multi-view matching is proposed. The matching algorithm is completely data-driven and is demonstrated in several applications such as matching of metabolites between humans and mice, and matching of sentences between documents in two languages.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a volumetric formulation for the multi-view stereo problem which is amenable to a computationally tractable global optimisation using Graph-cuts. Our approach is to seek the optimal partitioning of 3D space into two regions labelled as "object" and "empty" under a cost functional consisting of the following two terms: (1) A term that forces the boundary between the two regions to pass through photo-consistent locations and (2) a ballooning term that inflates the "object" region. To take account of the effect of occlusion on the first term we use an occlusion robust photo-consistency metric based on Normalised Cross Correlation, which does not assume any geometric knowledge about the reconstructed object. The globally optimal 3D partitioning can be obtained as the minimum cut solution of a weighted graph.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present a new co-clustering problem of images and visual features. The problem involves a set of non-object images in addition to a set of object images and features to be co-clustered. Co-clustering is performed in a way that maximises discrimination of object images from non-object images, thus emphasizing discriminative features. This provides a way of obtaining perceptual joint-clusters of object images and features. We tackle the problem by simultaneously boosting multiple strong classifiers which compete for images by their expertise. Each boosting classifier is an aggregation of weak-learners, i.e. simple visual features. The obtained classifiers are useful for object detection tasks which exhibit multimodalities, e.g. multi-category and multi-view object detection tasks. Experiments on a set of pedestrian images and a face data set demonstrate that the method yields intuitive image clusters with associated features and is much superior to conventional boosting classifiers in object detection tasks.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present an image-based approach to infer 3D structure parameters using a probabilistic "shape+structure'' model. The 3D shape of a class of objects may be represented by sets of contours from silhouette views simultaneously observed from multiple calibrated cameras. Bayesian reconstructions of new shapes can then be estimated using a prior density constructed with a mixture model and probabilistic principal components analysis. We augment the shape model to incorporate structural features of interest; novel examples with missing structure parameters may then be reconstructed to obtain estimates of these parameters. Model matching and parameter inference are done entirely in the image domain and require no explicit 3D construction. Our shape model enables accurate estimation of structure despite segmentation errors or missing views in the input silhouettes, and works even with only a single input view. Using a dataset of thousands of pedestrian images generated from a synthetic model, we can perform accurate inference of the 3D locations of 19 joints on the body based on observed silhouette contours from real images.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Statistical shape and texture appearance models are powerful image representations, but previously had been restricted to 2D or simple 3D shapes. In this paper we present a novel 3D morphable model based on image-based rendering techniques, which can represent complex lighting conditions, structures, and surfaces. We describe how to construct a manifold of the multi-view appearance of an object class using light fields and show how to match a 2D image of an object to a point on this manifold. In turn we use the reconstructed light field to render novel views of the object. Our technique overcomes the limitations of polygon based appearance models and uses light fields that are acquired in real-time.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Recovering a volumetric model of a person, car, or other object of interest from a single snapshot would be useful for many computer graphics applications. 3D model estimation in general is hard, and currently requires active sensors, multiple views, or integration over time. For a known object class, however, 3D shape can be successfully inferred from a single snapshot. We present a method for generating a ``virtual visual hull''-- an estimate of the 3D shape of an object from a known class, given a single silhouette observed from an unknown viewpoint. For a given class, a large database of multi-view silhouette examples from calibrated, though possibly varied, camera rigs are collected. To infer a novel single view input silhouette's virtual visual hull, we search for 3D shapes in the database which are most consistent with the observed contour. The input is matched to component single views of the multi-view training examples. A set of viewpoint-aligned virtual views are generated from the visual hulls corresponding to these examples. The 3D shape estimate for the input is then found by interpolating between the contours of these aligned views. When the underlying shape is ambiguous given a single view silhouette, we produce multiple visual hull hypotheses; if a sequence of input images is available, a dynamic programming approach is applied to find the maximum likelihood path through the feasible hypotheses over time. We show results of our algorithm on real and synthetic images of people.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We propose a multi-object multi-camera framework for tracking large numbers of tightly-spaced objects that rapidly move in three dimensions. We formulate the problem of finding correspondences across multiple views as a multidimensional assignment problem and use a greedy randomized adaptive search procedure to solve this NP-hard problem efficiently. To account for occlusions, we relax the one-to-one constraint that one measurement corresponds to one object and iteratively solve the relaxed assignment problem. After correspondences are established, object trajectories are estimated by stereoscopic reconstruction using an epipolar-neighborhood search. We embedded our method into a tracker-to-tracker multi-view fusion system that not only obtains the three-dimensional trajectories of closely-moving objects but also accurately settles track uncertainties that could not be resolved from single views due to occlusion. We conducted experiments to validate our greedy assignment procedure and our technique to recover from occlusions. We successfully track hundreds of flying bats and provide an analysis of their group behavior based on 150 reconstructed 3D trajectories.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A common design of an object recognition system has two steps, a detection step followed by a foreground within-class classification step. For example, consider face detection by a boosted cascade of detectors followed by face ID recognition via one-vs-all (OVA) classifiers. Another example is human detection followed by pose recognition. Although the detection step can be quite fast, the foreground within-class classification process can be slow and becomes a bottleneck. In this work, we formulate a filter-and-refine scheme, where the binary outputs of the weak classifiers in a boosted detector are used to identify a small number of candidate foreground state hypotheses quickly via Hamming distance or weighted Hamming distance. The approach is evaluated in three applications: face recognition on the FRGC V2 data set, hand shape detection and parameter estimation on a hand data set and vehicle detection and view angle estimation on a multi-view vehicle data set. On all data sets, our approach has comparable accuracy and is at least five times faster than the brute force approach.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

For the first time in this paper the authors present results showing the effect of out of plane speaker head pose variation on a lip biometric based speaker verification system. Using appearance DCT based features, they adopt a Mutual Information analysis technique to highlight the class discriminant DCT components most robust to changes in out of plane pose. Experiments are conducted using the initial phase of a new multi view Audio-Visual database designed for research and development of pose-invariant speech and speaker recognition. They show that verification performance can be improved by substituting higher order horizontal DCT components for vertical, particularly in the case of a train/test pose angle mismatch.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Die stereoskopische 3-D-Darstellung beruht auf der naturgetreuen Präsentation verschiedener Perspektiven für das rechte und linke Auge. Sie erlangt in der Medizin, der Architektur, im Design sowie bei Computerspielen und im Kino, zukünftig möglicherweise auch im Fernsehen, eine immer größere Bedeutung. 3-D-Displays dienen der zusätzlichen Wiedergabe der räumlichen Tiefe und lassen sich grob in die vier Gruppen Stereoskope und Head-mounted-Displays, Brillensysteme, autostereoskopische Displays sowie echte 3-D-Displays einteilen. Darunter besitzt der autostereoskopische Ansatz ohne Brillen, bei dem N≥2 Perspektiven genutzt werden, ein hohes Potenzial. Die beste Qualität in dieser Gruppe kann mit der Methode der Integral Photography, die sowohl horizontale als auch vertikale Parallaxe kodiert, erreicht werden. Allerdings ist das Verfahren sehr aufwendig und wird deshalb wenig genutzt. Den besten Kompromiss zwischen Leistung und Preis bieten präzise gefertigte Linsenrasterscheiben (LRS), die hinsichtlich Lichtausbeute und optischen Eigenschaften den bereits früher bekannten Barrieremasken überlegen sind. Insbesondere für die ergonomisch günstige Multiperspektiven-3-D-Darstellung wird eine hohe physikalische Monitorauflösung benötigt. Diese ist bei modernen TFT-Displays schon recht hoch. Eine weitere Verbesserung mit dem theoretischen Faktor drei erreicht man durch gezielte Ansteuerung der einzelnen, nebeneinander angeordneten Subpixel in den Farben Rot, Grün und Blau. Ermöglicht wird dies durch die um etwa eine Größenordnung geringere Farbauflösung des menschlichen visuellen Systems im Vergleich zur Helligkeitsauflösung. Somit gelingt die Implementierung einer Subpixel-Filterung, welche entsprechend den physiologischen Gegebenheiten mit dem in Luminanz und Chrominanz trennenden YUV-Farbmodell arbeitet. Weiterhin erweist sich eine Schrägstellung der Linsen im Verhältnis von 1:6 als günstig. Farbstörungen werden minimiert, und die Schärfe der Bilder wird durch eine weniger systematische Vergrößerung der technologisch unvermeidbaren Trennelemente zwischen den Subpixeln erhöht. Der Grad der Schrägstellung ist frei wählbar. In diesem Sinne ist die Filterung als adaptiv an den Neigungswinkel zu verstehen, obwohl dieser Wert für einen konkreten 3-D-Monitor eine Invariante darstellt. Die zu maximierende Zielgröße ist der Parameter Perspektiven-Pixel als Produkt aus Anzahl der Perspektiven N und der effektiven Auflösung pro Perspektive. Der Idealfall einer Verdreifachung wird praktisch nicht erreicht. Messungen mit Hilfe von Testbildern sowie Schrifterkennungstests lieferten einen Wert von knapp über 2. Dies ist trotzdem als eine signifikante Verbesserung der Qualität der 3-D-Darstellung anzusehen. In der Zukunft sind weitere Verbesserungen hinsichtlich der Zielgröße durch Nutzung neuer, feiner als TFT auflösender Technologien wie LCoS oder OLED zu erwarten. Eine Kombination mit der vorgeschlagenen Filtermethode wird natürlich weiterhin möglich und ggf. auch sinnvoll sein.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This chapter presents techniques used for the generation of 3D digital elevation models (DEMs) from remotely sensed data. Three methods are explored and discussed—optical stereoscopic imagery, Interferometric Synthetic Aperture Radar (InSAR), and LIght Detection and Ranging (LIDAR). For each approach, the state-of-the-art presented in the literature is reviewed. Techniques involved in DEM generation are presented with accuracy evaluation. Results of DEMs reconstructed from remotely sensed data are illustrated. While the processes of DEM generation from satellite stereoscopic imagery represents a good example of passive, multi-view imaging technology, discussed in Chap. 2 of this book, InSAR and LIDAR use different principles to acquire 3D information. With regard to InSAR and LIDAR, detailed discussions are conducted in order to convey the fundamentals of both technologies.