920 resultados para Computer vision teaching
Resumo:
This report mainly summarizes the Project MAC A.I. Group work between July 1968 and June 1969 but covers some work up to February 1970. The work on computer vision is described in detail. This summary should be read in conjunction with last year's A.I. Group Report which is included at the end of this Memo.
Resumo:
Three-dimensional models which contain both geometry and texture have numerous applications such as urban planning, physical simulation, and virtual environments. A major focus of computer vision (and recently graphics) research is the automatic recovery of three-dimensional models from two-dimensional images. After many years of research this goal is yet to be achieved. Most practical modeling systems require substantial human input and unlike automatic systems are not scalable. This thesis presents a novel method for automatically recovering dense surface patches using large sets (1000's) of calibrated images taken from arbitrary positions within the scene. Physical instruments, such as Global Positioning System (GPS), inertial sensors, and inclinometers, are used to estimate the position and orientation of each image. Essentially, the problem is to find corresponding points in each of the images. Once a correspondence has been established, calculating its three-dimensional position is simply a matter of geometry. Long baseline images improve the accuracy. Short baseline images and the large number of images greatly simplifies the correspondence problem. The initial stage of the algorithm is completely local and scales linearly with the number of images. Subsequent stages are global in nature, exploit geometric constraints, and scale quadratically with the complexity of the underlying scene. We describe techniques for: 1) detecting and localizing surface patches; 2) refining camera calibration estimates and rejecting false positive surfels; and 3) grouping surface patches into surfaces and growing the surface along a two-dimensional manifold. We also discuss a method for producing high quality, textured three-dimensional models from these surfaces. Some of the most important characteristics of this approach are that it: 1) uses and refines noisy calibration estimates; 2) compensates for large variations in illumination; 3) tolerates significant soft occlusion (e.g. tree branches); and 4) associates, at a fundamental level, an estimated normal (i.e. no frontal-planar assumption) and texture with each surface patch.
Resumo:
The problem of automatic face recognition is to visually identify a person in an input image. This task is performed by matching the input face against the faces of known people in a database of faces. Most existing work in face recognition has limited the scope of the problem, however, by dealing primarily with frontal views, neutral expressions, and fixed lighting conditions. To help generalize existing face recognition systems, we look at the problem of recognizing faces under a range of viewpoints. In particular, we consider two cases of this problem: (i) many example views are available of each person, and (ii) only one view is available per person, perhaps a driver's license or passport photograph. Ideally, we would like to address these two cases using a simple view-based approach, where a person is represented in the database by using a number of views on the viewing sphere. While the view-based approach is consistent with case (i), for case (ii) we need to augment the single real view of each person with synthetic views from other viewpoints, views we call 'virtual views'. Virtual views are generated using prior knowledge of face rotation, knowledge that is 'learned' from images of prototype faces. This prior knowledge is used to effectively rotate in depth the single real view available of each person. In this thesis, I present the view-based face recognizer, techniques for synthesizing virtual views, and experimental results using real and virtual views in the recognizer.
Resumo:
This thesis presents a learning based approach for detecting classes of objects and patterns with variable image appearance but highly predictable image boundaries. It consists of two parts. In part one, we introduce our object and pattern detection approach using a concrete human face detection example. The approach first builds a distribution-based model of the target pattern class in an appropriate feature space to describe the target's variable image appearance. It then learns from examples a similarity measure for matching new patterns against the distribution-based target model. The approach makes few assumptions about the target pattern class and should therefore be fairly general, as long as the target class has predictable image boundaries. Because our object and pattern detection approach is very much learning-based, how well a system eventually performs depends heavily on the quality of training examples it receives. The second part of this thesis looks at how one can select high quality examples for function approximation learning tasks. We propose an {em active learning} formulation for function approximation, and show for three specific approximation function classes, that the active example selection strategy learns its target with fewer data samples than random sampling. We then simplify the original active learning formulation, and show how it leads to a tractable example selection paradigm, suitable for use in many object and pattern detection problems.
Resumo:
This research project is a study of the role of fixation and visual attention in object recognition. In this project, we build an active vision system which can recognize a target object in a cluttered scene efficiently and reliably. Our system integrates visual cues like color and stereo to perform figure/ground separation, yielding candidate regions on which to focus attention. Within each image region, we use stereo to extract features that lie within a narrow disparity range about the fixation position. These selected features are then used as input to an alignment-style recognition system. We show that visual attention and fixation significantly reduce the complexity and the false identifications in model-based recognition using Alignment methods. We also demonstrate that stereo can be used effectively as a figure/ground separator without the need for accurate camera calibration.
Resumo:
Methods for fusing two computer vision methods are discussed and several example algorithms are presented to illustrate the variational method of fusing algorithms. The example algorithms seek to determine planet topography given two images taken from two different locations with two different lighting conditions. The algorithms each employ assingle cost function that combines the computer vision methods of shape-from-shading and stereo in different ways. The algorithms are closely coupled and take into account all the constraints of the photo-topography problem. The algorithms are run on four synthetic test image sets of varying difficulty.
Resumo:
Alignment is a prevalent approach for recognizing 3D objects in 2D images. A major problem with current implementations is how to robustly handle errors that propagate from uncertainties in the locations of image features. This thesis gives a technique for bounding these errors. The technique makes use of a new solution to the problem of recovering 3D pose from three matching point pairs under weak-perspective projection. Furthermore, the error bounds are used to demonstrate that using line segments for features instead of points significantly reduces the false positive rate, to the extent that alignment can remain reliable even in cluttered scenes.
Resumo:
How the visual system extracts shape information from a single grey-level image can be approached by examining how the information about shape is contained in the image. This technical report considers the characteristic equations derived by Horn as a dynamical system. Certain image critical points generate dynamical system critical points. The stable and unstable manifolds of these critical points correspond to convex and concave solution surfaces, giving more general existence and uniqueness results. A new kind of highly parallel, robust shape from shading algorithm is suggested on neighborhoods of these critical points. The information at bounding contours in the image is also analyzed.
Resumo:
This thesis addresses the problem of recognizing solid objects in the three-dimensional world, using two-dimensional shape information extracted from a single image. Objects can be partly occluded and can occur in cluttered scenes. A model based approach is taken, where stored models are matched to an image. The matching problem is separated into two stages, which employ different representations of objects. The first stage uses the smallest possible number of local features to find transformations from a model to an image. This minimizes the amount of search required in recognition. The second stage uses the entire edge contour of an object to verify each transformation. This reduces the chance of finding false matches.
Resumo:
Reconstructing a surface from sparse sensory data is a well known problem in computer vision. Early vision modules typically supply sparse depth, orientation and discontinuity information. The surface reconstruction module incorporates these sparse and possibly conflicting measurements of a surface into a consistent, dense depth map. The coupled depth/slope model developed here provides a novel computational solution to the surface reconstruction problem. This method explicitly computes dense slope representation as well as dense depth representations. This marked change from previous surface reconstruction algorithms allows a natural integration of orientation constraints into the surface description, a feature not easily incorporated into earlier algorithms. In addition, the coupled depth/ slope model generalizes to allow for varying amounts of smoothness at different locations on the surface. This computational model helps conceptualize the problem and leads to two possible implementations- analog and digital. The model can be implemented as an electrical or biological analog network since the only computations required at each locally connected node are averages, additions and subtractions. A parallel digital algorithm can be derived by using finite difference approximations. The resulting system of coupled equations can be solved iteratively on a mesh-pf-processors computer, such as the Connection Machine. Furthermore, concurrent multi-grid methods are designed to speed the convergence of this digital algorithm.
Resumo:
In this thesis we study the general problem of reconstructing a function, defined on a finite lattice from a set of incomplete, noisy and/or ambiguous observations. The goal of this work is to demonstrate the generality and practical value of a probabilistic (in particular, Bayesian) approach to this problem, particularly in the context of Computer Vision. In this approach, the prior knowledge about the solution is expressed in the form of a Gibbsian probability distribution on the space of all possible functions, so that the reconstruction task is formulated as an estimation problem. Our main contributions are the following: (1) We introduce the use of specific error criteria for the design of the optimal Bayesian estimators for several classes of problems, and propose a general (Monte Carlo) procedure for approximating them. This new approach leads to a substantial improvement over the existing schemes, both regarding the quality of the results (particularly for low signal to noise ratios) and the computational efficiency. (2) We apply the Bayesian appraoch to the solution of several problems, some of which are formulated and solved in these terms for the first time. Specifically, these applications are: teh reconstruction of piecewise constant surfaces from sparse and noisy observationsl; the reconstruction of depth from stereoscopic pairs of images and the formation of perceptual clusters. (3) For each one of these applications, we develop fast, deterministic algorithms that approximate the optimal estimators, and illustrate their performance on both synthetic and real data. (4) We propose a new method, based on the analysis of the residual process, for estimating the parameters of the probabilistic models directly from the noisy observations. This scheme leads to an algorithm, which has no free parameters, for the restoration of piecewise uniform images. (5) We analyze the implementation of the algorithms that we develop in non-conventional hardware, such as massively parallel digital machines, and analog and hybrid networks.
Resumo:
Handshape is a key articulatory parameter in sign language, and thus handshape recognition from signing video is essential for sign recognition and retrieval. Handshape transitions within monomorphemic lexical signs (the largest class of signs in signed languages) are governed by phonological rules. For example, such transitions normally involve either closing or opening of the hand (i.e., to exclusively use either folding or unfolding of the palm and one or more fingers). Furthermore, akin to allophonic variations in spoken languages, both inter- and intra- signer variations in the production of specific handshapes are observed. We propose a Bayesian network formulation to exploit handshape co-occurrence constraints, also utilizing information about allophonic variations to aid in handshape recognition. We propose a fast non-rigid image alignment method to gain improved robustness to handshape appearance variations during computation of observation likelihoods in the Bayesian network. We evaluate our handshape recognition approach on a large dataset of monomorphemic lexical signs. We demonstrate that leveraging linguistic constraints on handshapes results in improved handshape recognition accuracy. As part of the overall project, we are collecting and preparing for dissemination a large corpus (three thousand signs from three native signers) of American Sign Language (ASL) video. The video have been annotated using SignStream® [Neidle et al.] with labels for linguistic information such as glosses, morphological properties and variations, and start/end handshapes associated with each ASL sign.
Resumo:
A probabilistic, nonlinear supervised learning model is proposed: the Specialized Mappings Architecture (SMA). The SMA employs a set of several forward mapping functions that are estimated automatically from training data. Each specialized function maps certain domains of the input space (e.g., image features) onto the output space (e.g., articulated body parameters). The SMA can model ambiguous, one-to-many mappings that may yield multiple valid output hypotheses. Once learned, the mapping functions generate a set of output hypotheses for a given input via a statistical inference procedure. The SMA inference procedure incorporates an inverse mapping or feedback function in evaluating the likelihood of each of the hypothesis. Possible feedback functions include computer graphics rendering routines that can generate images for given hypotheses. The SMA employs a variant of the Expectation-Maximization algorithm for simultaneous learning of the specialized domains along with the mapping functions, and approximate strategies for inference. The framework is demonstrated in a computer vision system that can estimate the articulated pose parameters of a human’s body or hands, given silhouettes from a single image. The accuracy and stability of the SMA are also tested using synthetic images of human bodies and hands, where ground truth is known.
Resumo:
This paper introduces BoostMap, a method that can significantly reduce retrieval time in image and video database systems that employ computationally expensive distance measures, metric or non-metric. Database and query objects are embedded into a Euclidean space, in which similarities can be rapidly measured using a weighted Manhattan distance. Embedding construction is formulated as a machine learning task, where AdaBoost is used to combine many simple, 1D embeddings into a multidimensional embedding that preserves a significant amount of the proximity structure in the original space. Performance is evaluated in a hand pose estimation system, and a dynamic gesture recognition system, where the proposed method is used to retrieve approximate nearest neighbors under expensive image and video similarity measures. In both systems, BoostMap significantly increases efficiency, with minimal losses in accuracy. Moreover, the experiments indicate that BoostMap compares favorably with existing embedding methods that have been employed in computer vision and database applications, i.e., FastMap and Bourgain embeddings.
Resumo:
In many multi-camera vision systems the effect of camera locations on the task-specific quality of service is ignored. Researchers in Computational Geometry have proposed elegant solutions for some sensor location problem classes. Unfortunately, these solutions utilize unrealistic assumptions about the cameras' capabilities that make these algorithms unsuitable for many real-world computer vision applications: unlimited field of view, infinite depth of field, and/or infinite servo precision and speed. In this paper, the general camera placement problem is first defined with assumptions that are more consistent with the capabilities of real-world cameras. The region to be observed by cameras may be volumetric, static or dynamic, and may include holes that are caused, for instance, by columns or furniture in a room that can occlude potential camera views. A subclass of this general problem can be formulated in terms of planar regions that are typical of building floorplans. Given a floorplan to be observed, the problem is then to efficiently compute a camera layout such that certain task-specific constraints are met. A solution to this problem is obtained via binary optimization over a discrete problem space. In preliminary experiments the performance of the resulting system is demonstrated with different real floorplans.