14 resultados para Morphing Alteration Detection Image Warping
em Massachusetts Institute of Technology
Resumo:
The ability to detect faces in images is of critical ecological significance. It is a pre-requisite for other important face perception tasks such as person identification, gender classification and affect analysis. Here we address the question of how the visual system classifies images into face and non-face patterns. We focus on face detection in impoverished images, which allow us to explore information thresholds required for different levels of performance. Our experimental results provide lower bounds on image resolution needed for reliable discrimination between face and non-face patterns and help characterize the nature of facial representations used by the visual system under degraded viewing conditions. Specifically, they enable an evaluation of the contribution of luminance contrast, image orientation and local context on face-detection performance.
Resumo:
We present a new method for rendering novel images of flexible 3D objects from a small number of example images in correspondence. The strength of the method is the ability to synthesize images whose viewing position is significantly far away from the viewing cone of the example images ("view extrapolation"), yet without ever modeling the 3D structure of the scene. The method relies on synthesizing a chain of "trilinear tensors" that governs the warping function from the example images to the novel image, together with a multi-dimensional interpolation function that synthesizes the non-rigid motions of the viewed object from the virtual camera position. We show that two closely spaced example images alone are sufficient in practice to synthesize a significant viewing cone, thus demonstrating the ability of representing an object by a relatively small number of model images --- for the purpose of cheap and fast viewers that can run on standard hardware.
Resumo:
Certain salient structures in images attract our immediate attention without requiring a systematic scan. We present a method for computing saliency by a simple iterative scheme, using a uniform network of locally connected processing elements. The network uses an optimization approach to produce a "saliency map," a representation of the image emphasizing salient locations. The main properties of the network are: (i) the computations are simple and local, (ii) globally salient structures emerge with a small number of iterations, and (iii) as a by-product of the computations, contours are smoothed and gaps are filled in.
Resumo:
We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data, since each classifier requires the computation of many different image features. In particular, for independently trained detectors, the (run-time) computational complexity, and the (training-time) sample complexity, scales linearly with the number of classes to be detected. It seems unlikely that such an approach will scale up to allow recognition of hundreds or thousands of objects. We present a multi-class boosting procedure (joint boosting) that reduces the computational and sample complexity, by finding common features that can be shared across the classes (and/or views). The detectors for each class are trained jointly, rather than independently. For a given performance level, the total number of features required, and therefore the computational cost, is observed to scale approximately logarithmically with the number of classes. The features selected jointly are closer to edges and generic features typical of many natural structures instead of finding specific object parts. Those generic features generalize better and reduce considerably the computational cost of an algorithm for multi-class object detection.
Resumo:
We seek to both detect and segment objects in images. To exploit both local image data as well as contextual information, we introduce Boosted Random Fields (BRFs), which uses Boosting to learn the graph structure and local evidence of a conditional random field (CRF). The graph structure is learned by assembling graph fragments in an additive model. The connections between individual pixels are not very informative, but by using dense graphs, we can pool information from large regions of the image; dense models also support efficient inference. We show how contextual information from other objects can improve detection performance, both in terms of accuracy and speed, by using a computational cascade. We apply our system to detect stuff and things in office and street scenes.
Resumo:
This thesis presents a learning based approach for detecting classes of objects and patterns with variable image appearance but highly predictable image boundaries. It consists of two parts. In part one, we introduce our object and pattern detection approach using a concrete human face detection example. The approach first builds a distribution-based model of the target pattern class in an appropriate feature space to describe the target's variable image appearance. It then learns from examples a similarity measure for matching new patterns against the distribution-based target model. The approach makes few assumptions about the target pattern class and should therefore be fairly general, as long as the target class has predictable image boundaries. Because our object and pattern detection approach is very much learning-based, how well a system eventually performs depends heavily on the quality of training examples it receives. The second part of this thesis looks at how one can select high quality examples for function approximation learning tasks. We propose an {em active learning} formulation for function approximation, and show for three specific approximation function classes, that the active example selection strategy learns its target with fewer data samples than random sampling. We then simplify the original active learning formulation, and show how it leads to a tractable example selection paradigm, suitable for use in many object and pattern detection problems.
Resumo:
Information representation is a critical issue in machine vision. The representation strategy in the primitive stages of a vision system has enormous implications for the performance in subsequent stages. Existing feature extraction paradigms, like edge detection, provide sparse and unreliable representations of the image information. In this thesis, we propose a novel feature extraction paradigm. The features consist of salient, simple parts of regions bounded by zero-crossings. The features are dense, stable, and robust. The primary advantage of the features is that they have abstract geometric attributes pertaining to their size and shape. To demonstrate the utility of the feature extraction paradigm, we apply it to passive navigation. We argue that the paradigm is applicable to other early vision problems.
Resumo:
This report describes the implementation of a theory of edge detection, proposed by Marr and Hildreth (1979). According to this theory, the image is first processed independently through a set of different size filters, whose shape is the Laplacian of a Gaussian, ***. Zero-crossings in the output of these filters mark the positions of intensity changes at different resolutions. Information about these zero-crossings is then used for deriving a full symbolic description of changes in intensity in the image, called the raw primal sketch. The theory is closely tied with early processing in the human visual systems. In this report, we first examine the critical properties of the initial filters used in the edge detection process, both from a theoretical and practical standpoint. The implementation is then used as a test bed for exploring aspects of the human visual system; in particular, acuity and hyperacuity. Finally, we present some preliminary results concerning the relationship between zero-crossings detected at different resolutions, and some observations relevant to the process by which the human visual system integrates descriptions of intensity changes obtained at different resolutions.
Resumo:
This thesis explores how to represent image texture in order to obtain information about the geometry and structure of surfaces, with particular emphasis on locating surface discontinuities. Theoretical and psychophysical results lead to the following conclusions for the representation of image texture: (1) A texture edge primitive is needed to identify texture change contours, which are formed by an abrupt change in the 2-D organization of similar items in an image. The texture edge can be used for locating discontinuities in surface structure and surface geometry and for establishing motion correspondence. (2) Abrupt changes in attributes that vary with changing surface geometry ??ientation, density, length, and width ??ould be used to identify discontinuities in surface geometry and surface structure. (3) Texture tokens are needed to separate the effects of different physical processes operating on a surface. They represent the local structure of the image texture. Their spatial variation can be used in the detection of texture discontinuities and texture gradients, and their temporal variation may be used for establishing motion correspondence. What precisely constitutes the texture tokens is unknown; it appears, however, that the intensity changes alone will not suffice, but local groupings of them may. (4) The above primitives need to be assigned rapidly over a large range in an image.
Resumo:
We present a set of techniques that can be used to represent and detect shapes in images. Our methods revolve around a particular shape representation based on the description of objects using triangulated polygons. This representation is similar to the medial axis transform and has important properties from a computational perspective. The first problem we consider is the detection of non-rigid objects in images using deformable models. We present an efficient algorithm to solve this problem in a wide range of situations, and show examples in both natural and medical images. We also consider the problem of learning an accurate non-rigid shape model for a class of objects from examples. We show how to learn good models while constraining them to the form required by the detection algorithm. Finally, we consider the problem of low-level image segmentation and grouping. We describe a stochastic grammar that generates arbitrary triangulated polygons while capturing Gestalt principles of shape regularity. This grammar is used as a prior model over random shapes in a low level algorithm that detects objects in images.
Resumo:
In this report, a face recognition system that is capable of detecting and recognizing frontal and rotated faces was developed. Two face recognition methods focusing on the aspect of pose invariance are presented and evaluated - the whole face approach and the component-based approach. The main challenge of this project is to develop a system that is able to identify faces under different viewing angles in realtime. The development of such a system will enhance the capability and robustness of current face recognition technology. The whole-face approach recognizes faces by classifying a single feature vector consisting of the gray values of the whole face image. The component-based approach first locates the facial components and extracts them. These components are normalized and combined into a single feature vector for classification. The Support Vector Machine (SVM) is used as the classifier for both approaches. Extensive tests with respect to the robustness against pose changes are performed on a database that includes faces rotated up to about 40 degrees in depth. The component-based approach clearly outperforms the whole-face approach on all tests. Although this approach isproven to be more reliable, it is still too slow for real-time applications. That is the reason why a real-time face recognition system using the whole-face approach is implemented to recognize people in color video sequences.
Resumo:
We present an example-based learning approach for locating vertical frontal views of human faces in complex scenes. The technique models the distribution of human face patterns by means of a few view-based "face'' and "non-face'' prototype clusters. At each image location, the local pattern is matched against the distribution-based model, and a trained classifier determines, based on the local difference measurements, whether or not a human face exists at the current image location. We provide an analysis that helps identify the critical components of our system.
Resumo:
We present a trainable system for detecting frontal and near-frontal views of faces in still gray images using Support Vector Machines (SVMs). We first consider the problem of detecting the whole face pattern by a single SVM classifer. In this context we compare different types of image features, present and evaluate a new method for reducing the number of features and discuss practical issues concerning the parameterization of SVMs and the selection of training data. The second part of the paper describes a component-based method for face detection consisting of a two-level hierarchy of SVM classifers. On the first level, component classifers independently detect components of a face, such as the eyes, the nose, and the mouth. On the second level, a single classifer checks if the geometrical configuration of the detected components in the image matches a geometrical model of a face.
Resumo:
There is general consensus that context can be a rich source of information about an object's identity, location and scale. In fact, the structure of many real-world scenes is governed by strong configurational rules akin to those that apply to a single object. Here we introduce a simple probabilistic framework for modeling the relationship between context and object properties based on the correlation between the statistics of low-level features across the entire scene and the objects that it contains. The resulting scheme serves as an effective procedure for object priming, context driven focus of attention and automatic scale-selection on real-world scenes.