8 resultados para translation invariant

em Massachusetts Institute of Technology


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The HMAX model has recently been proposed by Riesenhuber & Poggio as a hierarchical model of position- and size-invariant object recognition in visual cortex. It has also turned out to model successfully a number of other properties of the ventral visual stream (the visual pathway thought to be crucial for object recognition in cortex), and particularly of (view-tuned) neurons in macaque inferotemporal cortex, the brain area at the top of the ventral stream. The original modeling study only used ``paperclip'' stimuli, as in the corresponding physiology experiment, and did not explore systematically how model units' invariance properties depended on model parameters. In this study, we aimed at a deeper understanding of the inner workings of HMAX and its performance for various parameter settings and ``natural'' stimulus classes. We examined HMAX responses for different stimulus sizes and positions systematically and found a dependence of model units' responses on stimulus position for which a quantitative description is offered. Interestingly, we find that scale invariance properties of hierarchical neural models are not independent of stimulus class, as opposed to translation invariance, even though both are affine transformations within the image plane.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In order to recognize an object in an image, we must determine the best transformation from object model to the image. In this paper, we show that for features from coplanar surfaces which undergo linear transformations in space, there exist projections invariant to the surface motions up to rotations in the image field. To use this property, we propose a new alignment approach to object recognition based on centroid alignment of corresponding feature groups. This method uses only a single pair of 2D model and data. Experimental results show the robustness of the proposed method against perturbations of feature positions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The problem of automatic face recognition is to visually identify a person in an input image. This task is performed by matching the input face against the faces of known people in a database of faces. Most existing work in face recognition has limited the scope of the problem, however, by dealing primarily with frontal views, neutral expressions, and fixed lighting conditions. To help generalize existing face recognition systems, we look at the problem of recognizing faces under a range of viewpoints. In particular, we consider two cases of this problem: (i) many example views are available of each person, and (ii) only one view is available per person, perhaps a driver's license or passport photograph. Ideally, we would like to address these two cases using a simple view-based approach, where a person is represented in the database by using a number of views on the viewing sphere. While the view-based approach is consistent with case (i), for case (ii) we need to augment the single real view of each person with synthetic views from other viewpoints, views we call 'virtual views'. Virtual views are generated using prior knowledge of face rotation, knowledge that is 'learned' from images of prototype faces. This prior knowledge is used to effectively rotate in depth the single real view available of each person. In this thesis, I present the view-based face recognizer, techniques for synthesizing virtual views, and experimental results using real and virtual views in the recognizer.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Machine translation has been a particularly difficult problem in the area of Natural Language Processing for over two decades. Early approaches to translation failed since interaction effects of complex phenomena in part made translation appear to be unmanageable. Later approaches to the problem have succeeded (although only bilingually), but are based on many language-specific rules of a context-free nature. This report presents an alternative approach to natural language translation that relies on principle-based descriptions of grammar rather than rule-oriented descriptions. The model that has been constructed is based on abstract principles as developed by Chomsky (1981) and several other researchers working within the "Government and Binding" (GB) framework. Thus, the grammar is viewed as a modular system of principles rather than a large set of ad hoc language-specific rules.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this report, a face recognition system that is capable of detecting and recognizing frontal and rotated faces was developed. Two face recognition methods focusing on the aspect of pose invariance are presented and evaluated - the whole face approach and the component-based approach. The main challenge of this project is to develop a system that is able to identify faces under different viewing angles in realtime. The development of such a system will enhance the capability and robustness of current face recognition technology. The whole-face approach recognizes faces by classifying a single feature vector consisting of the gray values of the whole face image. The component-based approach first locates the facial components and extracts them. These components are normalized and combined into a single feature vector for classification. The Support Vector Machine (SVM) is used as the classifier for both approaches. Extensive tests with respect to the robustness against pose changes are performed on a database that includes faces rotated up to about 40 degrees in depth. The component-based approach clearly outperforms the whole-face approach on all tests. Although this approach isproven to be more reliable, it is still too slow for real-time applications. That is the reason why a real-time face recognition system using the whole-face approach is implemented to recognize people in color video sequences.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Human object recognition is generally considered to tolerate changes of the stimulus position in the visual field. A number of recent studies, however, have cast doubt on the completeness of translation invariance. In a new series of experiments we tried to investigate whether positional specificity of short-term memory is a general property of visual perception. We tested same/different discrimination of computer graphics models that were displayed at the same or at different locations of the visual field, and found complete translation invariance, regardless of the similarity of the animals and irrespective of direction and size of the displacement (Exp. 1 and 2). Decisions were strongly biased towards same decisions if stimuli appeared at a constant location, while after translation subjects displayed a tendency towards different decisions. Even if the spatial order of animal limbs was randomized ("scrambled animals"), no deteriorating effect of shifts in the field of view could be detected (Exp. 3). However, if the influence of single features was reduced (Exp. 4 and 5) small but significant effects of translation could be obtained. Under conditions that do not reveal an influence of translation, rotation in depth strongly interferes with recognition (Exp. 6). Changes of stimulus size did not reduce performance (Exp. 7). Tolerance to these object transformations seems to rely on different brain mechanisms, with translation and scale invariance being achieved in principle, while rotation invariance is not.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Local descriptors are increasingly used for the task of object recognition because of their perceived robustness with respect to occlusions and to global geometrical deformations. Such a descriptor--based on a set of oriented Gaussian derivative filters-- is used in our recognition system. We report here an evaluation of several techniques for orientation estimation to achieve rotation invariance of the descriptor. We also describe feature selection based on a single training image. Virtual images are generated by rotating and rescaling the image and robust features are selected. The results confirm robust performance in cluttered scenes, in the presence of partial occlusions, and when the object is embedded in different backgrounds.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a new method to perform reliable matching between different images. This method exploits a projective invariant property between concentric circles and the corresponding projected ellipses to find complete region correspondences centered on interest points. The method matches interest points allowing for a full perspective transformation and exploiting all the available luminance information in the regions. Experiments have been conducted on many different data sets to compare our approach to SIFT local descriptors. The results show the new method offers increased robustness to partial visibility, object rotation in depth, and viewpoint angle change.