984 resultados para NATURAL IMAGES


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The paradigm of computational vision hypothesizes that any visual function -- such as the recognition of your grandparent -- can be replicated by computational processing of the visual input. What are these computations that the brain performs? What should or could they be? Working on the latter question, this dissertation takes the statistical approach, where the suitable computations are attempted to be learned from the natural visual data itself. In particular, we empirically study the computational processing that emerges from the statistical properties of the visual world and the constraints and objectives specified for the learning process. This thesis consists of an introduction and 7 peer-reviewed publications, where the purpose of the introduction is to illustrate the area of study to a reader who is not familiar with computational vision research. In the scope of the introduction, we will briefly overview the primary challenges to visual processing, as well as recall some of the current opinions on visual processing in the early visual systems of animals. Next, we describe the methodology we have used in our research, and discuss the presented results. We have included some additional remarks, speculations and conclusions to this discussion that were not featured in the original publications. We present the following results in the publications of this thesis. First, we empirically demonstrate that luminance and contrast are strongly dependent in natural images, contradicting previous theories suggesting that luminance and contrast were processed separately in natural systems due to their independence in the visual data. Second, we show that simple cell -like receptive fields of the primary visual cortex can be learned in the nonlinear contrast domain by maximization of independence. Further, we provide first-time reports of the emergence of conjunctive (corner-detecting) and subtractive (opponent orientation) processing due to nonlinear projection pursuit with simple objective functions related to sparseness and response energy optimization. Then, we show that attempting to extract independent components of nonlinear histogram statistics of a biologically plausible representation leads to projection directions that appear to differentiate between visual contexts. Such processing might be applicable for priming, \ie the selection and tuning of later visual processing. We continue by showing that a different kind of thresholded low-frequency priming can be learned and used to make object detection faster with little loss in accuracy. Finally, we show that in a computational object detection setting, nonlinearly gain-controlled visual features of medium complexity can be acquired sequentially as images are encountered and discarded. We present two online algorithms to perform this feature selection, and propose the idea that for artificial systems, some processing mechanisms could be selectable from the environment without optimizing the mechanisms themselves. In summary, this thesis explores learning visual processing on several levels. The learning can be understood as interplay of input data, model structures, learning objectives, and estimation algorithms. The presented work adds to the growing body of evidence showing that statistical methods can be used to acquire intuitively meaningful visual processing mechanisms. The work also presents some predictions and ideas regarding biological visual processing.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study develops a neuromorphic model of human lightness perception that is inspired by how the mammalian visual system is designed for this function. It is known that biological visual representations can adapt to a billion-fold change in luminance. How such a system determines absolute lightness under varying illumination conditions to generate a consistent interpretation of surface lightness remains an unsolved problem. Such a process, called "anchoring" of lightness, has properties including articulation, insulation, configuration, and area effects. The model quantitatively simulates such psychophysical lightness data, as well as other data such as discounting the illuminant, the double brilliant illusion, and lightness constancy and contrast effects. The model retina embodies gain control at retinal photoreceptors, and spatial contrast adaptation at the negative feedback circuit between mechanisms that model the inner segment of photoreceptors and interacting horizontal cells. The model can thereby adjust its sensitivity to input intensities ranging from dim moonlight to dazzling sunlight. A new anchoring mechanism, called the Blurred-Highest-Luminance-As-White (BHLAW) rule, helps simulate how surface lightness becomes sensitive to the spatial scale of objects in a scene. The model is also able to process natural color images under variable lighting conditions, and is compared with the popular RETINEX model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The paper reports an interactive tool for calibrating a camera, suitable for use in outdoor scenes. The motivation for the tool was the need to obtain an approximate calibration for images taken with no explicit calibration data. Such images are frequently presented to research laboratories, especially in surveillance applications, with a request to demonstrate algorithms. The method decomposes the calibration parameters into intuitively simple components, and relies on the operator interactively adjusting the parameter settings to achieve a visually acceptable agreement between a rectilinear calibration model and his own perception of the scene. Using the tool, we have been able to calibrate images of unknown scenes, taken with unknown cameras, in a matter of minutes. The standard of calibration has proved to be sufficient for model-based pose recovery and tracking of vehicles.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ecological approaches to perception have demonstrated that information encoding by the visual system is informed by the natural environment, both in terms of simple image attributes like luminance and contrast, and more complex relationships corresponding to Gestalt principles of perceptual organization. Here, we ask if this optimization biases perception of visual inputs that are perceptually bistable. Using the binocular rivalry paradigm, we designed stimuli that varied in either their spatiotemporal amplitude spectra or their phase spectra. We found that noise stimuli with “natural” amplitude spectra (i.e., amplitude content proportional to 1/f, where f is spatial or temporal frequency) dominate over those with any other systematic spectral slope, along both spatial and temporal dimensions. This could not be explained by perceived contrast measurements, and occurred even though all stimuli had equal energy. Calculating the effective contrast following attenuation by a model contrast sensitivity function suggested that the strong contrast dependency of rivalry provides the mechanism by which binocular vision is optimized for viewing natural images. We also compared rivalry between natural and phase-scrambled images and found a strong preference for natural phase spectra that could not be accounted for by observer biases in a control task. We propose that this phase specificity relates to contour information, and arises either from the activity of V1 complex cells, or from later visual areas, consistent with recent neuroimaging and single-cell work. Our findings demonstrate that human vision integrates information across space, time, and phase to select the input most likely to hold behavioral relevance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fourier-phase information is important in determining the appearance of natural scenes, but the structure of natural-image phase spectra is highly complex and difficult to relate directly to human perceptual processes. This problem is addressed by extending previous investigations of human visual sensitivity to the randomisation and quantisation of Fourier phase in natural images. The salience of the image changes induced by these physical processes is shown to depend critically on the nature of the original phase spectrum of each image, and the processes of randomisation and quantisation are shown to be perceptually equivalent provided that they shift image phase components by the same average amount. These results are explained by assuming that the visual system is sensitive to those phase-domain image changes which also alter certain global higher-order image statistics. This assumption may be used to place constraints on the likely nature of cortical processing: mechanisms which correlate the outputs of a bank of relative-phase-sensitive units are found to be consistent with the patterns of sensitivity reported here.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper, we describe a method for feature extraction and classification of characters manually isolated from scene or natural images. Characters in a scene image may be affected by low resolution, uneven illumination or occlusion. We propose a novel method to perform binarization on gray scale images by minimizing energy functional. Discrete Cosine Transform and Angular Radial Transform are used to extract the features from characters after normalization for scale and translation. We have evaluated our method on the complete test set of Chars74k dataset for English and Kannada scripts consisting of handwritten and synthesized characters, as well as characters extracted from camera captured images. We utilize only synthesized and handwritten characters from this dataset as training set. Nearest neighbor classification is used in our experiments.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Humans recognize optical reflectance properties of surfaces such as metal, plastic, or paper from a single image without knowledge of illumination. We develop a machine vision system to perform similar recognition tasks automatically. Reflectance estimation under unknown, arbitrary illumination proves highly underconstrained due to the variety of potential illumination distributions and surface reflectance properties. We have found that the spatial structure of real-world illumination possesses some of the statistical regularities observed in the natural image statistics literature. A human or computer vision system may be able to exploit this prior information to determine the most likely surface reflectance given an observed image. We develop an algorithm for reflectance classification under unknown real-world illumination, which learns relationships between surface reflectance and certain features (statistics) computed from a single observed image. We also develop an automatic feature selection method.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Under natural viewing conditions small movements of the eye, head, and body prevent the maintenance of a steady direction of gaze. It is known that stimuli tend to fade when they a restabilized on the retina for several seconds. However; it is unclear whether the physiological motion of the retinal image serves a visual purpose during the brief periods of natural visual fixation. This study examines the impact of fixational instability on the statistics of the visua1 input to the retina and on the structure of neural activity in the early visual system. We show that fixational instability introduces a component in the retinal input signals that in the presence of natural images, lacks spatial correlations. This component strongly influences neural activity in a model of the LGN. It decorrelates cell responses even if the contrast sensitivity functions of simulated cells arc not perfectly tuned to counterbalance the power-law spectrum of natural images. A decorrelation of neural activity at the early stages of the visual system has been proposed to be beneficial for discarding statistical redundancies in the input signals. The results of this study suggest that fixational instability might contribute to establishing efficient representations of natural stimuli.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Texture provides one cue for identifying the physical cause of an intensity edge, such as occlusion, shadow, surface orientation or reflectance change. Marr, Julesz, and others have proposed that texture is represented by small lines or blobs, called 'textons' by Julesz [1981a], together with their attributes, such as orientation, elongation, and intensity. Psychophysical studies suggest that texture boundaries are perceived where distributions of attributes over neighborhoods of textons differ significantly. However, these studies, which deal with synthetic images, neglect to consider two important questions: How can these textons be extracted from images of natural scenes? And how, exactly, are texture boundaries then found? This thesis proposes answers to these questions by presenting an algorithm for computing blobs from natural images and a statistic for measuring the difference between two sample distributions of blob attributes. As part of the blob detection algorithm, methods for estimating image noise are presented, which are applicable to edge detection as well.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

 Scale features are useful for a great number of applications in computer vision. However, it is difficult to tolerate diversities of features in natural scenes by parametric methods. Empirical studies show that object frequencies and segment sizes follow the power law distributions which are well generated by Pitman-Yor (PY) processes. Based on mid-level segments, we propose a hierarchical sequence of images to obtain scale information stored in a hierarchical structure through the hierarchical Pitman-Yor (HPY) model which is expected to tolerate uncertainty of natural images. We also evaluate our representation by the application of segmentation.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

To understand how the human visual system analyzes images, it is essential to know the structure of the visual environment. In particular, natural images display consistent statistical properties that distinguish them from random luminance distributions. We have studied the geometric regularities of oriented elements (edges or line segments) present in an ensemble of visual scenes, asking how much information the presence of a segment in a particular location of the visual scene carries about the presence of a second segment at different relative positions and orientations. We observed strong long-range correlations in the distribution of oriented segments that extend over the whole visual field. We further show that a very simple geometric rule, cocircularity, predicts the arrangement of segments in natural scenes, and that different geometrical arrangements show relevant differences in their scaling properties. Our results show similarities to geometric features of previous physiological and psychophysical studies. We discuss the implications of these findings for theories of early vision.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

What can the statistical structure of natural images teach us about the human brain? Even though the visual cortex is one of the most studied parts of the brain, surprisingly little is known about how exactly images are processed to leave us with a coherent percept of the world around us, so we can recognize a friend or drive on a crowded street without any effort. By constructing probabilistic models of natural images, the goal of this thesis is to understand the structure of the stimulus that is the raison d etre for the visual system. Following the hypothesis that the optimal processing has to be matched to the structure of that stimulus, we attempt to derive computational principles, features that the visual system should compute, and properties that cells in the visual system should have. Starting from machine learning techniques such as principal component analysis and independent component analysis we construct a variety of sta- tistical models to discover structure in natural images that can be linked to receptive field properties of neurons in primary visual cortex such as simple and complex cells. We show that by representing images with phase invariant, complex cell-like units, a better statistical description of the vi- sual environment is obtained than with linear simple cell units, and that complex cell pooling can be learned by estimating both layers of a two-layer model of natural images. We investigate how a simplified model of the processing in the retina, where adaptation and contrast normalization take place, is connected to the nat- ural stimulus statistics. Analyzing the effect that retinal gain control has on later cortical processing, we propose a novel method to perform gain control in a data-driven way. Finally we show how models like those pre- sented here can be extended to capture whole visual scenes rather than just small image patches. By using a Markov random field approach we can model images of arbitrary size, while still being able to estimate the model parameters from the data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Computational models of visual cortex, and in particular those based on sparse coding, have enjoyed much recent attention. Despite this currency, the question of how sparse or how over-complete a sparse representation should be, has gone without principled answer. Here, we use Bayesian model-selection methods to address these questions for a sparse-coding model based on a Student-t prior. Having validated our methods on toy data, we find that natural images are indeed best modelled by extremely sparse distributions; although for the Student-t prior, the associated optimal basis size is only modestly over-complete.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a new region-based unified tensor level set model for image segmentation. This model introduces a three-order tensor to comprehensively depict features of pixels, e.g., gray value and the local geometrical features, such as orientation and gradient, and then, by defining a weighted distance, we generalized the representative region-based level set method from scalar to tensor. The proposed model has four main advantages compared with the traditional representative method as follows. First, involving the Gaussian filter bank, the model is robust against noise, particularly the salt-and pepper-type noise. Second, considering the local geometrical features, e. g., orientation and gradient, the model pays more attention to boundaries and makes the evolving curve stop more easily at the boundary location. Third, due to the unified tensor pixel representation representing the pixels, the model segments images more accurately and naturally. Fourth, based on a weighted distance definition, the model possesses the capacity to cope with data varying from scalar to vector, then to high-order tensor. We apply the proposed method to synthetic, medical, and natural images, and the result suggests that the proposed method is superior to the available representative region-based level set method.