916 resultados para Human vision
Resumo:
Tese de dout., Engenharia Electrónica e de Computadores, Faculdade de Ciência e Tecnologia, Universidade do Algarve, 2007
Resumo:
The perceived displacement of motion-defined contours in peripheral vision was examined in four experiments. In Experiment 1, in line with Ramachandran and Anstis' finding [Ramachandran, V. S., & Anstis, S. M. (1990). Illusory displacement of equiluminous kinetic edges. Perception, 19, 611-616], the border between a field of drifting dots and a static dot pattern was apparently displaced in the same direction as the movement of the dots. When a uniform dark area was substituted for the static dots, a similar displacement was found, but this was smaller and statistically insignificant. In Experiment 2, the border between two fields of dots moving in opposite directions was displaced in the direction of motion of the dots in the more eccentric field, so that the location of a boundary defined by a diverging pattern is perceived as more eccentric, and that defined by a converging as less eccentric. Two explanations for this effect (that the displacement reflects a greater weight given to the more eccentric motion, or that the region containing stronger centripetal motion components expands perceptually into that containing centrifugal motion) were tested in Experiment 3, by varying the velocity of the more eccentric region. The results favoured the explanation based on the expansion of an area in centripetal motion. Experiment 4 showed that the difference in perceived location was unlikely to be due to differences in the discriminability of contours in diverging and converging pattems, and confirmed that this effect is due to a difference between centripetal and centrifugal motion rather than motion components in other directions. Our result provides new evidence for a bias towards centripetal motion in human vision, and suggests that the direction of motion-induced displacement of edges is not always in the direction of an adjacent moving pattern. (C) 2008 Elsevier Ltd. All rights reserved.
Resumo:
The neural basis for perceptual grouping operations in the human visual system, including the processes which generate illusory contours, is fundamental to understanding human vision. We have employed functional magnetic resonance imaging to investigate these processes noninvasively. Images were acquired on a GE Signa 1.5T scanner equipped for echo planar imaging with an in-plane resolution of 1.5 x 1.5 mm and slice thicknesses of 3.0 or 5.0 mm. Visual stimuli included nonaligned inducers (pacmen) that created no perceptual contours, similar inducers at the corners of a Kanizsa square that created illusory contours, and a real square formed by continuous contours. Multiple contiguous axial slices were acquired during baseline, visual stimulation, and poststimulation periods. Activated regions were identified by a multistage statistical analysis of the activation for each volume element sampled and were compared across conditions. Specific brain regions were activated in extrastriate cortex when the illusory contours were perceived but not during conditions when the illusory contours were absent. These unique regions were found primarily in the right hemisphere for all four subjects and demonstrate that specific brain regions are activated during the kind of perceptual grouping operations involved in illusory contour perception.
Resumo:
A fundamental problem for any visual system with binocular overlap is the combination of information from the two eyes. Electrophysiology shows that binocular integration of luminance contrast occurs early in visual cortex, but a specific systems architecture has not been established for human vision. Here, we address this by performing binocular summation and monocular, binocular, and dichoptic masking experiments for horizontal 1 cycle per degree test and masking gratings. These data reject three previously published proposals, each of which predict too little binocular summation and insufficient dichoptic facilitation. However, a simple development of one of the rejected models (the twin summation model) and a completely new model (the two-stage model) provide very good fits to the data. Two features common to both models are gently accelerating (almost linear) contrast transduction prior to binocular summation and suppressive ocular interactions that contribute to contrast gain control. With all model parameters fixed, both models correctly predict (1) systematic variation in psychometric slopes, (2) dichoptic contrast matching, and (3) high levels of binocular summation for various levels of binocular pedestal contrast. A review of evidence from elsewhere leads us to favor the two-stage model. © 2006 ARVO.
Functional neuroimaging and behavioural studies on global form processing in the human visual system
Resumo:
Magnetoencephalography (MEG), functional magnetic resonance imaging (fMRI) and behavioural experiments were used to investigate the neural processes underlying global form perception in human vision. Behavioural studies using Glass patterns examined sensitivity for detecting radial, rotational and horizontal structure. Neuroimaging experiments using either Glass patterns or arrays of Gabor patches determined the spatio-temporal neural responseto global form. MEG data were analysed using synthetic aperture magnetometry (SAM) to spatially map event-related cortical oscillatory power changes: the temporal sequencing of activity within a discrete cortical area was determined using a Morlet wavelet transform. A case study was conducted to determine the effects of strbismic amblyopia on global form processing: all other observers were normally-sighted. The main findings from normally-sighted observers were: 1) sensitivity to horizontal structure was less than for radial or rotational structure; 2) the neural response to global structure was a reduction in cortical oscillatory power (10-30 Hz) within a network of extrastriate areas, including V4 and V3a; 3) the extend of reduced cortical power was least for horizontal patters; 4) V1 was not identified as a region of peak activity with either MEG or fMRI. The main findings with the strabismic amblyope were: 1) sensitivity for detection of radial, rotational, and horizontal structure was reduced when viewed with the amblyopic- relative to the fellow- eye; 2) cortical power changes within V4 to the presentation of rotational Glass patterns were less when viewed with the amblyopic- compared with the fellow- eye. The main conclusions are: 1) a network of extrastriate cortical areas are involved in the analysis of global form, with the most prominent change in neural activity being a reduction in oscillatory power within the 10-30 Hz band; 2) in strabismic amblyopia, the neuronal assembly associated with form perception in extrastriate cortex may be dysfunctional, the nature of this dysfunction may be a change in the normal temporal pattern of neuronal discharges; 3) MEG, fMRI and behavioural measures support the notion that different neural processes underlie the perception of horizontal as opposed to radial or rotational structure.
Resumo:
The main goal of this research is to design an efficient compression al~ gorithm for fingerprint images. The wavelet transform technique is the principal tool used to reduce interpixel redundancies and to obtain a parsimonious representation for these images. A specific fixed decomposition structure is designed to be used by the wavelet packet in order to save on the computation, transmission, and storage costs. This decomposition structure is based on analysis of information packing performance of several decompositions, two-dimensional power spectral density, effect of each frequency band on the reconstructed image, and the human visual sensitivities. This fixed structure is found to provide the "most" suitable representation for fingerprints, according to the chosen criteria. Different compression techniques are used for different subbands, based on their observed statistics. The decision is based on the effect of each subband on the reconstructed image according to the mean square criteria as well as the sensitivities in human vision. To design an efficient quantization algorithm, a precise model for distribution of the wavelet coefficients is developed. The model is based on the generalized Gaussian distribution. A least squares algorithm on a nonlinear function of the distribution model shape parameter is formulated to estimate the model parameters. A noise shaping bit allocation procedure is then used to assign the bit rate among subbands. To obtain high compression ratios, vector quantization is used. In this work, the lattice vector quantization (LVQ) is chosen because of its superior performance over other types of vector quantizers. The structure of a lattice quantizer is determined by its parameters known as truncation level and scaling factor. In lattice-based compression algorithms reported in the literature the lattice structure is commonly predetermined leading to a nonoptimized quantization approach. In this research, a new technique for determining the lattice parameters is proposed. In the lattice structure design, no assumption about the lattice parameters is made and no training and multi-quantizing is required. The design is based on minimizing the quantization distortion by adapting to the statistical characteristics of the source in each subimage. 11 Abstract Abstract Since LVQ is a multidimensional generalization of uniform quantizers, it produces minimum distortion for inputs with uniform distributions. In order to take advantage of the properties of LVQ and its fast implementation, while considering the i.i.d. nonuniform distribution of wavelet coefficients, the piecewise-uniform pyramid LVQ algorithm is proposed. The proposed algorithm quantizes almost all of source vectors without the need to project these on the lattice outermost shell, while it properly maintains a small codebook size. It also resolves the wedge region problem commonly encountered with sharply distributed random sources. These represent some of the drawbacks of the algorithm proposed by Barlaud [26). The proposed algorithm handles all types of lattices, not only the cubic lattices, as opposed to the algorithms developed by Fischer [29) and Jeong [42). Furthermore, no training and multiquantizing (to determine lattice parameters) is required, as opposed to Powell's algorithm [78). For coefficients with high-frequency content, the positive-negative mean algorithm is proposed to improve the resolution of reconstructed images. For coefficients with low-frequency content, a lossless predictive compression scheme is used to preserve the quality of reconstructed images. A method to reduce bit requirements of necessary side information is also introduced. Lossless entropy coding techniques are subsequently used to remove coding redundancy. The algorithms result in high quality reconstructed images with better compression ratios than other available algorithms. To evaluate the proposed algorithms their objective and subjective performance comparisons with other available techniques are presented. The quality of the reconstructed images is important for a reliable identification. Enhancement and feature extraction on the reconstructed images are also investigated in this research. A structural-based feature extraction algorithm is proposed in which the unique properties of fingerprint textures are used to enhance the images and improve the fidelity of their characteristic features. The ridges are extracted from enhanced grey-level foreground areas based on the local ridge dominant directions. The proposed ridge extraction algorithm, properly preserves the natural shape of grey-level ridges as well as precise locations of the features, as opposed to the ridge extraction algorithm in [81). Furthermore, it is fast and operates only on foreground regions, as opposed to the adaptive floating average thresholding process in [68). Spurious features are subsequently eliminated using the proposed post-processing scheme.
Resumo:
Time-varying bispectra, computed using a classical sliding window short-time Fourier approach, are analyzed for scalp EEG potentials evoked by an auditory stimulus and new observations are presented. A single, short duration tone is presented from the left or the right, direction unknown to the test subject. The subject responds by moving the eyes to the direction of the sound. EEG epochs sampled at 200 Hz for repeated trials are processed between -70 ms and +1200 ms with reference to the stimulus. It is observed that for an ensemble of correctly recognized cases, the best matching timevarying bispectra at (8 Hz, 8Hz) are for PZ-FZ channels and this is also largely the case for grand averages but not for power spectra at 8 Hz. Out of 11 subjects, the only exception for time-varying bispectral match was a subject with family history of Alzheimer’s disease and the difference was in bicoherence, not biphase.
Resumo:
The time consuming and labour intensive task of identifying individuals in surveillance video is often challenged by poor resolution and the sheer volume of stored video. Faces or identifying marks such as tattoos are often too coarse for direct matching by machine or human vision. Object tracking and super-resolution can then be combined to facilitate the automated detection and enhancement of areas of interest. The object tracking process enables the automatic detection of people of interest, greatly reducing the amount of data for super-resolution. Smaller regions such as faces can also be tracked. A number of instances of such regions can then be utilized to obtain a super-resolved version for matching. Performance improvement from super-resolution is demonstrated using a face verification task. It is shown that there is a consistent improvement of approximately 7% in verification accuracy, using both Eigenface and Elastic Bunch Graph Matching approaches for automatic face verification, starting from faces with an eye to eye distance of 14 pixels. Visual improvement in image fidelity from super-resolved images over low-resolution and interpolated images is demonstrated on a small database. Current research and future directions in this area are also summarized.
Resumo:
Theoretical foundations of higher order spectral analysis are revisited to examine the use of time-varying bicoherence on non-stationary signals using a classical short-time Fourier approach. A methodology is developed to apply this to evoked EEG responses where a stimulus-locked time reference is available. Short-time windowed ensembles of the response at the same offset from the reference are considered as ergodic cyclostationary processes within a non-stationary random process. Bicoherence can be estimated reliably with known levels at which it is significantly different from zero and can be tracked as a function of offset from the stimulus. When this methodology is applied to multi-channel EEG, it is possible to obtain information about phase synchronization at different regions of the brain as the neural response develops. The methodology is applied to analyze evoked EEG response to flash visual stimulii to the left and right eye separately. The EEG electrode array is segmented based on bicoherence evolution with time using the mean absolute difference as a measure of dissimilarity. Segment maps confirm the importance of the occipital region in visual processing and demonstrate a link between the frontal and occipital regions during the response. Maps are constructed using bicoherence at bifrequencies that include the alpha band frequency of 8Hz as well as 4 and 20Hz. Differences are observed between responses from the left eye and the right eye, and also between subjects. The methodology shows potential as a neurological functional imaging technique that can be further developed for diagnosis and monitoring using scalp EEG which is less invasive and less expensive than magnetic resonance imaging.
Resumo:
At present, the most reliable method to obtain end-user perceived quality is through subjective tests. In this paper, the impact of automatic region-of-interest (ROI) coding on perceived quality of mobile video is investigated. The evidence, which is based on perceptual comparison analysis, shows that the coding strategy improves perceptual quality. This is particularly true in low bit rate situations. The ROI detection method used in this paper is based on two approaches: - (1) automatic ROI by analyzing the visual contents automatically, and; - (2) eye-tracking based ROI by aggregating eye-tracking data across many users, used to both evaluate the accuracy of automatic ROI detection and the subjective quality of automatic ROI encoded video. The perceptual comparison analysis is based on subjective assessments with 54 participants, across different content types, screen resolutions, and target bit rates while comparing the two ROI detection methods. The results from the user study demonstrate that ROI-based video encoding has higher perceived quality compared to normal video encoded at a similar bit rate, particularly in the lower bit rate range.
Resumo:
Rotations in depth are challenging for object vision because features can appear, disappear, be stretched or compressed. Yet we easily recognize objects across views. Are the underlying representations view invariant or dependent? This question has been intensely debated in human vision, but the neuronal representations remain poorly understood. Here, we show that for naturalistic objects, neurons in the monkey inferotemporal (IT) cortex undergo a dynamic transition in time, whereby they are initially sensitive to viewpoint and later encode view-invariant object identity. This transition depended on two aspects of object structure: it was strongest when objects foreshortened strongly across views and were similar to each other. View invariance in IT neurons was present even when objects were reduced to silhouettes, suggesting that it can arise through similarity between external contours of objects across views. Our results elucidate the viewpoint debate by showing that view invariance arises dynamically in IT neurons out of a representation that is initially view dependent.
Resumo:
Hyper-spectral data allows the construction of more robust statistical models to sample the material properties than the standard tri-chromatic color representation. However, because of the large dimensionality and complexity of the hyper-spectral data, the extraction of robust features (image descriptors) is not a trivial issue. Thus, to facilitate efficient feature extraction, decorrelation techniques are commonly applied to reduce the dimensionality of the hyper-spectral data with the aim of generating compact and highly discriminative image descriptors. Current methodologies for data decorrelation such as principal component analysis (PCA), linear discriminant analysis (LDA), wavelet decomposition (WD), or band selection methods require complex and subjective training procedures and in addition the compressed spectral information is not directly related to the physical (spectral) characteristics associated with the analyzed materials. The major objective of this article is to introduce and evaluate a new data decorrelation methodology using an approach that closely emulates the human vision. The proposed data decorrelation scheme has been employed to optimally minimize the amount of redundant information contained in the highly correlated hyper-spectral bands and has been comprehensively evaluated in the context of non-ferrous material classification