678 resultados para Human vision
em Queensland University of Technology - ePrints Archive
Resumo:
Modern computer graphics systems are able to construct renderings of such high quality that viewers are deceived into regarding the images as coming from a photographic source. Large amounts of computing resources are expended in this rendering process, using complex mathematical models of lighting and shading. However, psychophysical experiments have revealed that viewers only regard certain informative regions within a presented image. Furthermore, it has been shown that these visually important regions contain low-level visual feature differences that attract the attention of the viewer. This thesis will present a new approach to image synthesis that exploits these experimental findings by modulating the spatial quality of image regions by their visual importance. Efficiency gains are therefore reaped, without sacrificing much of the perceived quality of the image. Two tasks must be undertaken to achieve this goal. Firstly, the design of an appropriate region-based model of visual importance, and secondly, the modification of progressive rendering techniques to effect an importance-based rendering approach. A rule-based fuzzy logic model is presented that computes, using spatial feature differences, the relative visual importance of regions in an image. This model improves upon previous work by incorporating threshold effects induced by global feature difference distributions and by using texture concentration measures. A modified approach to progressive ray-tracing is also presented. This new approach uses the visual importance model to guide the progressive refinement of an image. In addition, this concept of visual importance has been incorporated into supersampling, texture mapping and computer animation techniques. Experimental results are presented, illustrating the efficiency gains reaped from using this method of progressive rendering. This visual importance-based rendering approach is expected to have applications in the entertainment industry, where image fidelity may be sacrificed for efficiency purposes, as long as the overall visual impression of the scene is maintained. Different aspects of the approach should find many other applications in image compression, image retrieval, progressive data transmission and active robotic vision.
Resumo:
The main goal of this research is to design an efficient compression al~ gorithm for fingerprint images. The wavelet transform technique is the principal tool used to reduce interpixel redundancies and to obtain a parsimonious representation for these images. A specific fixed decomposition structure is designed to be used by the wavelet packet in order to save on the computation, transmission, and storage costs. This decomposition structure is based on analysis of information packing performance of several decompositions, two-dimensional power spectral density, effect of each frequency band on the reconstructed image, and the human visual sensitivities. This fixed structure is found to provide the "most" suitable representation for fingerprints, according to the chosen criteria. Different compression techniques are used for different subbands, based on their observed statistics. The decision is based on the effect of each subband on the reconstructed image according to the mean square criteria as well as the sensitivities in human vision. To design an efficient quantization algorithm, a precise model for distribution of the wavelet coefficients is developed. The model is based on the generalized Gaussian distribution. A least squares algorithm on a nonlinear function of the distribution model shape parameter is formulated to estimate the model parameters. A noise shaping bit allocation procedure is then used to assign the bit rate among subbands. To obtain high compression ratios, vector quantization is used. In this work, the lattice vector quantization (LVQ) is chosen because of its superior performance over other types of vector quantizers. The structure of a lattice quantizer is determined by its parameters known as truncation level and scaling factor. In lattice-based compression algorithms reported in the literature the lattice structure is commonly predetermined leading to a nonoptimized quantization approach. In this research, a new technique for determining the lattice parameters is proposed. In the lattice structure design, no assumption about the lattice parameters is made and no training and multi-quantizing is required. The design is based on minimizing the quantization distortion by adapting to the statistical characteristics of the source in each subimage. 11 Abstract Abstract Since LVQ is a multidimensional generalization of uniform quantizers, it produces minimum distortion for inputs with uniform distributions. In order to take advantage of the properties of LVQ and its fast implementation, while considering the i.i.d. nonuniform distribution of wavelet coefficients, the piecewise-uniform pyramid LVQ algorithm is proposed. The proposed algorithm quantizes almost all of source vectors without the need to project these on the lattice outermost shell, while it properly maintains a small codebook size. It also resolves the wedge region problem commonly encountered with sharply distributed random sources. These represent some of the drawbacks of the algorithm proposed by Barlaud [26). The proposed algorithm handles all types of lattices, not only the cubic lattices, as opposed to the algorithms developed by Fischer [29) and Jeong [42). Furthermore, no training and multiquantizing (to determine lattice parameters) is required, as opposed to Powell's algorithm [78). For coefficients with high-frequency content, the positive-negative mean algorithm is proposed to improve the resolution of reconstructed images. For coefficients with low-frequency content, a lossless predictive compression scheme is used to preserve the quality of reconstructed images. A method to reduce bit requirements of necessary side information is also introduced. Lossless entropy coding techniques are subsequently used to remove coding redundancy. The algorithms result in high quality reconstructed images with better compression ratios than other available algorithms. To evaluate the proposed algorithms their objective and subjective performance comparisons with other available techniques are presented. The quality of the reconstructed images is important for a reliable identification. Enhancement and feature extraction on the reconstructed images are also investigated in this research. A structural-based feature extraction algorithm is proposed in which the unique properties of fingerprint textures are used to enhance the images and improve the fidelity of their characteristic features. The ridges are extracted from enhanced grey-level foreground areas based on the local ridge dominant directions. The proposed ridge extraction algorithm, properly preserves the natural shape of grey-level ridges as well as precise locations of the features, as opposed to the ridge extraction algorithm in [81). Furthermore, it is fast and operates only on foreground regions, as opposed to the adaptive floating average thresholding process in [68). Spurious features are subsequently eliminated using the proposed post-processing scheme.
Resumo:
Time-varying bispectra, computed using a classical sliding window short-time Fourier approach, are analyzed for scalp EEG potentials evoked by an auditory stimulus and new observations are presented. A single, short duration tone is presented from the left or the right, direction unknown to the test subject. The subject responds by moving the eyes to the direction of the sound. EEG epochs sampled at 200 Hz for repeated trials are processed between -70 ms and +1200 ms with reference to the stimulus. It is observed that for an ensemble of correctly recognized cases, the best matching timevarying bispectra at (8 Hz, 8Hz) are for PZ-FZ channels and this is also largely the case for grand averages but not for power spectra at 8 Hz. Out of 11 subjects, the only exception for time-varying bispectral match was a subject with family history of Alzheimer’s disease and the difference was in bicoherence, not biphase.
Resumo:
The time consuming and labour intensive task of identifying individuals in surveillance video is often challenged by poor resolution and the sheer volume of stored video. Faces or identifying marks such as tattoos are often too coarse for direct matching by machine or human vision. Object tracking and super-resolution can then be combined to facilitate the automated detection and enhancement of areas of interest. The object tracking process enables the automatic detection of people of interest, greatly reducing the amount of data for super-resolution. Smaller regions such as faces can also be tracked. A number of instances of such regions can then be utilized to obtain a super-resolved version for matching. Performance improvement from super-resolution is demonstrated using a face verification task. It is shown that there is a consistent improvement of approximately 7% in verification accuracy, using both Eigenface and Elastic Bunch Graph Matching approaches for automatic face verification, starting from faces with an eye to eye distance of 14 pixels. Visual improvement in image fidelity from super-resolved images over low-resolution and interpolated images is demonstrated on a small database. Current research and future directions in this area are also summarized.
Resumo:
Theoretical foundations of higher order spectral analysis are revisited to examine the use of time-varying bicoherence on non-stationary signals using a classical short-time Fourier approach. A methodology is developed to apply this to evoked EEG responses where a stimulus-locked time reference is available. Short-time windowed ensembles of the response at the same offset from the reference are considered as ergodic cyclostationary processes within a non-stationary random process. Bicoherence can be estimated reliably with known levels at which it is significantly different from zero and can be tracked as a function of offset from the stimulus. When this methodology is applied to multi-channel EEG, it is possible to obtain information about phase synchronization at different regions of the brain as the neural response develops. The methodology is applied to analyze evoked EEG response to flash visual stimulii to the left and right eye separately. The EEG electrode array is segmented based on bicoherence evolution with time using the mean absolute difference as a measure of dissimilarity. Segment maps confirm the importance of the occipital region in visual processing and demonstrate a link between the frontal and occipital regions during the response. Maps are constructed using bicoherence at bifrequencies that include the alpha band frequency of 8Hz as well as 4 and 20Hz. Differences are observed between responses from the left eye and the right eye, and also between subjects. The methodology shows potential as a neurological functional imaging technique that can be further developed for diagnosis and monitoring using scalp EEG which is less invasive and less expensive than magnetic resonance imaging.
Resumo:
At present, the most reliable method to obtain end-user perceived quality is through subjective tests. In this paper, the impact of automatic region-of-interest (ROI) coding on perceived quality of mobile video is investigated. The evidence, which is based on perceptual comparison analysis, shows that the coding strategy improves perceptual quality. This is particularly true in low bit rate situations. The ROI detection method used in this paper is based on two approaches: - (1) automatic ROI by analyzing the visual contents automatically, and; - (2) eye-tracking based ROI by aggregating eye-tracking data across many users, used to both evaluate the accuracy of automatic ROI detection and the subjective quality of automatic ROI encoded video. The perceptual comparison analysis is based on subjective assessments with 54 participants, across different content types, screen resolutions, and target bit rates while comparing the two ROI detection methods. The results from the user study demonstrate that ROI-based video encoding has higher perceived quality compared to normal video encoded at a similar bit rate, particularly in the lower bit rate range.
Resumo:
For robots to operate in human environments they must be able to make their own maps because it is unrealistic to expect a user to enter a map into the robot’s memory; existing floorplans are often incorrect; and human environments tend to change. Traditionally robots have used sonar, infra-red or laser range finders to perform the mapping task. Digital cameras have become very cheap in recent years and they have opened up new possibilities as a sensor for robot perception. Any robot that must interact with humans can reasonably be expected to have a camera for tasks such as face recognition, so it makes sense to also use the camera for navigation. Cameras have advantages over other sensors such as colour information (not available with any other sensor), better immunity to noise (compared to sonar), and not being restricted to operating in a plane (like laser range finders). However, there are disadvantages too, with the principal one being the effect of perspective. This research investigated ways to use a single colour camera as a range sensor to guide an autonomous robot and allow it to build a map of its environment, a process referred to as Simultaneous Localization and Mapping (SLAM). An experimental system was built using a robot controlled via a wireless network connection. Using the on-board camera as the only sensor, the robot successfully explored and mapped indoor office environments. The quality of the resulting maps is comparable to those that have been reported in the literature for sonar or infra-red sensors. Although the maps are not as accurate as ones created with a laser range finder, the solution using a camera is significantly cheaper and is more appropriate for toys and early domestic robots.
Resumo:
Automatic detection of suspicious activities in CCTV camera feeds is crucial to the success of video surveillance systems. Such a capability can help transform the dumb CCTV cameras into smart surveillance tools for fighting crime and terror. Learning and classification of basic human actions is a precursor to detecting suspicious activities. Most of the current approaches rely on a non-realistic assumption that a complete dataset of normal human actions is available. This paper presents a different approach to deal with the problem of understanding human actions in video when no prior information is available. This is achieved by working with an incomplete dataset of basic actions which are continuously updated. Initially, all video segments are represented by Bags-Of-Words (BOW) method using only Term Frequency-Inverse Document Frequency (TF-IDF) features. Then, a data-stream clustering algorithm is applied for updating the system's knowledge from the incoming video feeds. Finally, all the actions are classified into different sets. Experiments and comparisons are conducted on the well known Weizmann and KTH datasets to show the efficacy of the proposed approach.
Resumo:
This paper describes the current status of a program to develop an automated forced landing system for a fixed-wing Unmanned Aerial Vehicle (UAV). This automated system seeks to emulate human pilot thought processes when planning for and conducting an engine-off emergency landing. Firstly, a path planning algorithm that extends Dubins curves to 3D space is presented. This planning element is then combined with a nonlinear guidance and control logic, and simulated test results demonstrate the robustness of this approach to strong winds during a glided descent. The average path deviation errors incurred are comparable to or even better than that of manned, powered aircraft. Secondly, a study into suitable multi-criteria decision making approaches and the problems that confront the decision-maker is presented. From this study, it is believed that decision processes that utilize human expert knowledge and fuzzy logic reasoning are most suited to the problem at hand, and further investigations will be conducted to identify the particular technique/s to be implemented in simulations and field tests. The automated UAV forced landing approach presented in this paper is promising, and will allow the progression of this technology from the development and simulation stages through to a prototype system
Resumo:
Purpose: To determine the subbasal nerve density and tortuosity at 5 corneal locations and to investigate whether these microstructural observations correlate with corneal sensitivity. Method: Sixty eyes of 60 normal human subjects were recruited into 1 of 3 age groups, group 1: aged ,35 years, group 2: aged 35–50 years, and group 3: aged .50 years. All eyes were examined using slit-lamp biomicroscopy, noncontact corneal esthesiometry, and slit scanning in vivo confocal microscopy. Results: The mean subbasal nerve density and the mean corneal sensitivity were greatest centrally (14,731 6 6056 mm/mm2 and 0.38 6 0.21 millibars, respectively) and lowest in the nasal mid periphery (7850 6 4947 mm/mm2 and 0.49 6 0.25 millibars, respectively). The mean subbasal nerve tortuosity coefficient was greatest in the temporal mid periphery (27.3 6 6.4) and lowest in the superior mid periphery (19.3 6 14.1). There was no significant difference in mean total subbasal nerve density between age groups. However, corneal sensation (P = 0.001) and subbasal nerve tortuosity (P = 0.004) demonstrated significant differences between age groups. Subbasal nerve density only showed significant correlations with corneal sensitivity threshold in the temporal cornea and with subbasal nerve tortuosity in the inferior and nasal cornea. However, these correlations were weak. Conclusions: This study quantitatively analyzes living human corneal nerve structure and an aspect of nerve function. There is no strong correlation between subbasal nerve density and corneal sensation. This study provides useful baseline data for the normal living human cornea at central and mid-peripheral locations
Resumo:
Investigated human visual processing of simple two-colour patterns using a delayed match to sample paradigm with positron emission tomography (PET). This study is unique in that the authors specifically designed the visual stimuli to be the same for both pattern and colour recognition with all patterns being abstract shapes not easily verbally coded composed of two-colour combinations. The authors did this to explore those brain regions required for both colour and pattern processing and to separate those areas of activation required for one or the other. 10 right-handed male volunteers aged 18–35 yrs were recruited. The authors found that both tasks activated similar occipital regions, the major difference being more extensive activation in pattern recognition. A right-sided network that involved the inferior parietal lobule, the head of the caudate nucleus, and the pulvinar nucleus of the thalamus was common to both paradigms. Pattern recognition also activated the left temporal pole and right lateral orbital gyrus, whereas colour recognition activated the left fusiform gyrus and several right frontal regions.
Resumo:
We developed orthogonal least-squares techniques for fitting crystalline lens shapes, and used the bootstrap method to determine uncertainties associated with the estimated vertex radii of curvature and asphericities of five different models. Three existing models were investigated including one that uses two separate conics for the anterior and posterior surfaces, and two whole lens models based on a modulated hyperbolic cosine function and on a generalized conic function. Two new models were proposed including one that uses two interdependent conics and a polynomial based whole lens model. The models were used to describe the in vitro shape for a data set of twenty human lenses with ages 7–82 years. The two-conic-surface model (7 mm zone diameter) and the interdependent surfaces model had significantly lower merit functions than the other three models for the data set, indicating that most likely they can describe human lens shape over a wide age range better than the other models (although with the two-conic-surfaces model being unable to describe the lens equatorial region). Considerable differences were found between some models regarding estimates of radii of curvature and surface asphericities. The hyperbolic cosine model and the new polynomial based whole lens model had the best precision in determining the radii of curvature and surface asphericities across the five considered models. Most models found significant increase in anterior, but not posterior, radius of curvature with age. Most models found a wide scatter of asphericities, but with the asphericities usually being positive and not significantly related to age. As the interdependent surfaces model had lower merit function than three whole lens models, there is further scope to develop an accurate model of the complete shape of human lenses of all ages. The results highlight the continued difficulty in selecting an appropriate model for the crystalline lens shape.