962 resultados para Pixel
Resumo:
In this paper we use a sequence-based visual localization algorithm to reveal surprising answers to the question, how much visual information is actually needed to conduct effective navigation? The algorithm actively searches for the best local image matches within a sliding window of short route segments or 'sub-routes', and matches sub-routes by searching for coherent sequences of local image matches. In contract to many existing techniques, the technique requires no pre-training or camera parameter calibration. We compare the algorithm's performance to the state-of-the-art FAB-MAP 2.0 algorithm on a 70 km benchmark dataset. Performance matches or exceeds the state of the art feature-based localization technique using images as small as 4 pixels, fields of view reduced by a factor of 250, and pixel bit depths reduced to 2 bits. We present further results demonstrating the system localizing in an office environment with near 100% precision using two 7 bit Lego light sensors, as well as using 16 and 32 pixel images from a motorbike race and a mountain rally car stage. By demonstrating how little image information is required to achieve localization along a route, we hope to stimulate future 'low fidelity' approaches to visual navigation that complement probabilistic feature-based techniques.
In the pursuit of effective affective computing : the relationship between features and registration
Resumo:
For facial expression recognition systems to be applicable in the real world, they need to be able to detect and track a previously unseen person's face and its facial movements accurately in realistic environments. A highly plausible solution involves performing a "dense" form of alignment, where 60-70 fiducial facial points are tracked with high accuracy. The problem is that, in practice, this type of dense alignment had so far been impossible to achieve in a generic sense, mainly due to poor reliability and robustness. Instead, many expression detection methods have opted for a "coarse" form of face alignment, followed by an application of a biologically inspired appearance descriptor such as the histogram of oriented gradients or Gabor magnitudes. Encouragingly, recent advances to a number of dense alignment algorithms have demonstrated both high reliability and accuracy for unseen subjects [e.g., constrained local models (CLMs)]. This begs the question: Aside from countering against illumination variation, what do these appearance descriptors do that standard pixel representations do not? In this paper, we show that, when close to perfect alignment is obtained, there is no real benefit in employing these different appearance-based representations (under consistent illumination conditions). In fact, when misalignment does occur, we show that these appearance descriptors do work well by encoding robustness to alignment error. For this work, we compared two popular methods for dense alignment-subject-dependent active appearance models versus subject-independent CLMs-on the task of action-unit detection. These comparisons were conducted through a battery of experiments across various publicly available data sets (i.e., CK+, Pain, M3, and GEMEP-FERA). We also report our performance in the recent 2011 Facial Expression Recognition and Analysis Challenge for the subject-independent task.
Resumo:
Traditional area-based matching techniques make use of similarity metrics such as the Sum of Absolute Differences(SAD), Sum of Squared Differences (SSD) and Normalised Cross Correlation (NCC). Non-parametric matching algorithms such as the rank and census rely on the relative ordering of pixel values rather than the pixels themselves as a similarity measure. Both traditional area-based and non-parametric stereo matching techniques have an algorithmic structure which is amenable to fast hardware realisation. This investigation undertakes a performance assessment of these two families of algorithms for robustness to radiometric distortion and random noise. A generic implementation framework is presented for the stereo matching problem and the relative hardware requirements for the various metrics investigated.
Resumo:
There is a growing interest in the use of megavoltage cone-beam computed tomography (MV CBCT) data for radiotherapy treatment planning. To calculate accurate dose distributions, knowledge of the electron density (ED) of the tissues being irradiated is required. In the case of MV CBCT, it is necessary to determine a calibration-relating CT number to ED, utilizing the photon beam produced for MV CBCT. A number of different parameters can affect this calibration. This study was undertaken on the Siemens MV CBCT system, MVision, to evaluate the effect of the following parameters on the reconstructed CT pixel value to ED calibration: the number of monitor units (MUs) used (5, 8, 15 and 60 MUs), the image reconstruction filter (head and neck, and pelvis), reconstruction matrix size (256 by 256 and 512 by 512), and the addition of extra solid water surrounding the ED phantom. A Gammex electron density CT phantom containing EDs from 0.292 to 1.707 was imaged under each of these conditions. The linear relationship between MV CBCT pixel value and ED was demonstrated for all MU settings and over the range of EDs. Changes in MU number did not dramatically alter the MV CBCT ED calibration. The use of different reconstruction filters was found to affect the MV CBCT ED calibration, as was the addition of solid water surrounding the phantom. Dose distributions from treatment plans calculated with simulated image data from a 15 MU head and neck reconstruction filter MV CBCT image and a MV CBCT ED calibration curve from the image data parameters and a 15 MU pelvis reconstruction filter showed small and clinically insignificant differences. Thus, the use of a single MV CBCT ED calibration curve is unlikely to result in any clinical differences. However, to ensure minimal uncertainties in dose reporting, MV CBCT ED calibration measurements could be carried out using parameter-specific calibration measurements.
Resumo:
Background subtraction is a fundamental low-level processing task in numerous computer vision applications. The vast majority of algorithms process images on a pixel-by-pixel basis, where an independent decision is made for each pixel. A general limitation of such processing is that rich contextual information is not taken into account. We propose a block-based method capable of dealing with noise, illumination variations, and dynamic backgrounds, while still obtaining smooth contours of foreground objects. Specifically, image sequences are analyzed on an overlapping block-by-block basis. A low-dimensional texture descriptor obtained from each block is passed through an adaptive classifier cascade, where each stage handles a distinct problem. A probabilistic foreground mask generation approach then exploits block overlaps to integrate interim block-level decisions into final pixel-level foreground segmentation. Unlike many pixel-based methods, ad-hoc postprocessing of foreground masks is not required. Experiments on the difficult Wallflower and I2R datasets show that the proposed approach obtains on average better results (both qualitatively and quantitatively) than several prominent methods. We furthermore propose the use of tracking performance as an unbiased approach for assessing the practical usefulness of foreground segmentation methods, and show that the proposed approach leads to considerable improvements in tracking accuracy on the CAVIAR dataset.
Resumo:
High magnification and large depth of field with a temporal resolution of less than 100 microseconds are possible using the present invention which combines a linear electron beam produced by a tungsten filament from an SX-40A Scanning Electron Microscope (SEM), a magnetic deflection coil with lower inductance resulting from reducing the number of turns of the saddle-coil wires, while increasing the diameter of the wires, a fast scintillator, photomultiplier tube, photomultiplier tube base, and signal amplifiers and a high speed data acquisition system which allows for a scan rate of 381 frames per second and 256.times.128 pixel density in the SEM image at a data acquisition rate of 25 MHz. The data acquisition and scan position are fully coordinated. A digitizer and a digital waveform generator which generates the sweep signals to the scan coils run off the same clock to acquire the signal in real-time.
Resumo:
This paper looks at the accuracy of using the built-in camera of smart phones and free software as an economical way to quantify and analyse light exposure by producing luminance maps from High Dynamic Range (HDR) images. HDR images were captured with an Apple iPhone 4S to capture a wide variation of luminance within an indoor and outdoor scene. The HDR images were then processed using Photosphere software (Ward, 2010.) to produce luminance maps, where individual pixel values were compared with calibrated luminance meter readings. This comparison has shown an average luminance error of ~8% between the HDR image pixel values and luminance meter readings, when the range of luminances in the image is limited to approximately 1,500cd/m2.
Resumo:
The aim of this work is to develop software that is capable of back projecting primary fluence images obtained from EPID measurements through phantom and patient geometries in order to calculate 3D dose distributions. In the first instance, we aim to develop a tool for pretreatment verification in IMRT. In our approach, a Geant4 application is used to back project primary fluence values from each EPID pixel towards the source. Each beam is considered to be polyenergetic, with a spectrum obtained from Monte Carlo calculations for the LINAC in question. At each step of the ray tracing process, the energy differential fluence is corrected for attenuation and beam divergence. Subsequently, the TERMA is calculated and accumulated to an energy differential 3D TERMA distribution. This distribution is then convolved with monoenergetic point spread kernels, thus generating energy differential 3D dose distributions. The resulting dose distributions are accumulated to yield the total dose distribution, which can then be used for pre-treatment verification of IMRT plans. Preliminary results were obtained for a test EPID image comprised of 100 9 100 pixels of unity fluence. Back projection of this field into a 30 cm9 30 cm 9 30 cm water phantom was performed, with TERMA distributions obtained in approximately 10 min (running on a single core of a 3 GHz processor). Point spread kernels for monoenergetic photons in water were calculated using a separate Geant4 application. Following convolution and summation, the resulting 3D dose distribution produced familiar build-up and penumbral features. In order to validate the dose model we will use EPID images recorded without any attenuating material in the beam for a number of MLC defined square fields. The dose distributions in water will be calculated and compared to TPS predictions.
Resumo:
This paper presents an alternative approach to image segmentation by using the spatial distribution of edge pixels as opposed to pixel intensities. The segmentation is achieved by a multi-layered approach and is intended to find suitable landing areas for an aircraft emergency landing. We combine standard techniques (edge detectors) with novel developed algorithms (line expansion and geometry test) to design an original segmentation algorithm. Our approach removes the dependency on environmental factors that traditionally influence lighting conditions, which in turn have negative impact on pixel-based segmentation techniques. We present test outcomes on realistic visual data collected from an aircraft, reporting on preliminary feedback about the performance of the detection. We demonstrate consistent performances over 97% detection rate.
Resumo:
In this study x-ray CT has been used to produce a 3D image of an irradiated PAGAT gel sample, with noise-reduction achieved using the ‘zero-scan’ method. The gel was repeatedly CT scanned and a linear fit to the varying Hounsfield unit of each pixel in the 3D volume was evaluated across the repeated scans, allowing a zero-scan extrapolation of the image to be obtained. To minimise heating of the CT scanner’s x-ray tube, this study used a large slice thickness (1 cm), to provide image slices across the irradiated region of the gel, and a relatively small number of CT scans (63), to extrapolate the zero-scan image. The resulting set of transverse images shows reduced noise compared to images from the initial CT scan of the gel, without being degraded by the additional radiation dose delivered to the gel during the repeated scanning. The full, 3D image of the gel has a low spatial resolution in the longitudinal direction, due to the selected scan parameters. Nonetheless, important features of the dose distribution are apparent in the 3D x-ray CT scan of the gel. The results of this study demonstrate that the zero-scan extrapolation method can be applied to the reconstruction of multiple x-ray CT slices, to provide useful 2D and 3D images of irradiated dosimetry gels.
Resumo:
This study presents a segmentation pipeline that fuses colour and depth information to automatically separate objects of interest in video sequences captured from a quadcopter. Many approaches assume that cameras are static with known position, a condition which cannot be preserved in most outdoor robotic applications. In this study, the authors compute depth information and camera positions from a monocular video sequence using structure from motion and use this information as an additional cue to colour for accurate segmentation. The authors model the problem similarly to standard segmentation routines as a Markov random field and perform the segmentation using graph cuts optimisation. Manual intervention is minimised and is only required to determine pixel seeds in the first frame which are then automatically reprojected into the remaining frames of the sequence. The authors also describe an automated method to adjust the relative weights for colour and depth according to their discriminative properties in each frame. Experimental results are presented for two video sequences captured using a quadcopter. The quality of the segmentation is compared to a ground truth and other state-of-the-art methods with consistently accurate results.
Resumo:
In this paper we use the algorithm SeqSLAM to address the question, how little and what quality of visual information is needed to localize along a familiar route? We conduct a comprehensive investigation of place recognition performance on seven datasets while varying image resolution (primarily 1 to 512 pixel images), pixel bit depth, field of view, motion blur, image compression and matching sequence length. Results confirm that place recognition using single images or short image sequences is poor, but improves to match or exceed current benchmarks as the matching sequence length increases. We then present place recognition results from two experiments where low-quality imagery is directly caused by sensor limitations; in one, place recognition is achieved along an unlit mountain road by using noisy, long-exposure blurred images, and in the other, two single pixel light sensors are used to localize in an indoor environment. We also show failure modes caused by pose variance and sequence aliasing, and discuss ways in which they may be overcome. By showing how place recognition along a route is feasible even with severely degraded image sequences, we hope to provoke a re-examination of how we develop and test future localization and mapping systems.
Resumo:
Uncooperative iris identification systems at a distance suffer from poor resolution of the acquired iris images, which significantly degrades iris recognition performance. Super-resolution techniques have been employed to enhance the resolution of iris images and improve the recognition performance. However, most existing super-resolution approaches proposed for the iris biometric super-resolve pixel intensity values, rather than the actual features used for recognition. This paper thoroughly investigates transferring super-resolution of iris images from the intensity domain to the feature domain. By directly super-resolving only the features essential for recognition, and by incorporating domain specific information from iris models, improved recognition performance compared to pixel domain super-resolution can be achieved. A framework for applying super-resolution to nonlinear features in the feature-domain is proposed. Based on this framework, a novel feature-domain super-resolution approach for the iris biometric employing 2D Gabor phase-quadrant features is proposed. The approach is shown to outperform its pixel domain counterpart, as well as other feature domain super-resolution approaches and fusion techniques.
Resumo:
We propose a computationally efficient image border pixel based watermark embedding scheme for medical images. We considered the border pixels of a medical image as RONI (region of non-interest), since those pixels have no or little interest to doctors and medical professionals irrespective of the image modalities. Although RONI is used for embedding, our proposed scheme still keeps distortion at a minimum level in the embedding region using the optimum number of least significant bit-planes for the border pixels. All these not only ensure that a watermarked image is safe for diagnosis, but also help minimize the legal and ethical concerns of altering all pixels of medical images in any manner (e.g, reversible or irreversible). The proposed scheme avoids the need for RONI segmentation, which incurs capacity and computational overheads. The performance of the proposed scheme has been compared with a relevant scheme in terms of embedding capacity, image perceptual quality (measured by SSIM and PSNR), and computational efficiency. Our experimental results show that the proposed scheme is computationally efficient, offers an image-content-independent embedding capacity, and maintains a good image quality
Resumo:
In outdoor environments shadows are common. These typically strong visual features cause considerable change in the appearance of a place, and therefore confound vision-based localisation approaches. In this paper we describe how to convert a colour image of the scene to a greyscale invariant image where pixel values are a function of underlying material property not lighting. We summarise the theory of shadow invariant images and discuss the modelling and calibration issues which are important for non-ideal off-the-shelf colour cameras. We evaluate the technique with a commonly used robotic camera and an autonomous car operating in an outdoor environment, and show that it can outperform the use of ordinary greyscale images for the task of visual localisation.