915 resultados para Processing image
Resumo:
This thesis addresses the problem of detecting and describing the same scene points in different wide-angle images taken by the same camera at different viewpoints. This is a core competency of many vision-based localisation tasks including visual odometry and visual place recognition. Wide-angle cameras have a large field of view that can exceed a full hemisphere, and the images they produce contain severe radial distortion. When compared to traditional narrow field of view perspective cameras, more accurate estimates of camera egomotion can be found using the images obtained with wide-angle cameras. The ability to accurately estimate camera egomotion is a fundamental primitive of visual odometry, and this is one of the reasons for the increased popularity in the use of wide-angle cameras for this task. Their large field of view also enables them to capture images of the same regions in a scene taken at very different viewpoints, and this makes them suited for visual place recognition. However, the ability to estimate the camera egomotion and recognise the same scene in two different images is dependent on the ability to reliably detect and describe the same scene points, or ‘keypoints’, in the images. Most algorithms used for this purpose are designed almost exclusively for perspective images. Applying algorithms designed for perspective images directly to wide-angle images is problematic as no account is made for the image distortion. The primary contribution of this thesis is the development of two novel keypoint detectors, and a method of keypoint description, designed for wide-angle images. Both reformulate the Scale- Invariant Feature Transform (SIFT) as an image processing operation on the sphere. As the image captured by any central projection wide-angle camera can be mapped to the sphere, applying these variants to an image on the sphere enables keypoints to be detected in a manner that is invariant to image distortion. Each of the variants is required to find the scale-space representation of an image on the sphere, and they differ in the approaches they used to do this. Extensive experiments using real and synthetically generated wide-angle images are used to validate the two new keypoint detectors and the method of keypoint description. The best of these two new keypoint detectors is applied to vision based localisation tasks including visual odometry and visual place recognition using outdoor wide-angle image sequences. As part of this work, the effect of keypoint coordinate selection on the accuracy of egomotion estimates using the Direct Linear Transform (DLT) is investigated, and a simple weighting scheme is proposed which attempts to account for the uncertainty of keypoint positions during detection. A word reliability metric is also developed for use within a visual ‘bag of words’ approach to place recognition.
Resumo:
Camera calibration information is required in order for multiple camera networks to deliver more than the sum of many single camera systems. Methods exist for manually calibrating cameras with high accuracy. Manually calibrating networks with many cameras is, however, time consuming, expensive and impractical for networks that undergo frequent change. For this reason, automatic calibration techniques have been vigorously researched in recent years. Fully automatic calibration methods depend on the ability to automatically find point correspondences between overlapping views. In typical camera networks, cameras are placed far apart to maximise coverage. This is referred to as a wide base-line scenario. Finding sufficient correspondences for camera calibration in wide base-line scenarios presents a significant challenge. This thesis focuses on developing more effective and efficient techniques for finding correspondences in uncalibrated, wide baseline, multiple-camera scenarios. The project consists of two major areas of work. The first is the development of more effective and efficient view covariant local feature extractors. The second area involves finding methods to extract scene information using the information contained in a limited set of matched affine features. Several novel affine adaptation techniques for salient features have been developed. A method is presented for efficiently computing the discrete scale space primal sketch of local image features. A scale selection method was implemented that makes use of the primal sketch. The primal sketch-based scale selection method has several advantages over the existing methods. It allows greater freedom in how the scale space is sampled, enables more accurate scale selection, is more effective at combining different functions for spatial position and scale selection, and leads to greater computational efficiency. Existing affine adaptation methods make use of the second moment matrix to estimate the local affine shape of local image features. In this thesis, it is shown that the Hessian matrix can be used in a similar way to estimate local feature shape. The Hessian matrix is effective for estimating the shape of blob-like structures, but is less effective for corner structures. It is simpler to compute than the second moment matrix, leading to a significant reduction in computational cost. A wide baseline dense correspondence extraction system, called WiDense, is presented in this thesis. It allows the extraction of large numbers of additional accurate correspondences, given only a few initial putative correspondences. It consists of the following algorithms: An affine region alignment algorithm that ensures accurate alignment between matched features; A method for extracting more matches in the vicinity of a matched pair of affine features, using the alignment information contained in the match; An algorithm for extracting large numbers of highly accurate point correspondences from an aligned pair of feature regions. Experiments show that the correspondences generated by the WiDense system improves the success rate of computing the epipolar geometry of very widely separated views. This new method is successful in many cases where the features produced by the best wide baseline matching algorithms are insufficient for computing the scene geometry.
Resumo:
This paper considers the use of servo-mechanisms as part of a tightly integrated homogeneous Wireless Multi- media Sensor Network (WMSN). We describe the design of our second generation WMSN node platform, which has increased image resolution, in-built audio sensors, PIR sensors, and servo- mechanisms. These devices have a wide disparity in their energy consumption and in the information quality they return. As a result, we propose a framework that establishes a hierarchy of devices (sensors and actuators) within the node and uses frequent sampling of cheaper devices to trigger the activation of more energy-hungry devices. Within this framework, we consider the suitability of servos for WMSNs by examining the functional characteristics and by measuring the energy consumption of 2 analog and 2 digital servos, in order to determine their impact on overall node energy cost. We also implement a simple version of our hierarchical sampling framework to evaluate the energy consumption of servos relative to other node components. The evaluation results show that: (1) the energy consumption of servos is small relative to audio/image signal processing energy cost in WMSN nodes; (2) digital servos do not necessarily consume as much energy as is currently believed; and (3) the energy cost per degree panning is lower for larger panning angles.
Resumo:
The following paper proposes a novel application of Skid-to-Turn maneuvers for fixed wing Unmanned Aerial Vehicles (UAVs) inspecting locally linear infrastructure. Fixed wing UAVs, following the design of manned aircraft, commonly employ Bank-to-Turn ma- neuvers to change heading and thus direction of travel. Whilst effective, banking an aircraft during the inspection of ground based features hinders data collection, with body fixed sen- sors angled away from the direction of turn and a panning motion induced through roll rate that can reduce data quality. By adopting Skid-to-Turn maneuvers, the aircraft can change heading whilst maintaining wings level flight, thus allowing body fixed sensors to main- tain a downward facing orientation. An Image-Based Visual Servo controller is developed to directly control the position of features as captured by onboard inspection sensors. This improves on the indirect approach taken by other tracking controllers where a course over ground directly above the feature is assumed to capture it centered in the field of view. Performance of the proposed controller is compared against that of a Bank-to-Turn tracking controller driven by GPS derived cross track error in a simulation environment developed to replicate the field of view of a body fixed camera.
Resumo:
Machine vision represents a particularly attractive solution for sensing and detecting potential collision-course targets due to the relatively low cost, size, weight, and power requirements of the sensors involved. This paper describes the development of detection algorithms and the evaluation of a real-time flight ready hardware implementation of a vision-based collision detection system suitable for fixed-wing small/medium size UAS. In particular, this paper demonstrates the use of Hidden Markov filter to track and estimate the elevation (β) and bearing (α) of the target, compares several candidate graphic processing hardware choices, and proposes an image based visual servoing approach to achieve collision avoidance
Resumo:
Uncooperative iris identification systems at a distance and on the move often suffer from poor resolution and poor focus of the captured iris images. The lack of pixel resolution and well-focused images significantly degrades the iris recognition performance. This paper proposes a new approach to incorporate the focus score into a reconstruction-based super-resolution process to generate a high resolution iris image from a low resolution and focus inconsistent video sequence of an eye. A reconstruction-based technique, which can incorporate middle and high frequency components from multiple low resolution frames into one desired super-resolved frame without introducing false high frequency components, is used. A new focus assessment approach is proposed for uncooperative iris at a distance and on the move to improve performance for variations in lighting, size and occlusion. A novel fusion scheme is then proposed to incorporate the proposed focus score into the super-resolution process. The experiments conducted on the The Multiple Biometric Grand Challenge portal database shows that our proposed approach achieves an EER of 2.1%, outperforming the existing state-of-the-art averaging signal-level fusion approach by 19.2% and the robust mean super-resolution approach by 8.7%.
Resumo:
For a mobile robot to operate autonomously in real-world environments, it must have an effective control system and a navigation system capable of providing robust localization, path planning and path execution. In this paper we describe the work investigating synergies between mapping and control systems. We have integrated development of a control system for navigating mobile robots and a robot SLAM system. The control system is hybrid in nature and tightly coupled with the SLAM system; it uses a combination of high and low level deliberative and reactive control processes to perform obstacle avoidance, exploration, global navigation and recharging, and draws upon the map learning and localization capabilities of the SLAM system. The effectiveness of this hybrid, multi-level approach was evaluated in the context of a delivery robot scenario. Over a period of two weeks the robot performed 1143 delivery tasks to 11 different locations with only one delivery failure (from which it recovered), travelled a total distance of more than 40km, and recharged autonomously a total of 23 times. In this paper we describe the combined control and SLAM system and discuss insights gained from its successful application in a real-world context.
Resumo:
This paper presents an extended study on the implementation of support vector machine(SVM) based speaker verification in systems that employ continuous progressive model adaptation using the weight-based factor analysis model. The weight-based factor analysis model compensates for session variations in unsupervised scenarios by incorporating trial confidence measures in the general statistics used in the inter-session variability modelling process. Employing weight-based factor analysis in Gaussian mixture models (GMM) was recently found to provide significant performance gains to unsupervised classification. Further improvements in performance were found through the integration of SVM-based classification in the system by means of GMM supervectors. This study focuses particularly on the way in which a client is represented in the SVM kernel space using single and multiple target supervectors. Experimental results indicate that training client SVMs using a single target supervector maximises performance while exhibiting a certain robustness to the inclusion of impostor training data in the model. Furthermore, the inclusion of low-scoring target trials in the adaptation process is investigated where they were found to significantly aid performance.
Resumo:
Symmetric multi-processor (SMP) systems, or multiple-CPU servers, are suitable for implementing parallel algorithms because they employ dedicated communication devices to enhance the inter-processor communication bandwidth, so that a better performance can be obtained. However, the cost for a multiple-CPU server is high and therefore, the server is usually shared among many users. The work-load due to other users will certainly affect the performance of the parallel programs so it is desirable to derive a method to optimize parallel programs under different loading conditions. In this paper, we present a simple method, which can be applied in SPMD type parallel programs, to improve the speedup by controlling the number of threads within the programs.
Resumo:
Within a surveillance video, occlusions are commonplace, and accurately resolving these occlusions is key when seeking to accurately track objects. The challenge of accurately segmenting objects is further complicated by the fact that within many real-world surveillance environments, the objects appear very similar. For example, footage of pedestrians in a city environment will consist of many people wearing dark suits. In this paper, we propose a novel technique to segment groups and resolve occlusions using optical flow discontinuities. We demonstrate that the ratio of continuous to discontinuous pixels within a region can be used to locate the overlapping edges, and incorporate this into an object tracking framework. Results on a portion of the ETISEO database show that the proposed algorithm results in improved tracking performance overall, and improved tracking within occlusions.
Resumo:
The use of appropriate features to characterize an output class or object is critical for all classification problems. This paper evaluates the capability of several spectral and texture features for object-based vegetation classification at the species level using airborne high resolution multispectral imagery. Image-objects as the basic classification unit were generated through image segmentation. Statistical moments extracted from original spectral bands and vegetation index image are used as feature descriptors for image objects (i.e. tree crowns). Several state-of-art texture descriptors such as Gray-Level Co-Occurrence Matrix (GLCM), Local Binary Patterns (LBP) and its extensions are also extracted for comparison purpose. Support Vector Machine (SVM) is employed for classification in the object-feature space. The experimental results showed that incorporating spectral vegetation indices can improve the classification accuracy and obtained better results than in original spectral bands, and using moments of Ratio Vegetation Index obtained the highest average classification accuracy in our experiment. The experiments also indicate that the spectral moment features also outperform or can at least compare with the state-of-art texture descriptors in terms of classification accuracy.
Resumo:
A good object representation or object descriptor is one of the key issues in object based image analysis. To effectively fuse color and texture as a unified descriptor at object level, this paper presents a novel method for feature fusion. Color histogram and the uniform local binary patterns are extracted from arbitrary-shaped image-objects, and kernel principal component analysis (kernel PCA) is employed to find nonlinear relationships of the extracted color and texture features. The maximum likelihood approach is used to estimate the intrinsic dimensionality, which is then used as a criterion for automatic selection of optimal feature set from the fused feature. The proposed method is evaluated using SVM as the benchmark classifier and is applied to object-based vegetation species classification using high spatial resolution aerial imagery. Experimental results demonstrate that great improvement can be achieved by using proposed feature fusion method.
Resumo:
Advances in digital technology have caused a radical shift in moving image culture. This has occurred in both modes of production and sites of exhibition, resulting in a blurring of boundaries that previously defined a range of creative disciplines. Re-Imagining Animation: The Changing Face of the Moving Image, by Paul Wells and Johnny Hardstaff, argues that as a result of these blurred disciplinary boundaries, the term “animation” has become a “catch all” for describing any form of manipulated moving image practice. Understanding animation predicates the need to (re)define the medium within contemporary moving image culture. Via a series of case studies, the book engages with a range of moving image works, interrogating “how the many and varied approaches to making film, graphics, visual artefacts, multimedia and other intimations of motion pictures can now be delineated and understood” (p. 7). The structure and clarity of content make this book ideally suited to any serious study of contemporary animation which accepts animation as a truly interdisciplinary medium.
Resumo:
Despite the global financial downturn, the Australian rail industry is in a period of expansion. Reports indicate that the industry is not attracting sufficient entry level and mid-career engineers and skilled technicians from within the Australian labour market and is facing widespread retirements from an ageing workforce. This paper reports on a completed qualitative study that explores the perceptions of engineering students, their lecturers, careers advisors and recruitment consultants regarding rail as a brand and of careers in the rail industry. Findings are presented about career knowledge, job characteristic preferences, branding and image and indicate that rail as a brand has a dated image, that young people and their influencers have little knowledge of rail careers and that rail could better focus its image and recruitment strategies. Conclusions include suggestions for more effective attraction and image strategies for the industry and for further research.
Resumo:
With regard to the long-standing problem of the semantic gap between low-level image features and high-level human knowledge, the image retrieval community has recently shifted its emphasis from low-level features analysis to high-level image semantics extrac- tion. User studies reveal that users tend to seek information using high-level semantics. Therefore, image semantics extraction is of great importance to content-based image retrieval because it allows the users to freely express what images they want. Semantic content annotation is the basis for semantic content retrieval. The aim of image anno- tation is to automatically obtain keywords that can be used to represent the content of images. The major research challenges in image semantic annotation are: what is the basic unit of semantic representation? how can the semantic unit be linked to high-level image knowledge? how can the contextual information be stored and utilized for image annotation? In this thesis, the Semantic Web technology (i.e. ontology) is introduced to the image semantic annotation problem. Semantic Web, the next generation web, aims at mak- ing the content of whatever type of media not only understandable to humans but also to machines. Due to the large amounts of multimedia data prevalent on the Web, re- searchers and industries are beginning to pay more attention to the Multimedia Semantic Web. The Semantic Web technology provides a new opportunity for multimedia-based applications, but the research in this area is still in its infancy. Whether ontology can be used to improve image annotation and how to best use ontology in semantic repre- sentation and extraction is still a worth-while investigation. This thesis deals with the problem of image semantic annotation using ontology and machine learning techniques in four phases as below. 1) Salient object extraction. A salient object servers as the basic unit in image semantic extraction as it captures the common visual property of the objects. Image segmen- tation is often used as the �rst step for detecting salient objects, but most segmenta- tion algorithms often fail to generate meaningful regions due to over-segmentation and under-segmentation. We develop a new salient object detection algorithm by combining multiple homogeneity criteria in a region merging framework. 2) Ontology construction. Since real-world objects tend to exist in a context within their environment, contextual information has been increasingly used for improving object recognition. In the ontology construction phase, visual-contextual ontologies are built from a large set of fully segmented and annotated images. The ontologies are composed of several types of concepts (i.e. mid-level and high-level concepts), and domain contextual knowledge. The visual-contextual ontologies stand as a user-friendly interface between low-level features and high-level concepts. 3) Image objects annotation. In this phase, each object is labelled with a mid-level concept in ontologies. First, a set of candidate labels are obtained by training Support Vectors Machines with features extracted from salient objects. After that, contextual knowledge contained in ontologies is used to obtain the �nal labels by removing the ambiguity concepts. 4) Scene semantic annotation. The scene semantic extraction phase is to get the scene type by using both mid-level concepts and domain contextual knowledge in ontologies. Domain contextual knowledge is used to create scene con�guration that describes which objects co-exist with which scene type more frequently. The scene con�guration is represented in a probabilistic graph model, and probabilistic inference is employed to calculate the scene type given an annotated image. To evaluate the proposed methods, a series of experiments have been conducted in a large set of fully annotated outdoor scene images. These include a subset of the Corel database, a subset of the LabelMe dataset, the evaluation dataset of localized semantics in images, the spatial context evaluation dataset, and the segmented and annotated IAPR TC-12 benchmark.