876 resultados para Feature Extraction
Resumo:
Light Detection and Ranging (LIDAR) has great potential to assist vegetation management in power line corridors by providing more accurate geometric information of the power line assets and vegetation along the corridors. However, the development of algorithms for the automatic processing of LIDAR point cloud data, in particular for feature extraction and classification of raw point cloud data, is in still in its infancy. In this paper, we take advantage of LIDAR intensity and try to classify ground and non-ground points by statistically analyzing the skewness and kurtosis of the intensity data. Moreover, the Hough transform is employed to detected power lines from the filtered object points. The experimental results show the effectiveness of our methods and indicate that better results were obtained by using LIDAR intensity data than elevation data.
Resumo:
With the size and state of the Internet today, a good quality approach to organizing this mass of information is of great importance. Clustering web pages into groups of similar documents is one approach, but relies heavily on good feature extraction and document representation as well as a good clustering approach and algorithm. Due to the changing nature of the Internet, resulting in a dynamic dataset, an incremental approach is preferred. In this work we propose an enhanced incremental clustering approach to develop a better clustering algorithm that can help to better organize the information available on the Internet in an incremental fashion. Experiments show that the enhanced algorithm outperforms the original histogram based algorithm by up to 7.5%.
Resumo:
The cascading appearance-based (CAB) feature extraction technique has established itself as the state of the art in extracting dynamic visual speech features for speech recognition. In this paper, we will focus on investigating the effectiveness of this technique for the related speaker verification application. By investigating the speaker verification ability of each stage of the cascade we will demonstrate that the same steps taken to reduce static speaker and environmental information for the speech recognition application also provide similar improvements for speaker recognition. These results suggest that visual speaker recognition can improve considerable when conducted solely through a consideration of the dynamic speech information rather than the static appearance of the speaker's mouth region.
Resumo:
This paper proposes the validity of a Gabor filter bank for feature extraction of solder joint images on Printed Circuit Boards (PCBs). A distance measure based on the Mahalanobis Cosine metric is also presented for classification of five different types of solder joints. From the experimental results, this methodology achieved high accuracy and a well generalised performance. This can be an effective method to reduce cost and improve quality in the production of PCBs in the manufacturing industry.
Resumo:
Road features extraction from remote sensed imagery has been a long-term topic of great interest within the photogrammetry and remote sensing communities for over three decades. The majority of the early work only focused on linear feature detection approaches, with restrictive assumption on image resolution and road appearance. The widely available of high resolution digital aerial images makes it possible to extract sub-road features, e.g. road pavement markings. In this paper, we will focus on the automatic extraction of road lane markings, which are required by various lane-based vehicle applications, such as, autonomous vehicle navigation, and lane departure warning. The proposed approach consists of three phases: i) road centerline extraction from low resolution image, ii) road surface detection in the original image, and iii) pavement marking extraction on the generated road surface. The proposed method was tested on the aerial imagery dataset of the Bruce Highway, Queensland, and the results demonstrate the efficiency of our approach.
Resumo:
Automatic Speech Recognition (ASR) has matured into a technology which is becoming more common in our everyday lives, and is emerging as a necessity to minimise driver distraction when operating in-car systems such as navigation and infotainment. In “noise-free” environments, word recognition performance of these systems has been shown to approach 100%, however this performance degrades rapidly as the level of background noise is increased. Speech enhancement is a popular method for making ASR systems more ro- bust. Single-channel spectral subtraction was originally designed to improve hu- man speech intelligibility and many attempts have been made to optimise this algorithm in terms of signal-based metrics such as maximised Signal-to-Noise Ratio (SNR) or minimised speech distortion. Such metrics are used to assess en- hancement performance for intelligibility not speech recognition, therefore mak- ing them sub-optimal ASR applications. This research investigates two methods for closely coupling subtractive-type enhancement algorithms with ASR: (a) a computationally-efficient Mel-filterbank noise subtraction technique based on likelihood-maximisation (LIMA), and (b) in- troducing phase spectrum information to enable spectral subtraction in the com- plex frequency domain. Likelihood-maximisation uses gradient-descent to optimise parameters of the enhancement algorithm to best fit the acoustic speech model given a word se- quence known a priori. Whilst this technique is shown to improve the ASR word accuracy performance, it is also identified to be particularly sensitive to non-noise mismatches between the training and testing data. Phase information has long been ignored in spectral subtraction as it is deemed to have little effect on human intelligibility. In this work it is shown that phase information is important in obtaining highly accurate estimates of clean speech magnitudes which are typically used in ASR feature extraction. Phase Estimation via Delay Projection is proposed based on the stationarity of sinusoidal signals, and demonstrates the potential to produce improvements in ASR word accuracy in a wide range of SNR. Throughout the dissertation, consideration is given to practical implemen- tation in vehicular environments which resulted in two novel contributions – a LIMA framework which takes advantage of the grounding procedure common to speech dialogue systems, and a resource-saving formulation of frequency-domain spectral subtraction for realisation in field-programmable gate array hardware. The techniques proposed in this dissertation were evaluated using the Aus- tralian English In-Car Speech Corpus which was collected as part of this work. This database is the first of its kind within Australia and captures real in-car speech of 50 native Australian speakers in seven driving conditions common to Australian environments.
Resumo:
The detection of voice activity is a challenging problem, especially when the level of acoustic noise is high. Most current approaches only utilise the audio signal, making them susceptible to acoustic noise. An obvious approach to overcome this is to use the visual modality. The current state-of-the-art visual feature extraction technique is one that uses a cascade of visual features (i.e. 2D-DCT, feature mean normalisation, interstep LDA). In this paper, we investigate the effectiveness of this technique for the task of visual voice activity detection (VAD), and analyse each stage of the cascade and quantify the relative improvement in performance gained by each successive stage. The experiments were conducted on the CUAVE database and our results highlight that the dynamics of the visual modality can be used to good effect to improve visual voice activity detection performance.
Resumo:
The article described an open-source toolbox for machine vision called Machine Vision Toolbox (MVT). MVT includes more than 60 functions including image file reading and writing, acquisition, display, filtering, blob, point and line feature extraction, mathematical morphology, homographies, visual Jacobians, camera calibration, and color space conversion. MVT can be used for research into machine vision but is also versatile enough to be usable for real-time work and even control. MVT, combined with MATLAB and a model workstation computer, is a useful and convenient environment for the investigation of machine vision algorithms. The article illustrated the use of a subset of toolbox functions for some typical problems and described MVT operations including the simulation of a complete image-based visual servo system.
Resumo:
Robust image hashing seeks to transform a given input image into a shorter hashed version using a key-dependent non-invertible transform. These image hashes can be used for watermarking, image integrity authentication or image indexing for fast retrieval. This paper introduces a new method of generating image hashes based on extracting Higher Order Spectral features from the Radon projection of an input image. The feature extraction process is non-invertible, non-linear and different hashes can be produced from the same image through the use of random permutations of the input. We show that the transform is robust to typical image transformations such as JPEG compression, noise, scaling, rotation, smoothing and cropping. We evaluate our system using a verification-style framework based on calculating false match, false non-match likelihoods using the publicly available Uncompressed Colour Image database (UCID) of 1320 images. We also compare our results to Swaminathan’s Fourier-Mellin based hashing method with at least 1% EER improvement under noise, scaling and sharpening.
Resumo:
Camera calibration information is required in order for multiple camera networks to deliver more than the sum of many single camera systems. Methods exist for manually calibrating cameras with high accuracy. Manually calibrating networks with many cameras is, however, time consuming, expensive and impractical for networks that undergo frequent change. For this reason, automatic calibration techniques have been vigorously researched in recent years. Fully automatic calibration methods depend on the ability to automatically find point correspondences between overlapping views. In typical camera networks, cameras are placed far apart to maximise coverage. This is referred to as a wide base-line scenario. Finding sufficient correspondences for camera calibration in wide base-line scenarios presents a significant challenge. This thesis focuses on developing more effective and efficient techniques for finding correspondences in uncalibrated, wide baseline, multiple-camera scenarios. The project consists of two major areas of work. The first is the development of more effective and efficient view covariant local feature extractors. The second area involves finding methods to extract scene information using the information contained in a limited set of matched affine features. Several novel affine adaptation techniques for salient features have been developed. A method is presented for efficiently computing the discrete scale space primal sketch of local image features. A scale selection method was implemented that makes use of the primal sketch. The primal sketch-based scale selection method has several advantages over the existing methods. It allows greater freedom in how the scale space is sampled, enables more accurate scale selection, is more effective at combining different functions for spatial position and scale selection, and leads to greater computational efficiency. Existing affine adaptation methods make use of the second moment matrix to estimate the local affine shape of local image features. In this thesis, it is shown that the Hessian matrix can be used in a similar way to estimate local feature shape. The Hessian matrix is effective for estimating the shape of blob-like structures, but is less effective for corner structures. It is simpler to compute than the second moment matrix, leading to a significant reduction in computational cost. A wide baseline dense correspondence extraction system, called WiDense, is presented in this thesis. It allows the extraction of large numbers of additional accurate correspondences, given only a few initial putative correspondences. It consists of the following algorithms: An affine region alignment algorithm that ensures accurate alignment between matched features; A method for extracting more matches in the vicinity of a matched pair of affine features, using the alignment information contained in the match; An algorithm for extracting large numbers of highly accurate point correspondences from an aligned pair of feature regions. Experiments show that the correspondences generated by the WiDense system improves the success rate of computing the epipolar geometry of very widely separated views. This new method is successful in many cases where the features produced by the best wide baseline matching algorithms are insufficient for computing the scene geometry.