411 resultados para image classification


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Local spatio-temporal features with a Bag-of-visual words model is a popular approach used in human action recognition. Bag-of-features methods suffer from several challenges such as extracting appropriate appearance and motion features from videos, converting extracted features appropriate for classification and designing a suitable classification framework. In this paper we address the problem of efficiently representing the extracted features for classification to improve the overall performance. We introduce two generative supervised topic models, maximum entropy discrimination LDA (MedLDA) and class- specific simplex LDA (css-LDA), to encode the raw features suitable for discriminative SVM based classification. Unsupervised LDA models disconnect topic discovery from the classification task, hence yield poor results compared to the baseline Bag-of-words framework. On the other hand supervised LDA techniques learn the topic structure by considering the class labels and improve the recognition accuracy significantly. MedLDA maximizes likelihood and within class margins using max-margin techniques and yields a sparse highly discriminative topic structure; while in css-LDA separate class specific topics are learned instead of common set of topics across the entire dataset. In our representation first topics are learned and then each video is represented as a topic proportion vector, i.e. it can be comparable to a histogram of topics. Finally SVM classification is done on the learned topic proportion vector. We demonstrate the efficiency of the above two representation techniques through the experiments carried out in two popular datasets. Experimental results demonstrate significantly improved performance compared to the baseline Bag-of-features framework which uses kmeans to construct histogram of words from the feature vectors.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Deep convolutional network models have dominated recent work in human action recognition as well as image classification. However, these methods are often unduly influenced by the image background, learning and exploiting the presence of cues in typical computer vision datasets. For unbiased robotics applications, the degree of variation and novelty in action backgrounds is far greater than in computer vision datasets. To address this challenge, we propose an “action region proposal” method that, informed by optical flow, extracts image regions likely to contain actions for input into the network both during training and testing. In a range of experiments, we demonstrate that manually segmenting the background is not enough; but through active action region proposals during training and testing, state-of-the-art or better performance can be achieved on individual spatial and temporal video components. Finally, we show by focusing attention through action region proposals, we can further improve upon the existing state-of-the-art in spatio-temporally fused action recognition performance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents an effective classification method based on Support Vector Machines (SVM) in the context of activity recognition. Local features that capture both spatial and temporal information in activity videos have made significant progress recently. Efficient and effective features, feature representation and classification plays a crucial role in activity recognition. For classification, SVMs are popularly used because of their simplicity and efficiency; however the common multi-class SVM approaches applied suffer from limitations including having easily confused classes and been computationally inefficient. We propose using a binary tree SVM to address the shortcomings of multi-class SVMs in activity recognition. We proposed constructing a binary tree using Gaussian Mixture Models (GMM), where activities are repeatedly allocated to subnodes until every new created node contains only one activity. Then, for each internal node a separate SVM is learned to classify activities, which significantly reduces the training time and increases the speed of testing compared to popular the `one-against-the-rest' multi-class SVM classifier. Experiments carried out on the challenging and complex Hollywood dataset demonstrates comparable performance over the baseline bag-of-features method.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The most difficult operation in the flood inundation mapping using optical flood images is to separate fully inundated areas from the ‘wet’ areas where trees and houses are partly covered by water. This can be referred as a typical problem the presence of mixed pixels in the images. A number of automatic information extraction image classification algorithms have been developed over the years for flood mapping using optical remote sensing images. Most classification algorithms generally, help in selecting a pixel in a particular class label with the greatest likelihood. However, these hard classification methods often fail to generate a reliable flood inundation mapping because the presence of mixed pixels in the images. To solve the mixed pixel problem advanced image processing techniques are adopted and Linear Spectral unmixing method is one of the most popular soft classification technique used for mixed pixel analysis. The good performance of linear spectral unmixing depends on two important issues, those are, the method of selecting endmembers and the method to model the endmembers for unmixing. This paper presents an improvement in the adaptive selection of endmember subset for each pixel in spectral unmixing method for reliable flood mapping. Using a fixed set of endmembers for spectral unmixing all pixels in an entire image might cause over estimation of the endmember spectra residing in a mixed pixel and hence cause reducing the performance level of spectral unmixing. Compared to this, application of estimated adaptive subset of endmembers for each pixel can decrease the residual error in unmixing results and provide a reliable output. In this current paper, it has also been proved that this proposed method can improve the accuracy of conventional linear unmixing methods and also easy to apply. Three different linear spectral unmixing methods were applied to test the improvement in unmixing results. Experiments were conducted in three different sets of Landsat-5 TM images of three different flood events in Australia to examine the method on different flooding conditions and achieved satisfactory outcomes in flood mapping.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The most difficult operation in flood inundation mapping using optical flood images is to map the ‘wet’ areas where trees and houses are partly covered by water. This can be referred to as a typical problem of the presence of mixed pixels in the images. A number of automatic information extracting image classification algorithms have been developed over the years for flood mapping using optical remote sensing images, with most labelling a pixel as a particular class. However, they often fail to generate reliable flood inundation mapping because of the presence of mixed pixels in the images. To solve this problem, spectral unmixing methods have been developed. In this thesis, methods for selecting endmembers and the method to model the primary classes for unmixing, the two most important issues in spectral unmixing, are investigated. We conduct comparative studies of three typical spectral unmixing algorithms, Partial Constrained Linear Spectral unmixing, Multiple Endmember Selection Mixture Analysis and spectral unmixing using the Extended Support Vector Machine method. They are analysed and assessed by error analysis in flood mapping using MODIS, Landsat and World View-2 images. The Conventional Root Mean Square Error Assessment is applied to obtain errors for estimated fractions of each primary class. Moreover, a newly developed Fuzzy Error Matrix is used to obtain a clear picture of error distributions at the pixel level. This thesis shows that the Extended Support Vector Machine method is able to provide a more reliable estimation of fractional abundances and allows the use of a complete set of training samples to model a defined pure class. Furthermore, it can be applied to analysis of both pure and mixed pixels to provide integrated hard-soft classification results. Our research also identifies and explores a serious drawback in relation to endmember selections in current spectral unmixing methods which apply fixed sets of endmember classes or pure classes for mixture analysis of every pixel in an entire image. However, as it is not accurate to assume that every pixel in an image must contain all endmember classes, these methods usually cause an over-estimation of the fractional abundances in a particular pixel. In this thesis, a subset of adaptive endmembers in every pixel is derived using the proposed methods to form an endmember index matrix. The experimental results show that using the pixel-dependent endmembers in unmixing significantly improves performance.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper reports on the empirical comparison of seven machine learning algorithms in texture classification with application to vegetation management in power line corridors. Aiming at classifying tree species in power line corridors, object-based method is employed. Individual tree crowns are segmented as the basic classification units and three classic texture features are extracted as the input to the classification algorithms. Several widely used performance metrics are used to evaluate the classification algorithms. The experimental results demonstrate that the classification performance depends on the performance matrix, the characteristics of datasets and the feature used.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper, we propose a new multi-class steganalysis for binary image. The proposed method can identify the type of steganographic technique used by examining on the given binary image. In addition, our proposed method is also capable of differentiating an image with hidden message from the one without hidden message. In order to do that, we will extract some features from the binary image. The feature extraction method used is a combination of the method extended from our previous work and some new methods proposed in this paper. Based on the extracted feature sets, we construct our multi-class steganalysis from the SVM classifier. We also present the empirical works to demonstrate that the proposed method can effectively identify five different types of steganography.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Existing multi-model approaches for image set classification extract local models by clustering each image set individually only once, with fixed clusters used for matching with other image sets. However, this may result in the two closest clusters to represent different characteristics of an object, due to different undesirable environmental conditions (such as variations in illumination and pose). To address this problem, we propose to constrain the clustering of each query image set by forcing the clusters to have resemblance to the clusters in the gallery image sets. We first define a Frobenius norm distance between subspaces over Grassmann manifolds based on reconstruction error. We then extract local linear subspaces from a gallery image set via sparse representation. For each local linear subspace, we adaptively construct the corresponding closest subspace from the samples of a probe image set by joint sparse representation. We show that by minimising the sparse representation reconstruction error, we approach the nearest point on a Grassmann manifold. Experiments on Honda, ETH-80 and Cambridge-Gesture datasets show that the proposed method consistently outperforms several other recent techniques, such as Affine Hull based Image Set Distance (AHISD), Sparse Approximated Nearest Points (SANP) and Manifold Discriminant Analysis (MDA).

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Frog species have been declining worldwide at unprecedented rates in the past decades. There are many reasons for this decline including pollution, habitat loss, and invasive species [1]. To preserve, protect, and restore frog biodiversity, it is important to monitor and assess frog species. In this paper, a novel method using image processing techniques for analyzing Australian frog vocalisations is proposed. An FFT is applied to audio data to produce a spectrogram. Then, acoustic events are detected and isolated into corresponding segments through image processing techniques applied to the spectrogram. For each segment, spectral peak tracks are extracted with selected seeds and a region growing technique is utilised to obtain the contour of each frog vocalisation. Based on spectral peak tracks and the contour of each frog vocalisation, six feature sets are extracted. Principal component analysis reduces each feature set down to six principal components which are tested for classification performance with a k-nearest neighbor classifier. This experiment tests the proposed method of classification on fourteen frog species which are geographically well distributed throughout Queensland, Australia. The experimental results show that the best average classification accuracy for the fourteen frog species can be up to 87%.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Frogs have received increasing attention due to their effectiveness for indicating the environment change. Therefore, it is important to monitor and assess frogs. With the development of sensor techniques, large volumes of audio data (including frog calls) have been collected and need to be analysed. After transforming the audio data into its spectrogram representation using short-time Fourier transform, the visual inspection of this representation motivates us to use image processing techniques for analysing audio data. Applying acoustic event detection (AED) method to spectrograms, acoustic events are firstly detected from which ridges are extracted. Three feature sets, Mel-frequency cepstral coefficients (MFCCs), AED feature set and ridge feature set, are then used for frog call classification with a support vector machine classifier. Fifteen frog species widely spread in Queensland, Australia, are selected to evaluate the proposed method. The experimental results show that ridge feature set can achieve an average classification accuracy of 74.73% which outperforms the MFCCs (38.99%) and AED feature set (67.78%).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It has been proposed that body image disturbance is a form of cognitive bias wherein schemas for self-relevant information guide the selective processing of appearancerelated information in the environment. This threatening information receives disproportionately more attention and memory, as measured by an Emotional Stroop and incidental recall task. The aim of this thesis was to expand the literature on cognitive processing biases in non-clinical males and females by incorporating a number of significant methodological refinements. To achieve this aim, three phases of research were conducted. The initial two phases of research provided preliminary data to inform the development of the main study. Phase One was a qualitative exploration of body image concerns amongst males and females recruited through the general community and from a university. Seventeen participants (eight male; nine female) provided information on their body image and what factors they saw as positively and negatively impacting on their self evaluations. The importance of self esteem, mood, health and fitness, and recognition of the social ideal were identified as key themes. These themes were incorporated as psycho-social measures and Stroop word stimuli in subsequent phases of the research. Phase Two involved the selection and testing of stimuli to be used in the Emotional Stroop task. Six experimental categories of words were developed that reflected a broad range of health and body image concerns for males and females. These categories were high and low calorie food words, positive and negative appearance words, negative emotion words, and physical activity words. Phase Three addressed the central aim of the project by examining cognitive biases for body image information in empirically defined sub-groups. A National sample of males (N = 55) and females (N = 144), recruited from the general community and universities, completed an Emotional Stroop task, incidental memory test, and a collection of psycho-social questionnaires. Sub-groups of body image disturbance were sought using a cluster analysis, which identified three sub-groups in males (Normal, Dissatisfied, and Athletic) and four sub-groups in females (Normal, Health Conscious, Dissatisfied, and Symptomatic). No differences were noted between the groups in selective attention, although time taken to colour name the words was associated with some of the psycho-social variables. Memory biases found across the whole sample for negative emotion, low calorie food, and negative appearance words were interpreted as reflecting the current focus on health and stigma against being unattractive. Collectively these results have expanded our understanding of processing biases in the general community by demonstrating that the processing biases are found within non-clinical samples and that not all processing biases are associated with negative functionality

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Light Detection and Ranging (LIDAR) has great potential to assist vegetation management in power line corridors by providing more accurate geometric information of the power line assets and vegetation along the corridors. However, the development of algorithms for the automatic processing of LIDAR point cloud data, in particular for feature extraction and classification of raw point cloud data, is in still in its infancy. In this paper, we take advantage of LIDAR intensity and try to classify ground and non-ground points by statistically analyzing the skewness and kurtosis of the intensity data. Moreover, the Hough transform is employed to detected power lines from the filtered object points. The experimental results show the effectiveness of our methods and indicate that better results were obtained by using LIDAR intensity data than elevation data.