24 resultados para Feature Felection
Resumo:
Classification of a large document collection involves dealing with a huge feature space where each distinct word is a feature. In such an environment, classification is a costly task both in terms of running time and computing resources. Further it will not guarantee optimal results because it is likely to overfit by considering every feature for classification. In such a context, feature selection is inevitable. This work analyses the feature selection methods, explores the relations among them and attempts to find a minimal subset of features which are discriminative for document classification.
Resumo:
In this paper, we present a methodology for identifying best features from a large feature space. In high dimensional feature space nearest neighbor search is meaningless. In this feature space we see quality and performance issue with nearest neighbor search. Many data mining algorithms use nearest neighbor search. So instead of doing nearest neighbor search using all the features we need to select relevant features. We propose feature selection using Non-negative Matrix Factorization(NMF) and its application to nearest neighbor search. Recent clustering algorithm based on Locally Consistent Concept Factorization(LCCF) shows better quality of document clustering by using local geometrical and discriminating structure of the data. By using our feature selection method we have shown further improvement of performance in the clustering.
Resumo:
Outlier detection in high dimensional categorical data has been a problem of much interest due to the extensive use of qualitative features for describing the data across various application areas. Though there exist various established methods for dealing with the dimensionality aspect through feature selection on numerical data, the categorical domain is actively being explored. As outlier detection is generally considered as an unsupervised learning problem due to lack of knowledge about the nature of various types of outliers, the related feature selection task also needs to be handled in a similar manner. This motivates the need to develop an unsupervised feature selection algorithm for efficient detection of outliers in categorical data. Addressing this aspect, we propose a novel feature selection algorithm based on the mutual information measure and the entropy computation. The redundancy among the features is characterized using the mutual information measure for identifying a suitable feature subset with less redundancy. The performance of the proposed algorithm in comparison with the information gain based feature selection shows its effectiveness for outlier detection. The efficacy of the proposed algorithm is demonstrated on various high-dimensional benchmark data sets employing two existing outlier detection methods.
Resumo:
Clustering has been the most popular method for data exploration. Clustering is partitioning the data set into sub-partitions based on some measures say the distance measure, each partition has its own significant information. There are a number of algorithms explored for this purpose, one such algorithm is the Particle Swarm Optimization(PSO) which is a population based heuristic search technique derived from swarm intelligence. In this paper we present an improved version of the Particle Swarm Optimization where, each feature of the data set is given significance accordingly by adding some random weights, which also minimizes the distortions in the dataset if any. The performance of the above proposed algorithm is evaluated using some benchmark datasets from Machine Learning Repository. The experimental results shows that our proposed methodology performs significantly better than the previously performed experiments.
Resumo:
We consider the problem of finding the best features for value function approximation in reinforcement learning and develop an online algorithm to optimize the mean square Bellman error objective. For any given feature value, our algorithm performs gradient search in the parameter space via a residual gradient scheme and, on a slower timescale, also performs gradient search in the Grassman manifold of features. We present a proof of convergence of our algorithm. We show empirical results using our algorithm as well as a similar algorithm that uses temporal difference learning in place of the residual gradient scheme for the faster timescale updates.
Resumo:
Data clustering groups data so that data which are similar to each other are in the same group and data which are dissimilar to each other are in different groups. Since generally clustering is a subjective activity, it is possible to get different clusterings of the same data depending on the need. This paper attempts to find the best clustering of the data by first carrying out feature selection and using only the selected features, for clustering. A PSO (Particle Swarm Optimization)has been used for clustering but feature selection has also been carried out simultaneously. The performance of the above proposed algorithm is evaluated on some benchmark data sets. The experimental results shows the proposed methodology outperforms the previous approaches such as basic PSO and Kmeans for the clustering problem.
Resumo:
Acoustic signal variation and female preference for different signal components constitute the prerequisite framework to study the mechanisms of sexual selection that shape acoustic communication. Despite several studies of acoustic communication in crickets, information on both male calling song variation in the field and female preference in the same system is lacking for most species. Previous studies on acoustic signal variation either were carried out on populations maintained in the laboratory or did not investigate signal repeatability. We therefore used repeatability analysis to quantify variation in the spectral, temporal and amplitudinal characteristics of the male calling song of the field cricket Plebeiogryllus guttiventris in a wild population, at two temporal scales, within and across nights. Carrier frequency (CF) was the most repeatable character across nights, whereas chirp period (CP) had low repeatability across nights. We investigated whether female preferences were more likely to be based on features with high (CF) or low (CP) repeatability. Females showed no consistent preferences for CF but were significantly more attracted towards signals with short CPs. The attractiveness of lower CP calls disappeared, however, when traded off with sound pressure level (SPL). SPL was the only acoustic feature that was significantly positively correlated with male body size. Since relative SPL affects female phonotaxis strongly and can vary unpredictably based on male spacing, our results suggest that even strong female preferences for acoustic features may not necessarily translate into greater advantage for males possessing these features in the field. (C) 2013 The Association for the Study of Animal Behaviour. Published by Elsevier Ltd. All rights reserved.
Resumo:
We propose to develop a 3-D optical flow features based human action recognition system. Optical flow based features are employed here since they can capture the apparent movement in object, by design. Moreover, they can represent information hierarchically from local pixel level to global object level. In this work, 3-D optical flow based features a re extracted by combining the 2-1) optical flow based features with the depth flow features obtained from depth camera. In order to develop an action recognition system, we employ a Meta-Cognitive Neuro-Fuzzy Inference System (McFIS). The m of McFIS is to find the decision boundary separating different classes based on their respective optical flow based features. McFIS consists of a neuro-fuzzy inference system (cognitive component) and a self-regulatory learning mechanism (meta-cognitive component). During the supervised learning, self-regulatory learning mechanism monitors the knowledge of the current sample with respect to the existing knowledge in the network and controls the learning by deciding on sample deletion, sample learning or sample reserve strategies. The performance of the proposed action recognition system was evaluated on a proprietary data set consisting of eight subjects. The performance evaluation with standard support vector machine classifier and extreme learning machine indicates improved performance of McFIS is recognizing actions based of 3-D optical flow based features.
Resumo:
Selection of relevant features is an open problem in Brain-computer interfacing (BCI) research. Sometimes, features extracted from brain signals are high dimensional which in turn affects the accuracy of the classifier. Selection of the most relevant features improves the performance of the classifier and reduces the computational cost of the system. In this study, we have used a combination of Bacterial Foraging Optimization and Learning Automata to determine the best subset of features from a given motor imagery electroencephalography (EEG) based BCI dataset. Here, we have employed Discrete Wavelet Transform to obtain a high dimensional feature set and classified it by Distance Likelihood Ratio Test. Our proposed feature selector produced an accuracy of 80.291% in 216 seconds.