919 resultados para correlation-based feature selection


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Selection of relevant features is an open problem in Brain-computer interfacing (BCI) research. Sometimes, features extracted from brain signals are high dimensional which in turn affects the accuracy of the classifier. Selection of the most relevant features improves the performance of the classifier and reduces the computational cost of the system. In this study, we have used a combination of Bacterial Foraging Optimization and Learning Automata to determine the best subset of features from a given motor imagery electroencephalography (EEG) based BCI dataset. Here, we have employed Discrete Wavelet Transform to obtain a high dimensional feature set and classified it by Distance Likelihood Ratio Test. Our proposed feature selector produced an accuracy of 80.291% in 216 seconds.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Feature selection aims to determine a minimal feature subset from a problem domain while retaining a suitably high accuracy in representing the original features. Rough set theory (RST) has been used as such a tool with much success. RST enables the discovery of data dependencies and the reduction of the number of attributes contained in a dataset using the data alone, requiring no additional information. This chapter describes the fundamental ideas behind RST-based approaches and reviews related feature selection methods that build on these ideas. Extensions to the traditional rough set approach are discussed, including recent selection methods based on tolerance rough sets, variable precision rough sets and fuzzy-rough sets. Alternative search mechanisms are also highly important in rough set feature selection. The chapter includes the latest developments in this area, including RST strategies based on hill-climbing, genetic algorithms and ant colony optimization.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Q. Shen and R. Jensen, 'Approximation-based feature selection and application for algae population estimation,' Applied Intelligence, vol. 28, no. 2, pp. 167-181, 2008. Sponsorship: EPSRC RONO: EP/E058388/1

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Elliott, G. N., Worgan, H., Broadhurst, D. I., Draper, J. H., Scullion, J. (2007). Soil differentiation using fingerprint Fourier transform infrared spectroscopy, chemometrics and genetic algorithm-based feature selection. Soil Biology & Biochemistry, 39 (11), 2888-2896. Sponsorship: BBSRC / NERC RAE2008

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes a filter-based algorithm for feature selection. The filter is based on the partitioning of the set of features into clusters. The number of clusters, and consequently the cardinality of the subset of selected features, is automatically estimated from data. The computational complexity of the proposed algorithm is also investigated. A variant of this filter that considers feature-class correlations is also proposed for classification problems. Empirical results involving ten datasets illustrate the performance of the developed algorithm, which in general has obtained competitive results in terms of classification accuracy when compared to state of the art algorithms that find clusters of features. We show that, if computational efficiency is an important issue, then the proposed filter May be preferred over their counterparts, thus becoming eligible to join a pool of feature selection algorithms to be used in practice. As an additional contribution of this work, a theoretical framework is used to formally analyze some properties of feature selection methods that rely on finding clusters of features. (C) 2011 Elsevier Inc. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Parkinson's disease (PD) automatic identification has been actively pursued over several works in the literature. In this paper, we deal with this problem by applying evolutionary-based techniques in order to find the subset of features that maximize the accuracy of the Optimum-Path Forest (OPF) classifier. The reason for the choice of this classifier relies on its fast training phase, given that each possible solution to be optimized is guided by the OPF accuracy. We also show results that improved other ones recently obtained in the context of PD automatic identification. © 2011 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Although nontechnical losses automatic identification has been massively studied, the problem of selecting the most representative features in order to boost the identification accuracy and to characterize possible illegal consumers has not attracted much attention in this context. In this paper, we focus on this problem by reviewing three evolutionary-based techniques for feature selection, and we also introduce one of them in this context. The results demonstrated that selecting the most representative features can improve a lot of the classification accuracy of possible frauds in datasets composed by industrial and commercial profiles.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research proposes a generic methodology for dimensionality reduction upon time-frequency representations applied to the classification of different types of biosignals. The methodology directly deals with the highly redundant and irrelevant data contained in these representations, combining a first stage of irrelevant data removal by variable selection, with a second stage of redundancy reduction using methods based on linear transformations. The study addresses two techniques that provided a similar performance: the first one is based on the selection of a set of the most relevant time?frequency points, whereas the second one selects the most relevant frequency bands. The first methodology needs a lower quantity of components, leading to a lower feature space; but the second improves the capture of the time-varying dynamics of the signal, and therefore provides a more stable performance. In order to evaluate the generalization capabilities of the methodology proposed it has been applied to two types of biosignals with different kinds of non-stationary behaviors: electroencephalographic and phonocardiographic biosignals. Even when these two databases contain samples with different degrees of complexity and a wide variety of characterizing patterns, the results demonstrate a good accuracy for the detection of pathologies, over 98%.The results open the possibility to extrapolate the methodology to the study of other biosignals.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

X. Wang, J. Yang, X. Teng, W. Xia, and R. Jensen. Feature Selection based on Rough Sets and Particle Swarm Optimization. Pattern Recognition Letters, vol. 28, no. 4, pp. 459-471, 2007.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Continuous user authentication with keystroke dynamics uses characters sequences as features. Since users can type characters in any order, it is imperative to find character sequences (n-graphs) that are representative of user typing behavior. The contemporary feature selection approaches do not guarantee selecting frequently-typed features which may cause less accurate statistical user-representation. Furthermore, the selected features do not inherently reflect user typing behavior. We propose four statistical based feature selection techniques that mitigate limitations of existing approaches. The first technique selects the most frequently occurring features. The other three consider different user typing behaviors by selecting: n-graphs that are typed quickly; n-graphs that are typed with consistent time; and n-graphs that have large time variance among users. We use Gunetti’s keystroke dataset and k-means clustering algorithm for our experiments. The results show that among the proposed techniques, the most-frequent feature selection technique can effectively find user representative features. We further substantiate our results by comparing the most-frequent feature selection technique with three existing approaches (popular Italian words, common n-graphs, and least frequent ngraphs). We find that it performs better than the existing approaches after selecting a certain number of most-frequent n-graphs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Outlier detection in high dimensional categorical data has been a problem of much interest due to the extensive use of qualitative features for describing the data across various application areas. Though there exist various established methods for dealing with the dimensionality aspect through feature selection on numerical data, the categorical domain is actively being explored. As outlier detection is generally considered as an unsupervised learning problem due to lack of knowledge about the nature of various types of outliers, the related feature selection task also needs to be handled in a similar manner. This motivates the need to develop an unsupervised feature selection algorithm for efficient detection of outliers in categorical data. Addressing this aspect, we propose a novel feature selection algorithm based on the mutual information measure and the entropy computation. The redundancy among the features is characterized using the mutual information measure for identifying a suitable feature subset with less redundancy. The performance of the proposed algorithm in comparison with the information gain based feature selection shows its effectiveness for outlier detection. The efficacy of the proposed algorithm is demonstrated on various high-dimensional benchmark data sets employing two existing outlier detection methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a feature selection method for data classification, which combines a model-based variable selection technique and a fast two-stage subset selection algorithm. The relationship between a specified (and complete) set of candidate features and the class label is modelled using a non-linear full regression model which is linear-in-the-parameters. The performance of a sub-model measured by the sum of the squared-errors (SSE) is used to score the informativeness of the subset of features involved in the sub-model. The two-stage subset selection algorithm approaches a solution sub-model with the SSE being locally minimized. The features involved in the solution sub-model are selected as inputs to support vector machines (SVMs) for classification. The memory requirement of this algorithm is independent of the number of training patterns. This property makes this method suitable for applications executed in mobile devices where physical RAM memory is very limited. An application was developed for activity recognition, which implements the proposed feature selection algorithm and an SVM training procedure. Experiments are carried out with the application running on a PDA for human activity recognition using accelerometer data. A comparison with an information gain based feature selection method demonstrates the effectiveness and efficiency of the proposed algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Phishing emails are more dynamic and cause high risk of significant data, brand and financial loss to average computer user and organizations. To address this problem, we propose a hybrid feature selection approach based on combination of content-based and behavior-based. Our proposed hybrid features selections are able to achieve 93% accuracy rate as compared to other approaches. In addition, we successfully tested the quality of our proposed behavior-based feature using the Information Gain, Gain Ratio and Symmetrical Uncertainty.