959 resultados para feature selection


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Accurate detection of depression at an individual level using structural magnetic resonance imaging (sMRI) remains a challenge. Brain volumetric changes at a structural level appear to have importance in depression biomarkers studies. An automated algorithm is developed to select brain sMRI volumetric features for the detection of depression.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modern healthcare is getting reshaped by growing Electronic Medical Records (EMR). Recently, these records have been shown of great value towards building clinical prediction models. In EMR data, patients' diseases and hospital interventions are captured through a set of diagnoses and procedures codes. These codes are usually represented in a tree form (e.g. ICD-10 tree) and the codes within a tree branch may be highly correlated. These codes can be used as features to build a prediction model and an appropriate feature selection can inform a clinician about important risk factors for a disease. Traditional feature selection methods (e.g. Information Gain, T-test, etc.) consider each variable independently and usually end up having a long feature list. Recently, Lasso and related l1-penalty based feature selection methods have become popular due to their joint feature selection property. However, Lasso is known to have problems of selecting one feature of many correlated features randomly. This hinders the clinicians to arrive at a stable feature set, which is crucial for clinical decision making process. In this paper, we solve this problem by using a recently proposed Tree-Lasso model. Since, the stability behavior of Tree-Lasso is not well understood, we study the stability behavior of Tree-Lasso and compare it with other feature selection methods. Using a synthetic and two real-world datasets (Cancer and Acute Myocardial Infarction), we show that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test. We further show that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree-Lasso is comparable to Lasso and better than other methods. Our result has implications in identifying stable risk factors for many healthcare problems and therefore can potentially assist clinical decision making for accurate medical prognosis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The support vector machine (SVM) is a popular method for classification, well known for finding the maximum-margin hyperplane. Combining SVM with l1-norm penalty further enables it to simultaneously perform feature selection and margin maximization within a single framework. However, l1-norm SVM shows instability in selecting features in presence of correlated features. We propose a new method to increase the stability of l1-norm SVM by encouraging similarities between feature weights based on feature correlations, which is captured via a feature covariance matrix. Our proposed method can capture both positive and negative correlations between features. We formulate the model as a convex optimization problem and propose a solution based on alternating minimization. Using both synthetic and real-world datasets, we show that our model achieves better stability and classification accuracy compared to several state-of-the-art regularized classification methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Feature selection is an important step in building predictive models for most real-world problems. One of the popular methods in feature selection is Lasso. However, it shows instability in selecting features when dealing with correlated features. In this work, we propose a new method that aims to increase the stability of Lasso by encouraging similarities between features based on their relatedness, which is captured via a feature covariance matrix. Besides modeling positive feature correlations, our method can also identify negative correlations between features. We propose a convex formulation for our model along with an alternating optimization algorithm that can learn the weights of the features as well as the relationship between them. Using both synthetic and real-world data, we show that the proposed method is more stable than Lasso and many state-of-the-art shrinkage and feature selection methods. Also, its predictive performance is comparable to other methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we deal with the problem of feature selection by introducing a new approach based on Gravitational Search Algorithm (GSA). The proposed algorithm combines the optimization behavior of GSA together with the speed of Optimum-Path Forest (OPF) classifier in order to provide a fast and accurate framework for feature selection. Experiments on datasets obtained from a wide range of applications, such as vowel recognition, image classification and fraud detection in power distribution systems are conducted in order to asses the robustness of the proposed technique against Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and a Particle Swarm Optimization (PSO)-based algorithm for feature selection.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Parkinson's disease (PD) automatic identification has been actively pursued over several works in the literature. In this paper, we deal with this problem by applying evolutionary-based techniques in order to find the subset of features that maximize the accuracy of the Optimum-Path Forest (OPF) classifier. The reason for the choice of this classifier relies on its fast training phase, given that each possible solution to be optimized is guided by the OPF accuracy. We also show results that improved other ones recently obtained in the context of PD automatic identification. © 2011 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work has as objectives the implementation of a intelligent computational tool to identify the non-technical losses and to select its most relevant features, considering information from the database with industrial consumers profiles of a power company. The solution to this problem is not trivial and not of regional character, the minimization of non-technical loss represents the guarantee of investments in product quality and maintenance of power systems, introduced by a competitive environment after the period of privatization in the national scene. This work presents using the WEKA software to the proposed objective, comparing various classification techniques and optimization through intelligent algorithms, this way, can be possible to automate applications on Smart Grids. © 2012 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Feature selection aims to find the most important information from a given set of features. As this task can be seen as an optimization problem, the combinatorial growth of the possible solutions may be in-viable for a exhaustive search. In this paper we propose a new nature-inspired feature selection technique based on the bats behaviour, which has never been applied to this context so far. The wrapper approach combines the power of exploration of the bats together with the speed of the Optimum-Path Forest classifier to find the set of features that maximizes the accuracy in a validating set. Experiments conducted in five public datasets have demonstrated that the proposed approach can outperform some well-known swarm-based techniques. © 2012 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Feature selection aims to find the most important information to save computational efforts and data storage. We formulated this task as a combinatorial optimization problem since the exponential growth of possible solutions makes an exhaustive search infeasible. In this work, we propose a new nature-inspired feature selection technique based on bats behavior, namely, binary bat algorithm The wrapper approach combines the power of exploration of the bats together with the speed of the optimum-path forest classifier to find a better data representation. Experiments in public datasets have shown that the proposed technique can indeed improve the effectiveness of the optimum-path forest and outperform some well-known swarm-based techniques. © 2013 Copyright © 2013 Elsevier Inc. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Feature selection has been actively pursued in the last years, since to find the most discriminative set of features can enhance the recognition rates and also to make feature extraction faster. In this paper, the propose a new feature selection called Binary Cuckoo Search, which is based on the behavior of cuckoo birds. The experiments were carried out in the context of theft detection in power distribution systems in two datasets obtained from a Brazilian electrical power company, and have demonstrated the robustness of the proposed technique against with several others nature-inspired optimization techniques. © 2013 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Feature selection aims to find the most important information from a given set of features. As this task can be seen as an optimization problem, the combinatorial growth of the possible solutions may be inviable for a exhaustive search. In this paper we propose a new nature-inspired feature selection technique based on the Charged System Search (CSS), which has never been applied to this context so far. The wrapper approach combines the power of exploration of CSS together with the speed of the Optimum-Path Forest classifier to find the set of features that maximizes the accuracy in a validating set. Experiments conducted in four public datasets have demonstrated the validity of the proposed approach can outperform some well-known swarm-based techniques. © 2013 Springer-Verlag.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Besides optimizing classifier predictive performance and addressing the curse of the dimensionality problem, feature selection techniques support a classification model as simple as possible. In this paper, we present a wrapper feature selection approach based on Bat Algorithm (BA) and Optimum-Path Forest (OPF), in which we model the problem of feature selection as an binary-based optimization technique, guided by BA using the OPF accuracy over a validating set as the fitness function to be maximized. Moreover, we present a methodology to better estimate the quality of the reduced feature set. Experiments conducted over six public datasets demonstrated that the proposed approach provides statistically significant more compact sets and, in some cases, it can indeed improve the classification effectiveness. © 2013 Elsevier Ltd. All rights reserved.