58 resultados para correlation-based feature selection

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes a filter-based algorithm for feature selection. The filter is based on the partitioning of the set of features into clusters. The number of clusters, and consequently the cardinality of the subset of selected features, is automatically estimated from data. The computational complexity of the proposed algorithm is also investigated. A variant of this filter that considers feature-class correlations is also proposed for classification problems. Empirical results involving ten datasets illustrate the performance of the developed algorithm, which in general has obtained competitive results in terms of classification accuracy when compared to state of the art algorithms that find clusters of features. We show that, if computational efficiency is an important issue, then the proposed filter May be preferred over their counterparts, thus becoming eligible to join a pool of feature selection algorithms to be used in practice. As an additional contribution of this work, a theoretical framework is used to formally analyze some properties of feature selection methods that rely on finding clusters of features. (C) 2011 Elsevier Inc. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents the formulation of a combinatorial optimization problem with the following characteristics: (i) the search space is the power set of a finite set structured as a Boolean lattice; (ii) the cost function forms a U-shaped curve when applied to any lattice chain. This formulation applies for feature selection in the context of pattern recognition. The known approaches for this problem are branch-and-bound algorithms and heuristics that explore partially the search space. Branch-and-bound algorithms are equivalent to the full search, while heuristics are not. This paper presents a branch-and-bound algorithm that differs from the others known by exploring the lattice structure and the U-shaped chain curves of the search space. The main contribution of this paper is the architecture of this algorithm that is based on the representation and exploration of the search space by new lattice properties proven here. Several experiments, with well known public data, indicate the superiority of the proposed method to the sequential floating forward selection (SFFS), which is a popular heuristic that gives good results in very short computational time. In all experiments, the proposed method got better or equal results in similar or even smaller computational time. (C) 2009 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Feature selection is a pattern recognition approach to choose important variables according to some criteria in order to distinguish or explain certain phenomena (i.e., for dimensionality reduction). There are many genomic and proteomic applications that rely on feature selection to answer questions such as selecting signature genes which are informative about some biological state, e. g., normal tissues and several types of cancer; or inferring a prediction network among elements such as genes, proteins and external stimuli. In these applications, a recurrent problem is the lack of samples to perform an adequate estimate of the joint probabilities between element states. A myriad of feature selection algorithms and criterion functions have been proposed, although it is difficult to point the best solution for each application. Results: The intent of this work is to provide an open-source multiplataform graphical environment for bioinformatics problems, which supports many feature selection algorithms, criterion functions and graphic visualization tools such as scatterplots, parallel coordinates and graphs. A feature selection approach for growing genetic networks from seed genes ( targets or predictors) is also implemented in the system. Conclusion: The proposed feature selection environment allows data analysis using several algorithms, criterion functions and graphic visualization tools. Our experiments have shown the software effectiveness in two distinct types of biological problems. Besides, the environment can be used in different pattern recognition applications, although the main concern regards bioinformatics tasks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work, we take advantage of association rule mining to support two types of medical systems: the Content-based Image Retrieval (CBIR) systems and the Computer-Aided Diagnosis (CAD) systems. For content-based retrieval, association rules are employed to reduce the dimensionality of the feature vectors that represent the images and to improve the precision of the similarity queries. We refer to the association rule-based method to improve CBIR systems proposed here as Feature selection through Association Rules (FAR). To improve CAD systems, we propose the Image Diagnosis Enhancement through Association rules (IDEA) method. Association rules are employed to suggest a second opinion to the radiologist or a preliminary diagnosis of a new image. A second opinion automatically obtained can either accelerate the process of diagnosing or to strengthen a hypothesis, increasing the probability of a prescribed treatment be successful. Two new algorithms are proposed to support the IDEA method: to pre-process low-level features and to propose a preliminary diagnosis based on association rules. We performed several experiments to validate the proposed methods. The results indicate that association rules can be successfully applied to improve CBIR and CAD systems, empowering the arsenal of techniques to support medical image analysis in medical systems. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we propose a method based on association rule-mining to enhance the diagnosis of medical images (mammograms). It combines low-level features automatically extracted from images and high-level knowledge from specialists to search for patterns. Our method analyzes medical images and automatically generates suggestions of diagnoses employing mining of association rules. The suggestions of diagnosis are used to accelerate the image analysis performed by specialists as well as to provide them an alternative to work on. The proposed method uses two new algorithms, PreSAGe and HiCARe. The PreSAGe algorithm combines, in a single step, feature selection and discretization, and reduces the mining complexity. Experiments performed on PreSAGe show that this algorithm is highly suitable to perform feature selection and discretization in medical images. HiCARe is a new associative classifier. The HiCARe algorithm has an important property that makes it unique: it assigns multiple keywords per image to suggest a diagnosis with high values of accuracy. Our method was applied to real datasets, and the results show high sensitivity (up to 95%) and accuracy (up to 92%), allowing us to claim that the use of association rules is a powerful means to assist in the diagnosing task.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thanks to recent advances in molecular biology, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as cDNA microarrays and RNA-Seq. Particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. Methods have been developed for gene networks modeling and identification from expression profiles. However, an important open problem regards how to validate such approaches and its results. This work presents an objective approach for validation of gene network modeling and identification which comprises the following three main aspects: (1) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (2) a computational method for gene network identification from the simulated data, which is founded on a feature selection approach where a target gene is fixed and the expression profile is observed for all other genes in order to identify a relevant subset of predictors; and (3) validation of the identified AGN-based network through comparison with the original network. The proposed framework allows several types of AGNs to be generated and used in order to simulate temporal expression data. The results of the network identification method can then be compared to the original network in order to estimate its properties and accuracy. Some of the most important theoretical models of complex networks have been assessed: the uniformly-random Erdos-Renyi (ER), the small-world Watts-Strogatz (WS), the scale-free Barabasi-Albert (BA), and geographical networks (GG). The experimental results indicate that the inference method was sensitive to average degree k variation, decreasing its network recovery rate with the increase of k. The signal size was important for the inference method to get better accuracy in the network identification rate, presenting very good results with small expression profiles. However, the adopted inference method was not sensible to recognize distinct structures of interaction among genes, presenting a similar behavior when applied to different network topologies. In summary, the proposed framework, though simple, was adequate for the validation of the inferred networks by identifying some properties of the evaluated method, which can be extended to other inference methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: The inference of gene regulatory networks (GRNs) from large-scale expression profiles is one of the most challenging problems of Systems Biology nowadays. Many techniques and models have been proposed for this task. However, it is not generally possible to recover the original topology with great accuracy, mainly due to the short time series data in face of the high complexity of the networks and the intrinsic noise of the expression measurements. In order to improve the accuracy of GRNs inference methods based on entropy (mutual information), a new criterion function is here proposed. Results: In this paper we introduce the use of generalized entropy proposed by Tsallis, for the inference of GRNs from time series expression profiles. The inference process is based on a feature selection approach and the conditional entropy is applied as criterion function. In order to assess the proposed methodology, the algorithm is applied to recover the network topology from temporal expressions generated by an artificial gene network (AGN) model as well as from the DREAM challenge. The adopted AGN is based on theoretical models of complex networks and its gene transference function is obtained from random drawing on the set of possible Boolean functions, thus creating its dynamics. On the other hand, DREAM time series data presents variation of network size and its topologies are based on real networks. The dynamics are generated by continuous differential equations with noise and perturbation. By adopting both data sources, it is possible to estimate the average quality of the inference with respect to different network topologies, transfer functions and network sizes. Conclusions: A remarkable improvement of accuracy was observed in the experimental results by reducing the number of false connections in the inferred topology by the non-Shannon entropy. The obtained best free parameter of the Tsallis entropy was on average in the range 2.5 <= q <= 3.5 (hence, subextensive entropy), which opens new perspectives for GRNs inference methods based on information theory and for investigation of the nonextensivity of such networks. The inference algorithm and criterion function proposed here were implemented and included in the DimReduction software, which is freely available at http://sourceforge.net/projects/dimreduction and http://code.google.com/p/dimreduction/.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The design of translation invariant and locally defined binary image operators over large windows is made difficult by decreased statistical precision and increased training time. We present a complete framework for the application of stacked design, a recently proposed technique to create two-stage operators that circumvents that difficulty. We propose a novel algorithm, based on Information Theory, to find groups of pixels that should be used together to predict the Output Value. We employ this algorithm to automate the process of creating a set of first-level operators that are later combined in a global operator. We also propose a principled way to guide this combination, by using feature selection and model comparison. Experimental results Show that the proposed framework leads to better results than single stage design. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Breast weight has great economic importance in poultry industry, and may be associated with other variables. This work aimed to estimate phenotypic correlations between performance (live body weight at 7 and 28 days, and at slaughter, and depth of the breast muscle measured by ultrasonography), carcass (eviscerated body weight and leg weight) and body composition (heart, liver and abdominal fat weight) traits in a broiler line, and quantify the direct and indirect influence of these traits on breast weight. Path analysis was used by expanding the matrix of partial correlation in coefficients which give the direct influence of one trait on another, regardless the effect of the other traits. The simultaneous maintenance of live body weight at slaughter and eviscerated body weight in the matrix of correlations might be harmful for statistical analysis involving systems of normal equations, like path analysis, due to the observed multicollinearity. The live body weight at slaughter and the depth of the breast muscle as measured by ultrasonography directly affected breast weight and were identified as the most responsible factors for the magnitude of the correlation coefficients obtained between the studied traits and breast weight. Individual pre-selection for these traits could favor an increased breast weight in the future reproducer candidates of this line if the broilers' environmental conditions and housing are maintained, since the live body weight at slaughter and the depth of breast muscle measured by ultrasonography were directly related to breast weight.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Age-related changes in running kinematics have been reported in the literature using classical inferential statistics. However, this approach has been hampered by the increased number of biomechanical gait variables reported and subsequently the lack of differences presented in these studies. Data mining techniques have been applied in recent biomedical studies to solve this problem using a more general approach. In the present work, we re-analyzed lower extremity running kinematic data of 17 young and 17 elderly male runners using the Support Vector Machine (SVM) classification approach. In total, 31 kinematic variables were extracted to train the classification algorithm and test the generalized performance. The results revealed different accuracy rates across three different kernel methods adopted in the classifier, with the linear kernel performing the best. A subsequent forward feature selection algorithm demonstrated that with only six features, the linear kernel SVM achieved 100% classification performance rate, showing that these features provided powerful combined information to distinguish age groups. The results of the present work demonstrate potential in applying this approach to improve knowledge about the age-related differences in running gait biomechanics and encourages the use of the SVM in other clinical contexts. (C) 2010 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Successful classification, information retrieval and image analysis tools are intimately related with the quality of the features employed in the process. Pixel intensities, color, texture and shape are, generally, the basis from which most of the features are Computed and used in such fields. This papers presents a novel shape-based feature extraction approach where an image is decomposed into multiple contours, and further characterized by Fourier descriptors. Unlike traditional approaches we make use of topological knowledge to generate well-defined closed contours, which are efficient signatures for image retrieval. The method has been evaluated in the CBIR context and image analysis. The results have shown that the multi-contour decomposition, as opposed to a single shape information, introduced a significant improvement in the discrimination power. (c) 2008 Elsevier B.V. All rights reserved,

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Survival or longevity is an economically important trait in beef cattle. The main inconvenience for its inclusion in selection criteria is delayed recording of phenotypic data and the high computational demand for including survival in proportional hazard models. Thus, identification of a longevity-correlated trait that could be recorded early in life would be very useful for selection purposes. We estimated the genetic relationship of survival with productive and reproductive traits in Nellore cattle, including weaning weight (WW), post-weaning growth (PWG), muscularity (MUSC), scrotal circumference at 18 months (SC18), and heifer pregnancy (HP). Survival was measured in discrete time intervals and modeled through a sequential threshold model. Five independent bivariate Bayesian analyses were performed, accounting for cow survival and the five productive and reproductive traits. Posterior mean estimates for heritability (standard deviation in parentheses) were 0.55 (0.01) for WW, 0.25 (0.01) for PWG, 0.23 (0.01) for MUSC, and 0.48 (0.01) for SC18. The posterior mean estimates (95% confidence interval in parentheses) for the genetic correlation with survival were 0.16 (0.13-0.19), 0.30 (0.25-0.34), 0.31 (0.25-0.36), 0.07 (0.02-0.12), and 0.82 (0.78-0.86) for WW, PWG, MUSC, SC18, and HP, respectively. Based on the high genetic correlation and heritability (0.54) posterior mean estimates for HP, the expected progeny difference for HP can be used to select bulls for longevity, as well as for post-weaning gain and muscle score.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This study presents a decision-making method for maintenance policy selection of power plants equipment. The method is based on risk analysis concepts. The method first step consists in identifying critical equipment both for power plant operational performance and availability based on risk concepts. The second step involves the proposal of a potential maintenance policy that could be applied to critical equipment in order to increase its availability. The costs associated with each potential maintenance policy must be estimated, including the maintenance costs and the cost of failure that measures the critical equipment failure consequences for the power plant operation. Once the failure probabilities and the costs of failures are estimated, a decision-making procedure is applied to select the best maintenance policy. The decision criterion is to minimize the equipment cost of failure, considering the costs and likelihood of occurrence of failure scenarios. The method is applied to the analysis of a lubrication oil system used in gas turbines journal bearings. The turbine has more than 150 MW nominal output, installed in an open cycle thermoelectric power plant. A design modification with the installation of a redundant oil pump is proposed for lubricating oil system availability improvement. (C) 2009 Elsevier Ltd. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Object. The goal of this paper is to analyze the extension and relationships of glomus jugulare tumor with the temporal bone and the results of its surgical treatment aiming at preservation of the facial nerve. Based on the tumor extension and its relationships with the facial nerve, new criteria to be used in the selection of different surgical approaches are proposed. Methods. Between December 1997 and December 2007, 34 patients (22 female and 12 male) with glomus jugulare tumors were treated. Their mean age was 48 years. The mean follow-up was 52.5 months. Clinical findings included hearing loss in 88%, swallowing disturbance in 50%, and facial nerve palsy in 41%. Magnetic resonance imaging demonstrated a mass in the jugular foramen in all cases, a mass in the middle ear in 97%, a cervical mass in 85%, and an intradural mass in 41%. The tumor was supplied by the external carotid artery in all cases, the internal carotid artery in 44%, and the vertebral artery in 32%. Preoperative embolization was performed in 15 cases. The approach was tailored to each patient, and 4 types of approaches were designed. The infralabyrinthine retrofacial approach (Type A) was used in 32.5%; infralabyrinthine pre- and retrofacial approach without occlusion of the external acoustic meatus (Type B) in 20.5%; infralabyrinthine pre- and retrofacial approach with occlusion of the external acoustic meatus (Type C) in 41 W. and the infralabyrinthine approach with transposition of the facial nerve and removal of the middle ear structures (Type D) in 6% of the patients. Results. Radical removal was achieved in 91% of the cases and partial removal in 9%. Among 20 patients without preoperative facial nerve dysfunction, the nerve was kept in anatomical position in 19 (95%), and facial nerve function was normal during the immediate postoperative period in 17 (85%). Six patients (17.6%) had a new lower cranial nerve deficit, but recovery of swallowing function was adequate in all cases. Voice disturbance remained in all 6 cases. Cerebrospinal fluid leakage occurred in 6 patients (17.6%), with no need for reoperation in any of them. One patient died in the postoperative period due to pulmonary complications. The global recovery, based on the Karnofsky Performance Scale (KPS), was 100% in 15% of the patients, 90% in 45%, 80% in 33%, and 70% in 6%. Conclusions. Radical removal of glomus jugulare tumor can be achieved without anterior transposition of the facial nerve. The extension of dissection, however, should be tailored to each case based on tumor blood supply, preoperative symptoms, and tumor extension. The operative field provided by the retrofacial infralabyrinthine approach, or the pre- and retrofacial approaches. with or without Closure of the external acoustic meatus, allows a wide exposure of the jugular foramen area. Global functional recovery based on the KPS is acceptable in 94% of the patients. (DOI: 10.3171/2008.10.JNS08612)

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Attention is a critical mechanism for visual scene analysis. By means of attention, it is possible to break down the analysis of a complex scene to the analysis of its parts through a selection process. Empirical studies demonstrate that attentional selection is conducted on visual objects as a whole. We present a neurocomputational model of object-based selection in the framework of oscillatory correlation. By segmenting an input scene and integrating the segments with their conspicuity obtained from a saliency map, the model selects salient objects rather than salient locations. The proposed system is composed of three modules: a saliency map providing saliency values of image locations, image segmentation for breaking the input scene into a set of objects, and object selection which allows one of the objects of the scene to be selected at a time. This object selection system has been applied to real gray-level and color images and the simulation results show the effectiveness of the system. (C) 2010 Elsevier Ltd. All rights reserved.