Biblioteca Digital

945 resultados para DECISION-TREE INDUCTION

A survey of evolutionary algorithms for decision-tree induction

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a survey of evolutionary algorithms that are designed for decision-tree induction. In this context, most of the paper focuses on approaches that evolve decision trees as an alternate heuristics to the traditional top-down divide-and-conquer approach. Additionally, we present some alternative methods that make use of evolutionary algorithms to improve particular components of decision-tree classifiers. The paper's original contributions are the following. First, it provides an up-to-date overview that is fully focused on evolutionary algorithms and decision trees and does not concentrate on any specific evolutionary approach. Second, it provides a taxonomy, which addresses works that evolve decision trees and works that design decision-tree components by the use of evolutionary algorithms. Finally, a number of references are provided that describe applications of evolutionary algorithms for decision-tree induction in different domains. At the end of this paper, we address some important issues and open questions that can be the subject of future research.

Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. Results: The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. Conclusions: We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.

DECISION TREE CLASSIFIERS FOR STAR/GALAXY SEPARATION

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS-DR7). Each algorithm is defined by a set of parameters which, when varied, produce different final classification trees. We extensively explore the parameter space of each algorithm, using the set of 884,126 SDSS objects with spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured by the mean completeness in two magnitude intervals: 14 <= r <= 21 (85.2%) and r >= 19 (82.1%). We compare the performance of the tree generated with the optimal FT configuration to the classifications provided by the SDSS parametric classifier, 2DPHOT, and Ball et al. We find that our FT classifier is comparable to or better in completeness over the full magnitude range 15 <= r <= 21, with much lower contamination than all but the Ball et al. classifier. At the faintest magnitudes (r > 19), our classifier is the only one that maintains high completeness (> 80%) while simultaneously achieving low contamination (similar to 2.5%). We also examine the SDSS parametric classifier (psfMag - modelMag) to see if the dividing line between stars and galaxies can be adjusted to improve the classifier. We find that currently stars in close pairs are often misclassified as galaxies, and suggest a new cut to improve the classifier. Finally, we apply our FT classifier to separate stars from galaxies in the full set of 69,545,326 SDSS photometric objects in the magnitude range 14 <= r <= 21.

Novel benzofuroxan derivatives against multidrug-resistant Staphylococcus aureus strains: Design using Topliss` decision tree, synthesis and biological assay

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of this study was the design of a set of benzofuroxan derivatives as antimicrobial agents exploring the physicochemical properties of the related substituents. Topliss` decision tree approach was applied to select the substituent groups. Hierarchical cluster analysis was also performed to emphasize natural clusters and patterns. The compounds were obtained using two synthetic approaches for reducing the synthetic steps as well as improving the yield. The minimal inhibitory concentration method was employed to evaluate the activity against multidrug-resistant Staphylococcus aureus strains. The most active compound was 4-nitro-3-(trifluoromethyl)[N`-(benzofuroxan-5-yl) methylene] benzhydrazide (MIC range 12.7-11.4 mu g/mL), pointing out that the antimicrobial activity was indeed influenced by the hydrophobic and electron-withdrawing property of the substituent groups 3-CF(3) and 4-NO(2), respectively. (C) 2011 Elsevier Ltd. All rights reserved.

Research on using the decision tree method in order to select the best alternative for the pharmacies supply

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Introduction / Aims: Adopting the important decisions represents a specific task of the manager. An efficient manager takes these decisions during a sistematic process with well-defined elements, each with a precise order. In the pharmaceutical practice and business, in the supply process of the pharmacies, there are situations when the medicine distributors offer a certain discount, but require payment in a shorter period of time. In these cases, the analysis of the offer can be made with the help of the decision tree method, which permits identifying the decision offering the best possible result in a given situation. The aims of the research have been the analysis of the product offers of many different suppliers and the establishing of the most advantageous ways of pharmacy supplying. Material / Methods: There have been studied the general product offers of the following medical stores: A&G Med, Farmanord, Farmexim, Mediplus, Montero and Relad. In the case of medicine offers including a discount, the decision tree method has been applied in order to select the most advantageous offers. The Decision Tree is a management method used in taking the right decisions and it is generally used when one needs to evaluate the decisions that involve a series of stages. The tree diagram is used in order to look for the most efficient means to attain a specific goal. The decision trees are the most probabilistic methods, useful when adopting risk taking decisions. Results: The results of the analysis on the tree diagrams have indicated the fact that purchasing medicines with discount (1%, 10%, 15%) and payment in a shorter time interval (120 days) is more profitable than purchasing without a discount and payment in a longer time interval (160 days). Discussion / Conclusion: Depending on the results of the tree diagram analysis, the pharmacies would purchase from the selected suppliers. The research has shown that the decision tree method represents a valuable work instrument in choosing the best ways for supplying pharmacies and it is very useful to the specialists from the pharmaceutical field, pharmaceutical management, to medicine suppliers, pharmacy practitioners from the community pharmacies and especially to pharmacy managers, chief – pharmacists.

Schistosomiasis risk mapping in the state of Minas Gerais, Brazil, using a decision tree approach, remote sensing data and sociological indicators

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Schistosomiasis mansoni is not just a physical disease, but is related to social and behavioural factors as well. Snails of the Biomphalaria genus are an intermediate host for Schistosoma mansoni and infect humans through water. The objective of this study is to classify the risk of schistosomiasis in the state of Minas Gerais (MG). We focus on socioeconomic and demographic features, basic sanitation features, the presence of accumulated water bodies, dense vegetation in the summer and winter seasons and related terrain characteristics. We draw on the decision tree approach to infection risk modelling and mapping. The model robustness was properly verified. The main variables that were selected by the procedure included the terrain's water accumulation capacity, temperature extremes and the Human Development Index. In addition, the model was used to generate two maps, one that included risk classification for the entire of MG and another that included classification errors. The resulting map was 62.9% accurate.

Evaluation cardiologique préopératoire avant chirurgie non cardiaque: du choix cornélien a l'arbre décisionnel [Pre-operative cardiac assessment in non-cardiac surgery: a frequent dilemma simplified by a decision tree].

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In patients undergoing non-cardiac surgery, cardiac events are the most common cause of perioperative morbidity and mortality. It is often difficult to choose adequate cardiologic examinations before surgery. This paper, inspired by the guidelines of the European and American societies of cardiology (ESC, AHA, ACC), discusses the place of standard ECG, echocardiography, treadmill or bicycle ergometer and pharmacological stress testing in preoperative evaluations. The role of coronary angiography and prophylactic revascularization will also be discussed. Finally, we provide a decision tree which will be helpful to both general practitioners and specialists.

A Novel Decision Tree Algorithm for Numeric Datasets - C 4.5*Stat

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining

Prediction of Learning Disabilities in School Age Children using SVM and Decision Tree

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper highlights the prediction of Learning Disabilities (LD) in school-age children using two classification methods, Support Vector Machine (SVM) and Decision Tree (DT), with an emphasis on applications of data mining. About 10% of children enrolled in school have a learning disability. Learning disability prediction in school age children is a very complicated task because it tends to be identified in elementary school where there is no one sign to be identified. By using any of the two classification methods, SVM and DT, we can easily and accurately predict LD in any child. Also, we can determine the merits and demerits of these two classifiers and the best one can be selected for the use in the relevant field. In this study, Sequential Minimal Optimization (SMO) algorithm is used in performing SVM and J48 algorithm is used in constructing decision trees.

Early diagnosis of myocardial damage in Chagas disease: a decision tree application

Relevância:

100.00% 100.00%

Publicador:

Induction of modular classification rules: using Jmax-pruning

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Prism family of algorithms induces modular classification rules which, in contrast to decision tree induction algorithms, do not necessarily fit together into a decision tree structure. Classifiers induced by Prism algorithms achieve a comparable accuracy compared with decision trees and in some cases even outperform decision trees. Both kinds of algorithms tend to overfit on large and noisy datasets and this has led to the development of pruning methods. Pruning methods use various metrics to truncate decision trees or to eliminate whole rules or single rule terms from a Prism rule set. For decision trees many pre-pruning and postpruning methods exist, however for Prism algorithms only one pre-pruning method has been developed, J-pruning. Recent work with Prism algorithms examined J-pruning in the context of very large datasets and found that the current method does not use its full potential. This paper revisits the J-pruning method for the Prism family of algorithms and develops a new pruning method Jmax-pruning, discusses it in theoretical terms and evaluates it empirically.

Decision Tree Classification Model In Water Supply Network

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the service life of water supply network (WSN) growth, the growing phenomenon of aging pipe network has become exceedingly serious. As urban water supply network is hidden underground asset, it is difficult for monitoring staff to make a direct classification towards the faults of pipe network by means of the modern detecting technology. In this paper, based on the basic property data (e.g. diameter, material, pressure, distance to pump, distance to tank, load, etc.) of water supply network, decision tree algorithm (C4.5) has been carried out to classify the specific situation of water supply pipeline. Part of the historical data was used to establish a decision tree classification model, and the remaining historical data was used to validate this established model. Adopting statistical methods were used to access the decision tree model including basic statistical method, Receiver Operating Characteristic (ROC) and Recall-Precision Curves (RPC). These methods has been successfully used to assess the accuracy of this established classification model of water pipe network. The purpose of classification model was to classify the specific condition of water pipe network. It is important to maintain the pipeline according to the classification results including asset unserviceable (AU), near perfect condition (NPC) and serious deterioration (SD). Finally, this research focused on pipe classification which plays a significant role in maintaining water supply networks in the future.

Spatial Distribution of Seropositive Dogs to Leptospira spp. and Evaluation of Leptospirosis Risk Factors Using a Decision Tree

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Leptospirosis is an important zoonotic disease associated with poor areas of urban settings of developing countries and early diagnosis and prompt treatment may prevent disease. Although rodents are reportedly considered the main reservoirs of leptospirosis, dogs may develop the disease, may become asymptomatic carriers and may be used as sentinels for disease epidemiology. The use of Geographical Information Systems (GIS) combined with spatial analysis techniques allows the mapping of the disease and the identification and assessment of health risk factors. Besides the use of GIS and spatial analysis, the technique of data mining, decision tree, can provide a great potential to find a pattern in the behavior of the variables that determine the occurrence of leptospirosis. The objective of the present study was to apply Geographical Information Systems and data prospection (decision tree) to evaluate the risk factors for canine leptospirosis in an area of Curitiba, PR.Materials, Methods & Results: The present study was performed on the Vila Pantanal, a urban poor community in the city of Curitiba. A total of 287 dog blood samples were randomly obtained house-by-house in a two-day sampling on January 2010. In addition, a questionnaire was applied to owners at the time of sampling. Geographical coordinates related to each household of tested dog were obtained using a Global Positioning System (GPS) for mapping the spatial distribution of reagent and non-reagent dogs to leptospirosis. For the decision tree, risk factors included results of microagglutination test (MAT) from the serum of dogs, previous disease on the household, contact with rats or other dogs, dog breed, outdoors access, feeding, trash around house or backyard, open sewer proximity and flooding. A total of 189 samples (about 2/3 of overall samples) were randomly selected for the training file and consequent decision rules. The remained 98 samples were used for the testing file. The seroprevalence showed a pattern of spatial distribution that involved all the Pantanal area, without agglomeration of reagent animals. In relation to data mining, from 189 samples used in decision tree, a total of 165 (87.3%) animal samples were correctly classified, generating a Kappa index of 0.413. A total of 154 out of 159 (96.8%) samples were considered non-reagent and were correctly classified and only 5/159 (3.2%) were wrongly identified. on the other hand, only 11 (36.7%) reagent samples were correctly classified, with 19 (63.3%) samples failing diagnosis.Discussion: The spatial distribution that involved all the Pantanal area showed that all the animals in the area are at risk of contamination by Leptospira spp. Although most samples had been classified correctly by the decision tree, a degree of difficulty of separability related to seropositive animals was observed, with only 36.7% of the samples classified correctly. This can occur due to the fact of seronegative animals number is superior to the number of seropositive ones, taking the differences in the pattern of variable behavior. The data mining helped to evaluate the most important risk factors for leptospirosis in an urban poor community of Curitiba. The variables selected by decision tree reflected the important factors about the existence of the disease (default of sewer, presence of rats and rubbish and dogs with free access to street). The analyses showed the multifactorial character of the epidemiology of canine leptospirosis.

Evolving decision trees with beam search-based initialization and lexicographic multi-objective evaluation

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Decision tree induction algorithms represent one of the most popular techniques for dealing with classification problems. However, traditional decision-tree induction algorithms implement a greedy approach for node splitting that is inherently susceptible to local optima convergence. Evolutionary algorithms can avoid the problems associated with a greedy search and have been successfully employed to the induction of decision trees. Previously, we proposed a lexicographic multi-objective genetic algorithm for decision-tree induction, named LEGAL-Tree. In this work, we propose extending this approach substantially, particularly w.r.t. two important evolutionary aspects: the initialization of the population and the fitness function. We carry out a comprehensive set of experiments to validate our extended algorithm. The experimental results suggest that it is able to outperform both traditional algorithms for decision-tree induction and another evolutionary algorithm in a variety of application domains.

Reduction of Transient Particulate Matter Spikes with Decision Tree Based Control

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Decision trees have been proposed as a basis for modifying table based injection to reduce transient particulate spikes during the turbocharger lag period. It has been shown that decision trees can detect particulate spikes in real time. In well calibrated electronically controlled diesel engines these spikes are narrow and are encompassed by a wider NOx spike. Decision trees have been shown to pinpoint the exact location of measured opacity spikes in real time thus enabling targeted PM reduction with near zero NOx penalty. A calibrated dimensional model has been used to demonstrate the possible reduction of particulate matter with targeted injection pressure pulses. Post injection strategy optimized for near stoichiometric combustion has been shown to provide additional benefits. Empirical models have been used to calculate emission tradeoffs over the entire FTP cycle. An empirical model based transient calibration has been used to demonstrate that such targeted transient modifiers are more beneficial at lower engine-out NOx levels.

«
1
2
3
4
5
6
7
8
...
62
63
»