949 resultados para decision tree
Resumo:
We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS-DR7). Each algorithm is defined by a set of parameters which, when varied, produce different final classification trees. We extensively explore the parameter space of each algorithm, using the set of 884,126 SDSS objects with spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured by the mean completeness in two magnitude intervals: 14 <= r <= 21 (85.2%) and r >= 19 (82.1%). We compare the performance of the tree generated with the optimal FT configuration to the classifications provided by the SDSS parametric classifier, 2DPHOT, and Ball et al. We find that our FT classifier is comparable to or better in completeness over the full magnitude range 15 <= r <= 21, with much lower contamination than all but the Ball et al. classifier. At the faintest magnitudes (r > 19), our classifier is the only one that maintains high completeness (> 80%) while simultaneously achieving low contamination (similar to 2.5%). We also examine the SDSS parametric classifier (psfMag - modelMag) to see if the dividing line between stars and galaxies can be adjusted to improve the classifier. We find that currently stars in close pairs are often misclassified as galaxies, and suggest a new cut to improve the classifier. Finally, we apply our FT classifier to separate stars from galaxies in the full set of 69,545,326 SDSS photometric objects in the magnitude range 14 <= r <= 21.
Resumo:
The aim of this study was the design of a set of benzofuroxan derivatives as antimicrobial agents exploring the physicochemical properties of the related substituents. Topliss` decision tree approach was applied to select the substituent groups. Hierarchical cluster analysis was also performed to emphasize natural clusters and patterns. The compounds were obtained using two synthetic approaches for reducing the synthetic steps as well as improving the yield. The minimal inhibitory concentration method was employed to evaluate the activity against multidrug-resistant Staphylococcus aureus strains. The most active compound was 4-nitro-3-(trifluoromethyl)[N`-(benzofuroxan-5-yl) methylene] benzhydrazide (MIC range 12.7-11.4 mu g/mL), pointing out that the antimicrobial activity was indeed influenced by the hydrophobic and electron-withdrawing property of the substituent groups 3-CF(3) and 4-NO(2), respectively. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
Introduction / Aims: Adopting the important decisions represents a specific task of the manager. An efficient manager takes these decisions during a sistematic process with well-defined elements, each with a precise order. In the pharmaceutical practice and business, in the supply process of the pharmacies, there are situations when the medicine distributors offer a certain discount, but require payment in a shorter period of time. In these cases, the analysis of the offer can be made with the help of the decision tree method, which permits identifying the decision offering the best possible result in a given situation. The aims of the research have been the analysis of the product offers of many different suppliers and the establishing of the most advantageous ways of pharmacy supplying. Material / Methods: There have been studied the general product offers of the following medical stores: A&G Med, Farmanord, Farmexim, Mediplus, Montero and Relad. In the case of medicine offers including a discount, the decision tree method has been applied in order to select the most advantageous offers. The Decision Tree is a management method used in taking the right decisions and it is generally used when one needs to evaluate the decisions that involve a series of stages. The tree diagram is used in order to look for the most efficient means to attain a specific goal. The decision trees are the most probabilistic methods, useful when adopting risk taking decisions. Results: The results of the analysis on the tree diagrams have indicated the fact that purchasing medicines with discount (1%, 10%, 15%) and payment in a shorter time interval (120 days) is more profitable than purchasing without a discount and payment in a longer time interval (160 days). Discussion / Conclusion: Depending on the results of the tree diagram analysis, the pharmacies would purchase from the selected suppliers. The research has shown that the decision tree method represents a valuable work instrument in choosing the best ways for supplying pharmacies and it is very useful to the specialists from the pharmaceutical field, pharmaceutical management, to medicine suppliers, pharmacy practitioners from the community pharmacies and especially to pharmacy managers, chief – pharmacists.
Resumo:
Schistosomiasis mansoni is not just a physical disease, but is related to social and behavioural factors as well. Snails of the Biomphalaria genus are an intermediate host for Schistosoma mansoni and infect humans through water. The objective of this study is to classify the risk of schistosomiasis in the state of Minas Gerais (MG). We focus on socioeconomic and demographic features, basic sanitation features, the presence of accumulated water bodies, dense vegetation in the summer and winter seasons and related terrain characteristics. We draw on the decision tree approach to infection risk modelling and mapping. The model robustness was properly verified. The main variables that were selected by the procedure included the terrain's water accumulation capacity, temperature extremes and the Human Development Index. In addition, the model was used to generate two maps, one that included risk classification for the entire of MG and another that included classification errors. The resulting map was 62.9% accurate.
Resumo:
In patients undergoing non-cardiac surgery, cardiac events are the most common cause of perioperative morbidity and mortality. It is often difficult to choose adequate cardiologic examinations before surgery. This paper, inspired by the guidelines of the European and American societies of cardiology (ESC, AHA, ACC), discusses the place of standard ECG, echocardiography, treadmill or bicycle ergometer and pharmacological stress testing in preoperative evaluations. The role of coronary angiography and prophylactic revascularization will also be discussed. Finally, we provide a decision tree which will be helpful to both general practitioners and specialists.
Resumo:
Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining
Resumo:
This paper highlights the prediction of Learning Disabilities (LD) in school-age children using two classification methods, Support Vector Machine (SVM) and Decision Tree (DT), with an emphasis on applications of data mining. About 10% of children enrolled in school have a learning disability. Learning disability prediction in school age children is a very complicated task because it tends to be identified in elementary school where there is no one sign to be identified. By using any of the two classification methods, SVM and DT, we can easily and accurately predict LD in any child. Also, we can determine the merits and demerits of these two classifiers and the best one can be selected for the use in the relevant field. In this study, Sequential Minimal Optimization (SMO) algorithm is used in performing SVM and J48 algorithm is used in constructing decision trees.
Resumo:
With the service life of water supply network (WSN) growth, the growing phenomenon of aging pipe network has become exceedingly serious. As urban water supply network is hidden underground asset, it is difficult for monitoring staff to make a direct classification towards the faults of pipe network by means of the modern detecting technology. In this paper, based on the basic property data (e.g. diameter, material, pressure, distance to pump, distance to tank, load, etc.) of water supply network, decision tree algorithm (C4.5) has been carried out to classify the specific situation of water supply pipeline. Part of the historical data was used to establish a decision tree classification model, and the remaining historical data was used to validate this established model. Adopting statistical methods were used to access the decision tree model including basic statistical method, Receiver Operating Characteristic (ROC) and Recall-Precision Curves (RPC). These methods has been successfully used to assess the accuracy of this established classification model of water pipe network. The purpose of classification model was to classify the specific condition of water pipe network. It is important to maintain the pipeline according to the classification results including asset unserviceable (AU), near perfect condition (NPC) and serious deterioration (SD). Finally, this research focused on pipe classification which plays a significant role in maintaining water supply networks in the future.
Resumo:
Background: Leptospirosis is an important zoonotic disease associated with poor areas of urban settings of developing countries and early diagnosis and prompt treatment may prevent disease. Although rodents are reportedly considered the main reservoirs of leptospirosis, dogs may develop the disease, may become asymptomatic carriers and may be used as sentinels for disease epidemiology. The use of Geographical Information Systems (GIS) combined with spatial analysis techniques allows the mapping of the disease and the identification and assessment of health risk factors. Besides the use of GIS and spatial analysis, the technique of data mining, decision tree, can provide a great potential to find a pattern in the behavior of the variables that determine the occurrence of leptospirosis. The objective of the present study was to apply Geographical Information Systems and data prospection (decision tree) to evaluate the risk factors for canine leptospirosis in an area of Curitiba, PR.Materials, Methods & Results: The present study was performed on the Vila Pantanal, a urban poor community in the city of Curitiba. A total of 287 dog blood samples were randomly obtained house-by-house in a two-day sampling on January 2010. In addition, a questionnaire was applied to owners at the time of sampling. Geographical coordinates related to each household of tested dog were obtained using a Global Positioning System (GPS) for mapping the spatial distribution of reagent and non-reagent dogs to leptospirosis. For the decision tree, risk factors included results of microagglutination test (MAT) from the serum of dogs, previous disease on the household, contact with rats or other dogs, dog breed, outdoors access, feeding, trash around house or backyard, open sewer proximity and flooding. A total of 189 samples (about 2/3 of overall samples) were randomly selected for the training file and consequent decision rules. The remained 98 samples were used for the testing file. The seroprevalence showed a pattern of spatial distribution that involved all the Pantanal area, without agglomeration of reagent animals. In relation to data mining, from 189 samples used in decision tree, a total of 165 (87.3%) animal samples were correctly classified, generating a Kappa index of 0.413. A total of 154 out of 159 (96.8%) samples were considered non-reagent and were correctly classified and only 5/159 (3.2%) were wrongly identified. on the other hand, only 11 (36.7%) reagent samples were correctly classified, with 19 (63.3%) samples failing diagnosis.Discussion: The spatial distribution that involved all the Pantanal area showed that all the animals in the area are at risk of contamination by Leptospira spp. Although most samples had been classified correctly by the decision tree, a degree of difficulty of separability related to seropositive animals was observed, with only 36.7% of the samples classified correctly. This can occur due to the fact of seronegative animals number is superior to the number of seropositive ones, taking the differences in the pattern of variable behavior. The data mining helped to evaluate the most important risk factors for leptospirosis in an urban poor community of Curitiba. The variables selected by decision tree reflected the important factors about the existence of the disease (default of sewer, presence of rats and rubbish and dogs with free access to street). The analyses showed the multifactorial character of the epidemiology of canine leptospirosis.
Resumo:
This paper presents a survey of evolutionary algorithms that are designed for decision-tree induction. In this context, most of the paper focuses on approaches that evolve decision trees as an alternate heuristics to the traditional top-down divide-and-conquer approach. Additionally, we present some alternative methods that make use of evolutionary algorithms to improve particular components of decision-tree classifiers. The paper's original contributions are the following. First, it provides an up-to-date overview that is fully focused on evolutionary algorithms and decision trees and does not concentrate on any specific evolutionary approach. Second, it provides a taxonomy, which addresses works that evolve decision trees and works that design decision-tree components by the use of evolutionary algorithms. Finally, a number of references are provided that describe applications of evolutionary algorithms for decision-tree induction in different domains. At the end of this paper, we address some important issues and open questions that can be the subject of future research.
Resumo:
Background: This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. Results: The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. Conclusions: We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.
Resumo:
Decision trees have been proposed as a basis for modifying table based injection to reduce transient particulate spikes during the turbocharger lag period. It has been shown that decision trees can detect particulate spikes in real time. In well calibrated electronically controlled diesel engines these spikes are narrow and are encompassed by a wider NOx spike. Decision trees have been shown to pinpoint the exact location of measured opacity spikes in real time thus enabling targeted PM reduction with near zero NOx penalty. A calibrated dimensional model has been used to demonstrate the possible reduction of particulate matter with targeted injection pressure pulses. Post injection strategy optimized for near stoichiometric combustion has been shown to provide additional benefits. Empirical models have been used to calculate emission tradeoffs over the entire FTP cycle. An empirical model based transient calibration has been used to demonstrate that such targeted transient modifiers are more beneficial at lower engine-out NOx levels.
Resumo:
Smoke spikes occurring during transient engine operation have detrimental health effects and increase fuel consumption by requiring more frequent regeneration of the diesel particulate filter. This paper proposes a decision tree approach to real-time detection of smoke spikes for control and on-board diagnostics purposes. A contemporary, electronically controlled heavy-duty diesel engine was used to investigate the deficiencies of smoke control based on the fuel-to-oxygen-ratio limit. With the aid of transient and steady state data analysis and empirical as well as dimensional modeling, it was shown that the fuel-to-oxygen ratio was not estimated correctly during the turbocharger lag period. This inaccuracy was attributed to the large manifold pressure ratios and low exhaust gas recirculation flows recorded during the turbocharger lag period, which meant that engine control module correlations for the exhaust gas recirculation flow and the volumetric efficiency had to be extrapolated. The engine control module correlations were based on steady state data and it was shown that, unless the turbocharger efficiency is artificially reduced, the large manifold pressure ratios observed during the turbocharger lag period cannot be achieved at steady state. Additionally, the cylinder-to-cylinder variation during this period were shown to be sufficiently significant to make the average fuel-to-oxygen ratio a poor predictor of the transient smoke emissions. The steady state data also showed higher smoke emissions with higher exhaust gas recirculation fractions at constant fuel-to-oxygen-ratio levels. This suggests that, even if the fuel-to-oxygen ratios were to be estimated accurately for each cylinder, they would still be ineffective as smoke limiters. A decision tree trained on snap throttle data and pruned with engineering knowledge was able to use the inaccurate engine control module estimates of the fuel-to-oxygen ratio together with information on the engine control module estimate of the exhaust gas recirculation fraction, the engine speed, and the manifold pressure ratio to predict 94% of all spikes occurring over the Federal Test Procedure cycle. The advantages of this non-parametric approach over other commonly used parametric empirical methods such as regression were described. An application of accurate smoke spike detection in which the injection pressure is increased at points with a high opacity to reduce the cumulative particulate matter emissions substantially with a minimum increase in the cumulative nitrogrn oxide emissions was illustrated with dimensional and empirical modeling.