897 resultados para Decision tree method
Resumo:
Usually, data mining projects that are based on decision trees for classifying test cases will use the probabilities provided by these decision trees for ranking classified test cases. We have a need for a better method for ranking test cases that have already been classified by a binary decision tree because these probabilities are not always accurate and reliable enough. A reason for this is that the probability estimates computed by existing decision tree algorithms are always the same for all the different cases in a particular leaf of the decision tree. This is only one reason why the probability estimates given by decision tree algorithms can not be used as an accurate means of deciding if a test case has been correctly classified. Isabelle Alvarez has proposed a new method that could be used to rank the test cases that were classified by a binary decision tree [Alvarez, 2004]. In this paper we will give the results of a comparison of different ranking methods that are based on the probability estimate, the sensitivity of a particular case or both.
Predictive models for chronic renal disease using decision trees, naïve bayes and case-based methods
Resumo:
Data mining can be used in healthcare industry to “mine” clinical data to discover hidden information for intelligent and affective decision making. Discovery of hidden patterns and relationships often goes intact, yet advanced data mining techniques can be helpful as remedy to this scenario. This thesis mainly deals with Intelligent Prediction of Chronic Renal Disease (IPCRD). Data covers blood, urine test, and external symptoms applied to predict chronic renal disease. Data from the database is initially transformed to Weka (3.6) and Chi-Square method is used for features section. After normalizing data, three classifiers were applied and efficiency of output is evaluated. Mainly, three classifiers are analyzed: Decision Tree, Naïve Bayes, K-Nearest Neighbour algorithm. Results show that each technique has its unique strength in realizing the objectives of the defined mining goals. Efficiency of Decision Tree and KNN was almost same but Naïve Bayes proved a comparative edge over others. Further sensitivity and specificity tests are used as statistical measures to examine the performance of a binary classification. Sensitivity (also called recall rate in some fields) measures the proportion of actual positives which are correctly identified while Specificity measures the proportion of negatives which are correctly identified. CRISP-DM methodology is applied to build the mining models. It consists of six major phases: business understanding, data understanding, data preparation, modeling, evaluation, and deployment.
Resumo:
Four bar mechanisms are basic components of many important mechanical devices. The kinematic synthesis of four bar mechanisms is a difficult design problem. A novel method that combines the genetic programming and decision tree learning methods is presented. We give a structural description for the class of mechanisms that produce desired coupler curves. Constructive induction is used to find and characterize feasible regions of the design space. Decision trees constitute the learning engine, and the new features are created by genetic programming.
Resumo:
Transition P Systems are a parallel and distributed computational model based on the notion of the cellular membrane structure. Each membrane determines a region that encloses a multiset of objects and evolution rules. Transition P Systems evolve through transitions between two consecutive configurations that are determined by the membrane structure and multisets present inside membranes. Moreover, transitions between two consecutive configurations are provided by an exhaustive non-deterministic and parallel application of active evolution rules subset inside each membrane of the P system. But, to establish the active evolution rules subset, it is required the previous calculation of useful and applicable rules. Hence, computation of applicable evolution rules subset is critical for the whole evolution process efficiency, because it is performed in parallel inside each membrane in every evolution step. The work presented here shows advantages of incorporating decision trees in the evolution rules applicability algorithm. In order to it, necessary formalizations will be presented to consider this as a classification problem, the method to obtain the necessary decision tree automatically generated and the new algorithm for applicability based on it.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
This paper presents new insights and novel algorithms for strategy selection in sequential decision making with partially ordered preferences; that is, where some strategies may be incomparable with respect to expected utility. We assume that incomparability amongst strategies is caused by indeterminacy/imprecision in probability values. We investigate six criteria for consequentialist strategy selection: Gamma-Maximin, Gamma-Maximax, Gamma-Maximix, Interval Dominance, Maximality and E-admissibility. We focus on the popular decision tree and influence diagram representations. Algorithms resort to linear/multilinear programming; we describe implementation and experiments. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
The collection of spatial information to quantify changes to the state and condition of the environment is a fundamental component of conservation or sustainable utilization of tropical and subtropical forests, Age is an important structural attribute of old-growth forests influencing biological diversity in Australia eucalypt forests. Aerial photograph interpretation has traditionally been used for mapping the age and structure of forest stands. However this method is subjective and is not able to accurately capture fine to landscape scale variation necessary for ecological studies. Identification and mapping of fine to landscape scale vegetative structural attributes will allow the compilation of information associated with Montreal Process indicators lb and ld, which seek to determine linkages between age structure and the diversity and abundance of forest fauna populations. This project integrated measurements of structural attributes derived from a canopy-height elevation model with results from a geometrical-optical/spectral mixture analysis model to map forest age structure at a landscape scale. The availability of multiple-scale data allows the transfer of high-resolution attributes to landscape scale monitoring. Multispectral image data were obtained from a DMSV (Digital Multi-Spectral Video) sensor over St Mary's State Forest in Southeast Queensland, Australia. Local scene variance levels for different forest tapes calculated from the DMSV data were used to optimize the tree density and canopy size output in a geometric-optical model applied to a Landsat Thematic Mapper (TU) data set. Airborne laser scanner data obtained over the project area were used to calibrate a digital filter to extract tree heights from a digital elevation model that was derived from scanned colour stereopairs. The modelled estimates of tree height, crown size, and tree density were used to produce a decision-tree classification of forest successional stage at a landscape scale. The results obtained (72% accuracy), were limited in validation, but demonstrate potential for using the multi-scale methodology to provide spatial information for forestry policy objectives (ie., monitoring forest age structure).
Resumo:
PURPOSE: Fatty liver disease (FLD) is an increasing prevalent disease that can be reversed if detected early. Ultrasound is the safest and ubiquitous method for identifying FLD. Since expert sonographers are required to accurately interpret the liver ultrasound images, lack of the same will result in interobserver variability. For more objective interpretation, high accuracy, and quick second opinions, computer aided diagnostic (CAD) techniques may be exploited. The purpose of this work is to develop one such CAD technique for accurate classification of normal livers and abnormal livers affected by FLD. METHODS: In this paper, the authors present a CAD technique (called Symtosis) that uses a novel combination of significant features based on the texture, wavelet transform, and higher order spectra of the liver ultrasound images in various supervised learning-based classifiers in order to determine parameters that classify normal and FLD-affected abnormal livers. RESULTS: On evaluating the proposed technique on a database of 58 abnormal and 42 normal liver ultrasound images, the authors were able to achieve a high classification accuracy of 93.3% using the decision tree classifier. CONCLUSIONS: This high accuracy added to the completely automated classification procedure makes the authors' proposed technique highly suitable for clinical deployment and usage.
Resumo:
In this work the identification and diagnosis of various stages of chronic liver disease is addressed. The classification results of a support vector machine, a decision tree and a k-nearest neighbor classifier are compared. Ultrasound image intensity and textural features are jointly used with clinical and laboratorial data in the staging process. The classifiers training is performed by using a population of 97 patients at six different stages of chronic liver disease and a leave-one-out cross-validation strategy. The best results are obtained using the support vector machine with a radial-basis kernel, with 73.20% of overall accuracy. The good performance of the method is a promising indicator that it can be used, in a non invasive way, to provide reliable information about the chronic liver disease staging.
Resumo:
Dissertação apresentada como requisito parcial para obtenção do grau de Mestre em Estatística e Gestão de Informação.
Resumo:
Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers.
Resumo:
INTRODUCTION: Hip fractures are responsible for excessive mortality, decreasing the 5-year survival rate by about 20%. From an economic perspective, they represent a major source of expense, with direct costs in hospitalization, rehabilitation, and institutionalization. The incidence rate sharply increases after the age of 70, but it can be reduced in women aged 70-80 years by therapeutic interventions. Recent analyses suggest that the most efficient strategy is to implement such interventions in women at the age of 70 years. As several guidelines recommend bone mineral density (BMD) screening of postmenopausal women with clinical risk factors, our objective was to assess the cost-effectiveness of two screening strategies applied to elderly women aged 70 years and older. METHODS: A cost-effectiveness analysis was performed using decision-tree analysis and a Markov model. Two alternative strategies, one measuring BMD of all women, and one measuring BMD only of those having at least one risk factor, were compared with the reference strategy "no screening". Cost-effectiveness ratios were measured as cost per year gained without hip fracture. Most probabilities were based on data observed in EPIDOS, SEMOF and OFELY cohorts. RESULTS: In this model, which is mostly based on observed data, the strategy "screen all" was more cost effective than "screen women at risk." For one woman screened at the age of 70 and followed for 10 years, the incremental (additional) cost-effectiveness ratio of these two strategies compared with the reference was 4,235 euros and 8,290 euros, respectively. CONCLUSION: The results of this model, under the assumptions described in the paper, suggest that in women aged 70-80 years, screening all women with dual-energy X-ray absorptiometry (DXA) would be more effective than no screening or screening only women with at least one risk factor. Cost-effectiveness studies based on decision-analysis trees maybe useful tools for helping decision makers, and further models based on different assumptions should be performed to improve the level of evidence on cost-effectiveness ratios of the usual screening strategies for osteoporosis.
Resumo:
Percutaneous transluminal renal angioplasty (PTRA) is an invasive technique that is costly and involves the risk of complications and renal failure. The ability of PTRA to reduce the administration of antihypertensive drugs has been demonstrated. A potentially greater benefit, which nevertheless remains to be proven, is the deferral of the need for chronic dialysis. The aim of the study (ANPARIA) was to assess the appropriateness of PTRA to impact on the evolution of renal function. A standardized expert panel method was used to assess the appropriateness of medical treatment alone or medical treatment with revascularization in various clinical situations. The choice of revascularization by either PTRA or surgery was examined for each clinical situation. Analysis was based on a detailed literature review and on systematically elicited expert opinion, which were obtained during a two-round modified Delphi process. The study provides detailed responses on the appropriateness of PTRA for 1848 distinct clinical scenarios. Depending on the major clinical presentation, appropriateness of revascularization varied from 32% to 75% for individual scenarios (overal 48%). Uncertainty as to revascularization was 41% overall. When revascularization was appropriate, PTRA was favored over surgery in 94% of the scenarios, except in certain cases of aortic atheroma where sugery was the preferred choice. Kidney size [7 cm, absence of coexisting disease, acute renal failure, a high degree of stenosis (C70%), and absence of multiple arteries were identified as predictive variables of favorable appropriateness ratings. Situations such as cardiac failure with pulmonary edema or acute thrombosis of the renal artery were defined as indications for PTRA. This study identified clinical situations in which PTRA or surgery are appropriate for renal artery disease. We built a decision tree which can be used via Internet: the ANPARIA software (http://www.chu-clermontferrand.fr/anparia/). In numerous clinical situations uncertainty remains as to whether PTRA prevents deterioration of renal function.
Resumo:
Single amino acid substitution is the type of protein alteration most related to human diseases. Current studies seek primarily to distinguish neutral mutations from harmful ones. Very few methods offer an explanation of the final prediction result in terms of the probable structural or functional effect on the protein. In this study, we describe the use of three novel parameters to identify experimentally-verified critical residues of the TP53 protein (p53). The first two parameters make use of a surface clustering method to calculate the protein surface area of highly conserved regions or regions with high nonlocal atomic interaction energy (ANOLEA) score. These parameters help identify important functional regions on the surface of a protein. The last parameter involves the use of a new method for pseudobinding free-energy estimation to specifically probe the importance of residue side-chains to the stability of protein fold. A decision tree was designed to optimally combine these three parameters. The result was compared to the functional data stored in the International Agency for Research on Cancer (IARC) TP53 mutation database. The final prediction achieved a prediction accuracy of 70% and a Matthews correlation coefficient of 0.45. It also showed a high specificity of 91.8%. Mutations in the 85 correctly identified important residues represented 81.7% of the total mutations recorded in the database. In addition, the method was able to correctly assign a probable functional or structural role to the residues. Such information could be critical for the interpretation and prediction of the effect of missense mutations, as it not only provided the fundamental explanation of the observed effect, but also helped design the most appropriate laboratory experiment to verify the prediction results.
Resumo:
In order to broaden our knowledge and understanding of the decision steps in the criminal investigation process, we started by evaluating the decision to analyse a trace and the factors involved in this decision step. This decision step is embedded in the complete criminal investigation process, involving multiple decision and triaging steps. Considering robbery cases occurring in a geographic region during a 2-year-period, we have studied the factors influencing the decision to submit biological traces, directly sampled on the scene of the robbery or on collected objects, for analysis. The factors were categorised into five knowledge dimensions: strategic, immediate, physical, criminal and utility and decision tree analysis was carried out. Factors in each category played a role in the decision to analyse a biological trace. Interestingly, factors involving information available prior to the analysis are of importance, such as the fact that a positive result (a profile suitable for comparison) is already available in the case, or that a suspect has been identified through traditional police work before analysis. One factor that was taken into account, but was not significant, is the matrix of the trace. Hence, the decision to analyse a trace is not influenced by this variable. The decision to analyse a trace first is very complex and many of the tested variables were taken into account. The decisions are often made on a case-by-case basis.