906 resultados para decision tree
Resumo:
INTRODUCTION: Hip fractures are responsible for excessive mortality, decreasing the 5-year survival rate by about 20%. From an economic perspective, they represent a major source of expense, with direct costs in hospitalization, rehabilitation, and institutionalization. The incidence rate sharply increases after the age of 70, but it can be reduced in women aged 70-80 years by therapeutic interventions. Recent analyses suggest that the most efficient strategy is to implement such interventions in women at the age of 70 years. As several guidelines recommend bone mineral density (BMD) screening of postmenopausal women with clinical risk factors, our objective was to assess the cost-effectiveness of two screening strategies applied to elderly women aged 70 years and older. METHODS: A cost-effectiveness analysis was performed using decision-tree analysis and a Markov model. Two alternative strategies, one measuring BMD of all women, and one measuring BMD only of those having at least one risk factor, were compared with the reference strategy "no screening". Cost-effectiveness ratios were measured as cost per year gained without hip fracture. Most probabilities were based on data observed in EPIDOS, SEMOF and OFELY cohorts. RESULTS: In this model, which is mostly based on observed data, the strategy "screen all" was more cost effective than "screen women at risk." For one woman screened at the age of 70 and followed for 10 years, the incremental (additional) cost-effectiveness ratio of these two strategies compared with the reference was 4,235 euros and 8,290 euros, respectively. CONCLUSION: The results of this model, under the assumptions described in the paper, suggest that in women aged 70-80 years, screening all women with dual-energy X-ray absorptiometry (DXA) would be more effective than no screening or screening only women with at least one risk factor. Cost-effectiveness studies based on decision-analysis trees maybe useful tools for helping decision makers, and further models based on different assumptions should be performed to improve the level of evidence on cost-effectiveness ratios of the usual screening strategies for osteoporosis.
Predictive models for chronic renal disease using decision trees, naïve bayes and case-based methods
Resumo:
Data mining can be used in healthcare industry to “mine” clinical data to discover hidden information for intelligent and affective decision making. Discovery of hidden patterns and relationships often goes intact, yet advanced data mining techniques can be helpful as remedy to this scenario. This thesis mainly deals with Intelligent Prediction of Chronic Renal Disease (IPCRD). Data covers blood, urine test, and external symptoms applied to predict chronic renal disease. Data from the database is initially transformed to Weka (3.6) and Chi-Square method is used for features section. After normalizing data, three classifiers were applied and efficiency of output is evaluated. Mainly, three classifiers are analyzed: Decision Tree, Naïve Bayes, K-Nearest Neighbour algorithm. Results show that each technique has its unique strength in realizing the objectives of the defined mining goals. Efficiency of Decision Tree and KNN was almost same but Naïve Bayes proved a comparative edge over others. Further sensitivity and specificity tests are used as statistical measures to examine the performance of a binary classification. Sensitivity (also called recall rate in some fields) measures the proportion of actual positives which are correctly identified while Specificity measures the proportion of negatives which are correctly identified. CRISP-DM methodology is applied to build the mining models. It consists of six major phases: business understanding, data understanding, data preparation, modeling, evaluation, and deployment.
Resumo:
We are investigating the combination of wavelets and decision trees to detect ships and other maritime surveillance targets from medium resolution SAR images. Wavelets have inherent advantages to extract image descriptors while decision trees are able to handle different data sources. In addition, our work aims to consider oceanic features such as ship wakes and ocean spills. In this incipient work, Haar and Cohen-Daubechies-Feauveau 9/7 wavelets obtain detailed descriptors from targets and ocean features and are inserted with other statistical parameters and wavelets into an oblique decision tree. © 2011 Springer-Verlag.
Resumo:
The identification of tree species is a key step for sustainable management plans of forest resources, as well as for several other applications that are based on such surveys. However, the present available techniques are dependent on the presence of tree structures, such as flowers, fruits, and leaves, limiting the identification process to certain periods of the year Therefore, this article introduces a study on the application of statistical parameters for texture classification of tree trunk images. For that, 540 samples from five Brazilian native deciduous species were acquired and measures of entropy, uniformity, smoothness, asymmetry (third moment), mean, and standard deviation were obtained from the presented textures. Using a decision tree, a biometric species identification system was constructed and resulted to a 0.84 average precision rate for species classification with 0.83accuracy and 0.79 agreement. Thus, it can be considered that the use of texture presented in trunk images can represent an important advance in tree identification, since the limitations of the current techniques can be overcome.
Resumo:
Decision tree induction algorithms represent one of the most popular techniques for dealing with classification problems. However, traditional decision-tree induction algorithms implement a greedy approach for node splitting that is inherently susceptible to local optima convergence. Evolutionary algorithms can avoid the problems associated with a greedy search and have been successfully employed to the induction of decision trees. Previously, we proposed a lexicographic multi-objective genetic algorithm for decision-tree induction, named LEGAL-Tree. In this work, we propose extending this approach substantially, particularly w.r.t. two important evolutionary aspects: the initialization of the population and the fitness function. We carry out a comprehensive set of experiments to validate our extended algorithm. The experimental results suggest that it is able to outperform both traditional algorithms for decision-tree induction and another evolutionary algorithm in a variety of application domains.
Resumo:
BackgroundConsensus-based approaches provide an alternative to evidence-based decision making, especially in situations where high-level evidence is limited. Our aim was to demonstrate a novel source of information, objective consensus based on recommendations in decision tree format from multiple sources.MethodsBased on nine sample recommendations in decision tree format a representative analysis was performed. The most common (mode) recommendations for each eventuality (each permutation of parameters) were determined. The same procedure was applied to real clinical recommendations for primary radiotherapy for prostate cancer. Data was collected from 16 radiation oncology centres, converted into decision tree format and analyzed in order to determine the objective consensus.ResultsBased on information from multiple sources in decision tree format, treatment recommendations can be assessed for every parameter combination. An objective consensus can be determined by means of mode recommendations without compromise or confrontation among the parties. In the clinical example involving prostate cancer therapy, three parameters were used with two cut-off values each (Gleason score, PSA, T-stage) resulting in a total of 27 possible combinations per decision tree. Despite significant variations among the recommendations, a mode recommendation could be found for specific combinations of parameters.ConclusionRecommendations represented as decision trees can serve as a basis for objective consensus among multiple parties.
Resumo:
Data mining, and in particular decision trees have been used in different fields: engineering, medicine, banking and finance, etc., to analyze a target variable through decision variables. The following article examines the use of the decision trees algorithm as a tool in territorial logistic planning. The decision tree built has estimated population density indexes for territorial units with similar logistics characteristics in a concise and practical way.
Resumo:
This study demonstrates a quantitative approach to construction risk management through analytic hierarchy process and decision tree analysis. All the risk factors are identified, their effects are quantified by determining probability and severity, and various alternative responses are generated with cost implication for mitigating the quantified risks. The expected monetary values are then derived for each alternative in a decision tree framework and subsequent probability analysis aids the decision process in managing risks. The entire methodology is explained through a case application of a cross-country petroleum pipeline project in India and its effectiveness in project management is demonstrated.
Resumo:
Four bar mechanisms are basic components of many important mechanical devices. The kinematic synthesis of four bar mechanisms is a difficult design problem. A novel method that combines the genetic programming and decision tree learning methods is presented. We give a structural description for the class of mechanisms that produce desired coupler curves. Constructive induction is used to find and characterize feasible regions of the design space. Decision trees constitute the learning engine, and the new features are created by genetic programming.
Resumo:
Transition P Systems are a parallel and distributed computational model based on the notion of the cellular membrane structure. Each membrane determines a region that encloses a multiset of objects and evolution rules. Transition P Systems evolve through transitions between two consecutive configurations that are determined by the membrane structure and multisets present inside membranes. Moreover, transitions between two consecutive configurations are provided by an exhaustive non-deterministic and parallel application of active evolution rules subset inside each membrane of the P system. But, to establish the active evolution rules subset, it is required the previous calculation of useful and applicable rules. Hence, computation of applicable evolution rules subset is critical for the whole evolution process efficiency, because it is performed in parallel inside each membrane in every evolution step. The work presented here shows advantages of incorporating decision trees in the evolution rules applicability algorithm. In order to it, necessary formalizations will be presented to consider this as a classification problem, the method to obtain the necessary decision tree automatically generated and the new algorithm for applicability based on it.
Resumo:
Usually, data mining projects that are based on decision trees for classifying test cases will use the probabilities provided by these decision trees for ranking classified test cases. We have a need for a better method for ranking test cases that have already been classified by a binary decision tree because these probabilities are not always accurate and reliable enough. A reason for this is that the probability estimates computed by existing decision tree algorithms are always the same for all the different cases in a particular leaf of the decision tree. This is only one reason why the probability estimates given by decision tree algorithms can not be used as an accurate means of deciding if a test case has been correctly classified. Isabelle Alvarez has proposed a new method that could be used to rank the test cases that were classified by a binary decision tree [Alvarez, 2004]. In this paper we will give the results of a comparison of different ranking methods that are based on the probability estimate, the sensitivity of a particular case or both.
Resumo:
The high morbidity and mortality associated with atherosclerotic coronary vascular disease (CVD) and its complications are being lessened by the increased knowledge of risk factors, effective preventative measures and proven therapeutic interventions. However, significant CVD morbidity remains and sudden cardiac death continues to be a presenting feature for some subsequently diagnosed with CVD. Coronary vascular disease is also the leading cause of anaesthesia related complications. Stress electrocardiography/exercise testing is predictive of 10 year risk of CVD events and the cardiovascular variables used to score this test are monitored peri-operatively. Similar physiological time-series datasets are being subjected to data mining methods for the prediction of medical diagnoses and outcomes. This study aims to find predictors of CVD using anaesthesia time-series data and patient risk factor data. Several pre-processing and predictive data mining methods are applied to this data. Physiological time-series data related to anaesthetic procedures are subjected to pre-processing methods for removal of outliers, calculation of moving averages as well as data summarisation and data abstraction methods. Feature selection methods of both wrapper and filter types are applied to derived physiological time-series variable sets alone and to the same variables combined with risk factor variables. The ability of these methods to identify subsets of highly correlated but non-redundant variables is assessed. The major dataset is derived from the entire anaesthesia population and subsets of this population are considered to be at increased anaesthesia risk based on their need for more intensive monitoring (invasive haemodynamic monitoring and additional ECG leads). Because of the unbalanced class distribution in the data, majority class under-sampling and Kappa statistic together with misclassification rate and area under the ROC curve (AUC) are used for evaluation of models generated using different prediction algorithms. The performance based on models derived from feature reduced datasets reveal the filter method, Cfs subset evaluation, to be most consistently effective although Consistency derived subsets tended to slightly increased accuracy but markedly increased complexity. The use of misclassification rate (MR) for model performance evaluation is influenced by class distribution. This could be eliminated by consideration of the AUC or Kappa statistic as well by evaluation of subsets with under-sampled majority class. The noise and outlier removal pre-processing methods produced models with MR ranging from 10.69 to 12.62 with the lowest value being for data from which both outliers and noise were removed (MR 10.69). For the raw time-series dataset, MR is 12.34. Feature selection results in reduction in MR to 9.8 to 10.16 with time segmented summary data (dataset F) MR being 9.8 and raw time-series summary data (dataset A) being 9.92. However, for all time-series only based datasets, the complexity is high. For most pre-processing methods, Cfs could identify a subset of correlated and non-redundant variables from the time-series alone datasets but models derived from these subsets are of one leaf only. MR values are consistent with class distribution in the subset folds evaluated in the n-cross validation method. For models based on Cfs selected time-series derived and risk factor (RF) variables, the MR ranges from 8.83 to 10.36 with dataset RF_A (raw time-series data and RF) being 8.85 and dataset RF_F (time segmented time-series variables and RF) being 9.09. The models based on counts of outliers and counts of data points outside normal range (Dataset RF_E) and derived variables based on time series transformed using Symbolic Aggregate Approximation (SAX) with associated time-series pattern cluster membership (Dataset RF_ G) perform the least well with MR of 10.25 and 10.36 respectively. For coronary vascular disease prediction, nearest neighbour (NNge) and the support vector machine based method, SMO, have the highest MR of 10.1 and 10.28 while logistic regression (LR) and the decision tree (DT) method, J48, have MR of 8.85 and 9.0 respectively. DT rules are most comprehensible and clinically relevant. The predictive accuracy increase achieved by addition of risk factor variables to time-series variable based models is significant. The addition of time-series derived variables to models based on risk factor variables alone is associated with a trend to improved performance. Data mining of feature reduced, anaesthesia time-series variables together with risk factor variables can produce compact and moderately accurate models able to predict coronary vascular disease. Decision tree analysis of time-series data combined with risk factor variables yields rules which are more accurate than models based on time-series data alone. The limited additional value provided by electrocardiographic variables when compared to use of risk factors alone is similar to recent suggestions that exercise electrocardiography (exECG) under standardised conditions has limited additional diagnostic value over risk factor analysis and symptom pattern. The effect of the pre-processing used in this study had limited effect when time-series variables and risk factor variables are used as model input. In the absence of risk factor input, the use of time-series variables after outlier removal and time series variables based on physiological variable values’ being outside the accepted normal range is associated with some improvement in model performance.
Resumo:
Distraction whilst driving on an approach to a signalized intersection is particularly dangerous, as potential vehicular conflicts and resulting angle collisions tend to be severe. This study examines the decisions of distracted drivers during the onset of amber lights. Driving simulator data were obtained from a sample of 58 drivers under baseline and handheld mobile phone conditions at the University of IOWA - National Advanced Driving Simulator. Explanatory variables include age, gender, cell phone use, distance to stop-line, and speed. An iterative combination of decision tree and logistic regression analyses are employed to identify main effects, non-linearities, and interactions effects. Results show that novice (16-17 years) and younger (18-25 years) drivers’ had heightened amber light running risk while distracted by cell phone, and speed and distance thresholds yielded significant interaction effects. Driver experience captured by age has a multiplicative effect with distraction, making the combined effect of being inexperienced and distracted particularly risky. Solutions are needed to combat the use of mobile phones whilst driving.
Resumo:
Our daily lives become more and more dependent upon smartphones due to their increased capabilities. Smartphones are used in various ways from payment systems to assisting the lives of elderly or disabled people. Security threats for these devices become increasingly dangerous since there is still a lack of proper security tools for protection. Android emerges as an open smartphone platform which allows modification even on operating system level. Therefore, third-party developers have the opportunity to develop kernel-based low-level security tools which is not normal for smartphone platforms. Android quickly gained its popularity among smartphone developers and even beyond since it bases on Java on top of "open" Linux in comparison to former proprietary platforms which have very restrictive SDKs and corresponding APIs. Symbian OS for example, holding the greatest market share among all smartphone OSs, was closing critical APIs to common developers and introduced application certification. This was done since this OS was the main target for smartphone malwares in the past. In fact, more than 290 malwares designed for Symbian OS appeared from July 2004 to July 2008. Android, in turn, promises to be completely open source. Together with the Linux-based smartphone OS OpenMoko, open smartphone platforms may attract malware writers for creating malicious applications endangering the critical smartphone applications and owners� privacy. In this work, we present our current results in analyzing the security of Android smartphones with a focus on its Linux side. Our results are not limited to Android, they are also applicable to Linux-based smartphones such as OpenMoko Neo FreeRunner. Our contribution in this work is three-fold. First, we analyze android framework and the Linux-kernel to check security functionalities. We survey wellaccepted security mechanisms and tools which can increase device security. We provide descriptions on how to adopt these security tools on Android kernel, and provide their overhead analysis in terms of resource usage. As open smartphones are released and may increase their market share similar to Symbian, they may attract attention of malware writers. Therefore, our second contribution focuses on malware detection techniques at the kernel level. We test applicability of existing signature and intrusion detection methods in Android environment. We focus on monitoring events on the kernel; that is, identifying critical kernel, log file, file system and network activity events, and devising efficient mechanisms to monitor them in a resource limited environment. Our third contribution involves initial results of our malware detection mechanism basing on static function call analysis. We identified approximately 105 Executable and Linking Format (ELF) executables installed to the Linux side of Android. We perform a statistical analysis on the function calls used by these applications. The results of the analysis can be compared to newly installed applications for detecting significant differences. Additionally, certain function calls indicate malicious activity. Therefore, we present a simple decision tree for deciding the suspiciousness of the corresponding application. Our results present a first step towards detecting malicious applications on Android-based devices.
Resumo:
Background Falls are one of the most frequently occurring adverse events that impact upon the recovery of older hospital inpatients. Falls can threaten both immediate and longer-term health and independence. There is need to identify cost-effective means for preventing falls in hospitals. Hospital-based falls prevention interventions tested in randomized trials have not yet been subjected to economic evaluation. Methods Incremental cost-effectiveness analysis was undertaken from the health service provider perspective, over the period of hospitalization (time horizon) using the Australian Dollar (A$) at 2008 values. Analyses were based on data from a randomized trial among n = 1,206 acute and rehabilitation inpatients. Decision tree modeling with three-way sensitivity analyses were conducted using burden of disease estimates developed from trial data and previous research. The intervention was a multimedia patient education program provided with trained health professional follow-up shown to reduce falls among cognitively intact hospital patients. Results The short-term cost to a health service of one cognitively intact patient being a faller could be as high as A$14,591 (2008). The education program cost A$526 (2008) to prevent one cognitively intact patient becoming a faller and A$294 (2008) to prevent one fall based on primary trial data. These estimates were unstable due to high variability in the hospital costs accrued by individual patients involved in the trial. There was a 52% probability the complete program was both more effective and less costly (from the health service perspective) than providing usual care alone. Decision tree modeling sensitivity analyses identified that when provided in real life contexts, the program would be both more effective in preventing falls among cognitively intact inpatients and cost saving where the proportion of these patients who would otherwise fall under usual care conditions is at least 4.0%. Conclusions This economic evaluation was designed to assist health care providers decide in what circumstances this intervention should be provided. If the proportion of cognitively intact patients falling on a ward under usual care conditions is 4% or greater, then provision of the complete program in addition to usual care will likely both prevent falls and reduce costs for a health service.