963 resultados para Classification Trees


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Development of three classification trees (CT) based on the CART (Classification and Regression Trees), CHAID (Chi-Square Automatic Interaction Detection) and C4.5 methodologies for the calculation of probability of hospital mortality; the comparison of the results with the APACHE II, SAPS II and MPM II-24 scores, and with a model based on multiple logistic regression (LR). Methods: Retrospective study of 2864 patients. Random partition (70:30) into a Development Set (DS) n = 1808 and Validation Set (VS) n = 808. Their properties of discrimination are compared with the ROC curve (AUC CI 95%), Percent of correct classification (PCC CI 95%); and the calibration with the Calibration Curve and the Standardized Mortality Ratio (SMR CI 95%). Results: CTs are produced with a different selection of variables and decision rules: CART (5 variables and 8 decision rules), CHAID (7 variables and 15 rules) and C4.5 (6 variables and 10 rules). The common variables were: inotropic therapy, Glasgow, age, (A-a)O2 gradient and antecedent of chronic illness. In VS: all the models achieved acceptable discrimination with AUC above 0.7. CT: CART (0.75(0.71-0.81)), CHAID (0.76(0.72-0.79)) and C4.5 (0.76(0.73-0.80)). PCC: CART (72(69- 75)), CHAID (72(69-75)) and C4.5 (76(73-79)). Calibration (SMR) better in the CT: CART (1.04(0.95-1.31)), CHAID (1.06(0.97-1.15) and C4.5 (1.08(0.98-1.16)). Conclusion: With different methodologies of CTs, trees are generated with different selection of variables and decision rules. The CTs are easy to interpret, and they stratify the risk of hospital mortality. The CTs should be taken into account for the classification of the prognosis of critically ill patients.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract Background Smear negative pulmonary tuberculosis (SNPT) accounts for 30% of pulmonary tuberculosis cases reported yearly in Brazil. This study aimed to develop a prediction model for SNPT for outpatients in areas with scarce resources. Methods The study enrolled 551 patients with clinical-radiological suspicion of SNPT, in Rio de Janeiro, Brazil. The original data was divided into two equivalent samples for generation and validation of the prediction models. Symptoms, physical signs and chest X-rays were used for constructing logistic regression and classification and regression tree models. From the logistic regression, we generated a clinical and radiological prediction score. The area under the receiver operator characteristic curve, sensitivity, and specificity were used to evaluate the model's performance in both generation and validation samples. Results It was possible to generate predictive models for SNPT with sensitivity ranging from 64% to 71% and specificity ranging from 58% to 76%. Conclusion The results suggest that those models might be useful as screening tools for estimating the risk of SNPT, optimizing the utilization of more expensive tests, and avoiding costs of unnecessary anti-tuberculosis treatment. Those models might be cost-effective tools in a health care network with hierarchical distribution of scarce resources.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most of the modem developments with classification trees are aimed at improving their predictive capacity. This article considers a curiously neglected aspect of classification trees, namely the reliability of predictions that come from a given classification tree. In the sense that a node of a tree represents a point in the predictor space in the limit, the aim of this article is the development of localized assessment of the reliability of prediction rules. A classification tree may be used either to provide a probability forecast, where for each node the membership probabilities for each class constitutes the prediction, or a true classification where each new observation is predictively assigned to a unique class. Correspondingly, two types of reliability measure will be derived-namely, prediction reliability and classification reliability. We use bootstrapping methods as the main tool to construct these measures. We also provide a suite of graphical displays by which they may be easily appreciated. In addition to providing some estimate of the reliability of specific forecasts of each type, these measures can also be used to guide future data collection to improve the effectiveness of the tree model. The motivating example we give has a binary response, namely the presence or absence of a species of Eucalypt, Eucalyptus cloeziana, at a given sampling location in response to a suite of environmental covariates, (although the methods are not restricted to binary response data).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Risk assessment systems for introduced species are being developed and applied globally, but methods for rigorously evaluating them are still in their infancy. We explore classification and regression tree models as an alternative to the current Australian Weed Risk Assessment system, and demonstrate how the performance of screening tests for unwanted alien species may be quantitatively compared using receiver operating characteristic (ROC) curve analysis. The optimal classification tree model for predicting weediness included just four out of a possible 44 attributes of introduced plants examined, namely: (i) intentional human dispersal of propagules; (ii) evidence of naturalization beyond native range; (iii) evidence of being a weed elsewhere; and (iv) a high level of domestication. Intentional human dispersal of propagules in combination with evidence of naturalization beyond a plants native range led to the strongest prediction of weediness. A high level of domestication in combination with no evidence of naturalization mitigated the likelihood of an introduced plant becoming a weed resulting from intentional human dispersal of propagules. Unlikely intentional human dispersal of propagules combined with no evidence of being a weed elsewhere led to the lowest predicted probability of weediness. The failure to include intrinsic plant attributes in the model suggests that either these attributes are not useful general predictors of weediness, or data and analysis were inadequate to elucidate the underlying relationship(s). This concurs with the historical pessimism that we will ever be able to accurately predict invasive plants. Given the apparent importance of propagule pressure (the number of individuals of an species released), future attempts at evaluating screening model performance for identifying unwanted plants need to account for propagule pressure when collating and/or analysing datasets. The classification tree had a cross-validated sensitivity of 93.6% and specificity of 36.7%. Based on the area under the ROC curve, the performance of the classification tree in correctly classifying plants as weeds or non-weeds was slightly inferior (Area under ROC curve = 0.83 +/- 0.021 (+/- SE)) to that of the current risk assessment system in use (Area under ROC curve = 0.89 +/- 0.018 (+/- SE)), although requires many fewer questions to be answered.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Intrusion detection is a critical component of security information systems. The intrusion detection process attempts to detect malicious attacks by examining various data collected during processes on the protected system. This paper examines the anomaly-based intrusion detection based on sequences of system calls. The point is to construct a model that describes normal or acceptable system activity using the classification trees approach. The created database is utilized as a basis for distinguishing the intrusive activity from the legal one using string metric algorithms. The major results of the implemented simulation experiments are presented and discussed as well.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

BACKGROUND: We sought to improve upon previously published statistical modeling strategies for binary classification of dyslipidemia for general population screening purposes based on the waist-to-hip circumference ratio and body mass index anthropometric measurements. METHODS: Study subjects were participants in WHO-MONICA population-based surveys conducted in two Swiss regions. Outcome variables were based on the total serum cholesterol to high density lipoprotein cholesterol ratio. The other potential predictor variables were gender, age, current cigarette smoking, and hypertension. The models investigated were: (i) linear regression; (ii) logistic classification; (iii) regression trees; (iv) classification trees (iii and iv are collectively known as "CART"). Binary classification performance of the region-specific models was externally validated by classifying the subjects from the other region. RESULTS: Waist-to-hip circumference ratio and body mass index remained modest predictors of dyslipidemia. Correct classification rates for all models were 60-80%, with marked gender differences. Gender-specific models provided only small gains in classification. The external validations provided assurance about the stability of the models. CONCLUSIONS: There were no striking differences between either the algebraic (i, ii) vs. non-algebraic (iii, iv), or the regression (i, iii) vs. classification (ii, iv) modeling approaches. Anticipated advantages of the CART vs. simple additive linear and logistic models were less than expected in this particular application with a relatively small set of predictor variables. CART models may be more useful when considering main effects and interactions between larger sets of predictor variables.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The usefulness of motor subtypes of delirium is unclear due to inconsistency in subtyping methods and a lack of validation with objective measures of activity. The activity of 40 patients was measured over 24 h with a commercial accelerometer-based activity monitor. Accelerometry data from patients with DSM-IV delirium that were readily divided into hyperactive, hypoactive and mixed motor subtypes, were used to create classification trees that were Subsequently applied to the remaining cohort to define motoric subtypes. The classification trees used the periods of sitting/lying, standing, stepping and number of postural transitions as measured by the activity monitor as determining factors from which to classify the delirious cohort. The use of a classification system shows how delirium subtypes can be categorised in relation to overall activity and postural changes, which was one of the most discriminating measures examined. The classification system was also implemented to successfully define other patient motoric subtypes. Motor subtypes of delirium defined by observed ward behaviour differ in electronically measured activity levels. Crown Copyright (C) 2009 Published by Elsevier B.V. All rights reserved.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Coral reefs represent major accumulations of calcium carbonate (CaCO3). The particularly labyrinthine network of reefs in Torres Strait, north of the Great Barrier Reef (GBR), has been examined in order to estimate their gross CaCO3 productivity. The approach involved a two-step procedure, first characterising and classifying the morphology of reefs based on a classification scheme widely employed on the GBR and then estimating gross CaCO3 productivity rates across the region using a regional census-based approach. This was undertaken by independently verifying published rates of coral reef community gross production for use in Torres Strait, based on site-specific ecological and morphological data. A total of 606 reef platforms were mapped and classified using classification trees. Despite the complexity of the maze of reefs in Torres Strait, there are broad morphological similarities with reefs in the GBR. The spatial distribution and dimensions of reef types across both regions are underpinned by similar geological processes, sea-level history in the Holocene and exposure to the same wind/wave energetic regime, resulting in comparable geomorphic zonation. However, the presence of strong tidal currents flowing through Torres Strait and the relatively shallow and narrow dimensions of the shelf exert a control on local morphology and spatial distribution of the reef platforms. A total amount of 8.7 million tonnes of CaCO3 per year, at an average rate of 3.7 kg CaCO3 m-2 yr-1 (G), were estimated for the studied area. Extrapolated production rates based on detailed and regional census-based approaches for geomorphic zones across Torres Strait were comparable to those reported elsewhere, particularly values for the GBR based on alkalinity-reduction methods. However, differences in mapping methodologies and the impact of reduced calcification due to global trends in coral reef ecological decline and changing oceanic physical conditions warrant further research. The novel method proposed in this study to characterise the geomorphology of reef types based on classification trees provides an objective and repeatable data-driven approach that combined with regional census-based approaches has the potential to be adapted and transferred to different coral reef regions, depicting a more accurate picture of interactions between reef ecology and geomorphology.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In the spinal cord of the anesthetized cat, spontaneous cord dorsum potentials (CDPs) appear synchronously along the lumbo-sacral segments. These CDPs have different shapes and magnitudes. Previous work has indicated that some CDPs appear to be specially associated with the activation of spinal pathways that lead to primary afferent depolarization and presynaptic inhibition. Visual detection and classification of these CDPs provides relevant information on the functional organization of the neural networks involved in the control of sensory information and allows the characterization of the changes produced by acute nerve and spinal lesions. We now present a novel feature extraction approach for signal classification, applied to CDP detection. The method is based on an intuitive procedure. We first remove by convolution the noise from the CDPs recorded in each given spinal segment. Then, we assign a coefficient for each main local maximum of the signal using its amplitude and distance to the most important maximum of the signal. These coefficients will be the input for the subsequent classification algorithm. In particular, we employ gradient boosting classification trees. This combination of approaches allows a faster and more accurate discrimination of CDPs than is obtained by other methods.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Precise classification of tumors is critically important for cancer diagnosis and treatment. It is also a scientifically challenging task. Recently, efforts have been made to use gene expression profiles to improve the precision of classification, with limited success. Using a published data set for purposes of comparison, we introduce a methodology based on classification trees and demonstrate that it is significantly more accurate for discriminating among distinct colon cancer tissues than other statistical approaches used heretofore. In addition, competing classification trees are displayed, which suggest that different genes may coregulate colon cancers.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The problem of recognition on finite set of events is considered. The generalization ability of classifiers for this problem is studied within the Bayesian approach. The method for non-uniform prior distribution specification on recognition tasks is suggested. It takes into account the assumed degree of intersection between classes. The results of the analysis are applied for pruning of classification trees.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

ACM Computing Classification System (1998): I.4.9, I.4.10.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS-DR7). Each algorithm is defined by a set of parameters which, when varied, produce different final classification trees. We extensively explore the parameter space of each algorithm, using the set of 884,126 SDSS objects with spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured by the mean completeness in two magnitude intervals: 14 <= r <= 21 (85.2%) and r >= 19 (82.1%). We compare the performance of the tree generated with the optimal FT configuration to the classifications provided by the SDSS parametric classifier, 2DPHOT, and Ball et al. We find that our FT classifier is comparable to or better in completeness over the full magnitude range 15 <= r <= 21, with much lower contamination than all but the Ball et al. classifier. At the faintest magnitudes (r > 19), our classifier is the only one that maintains high completeness (> 80%) while simultaneously achieving low contamination (similar to 2.5%). We also examine the SDSS parametric classifier (psfMag - modelMag) to see if the dividing line between stars and galaxies can be adjusted to improve the classifier. We find that currently stars in close pairs are often misclassified as galaxies, and suggest a new cut to improve the classifier. Finally, we apply our FT classifier to separate stars from galaxies in the full set of 69,545,326 SDSS photometric objects in the magnitude range 14 <= r <= 21.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background -: Beta-2 adrenergic receptor gene polymorphisms Gln27Glu, Arg16Gly and Thr164Ile were suggested to have an effect in heart failure. We evaluated these polymorphisms relative to clinical characteristics and prognosis of alarge cohort of patients with heart failure of different etiologies. Methods -: We studied 501 patients with heart failure of different etiologies. Mean age was 58 years (standard deviation 14.4 years), 298 (60%) were men. Polymorphisms were identified by polymerase chain reaction-restriction fragment length polymorphism. Results -: During the mean follow-up of 12.6 months (standard deviation 10.3 months), 188 (38%) patients died. Distribution of genotypes of polymorphism Arg16Gly was different relative to body mass index (chi(2) = 9.797; p = 0.04). Overall the probability of survival was not significantly predicted by genotypes of Gln27Glu, Arg16Gly, or Thr164Ile. Allele and haplotype analysis also did not disclose any significant difference regarding mortality. Exploratory analysis through classification trees pointed towards a potential association between the Gln27Glu polymorphism and mortality in older individuals. Conclusion -: In this study sample, we were not able to demonstrate an overall influence of polymorphisms Gln27Glu and Arg16Gly of beta-2 receptor gene on prognosis. Nevertheless, Gln27Glu polymorphism may have a potential predictive value in older individuals.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper proposes a template for modelling complex datasets that integrates traditional statistical modelling approaches with more recent advances in statistics and modelling through an exploratory framework. Our approach builds on the well-known and long standing traditional idea of 'good practice in statistics' by establishing a comprehensive framework for modelling that focuses on exploration, prediction, interpretation and reliability assessment, a relatively new idea that allows individual assessment of predictions. The integrated framework we present comprises two stages. The first involves the use of exploratory methods to help visually understand the data and identify a parsimonious set of explanatory variables. The second encompasses a two step modelling process, where the use of non-parametric methods such as decision trees and generalized additive models are promoted to identify important variables and their modelling relationship with the response before a final predictive model is considered. We focus on fitting the predictive model using parametric, non-parametric and Bayesian approaches. This paper is motivated by a medical problem where interest focuses on developing a risk stratification system for morbidity of 1,710 cardiac patients given a suite of demographic, clinical and preoperative variables. Although the methods we use are applied specifically to this case study, these methods can be applied across any field, irrespective of the type of response.