909 resultados para Classification and Regression Trees
Resumo:
Background Individual signs and symptoms are of limited value for the diagnosis of influenza. Objective To develop a decision tree for the diagnosis of influenza based on a classification and regression tree (CART) analysis. Methods Data from two previous similar cohort studies were assembled into a single dataset. The data were randomly divided into a development set (70%) and a validation set (30%). We used CART analysis to develop three models that maximize the number of patients who do not require diagnostic testing prior to treatment decisions. The validation set was used to evaluate overfitting of the model to the training set. Results Model 1 has seven terminal nodes based on temperature, the onset of symptoms and the presence of chills, cough and myalgia. Model 2 was a simpler tree with only two splits based on temperature and the presence of chills. Model 3 was developed with temperature as a dichotomous variable (≥38°C) and had only two splits based on the presence of fever and myalgia. The area under the receiver operating characteristic curves (AUROCC) for the development and validation sets, respectively, were 0.82 and 0.80 for Model 1, 0.75 and 0.76 for Model 2 and 0.76 and 0.77 for Model 3. Model 2 classified 67% of patients in the validation group into a high- or low-risk group compared with only 38% for Model 1 and 54% for Model 3. Conclusions A simple decision tree (Model 2) classified two-thirds of patients as low or high risk and had an AUROCC of 0.76. After further validation in an independent population, this CART model could support clinical decision making regarding influenza, with low-risk patients requiring no further evaluation for influenza and high-risk patients being candidates for empiric symptomatic or drug therapy.
Resumo:
Background: Tuberculosis (TB) remains a public health issue worldwide. The lack of specific clinical symptoms to diagnose TB makes the correct decision to admit patients to respiratory isolation a difficult task for the clinician. Isolation of patients without the disease is common and increases health costs. Decision models for the diagnosis of TB in patients attending hospitals can increase the quality of care and decrease costs, without the risk of hospital transmission. We present a predictive model for predicting pulmonary TB in hospitalized patients in a high prevalence area in order to contribute to a more rational use of isolation rooms without increasing the risk of transmission. Methods: Cross sectional study of patients admitted to CFFH from March 2003 to December 2004. A classification and regression tree (CART) model was generated and validated. The area under the ROC curve (AUC), sensitivity, specificity, positive and negative predictive values were used to evaluate the performance of model. Validation of the model was performed with a different sample of patients admitted to the same hospital from January to December 2005. Results: We studied 290 patients admitted with clinical suspicion of TB. Diagnosis was confirmed in 26.5% of them. Pulmonary TB was present in 83.7% of the patients with TB (62.3% with positive sputum smear) and HIV/AIDS was present in 56.9% of patients. The validated CART model showed sensitivity, specificity, positive predictive value and negative predictive value of 60.00%, 76.16%, 33.33%, and 90.55%, respectively. The AUC was 79.70%. Conclusions: The CART model developed for these hospitalized patients with clinical suspicion of TB had fair to good predictive performance for pulmonary TB. The most important variable for prediction of TB diagnosis was chest radiograph results. Prospective validation is still necessary, but our model offer an alternative for decision making in whether to isolate patients with clinical suspicion of TB in tertiary health facilities in countries with limited resources.
Resumo:
Background: Development of three classification trees (CT) based on the CART (Classification and Regression Trees), CHAID (Chi-Square Automatic Interaction Detection) and C4.5 methodologies for the calculation of probability of hospital mortality; the comparison of the results with the APACHE II, SAPS II and MPM II-24 scores, and with a model based on multiple logistic regression (LR). Methods: Retrospective study of 2864 patients. Random partition (70:30) into a Development Set (DS) n = 1808 and Validation Set (VS) n = 808. Their properties of discrimination are compared with the ROC curve (AUC CI 95%), Percent of correct classification (PCC CI 95%); and the calibration with the Calibration Curve and the Standardized Mortality Ratio (SMR CI 95%). Results: CTs are produced with a different selection of variables and decision rules: CART (5 variables and 8 decision rules), CHAID (7 variables and 15 rules) and C4.5 (6 variables and 10 rules). The common variables were: inotropic therapy, Glasgow, age, (A-a)O2 gradient and antecedent of chronic illness. In VS: all the models achieved acceptable discrimination with AUC above 0.7. CT: CART (0.75(0.71-0.81)), CHAID (0.76(0.72-0.79)) and C4.5 (0.76(0.73-0.80)). PCC: CART (72(69- 75)), CHAID (72(69-75)) and C4.5 (76(73-79)). Calibration (SMR) better in the CT: CART (1.04(0.95-1.31)), CHAID (1.06(0.97-1.15) and C4.5 (1.08(0.98-1.16)). Conclusion: With different methodologies of CTs, trees are generated with different selection of variables and decision rules. The CTs are easy to interpret, and they stratify the risk of hospital mortality. The CTs should be taken into account for the classification of the prognosis of critically ill patients.
Resumo:
Objective: We used demographic and clinical data to design practical classification models for prediction of neurocognitive impairment (NCI) in people with HIV infection. Methods: The study population comprised 331 HIV-infected patients with available demographic, clinical, and neurocognitive data collected using a comprehensive battery of neuropsychological tests. Classification and regression trees (CART) were developed to btain detailed and reliable models to predict NCI. Following a practical clinical approach, NCI was considered the main variable for study outcomes, and analyses were performed separately in treatment-naïve and treatment-experienced patients. Results: The study sample comprised 52 treatment-naïve and 279 experienced patients. In the first group, the variables identified as better predictors of NCI were CD4 cell count and age (correct classification [CC]: 79.6%, 3 final nodes). In treatment-experienced patients, the variables most closely related to NCI were years of education, nadir CD4 cell count, central nervous system penetration-effectiveness score, age, employment status, and confounding comorbidities (CC: 82.1%, 7 final nodes). In patients with an undetectable viral load and no comorbidities, we obtained a fairly accurate model in which the main variables were nadir CD4 cell count, current CD4 cell count, time on current treatment, and past highest viral load (CC: 88%, 6 final nodes). Conclusion: Practical classification models to predict NCI in HIV infection can be obtained using demographic and clinical variables. An approach based on CART analyses may facilitate screening for HIV-associated neurocognitive disorders and complement clinical information about risk and protective factors for NCI in HIV-infected patients.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Biomédica
Resumo:
Two types of ecological thresholds are now being widely used to develop conservation targets: breakpoint-based thresholds represent tipping points where system properties change dramatically, whereas classification thresholds identify groups of data points with contrasting properties. Both breakpoint-based and classification thresholds are useful tools in evidence-based conservation. However, it is critical that the type of threshold to be estimated corresponds with the question of interest and that appropriate statistical procedures are used to determine its location. On the basis of their statistical properties, we recommend using piecewise regression methods to identify breakpoint-based thresholds and discriminant analysis or classification and regression trees to identify classification thresholds.
Resumo:
A study was conducted to evaluate in vitro the effect of root surface conditioning with basic fibroblast growth factor (b-FGF) on morphology and proliferation of fibroblasts. Three experimental groups were used: non-treated, and treated with 50 microg or 125 microg b-FGF/ml. The dentin samples in each group were divided into subgroups according to the chemical treatment received before application of b-FGF: none, or conditioned with tetracycline-HCl or EDTA. After contact with b-FGF for 5 min, the samples were incubated for 24 h with 1 ml of culture medium containing 1 x 10(5) cells/ml plus 1 ml of culture medium alone. The samples were then subjected to routine preparation for SEM, and random fields were photographed. Three calibrated and blind examiners performed the assessment of morphology and density according to two index systems. Classification and regression trees indicated that the root surfaces treated with 125 microg b-FGF and previously conditioned with tetracycline-HCl or EDTA presented a morphology more suggestive of cellular adhesion and viability (P = 0.004). The density of fibroblasts on samples previously conditioned with EDTA, regardless of treatment with b-FGF, was significantly higher than in the other groups (P < 0.001). The present findings suggest that topical application of b-FGF has a positive influence on both the density and morphology of fibroblasts.
Resumo:
Survival, T-cell functions, and postmortem histopathology were studied in H-2 congenic strains of mice bearing H-2b, H-2k, and H-2d haplotypes. Males lived longer than females in all homozygous and heterozygous combinations except for H-2d homozygotes, which showed no differences between males and females. Association of heterozygosity with longer survival was observed only with H-2b/H-2b and H-2b/H-2d mice. Analysis using classification and regression trees (CART) showed that both males and females of H-2b homozygous and H-2k/H-2b mice had the shortest life-span of the strains studied. In histopathological analyses, lymphomas were noted to be more frequent in females, while hemangiosarcomas and hepatomas were more frequent in males. Lymphomas appeared earlier than hepatomas or hemangiosarcomas. The incidence of lymphomas was associated with the H-2 haplotype--e.g., H-2b homozygous mice had more lymphomas than did mice of the H-2d haplotype. More vigorous T-cell function was maintained with age (27 months) in H-2d, H-2b/H-2d, and H-2d/H-2k mice as compared with H-2b, H-2k, and H-2b/H-2k mice, which showed a decline of T-cell responses with age.
Resumo:
Traditional vegetation mapping methods use high cost, labour-intensive aerial photography interpretation. This approach can be subjective and is limited by factors such as the extent of remnant vegetation, and the differing scale and quality of aerial photography over time. An alternative approach is proposed which integrates a data model, a statistical model and an ecological model using sophisticated Geographic Information Systems (GIS) techniques and rule-based systems to support fine-scale vegetation community modelling. This approach is based on a more realistic representation of vegetation patterns with transitional gradients from one vegetation community to another. Arbitrary, though often unrealistic, sharp boundaries can be imposed on the model by the application of statistical methods. This GIS-integrated multivariate approach is applied to the problem of vegetation mapping in the complex vegetation communities of the Innisfail Lowlands in the Wet Tropics bioregion of Northeastern Australia. The paper presents the full cycle of this vegetation modelling approach including sampling sites, variable selection, model selection, model implementation, internal model assessment, model prediction assessments, models integration of discrete vegetation community models to generate a composite pre-clearing vegetation map, independent data set model validation and model prediction's scale assessments. An accurate pre-clearing vegetation map of the Innisfail Lowlands was generated (0.83r(2)) through GIS integration of 28 separate statistical models. This modelling approach has good potential for wider application, including provision of. vital information for conservation planning and management; a scientific basis for rehabilitation of disturbed and cleared areas; a viable method for the production of adequate vegetation maps for conservation and forestry planning of poorly-studied areas. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.
Resumo:
Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.
Resumo:
Endogenous and environmental variables are fundamental in explaining variations in fish condition. Based on more than 20 yr of fish weight and length data, relative condition indices were computed for anchovy and sardine caught in the Gulf of Lions. Classification and regression trees (CART) were used to identify endogenous factors affecting fish condition, and to group years of similar condition. Both species showed a similar annual cycle with condition being minimal in February and maximal in July. CART identified 3 groups of years where the fish populations generally showed poor, average and good condition and within which condition differed between age classes but not according to sex. In particular, during the period of poor condition (mostly recent years), sardines older than 1 yr appeared to be more strongly affected than younger individuals. Time-series were analyzed using generalized linear models (GLMs) to examine the effects of oceanographic abiotic (temperature, Western Mediterranean Oscillation [WeMO] and Rhone outflow) and biotic (chlorophyll a and 6 plankton classes) factors on fish condition. The selected models explained 48 and 35% of the variance of anchovy and sardine condition, respectively. Sardine condition was negatively related to temperature but positively related to the WeMO and mesozooplankton and diatom concentrations. A positive effect of mesozooplankton and Rhone runoff on anchovy condition was detected. The importance of increasing temperatures and reduced water mixing in the NW Mediterranean Sea, affecting planktonic productivity and thus fish condition by bottom-up control processes, was highlighted by these results. Changes in plankton quality, quantity and phenology could lead to insufficient or inadequate food supply for both species.
Resumo:
Esse estudo analisa dados do vestibular da Universidade Federal de Minas Gerais de 2004, mediante um modelo de regressão não paramétrico, o Classification and Regression Trees. Seu objetivo foi identificar os principais fatores de aprovação e, também, verificar se esses fatores eram os mesmos para os cursos diurnos e noturnos. A resposta a essas questões permitiria verificar se a expansão do turno noturno feita por essa universidade vinha promovendo maior inserção social. Observou-se que, em geral, a conclusão do ensino médio em escolas públicas federais ou particulares, o conhecimento de língua estrangeira e o pertencimento a um grupo socioeconômico alto são fatores fortemente associados à aprovação do candidato. Verificou-se, ainda, que nos cursos noturnos as variáveis socioeconômicas têm maior relevância, enquanto nos cursos diurnos a formação do candidato adquire maior peso. Finalmente, o fator socioeconômico médio tende a ser maior para os candidatos aprovados.
Resumo:
Le but de cette thèse est d’expliquer la délinquance prolifique de certains délinquants. Nous avançons la thèse que la délinquance prolifique s’explique par la formation plus fréquente de situations criminogènes. Ces situations réfèrent au moment où un délinquant entre en interaction avec une opportunité criminelle dans un contexte favorable au crime. Plus exactement, il s’agit du moment où le délinquant fait face à cette opportunité, mais où le crime n’a pas encore été commis. La formation de situations criminogènes est facilitée par l’interaction et l’interdépendance de trois éléments : la propension à la délinquance de la personne, son entourage criminalisé et son style de vie. Ainsi, la délinquance prolifique ne pourrait être expliquée adéquatement sans tenir compte de l’interaction entre le risque individuel et le risque contextuel. L’objectif général de la présente thèse est de faire la démonstration de l’importance d’une modélisation interactionnelle entre le risque individuel et le risque contextuel afin d’expliquer la délinquance plus prolifique de certains contrevenants. Pour ce faire, 155 contrevenants placés sous la responsabilité de deux établissements des Services correctionnels du Québec et de quatre centres jeunesse du Québec ont complété un protocole d’évaluation par questionnaires auto-administrés. Dans un premier temps (chapitre trois), nous avons décrit et comparé la nature de la délinquance autorévélée des contrevenants de notre échantillon. Ce premier chapitre de résultats a permis de mettre en valeur le fait que ce bassin de contrevenants est similaire à d’autres échantillons de délinquants en ce qui a trait à la nature de leur délinquance, plus particulièrement, au volume, à la variété et à la gravité de leurs crimes. En effet, la majorité des participants rapportent un volume faible de crimes contre la personne et contre les biens alors qu’un petit groupe se démarque par un lambda très élevé (13,1 % des délinquants de l’échantillon sont responsables de 60,3% de tous les crimes rapportés). Environ quatre délinquants sur cinq rapportent avoir commis au moins un crime contre la personne et un crime contre les biens. De plus, plus de 50% de ces derniers rapportent dans au moins quatre sous-catégories. Finalement, bien que les délinquants de notre échantillon aient un IGC (indice de gravité de la criminalité) moyen relativement faible (médiane = 77), près de 40% des contrevenants rapportent avoir commis au moins un des deux crimes les plus graves recensés dans cette étude (décharger une arme et vol qualifié). Le second objectif spécifique était d’explorer, au chapitre quatre, l’interaction entre les caractéristiques personnelles, l’entourage et le style de vie des délinquants dans la formation de situations criminogènes. Les personnes ayant une propension à la délinquance plus élevée semblent avoir tendance à être davantage entourées de personnes criminalisées et à avoir un style de vie plus oisif. L’entourage criminalisé semble également influencer le style de vie de ces délinquants. Ainsi, l’interdépendance entre ces trois éléments facilite la formation plus fréquente de situations criminogènes et crée une conjoncture propice à l’émergence de la délinquance prolifique. Le dernier objectif spécifique de la thèse, qui a été couvert dans le chapitre cinq, était d’analyser l’impact de la formation de situations criminogènes sur la nature de la délinquance. Les analyses de régression linéaires multiples et les arbres de régression ont permis de souligner la contribution des caractéristiques personnelles, de l’entourage et du style de vie dans l’explication de la nature de la délinquance. D’un côté, les analyses de régression (modèles additifs) suggèrent que l’ensemble des éléments favorisant la formation de situations criminogènes apporte une contribution unique à l’explication de la délinquance. D’un autre côté, les arbres de régression nous ont permis de mieux comprendre l’interaction entre les éléments dans l’explication de la délinquance prolifique. En effet, un positionnement plus faible sur certains éléments peut être compensé par un positionnement plus élevé sur d’autres. De plus, l’accumulation d’éléments favorisant la formation de situations criminogènes ne se fait pas de façon linéaire. Ces conclusions sont appuyées sur des proportions de variance expliquée plus élevées que celles des régressions linéaires multiples. En conclusion, mettre l’accent que sur un seul élément (la personne et sa propension à la délinquance ou le contexte et ses opportunités) ou leur combinaison de façon simplement additive ne permet pas de rendre justice à la complexité de l’émergence de la délinquance prolifique. En mettant à l’épreuve empiriquement cette idée généralement admise, cette thèse permet donc de souligner l’importance de considérer l’interaction entre le risque individuel et le risque contextuel dans l’explication de la délinquance prolifique.