908 resultados para CLASSIFICATION AND REGRESSION TREE
Resumo:
Background: Tuberculosis (TB) remains a public health issue worldwide. The lack of specific clinical symptoms to diagnose TB makes the correct decision to admit patients to respiratory isolation a difficult task for the clinician. Isolation of patients without the disease is common and increases health costs. Decision models for the diagnosis of TB in patients attending hospitals can increase the quality of care and decrease costs, without the risk of hospital transmission. We present a predictive model for predicting pulmonary TB in hospitalized patients in a high prevalence area in order to contribute to a more rational use of isolation rooms without increasing the risk of transmission. Methods: Cross sectional study of patients admitted to CFFH from March 2003 to December 2004. A classification and regression tree (CART) model was generated and validated. The area under the ROC curve (AUC), sensitivity, specificity, positive and negative predictive values were used to evaluate the performance of model. Validation of the model was performed with a different sample of patients admitted to the same hospital from January to December 2005. Results: We studied 290 patients admitted with clinical suspicion of TB. Diagnosis was confirmed in 26.5% of them. Pulmonary TB was present in 83.7% of the patients with TB (62.3% with positive sputum smear) and HIV/AIDS was present in 56.9% of patients. The validated CART model showed sensitivity, specificity, positive predictive value and negative predictive value of 60.00%, 76.16%, 33.33%, and 90.55%, respectively. The AUC was 79.70%. Conclusions: The CART model developed for these hospitalized patients with clinical suspicion of TB had fair to good predictive performance for pulmonary TB. The most important variable for prediction of TB diagnosis was chest radiograph results. Prospective validation is still necessary, but our model offer an alternative for decision making in whether to isolate patients with clinical suspicion of TB in tertiary health facilities in countries with limited resources.
Resumo:
Background Individual signs and symptoms are of limited value for the diagnosis of influenza. Objective To develop a decision tree for the diagnosis of influenza based on a classification and regression tree (CART) analysis. Methods Data from two previous similar cohort studies were assembled into a single dataset. The data were randomly divided into a development set (70%) and a validation set (30%). We used CART analysis to develop three models that maximize the number of patients who do not require diagnostic testing prior to treatment decisions. The validation set was used to evaluate overfitting of the model to the training set. Results Model 1 has seven terminal nodes based on temperature, the onset of symptoms and the presence of chills, cough and myalgia. Model 2 was a simpler tree with only two splits based on temperature and the presence of chills. Model 3 was developed with temperature as a dichotomous variable (≥38°C) and had only two splits based on the presence of fever and myalgia. The area under the receiver operating characteristic curves (AUROCC) for the development and validation sets, respectively, were 0.82 and 0.80 for Model 1, 0.75 and 0.76 for Model 2 and 0.76 and 0.77 for Model 3. Model 2 classified 67% of patients in the validation group into a high- or low-risk group compared with only 38% for Model 1 and 54% for Model 3. Conclusions A simple decision tree (Model 2) classified two-thirds of patients as low or high risk and had an AUROCC of 0.76. After further validation in an independent population, this CART model could support clinical decision making regarding influenza, with low-risk patients requiring no further evaluation for influenza and high-risk patients being candidates for empiric symptomatic or drug therapy.
Resumo:
Abstract Background Smear negative pulmonary tuberculosis (SNPT) accounts for 30% of pulmonary tuberculosis cases reported yearly in Brazil. This study aimed to develop a prediction model for SNPT for outpatients in areas with scarce resources. Methods The study enrolled 551 patients with clinical-radiological suspicion of SNPT, in Rio de Janeiro, Brazil. The original data was divided into two equivalent samples for generation and validation of the prediction models. Symptoms, physical signs and chest X-rays were used for constructing logistic regression and classification and regression tree models. From the logistic regression, we generated a clinical and radiological prediction score. The area under the receiver operator characteristic curve, sensitivity, and specificity were used to evaluate the model's performance in both generation and validation samples. Results It was possible to generate predictive models for SNPT with sensitivity ranging from 64% to 71% and specificity ranging from 58% to 76%. Conclusion The results suggest that those models might be useful as screening tools for estimating the risk of SNPT, optimizing the utilization of more expensive tests, and avoiding costs of unnecessary anti-tuberculosis treatment. Those models might be cost-effective tools in a health care network with hierarchical distribution of scarce resources.
Resumo:
Risk assessment systems for introduced species are being developed and applied globally, but methods for rigorously evaluating them are still in their infancy. We explore classification and regression tree models as an alternative to the current Australian Weed Risk Assessment system, and demonstrate how the performance of screening tests for unwanted alien species may be quantitatively compared using receiver operating characteristic (ROC) curve analysis. The optimal classification tree model for predicting weediness included just four out of a possible 44 attributes of introduced plants examined, namely: (i) intentional human dispersal of propagules; (ii) evidence of naturalization beyond native range; (iii) evidence of being a weed elsewhere; and (iv) a high level of domestication. Intentional human dispersal of propagules in combination with evidence of naturalization beyond a plants native range led to the strongest prediction of weediness. A high level of domestication in combination with no evidence of naturalization mitigated the likelihood of an introduced plant becoming a weed resulting from intentional human dispersal of propagules. Unlikely intentional human dispersal of propagules combined with no evidence of being a weed elsewhere led to the lowest predicted probability of weediness. The failure to include intrinsic plant attributes in the model suggests that either these attributes are not useful general predictors of weediness, or data and analysis were inadequate to elucidate the underlying relationship(s). This concurs with the historical pessimism that we will ever be able to accurately predict invasive plants. Given the apparent importance of propagule pressure (the number of individuals of an species released), future attempts at evaluating screening model performance for identifying unwanted plants need to account for propagule pressure when collating and/or analysing datasets. The classification tree had a cross-validated sensitivity of 93.6% and specificity of 36.7%. Based on the area under the ROC curve, the performance of the classification tree in correctly classifying plants as weeds or non-weeds was slightly inferior (Area under ROC curve = 0.83 +/- 0.021 (+/- SE)) to that of the current risk assessment system in use (Area under ROC curve = 0.89 +/- 0.018 (+/- SE)), although requires many fewer questions to be answered.
Resumo:
Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.
Resumo:
Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.
Impact of cancer-related symptom synergisms on health-related quality of life and performance status
Resumo:
To identify the impact of multiple symptoms and their co-occurrence on health-related quality of life (HRQOL) dimensions and performance status (PS), 115 outpatients with cancer, who were not receiving active cancer treatment and were recruited from, a university hospital in Sao Paulo, Brazil completed the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire-C30, the Beck Depression Inventory, and the Brief Pain Inventory. Karnofsky Performance Status scores also were completed. Application of TwoStep Cluster analysis resulted in two distinct patient subgroups based on 113 patient experiences with pain, depression, fatigue, insomnia, constipation, lack of appetite, dyspnea, nausea, vomiting, and diarrhea. One group had multiple and severe symptom subgroup and another had Less symptoms and with lower severity. Multiple and severe symptoms had worse PS, role functioning, and physical, emotional, cognitive, social, and overall HRQOL. Multiple and severe symptom subgroup was also six times as likely as lower severity to have poor role functioning;five times more likely to have poor emotional;four times more likely to have poor PS, physical, and overall HRQOL, and three times as likely to have poor cognitive and social HRQOL, independent of gender, age, level of education, and economic condition. Classification and Regression Tree analyses were undertaken to identify which co-occurring symptoms would best determine reduction in HRQOL and PS. Pain and fatigue were identified as indicators of reduction on physical HRQOL and PS. Fatigue and insomnia were associated with reduction in cognitive; depression and pain in social; and fatigue and constipation in role functioning. Only depression was associated with reduction in overall HRQOL. These data demonstrate that there is a synergic effect among distinct cancer symptoms that result in reduction in HRQOL dimensions and PS.
Resumo:
Objectives: To measure the health-related quality of life (HRQoL) of multiple sclerosis (MS) patients and their caregivers, and to assess which factors can best describe HRQoL. Methods: A cross-sectional multicenter study of nine hospitals enrolled MS patients and their caregivers who attended outpatient clinics consecutively. The instruments used were the SF-36 for patients and the SF-12 and GHQ-12 for caregivers. Classification and regression tree analysis was used to analyze the explanatory factors of HRQoL. Results: A total of 705 patients (mean age 40.4 years, median Expanded Disability Status Scale 2.5, 77.8% with relapsing-remitting MS) and 551 caregivers (mean age 45.4 years) participated in the study. MS patients had significantly lower HRQoL than in the general population (physical SF-36: 39.9; 95% confidence interval [CI]: 39.1–40.6; mental SF-36: 44.4; 95% CI: 43.5–45.3). Caregivers also presented lower HRQoL than general population, especially in its mental domain (mental SF-12: 46.4; 95% CI: 45.5–47.3). Moreover, according to GHQ-12, 27% of caregivers presented probable psychological distress. Disability and co-morbidity in patients, and co-morbidity and employment status in caregivers, were the most important explanatory factors of their HRQoL. Conclusions: Not only the HRQoL of patients with MS, but also that of their caregivers, is indeed notably affected. Caregivers’ HRQoL is close to population of chronic illness even that the patients sample has a mild clinical severity and that caregiving role is a usual task in the study context
Resumo:
INTRODUCTION: Optimal identification of subtle cognitive impairment in the primary care setting requires a very brief tool combining (a) patients' subjective impairments, (b) cognitive testing, and (c) information from informants. The present study developed a new, very quick and easily administered case-finding tool combining these assessments ('BrainCheck') and tested the feasibility and validity of this instrument in two independent studies. METHODS: We developed a case-finding tool comprised of patient-directed (a) questions about memory and depression and (b) clock drawing, and (c) the informant-directed 7-item version of the Informant Questionnaire on Cognitive Decline in the Elderly (IQCODE). Feasibility study: 52 general practitioners rated the feasibility and acceptance of the patient-directed tool. Validation study: An independent group of 288 Memory Clinic patients (mean ± SD age = 76.6 ± 7.9, education = 12.0 ± 2.6; 53.8% female) with diagnoses of mild cognitive impairment (n = 80), probable Alzheimer's disease (n = 185), or major depression (n = 23) and 126 demographically matched, cognitively healthy volunteer participants (age = 75.2 ± 8.8, education = 12.5 ± 2.7; 40% female) partook. All patient and healthy control participants were administered the patient-directed tool, and informants of 113 patient and 70 healthy control participants completed the very short IQCODE. RESULTS: Feasibility study: General practitioners rated the patient-directed tool as highly feasible and acceptable. Validation study: A Classification and Regression Tree analysis generated an algorithm to categorize patient-directed data which resulted in a correct classification rate (CCR) of 81.2% (sensitivity = 83.0%, specificity = 79.4%). Critically, the CCR of the combined patient- and informant-directed instruments (BrainCheck) reached nearly 90% (that is 89.4%; sensitivity = 97.4%, specificity = 81.6%). CONCLUSION: A new and very brief instrument for general practitioners, 'BrainCheck', combined three sources of information deemed critical for effective case-finding (that is, patients' subject impairments, cognitive testing, informant information) and resulted in a nearly 90% CCR. Thus, it provides a very efficient and valid tool to aid general practitioners in deciding whether patients with suspected cognitive impairments should be further evaluated or not ('watchful waiting').
Resumo:
Two types of ecological thresholds are now being widely used to develop conservation targets: breakpoint-based thresholds represent tipping points where system properties change dramatically, whereas classification thresholds identify groups of data points with contrasting properties. Both breakpoint-based and classification thresholds are useful tools in evidence-based conservation. However, it is critical that the type of threshold to be estimated corresponds with the question of interest and that appropriate statistical procedures are used to determine its location. On the basis of their statistical properties, we recommend using piecewise regression methods to identify breakpoint-based thresholds and discriminant analysis or classification and regression trees to identify classification thresholds.
Resumo:
Background: In Cambodia, malaria transmission is low and most cases occur in forested areas. Seroepidemiological techniques can be used to identify both areas of ongoing transmission and high-risk groups to be targeted by control interventions. This study utilizes repeated cross-sectional data to assess the risk of being malaria sero-positive at two consecutive time points during the rainy season and investigates who is most likely to sero-convert over the transmission season. Methods: In 2005, two cross-sectional surveys, one in the middle and the other at the end of the malaria transmission season, were carried out in two ecologically distinct regions in Cambodia. Parasitological and serological data were collected in four districts. Antibodies to Plasmodium falciparum Glutamate Rich Protein (GLURP) and Plasmodium vivax Merozoite Surface Protein-119 (MSP-119) were detected using Enzyme Linked Immunosorbent Assay (ELISA). The force of infection was estimated using a simple catalytic model fitted using maximum likelihood methods. Risks for sero-converting during the rainy season were analysed using the Classification and Regression Tree (CART) method. Results: A total of 804 individuals participating in both surveys were analysed. The overall parasite prevalence was low (4.6% and 2.0% for P. falciparum and 7.9% and 6.0% for P. vivax in August and November respectively). P. falciparum force of infection was higher in the eastern region and increased between August and November, whilst P. vivax force of infection was higher in the western region and remained similar in both surveys. In the western region, malaria transmission changed very little across the season (for both species). CART analysis for P. falciparum in the east highlighted age, ethnicity, village of residence and forest work as important predictors for malaria exposure during the rainy season. Adults were more likely to increase their antibody responses to P. falciparum during the transmission season than children, whilst members of the Charay ethnic group demonstrated the largest increases. Discussion: In areas of low transmission intensity, such as in Cambodia, the analysis of longitudinal serological data enables a sensitive evaluation of transmission dynamics. Consecutive serological surveys allow an insight into spatio-temporal patterns of malaria transmission. The use of CART enabled multiple interactions to be accounted for simultaneously and permitted risk factors for exposure to be clearly identified.
Resumo:
Objective:The most difficult thyroid tumors to be diagnosed by cytology and histology are conventional follicular carcinomas (cFTCs) and oncocytic follicular carcinomas (oFTCs). Several microRNAs (miRNAs) have been previously found to be consistently deregulated in papillary thyroid carcinomas; however, very limited information is available for cFTC and oFTC. The aim of this study was to explore miRNA deregulation and find candidate miRNA markers for follicular carcinomas that can be used diagnostically.Design:Thirty-eight follicular thyroid carcinomas (21 cFTCs, 17 oFTCs) and 10 normal thyroid tissue samples were studied for expression of 381 miRNAs using human microarray assays. Expression of deregulated miRNAs was confirmed by individual RT-PCR assays in all samples. In addition, 11 follicular adenomas, two hyperplastic nodules (HNs), and 19 fine-needle aspiration samples were studied for expression of novel miRNA markers detected in this study.Results:The unsupervised hierarchical clustering analysis demonstrated individual clusters for cFTC and oFTC, indicating the difference in miRNA expression between these tumor types. Both cFTCs and oFTCs showed an up-regulation of miR-182/-183/-221/-222/-125a-3p and a down-regulation of miR-542-5p/-574-3p/-455/-199a. Novel miRNA (miR-885-5p) was found to be strongly up-regulated (>40-fold) in oFTCs but not in cFTCs, follicular adenomas, and HNs. The classification and regression tree algorithm applied to fine-needle aspiration samples demonstrated that three dysregulated miRNAs (miR-885-5p/-221/-574-3p) allowed distinguishing follicular thyroid carcinomas from benign HNs with high accuracy.Conclusions:In this study we demonstrate that different histopathological types of follicular thyroid carcinomas have distinct miRNA expression profiles. MiR-885-5p is highly up-regulated in oncocytic follicular carcinomas and may serve as a diagnostic marker for these tumors. A small set of deregulated miRNAs allows for an accurate discrimination between follicular carcinomas and hyperplastic nodules and can be used diagnostically in fine-needle aspiration biopsies.
Resumo:
INTRODUCTION Optimal identification of subtle cognitive impairment in the primary care setting requires a very brief tool combining (a) patients' subjective impairments, (b) cognitive testing, and (c) information from informants. The present study developed a new, very quick and easily administered case-finding tool combining these assessments ('BrainCheck') and tested the feasibility and validity of this instrument in two independent studies. METHODS We developed a case-finding tool comprised of patient-directed (a) questions about memory and depression and (b) clock drawing, and (c) the informant-directed 7-item version of the Informant Questionnaire on Cognitive Decline in the Elderly (IQCODE). Feasibility study: 52 general practitioners rated the feasibility and acceptance of the patient-directed tool. Validation study: An independent group of 288 Memory Clinic patients (mean ± SD age = 76.6 ± 7.9, education = 12.0 ± 2.6; 53.8% female) with diagnoses of mild cognitive impairment (n = 80), probable Alzheimer's disease (n = 185), or major depression (n = 23) and 126 demographically matched, cognitively healthy volunteer participants (age = 75.2 ± 8.8, education = 12.5 ± 2.7; 40% female) partook. All patient and healthy control participants were administered the patient-directed tool, and informants of 113 patient and 70 healthy control participants completed the very short IQCODE. RESULTS Feasibility study: General practitioners rated the patient-directed tool as highly feasible and acceptable. Validation study: A Classification and Regression Tree analysis generated an algorithm to categorize patient-directed data which resulted in a correct classification rate (CCR) of 81.2% (sensitivity = 83.0%, specificity = 79.4%). Critically, the CCR of the combined patient- and informant-directed instruments (BrainCheck) reached nearly 90% (that is 89.4%; sensitivity = 97.4%, specificity = 81.6%). CONCLUSION A new and very brief instrument for general practitioners, 'BrainCheck', combined three sources of information deemed critical for effective case-finding (that is, patients' subject impairments, cognitive testing, informant information) and resulted in a nearly 90% CCR. Thus, it provides a very efficient and valid tool to aid general practitioners in deciding whether patients with suspected cognitive impairments should be further evaluated or not ('watchful waiting').
Resumo:
Aim Our aim was to discriminate different species of Pinus via pollen analysis in order to assess the responses of particular pine species to orbital and millennial-scale climate changes, particularly during the last glacial period. Location Modern pollen grains were collected from current pine populations along transects from the Pyrenees to southern Iberia and the Balearic Islands. Fossil pine pollen was recovered from the south-western Iberian margin core MD95-2042. Methods We measured a set of morphological traits of modern pollen from the Iberian pine species Pinus nigra, P. sylvestris, P. halepensis, P. pinea and P. pinaster and of fossil pine pollen from selected samples of the last glacial period and the early to mid-Holocene. Classification and regression tree (CART) analysis was used to establish a model from the modern dataset that discriminates pollen from the different pine species and allows identification of fossil pine pollen at the species level. Results The CART model was effective in separating pollen of P. nigra and P. sylvestris from that of the Mediterranean pine group (P. halepensis, P. pinea and P. pinaster). The pollen of Pinus nigra diverged from that of P. sylvestris by having a more flattened corpus. Predictions using this model suggested that fossil pine pollen is mainly from P. nigra in all the samples analysed. Pinus sylvestris was more abundant in samples from Greenland stadials than Heinrich stadials, whereas Mediterranean pines increased in samples from Greenland interstadials and during the early to mid-Holocene. Main conclusions Morphological parameters can be successfully used to increase the taxonomic resolution of fossil pine pollen at the species level for the highland pines (P. nigra and P. sylvestris) and at the group of species level for the Mediterranean pines. Our study indicates that P. nigra was the dominant component of the last glacial south-western/central Iberian pinewoods, although the species composition of these woodlands varied in response to abrupt climate changes.