71 resultados para Multiple Additive Regression Trees (MART)
em Université de Lausanne, Switzerland
Resumo:
PURPOSE: According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. METHOD: About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). RESULTS: The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. CONCLUSION: Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables like soil gas radon measurements as well as more detailed geological information.
Resumo:
Background Individual signs and symptoms are of limited value for the diagnosis of influenza. Objective To develop a decision tree for the diagnosis of influenza based on a classification and regression tree (CART) analysis. Methods Data from two previous similar cohort studies were assembled into a single dataset. The data were randomly divided into a development set (70%) and a validation set (30%). We used CART analysis to develop three models that maximize the number of patients who do not require diagnostic testing prior to treatment decisions. The validation set was used to evaluate overfitting of the model to the training set. Results Model 1 has seven terminal nodes based on temperature, the onset of symptoms and the presence of chills, cough and myalgia. Model 2 was a simpler tree with only two splits based on temperature and the presence of chills. Model 3 was developed with temperature as a dichotomous variable (≥38°C) and had only two splits based on the presence of fever and myalgia. The area under the receiver operating characteristic curves (AUROCC) for the development and validation sets, respectively, were 0.82 and 0.80 for Model 1, 0.75 and 0.76 for Model 2 and 0.76 and 0.77 for Model 3. Model 2 classified 67% of patients in the validation group into a high- or low-risk group compared with only 38% for Model 1 and 54% for Model 3. Conclusions A simple decision tree (Model 2) classified two-thirds of patients as low or high risk and had an AUROCC of 0.76. After further validation in an independent population, this CART model could support clinical decision making regarding influenza, with low-risk patients requiring no further evaluation for influenza and high-risk patients being candidates for empiric symptomatic or drug therapy.
Resumo:
An important statistical development of the last 30 years has been the advance in regression analysis provided by generalized linear models (GLMs) and generalized additive models (GAMs). Here we introduce a series of papers prepared within the framework of an international workshop entitled: Advances in GLMs/GAMs modeling: from species distribution to environmental management, held in Riederalp, Switzerland, 6-11 August 2001.We first discuss some general uses of statistical models in ecology, as well as provide a short review of several key examples of the use of GLMs and GAMs in ecological modeling efforts. We next present an overview of GLMs and GAMs, and discuss some of their related statistics used for predictor selection, model diagnostics, and evaluation. Included is a discussion of several new approaches applicable to GLMs and GAMs, such as ridge regression, an alternative to stepwise selection of predictors, and methods for the identification of interactions by a combined use of regression trees and several other approaches. We close with an overview of the papers and how we feel they advance our understanding of their application to ecological modeling.
Resumo:
BACKGROUND: We sought to improve upon previously published statistical modeling strategies for binary classification of dyslipidemia for general population screening purposes based on the waist-to-hip circumference ratio and body mass index anthropometric measurements. METHODS: Study subjects were participants in WHO-MONICA population-based surveys conducted in two Swiss regions. Outcome variables were based on the total serum cholesterol to high density lipoprotein cholesterol ratio. The other potential predictor variables were gender, age, current cigarette smoking, and hypertension. The models investigated were: (i) linear regression; (ii) logistic classification; (iii) regression trees; (iv) classification trees (iii and iv are collectively known as "CART"). Binary classification performance of the region-specific models was externally validated by classifying the subjects from the other region. RESULTS: Waist-to-hip circumference ratio and body mass index remained modest predictors of dyslipidemia. Correct classification rates for all models were 60-80%, with marked gender differences. Gender-specific models provided only small gains in classification. The external validations provided assurance about the stability of the models. CONCLUSIONS: There were no striking differences between either the algebraic (i, ii) vs. non-algebraic (iii, iv), or the regression (i, iii) vs. classification (ii, iv) modeling approaches. Anticipated advantages of the CART vs. simple additive linear and logistic models were less than expected in this particular application with a relatively small set of predictor variables. CART models may be more useful when considering main effects and interactions between larger sets of predictor variables.
Resumo:
OBJECTIVE: To provide information on the effects of alcohol and tobacco on laryngeal cancer and its subsites. METHODS: This was a case-control study conducted between 1992 and 2000 in northern Italy and Switzerland. A total of 527 cases of incident squamous-cell carcinoma of the larynx and 1297 hospital controls frequency-matched with cases on age, sex, and area of residence were included. Odds ratios (ORs) and corresponding 95% confidence intervals (CIs) were estimated using multiple logistic regression. RESULTS: In comparison with never smokers, ORs were 19.8 for current smokers and 7.0 for ex-smokers. The risk increased in relation to the number of cigarettes (OR = 42.9 for > or = 25 cigarettes/day) and for duration of smoking (OR = 37.2 for > or = 40 years). For alcohol, the risk increased in relation to number of drinks (OR = 5.9 for > or = 56 drinks per week). Combined alcohol and tobacco consumption showed a multiplicative (OR = 177) rather than an additive risk. For current smokers and current drinkers the risk was higher for supraglottis (ORs 54.9 and 2.6, respectively) than for glottis (ORs 7.4 and 1.8) and others subsites (ORs 10.9 and 1.9). CONCLUSIONS: Our study shows that both cigarette smoking and alcohol drinking are independent risk factors for laryngeal cancer. Heavy consumption of alcohol and cigarettes determined a multiplicative risk increase, possibly suggesting biological synergy.
Resumo:
RATIONALE: A dysregulation of the hypothalamic-pituitary-adrenal (HPA) axis is a well-documented neurobiological finding in major depression. Moreover, clinically effective therapy with antidepressant drugs may normalize the HPA axis activity. OBJECTIVE: The aim of this study was to test whether citalopram (R/S-CIT) affects the function of the HPA axis in patients with major depression (DSM IV). METHODS: Twenty depressed patients (11 women and 9 men) were challenged with a combined dexamethasone (DEX) suppression and corticotropin-releasing hormone (CRH) stimulation test (DEX/CRH test) following a placebo week and after 2, 4, and 16 weeks of 40 mg/day R/S-CIT treatment. RESULTS: The results show a time-dependent reduction of adrenocorticotrophic hormone (ACTH) and cortisol response during the DEX/CRH test both in treatment responders and nonresponders within 16 weeks. There was a significant relationship between post-DEX baseline cortisol levels (measured before administration of CRH) and severity of depression at pretreatment baseline. Multiple linear regression analyses were performed to identify the impact of psychopathology and hormonal stress responsiveness and R/S-CIT concentrations in plasma and cerebrospinal fluid (CSF). The magnitude of decrease in cortisol responsivity from pretreatment baseline to week 4 on drug [delta-area under the curve (AUC) cortisol] was a significant predictor (p<0.0001) of the degree of symptom improvement following 16 weeks on drug (i.e., decrease in HAM-D21 total score). The model demonstrated that the interaction of CSF S-CIT concentrations and clinical improvement was the most powerful predictor of AUC cortisol responsiveness. CONCLUSION: The present study shows that decreased AUC cortisol was highly associated with S-CIT concentrations in plasma and CSF. Therefore, our data suggest that the CSF or plasma S-CIT concentrations rather than the R/S-CIT dose should be considered as an indicator of the selective serotonergic reuptake inhibitors (SSRIs) effect on HPA axis responsiveness as measured by AUC cortisol response.
Resumo:
This study aimed at identifying clinical factors for predicting hematologic toxicity after radioimmunotherapy with (90)Y-ibritumomab tiuxetan or (131)I-tositumomab in clinical practice. Hematologic data were available from 14 non-Hodgkin lymphoma patients treated with (90)Y-ibritumomab tiuxetan and 18 who received (131)I-tositumomab. The percentage baseline at nadir and 4 wk post nadir and the time to nadir were selected as the toxicity indicators for both platelets and neutrophils. Multiple linear regression analysis was performed to identify significant predictors (P < 0.05) of each indicator. For both platelets and neutrophils, pooled and separate analyses of (90)Y-ibritumomab tiuxetan and (131)I-tositumomab data yielded the time elapsed since the last chemotherapy as the only significant predictor of the percentage baseline at nadir. The extent of bone marrow involvement was not a significant factor in this study, possibly because of the short time elapsed since the last chemotherapy of the 7 patients with bone marrow involvement. Because both treatments were designed to deliver a comparable bone marrow dose, this factor also was not significant. None of the 14 factors considered was predictive of the time to nadir. The R(2) value for the model predicting percentage baseline at nadir was 0.60 for platelets and 0.40 for neutrophils. This model predicted the platelet and neutrophil toxicity grade to within ±1 for 28 and 30 of the 32 patients, respectively. For the 7 patients predicted with grade I thrombocytopenia, 6 of whom had actual grade I-II, dosing might be increased to improve treatment efficacy. The elapsed time since the last chemotherapy can be used to predict hematologic toxicity and customize the current dosing method in radioimmunotherapy.
Resumo:
PURPOSE: Bioaerosols and their constituents, such as endotoxins, are capable of causing an inflammatory reaction at the level of the lung-blood barrier, which becomes more permeable. Thus, it was hypothesized that occupational exposure to bioaerosols can increase leakage of surfactant protein-D (SP-D), a lung-specific protein, into the bloodstream. METHODS: SP-D was determined by ELISA in 316 wastewater workers, 67 garbage collectors, and 395 control subjects. Exposure was assessed with four interview-based indicators and by preliminary endotoxin measurements using the Limulus amoebocyte lysate assay. Influence of exposure on serum SP-D was assessed by multiple linear regression considering smoking, glomerular function, lung diseases, obesity, and other confounders. RESULTS: Overall, mean exposure levels to endotoxins were below 100 EU/m(3). However, special tasks of wastewater workers caused higher endotoxin exposure. SP-D concentration was slightly increased in this occupational group and associated with the occurrence of splashes and contact to raw sewage. No effect was found in garbage collectors. Smoking increased serum SP-D. No clinically relevant correlation between spirometry results and SP-D concentrations appeared. CONCLUSIONS: These results support the hypothesis that inhalation of bioaerosols, even at low concentrations, has a subclinical effect on the lung-blood barrier, the permeability of which increases without associated spirometric changes.
Resumo:
An online algorithm for determining respiratory mechanics in patients using non-invasive ventilation (NIV) in pressure support mode was developed and embedded in a ventilator system. Based on multiple linear regression (MLR) of respiratory data, the algorithm was tested on a patient bench model under conditions with and without leak and simulating a variety of mechanics. Bland-Altman analysis indicates reliable measures of compliance across the clinical range of interest (± 11-18% limits of agreement). Resistance measures showed large quantitative errors (30-50%), however, it was still possible to qualitatively distinguish between normal and obstructive resistances. This outcome provides clinically significant information for ventilator titration and patient management.
Resumo:
1. Species distribution modelling is used increasingly in both applied and theoretical research to predict how species are distributed and to understand attributes of species' environmental requirements. In species distribution modelling, various statistical methods are used that combine species occurrence data with environmental spatial data layers to predict the suitability of any site for that species. While the number of data sharing initiatives involving species' occurrences in the scientific community has increased dramatically over the past few years, various data quality and methodological concerns related to using these data for species distribution modelling have not been addressed adequately. 2. We evaluated how uncertainty in georeferences and associated locational error in occurrences influence species distribution modelling using two treatments: (1) a control treatment where models were calibrated with original, accurate data and (2) an error treatment where data were first degraded spatially to simulate locational error. To incorporate error into the coordinates, we moved each coordinate with a random number drawn from the normal distribution with a mean of zero and a standard deviation of 5 km. We evaluated the influence of error on the performance of 10 commonly used distributional modelling techniques applied to 40 species in four distinct geographical regions. 3. Locational error in occurrences reduced model performance in three of these regions; relatively accurate predictions of species distributions were possible for most species, even with degraded occurrences. Two species distribution modelling techniques, boosted regression trees and maximum entropy, were the best performing models in the face of locational errors. The results obtained with boosted regression trees were only slightly degraded by errors in location, and the results obtained with the maximum entropy approach were not affected by such errors. 4. Synthesis and applications. To use the vast array of occurrence data that exists currently for research and management relating to the geographical ranges of species, modellers need to know the influence of locational error on model quality and whether some modelling techniques are particularly robust to error. We show that certain modelling techniques are particularly robust to a moderate level of locational error and that useful predictions of species distributions can be made even when occurrence data include some error.
Resumo:
OBJECTIVES: Data on the frequency of extraintestinal manifestations (EIMs) in Crohn's disease (CD) and ulcerative colitis (UC) and analyses of their risk factors are scarce. We evaluated their prevalence and risk factors in a large nationwide cohort of inflammatory bowel disease (IBD) patients. METHODS: IBD patients from an adult clinical cohort in Switzerland (Swiss IBD cohort study) were prospectively included. Data from validated physician enrolment questionnaires were analyzed. RESULTS: A total of 950 patients were included, 580 (61%) with CD (mean age 41 years) and 370 (39%) with UC (mean age 42 years). Of these, 249 (43%) of CD and 113 (31%) of UC patients had one to five EIMs. The following EIMs were found: arthritis (CD 33%, UC 21%), aphthous stomatitis (CD 10%, UC 4%), uveitis (CD 6%, UC 4%), erythema nodosum (CD 6%, UC 3%), ankylosing spondylitis (CD 6%, UC 2%), psoriasis (CD 2%, UC 1%), pyoderma gangrenosum (CD and UC each 2%), and primary sclerosing cholangitis (CD 1%, UC 4%). Multiple logistic regression identified the following risk factors for ongoing EIM in CD: active disease (odds ratio (OR)=1.95, 95% confidence interval (CI)=1.17-3.23, P=0.01), and positive IBD family history (OR=1.77, 95% CI=1.07-2.92, P=0.025). No risk factors were identified in UC patients. CONCLUSIONS: EIMs are a frequent problem in CD and UC patients. Active disease and positive IBD family history are associated with ongoing EIM in CD patients. Identification of EIM prevalence and associated risk factors may result in increased awareness for this problem and thereby facilitating their diagnosis and therapeutic management.
Resumo:
OBJECTIVE: To investigate the evolution of delirium of nursing home (NH) residents and their possible predictors. DESIGN: Post-hoc analysis of a prospective cohort assessment. SETTING: Ninety NHs in Switzerland. PARTICIPANTS: Included 14,771 NH residents. MEASUREMENTS: The Resident Assessment Instrument Minimum Data Set and the Nursing Home Confusion Assessment Method were used to determine follow-up of subsyndromal or full delirium in NH residents using discrete Markov chain modeling to describe long-term trajectories and multiple logistic regression analyses to determine predictors of the trajectories. RESULTS: We identified four major types of delirium time courses in NH. Increasing severity of cognitive impairment and of depressive symptoms at the initial assessment predicted the different delirium time courses. CONCLUSION: More pronounced cognitive impairment and depressive symptoms at the initial assessment are associated with different subsequent evolutions of delirium. The presence and evolution of delirium in the first year after NH admission predicted the subsequent course of delirium until death.
Resumo:
Predictive species distribution modelling (SDM) has become an essential tool in biodiversity conservation and management. The choice of grain size (resolution) of environmental layers used in modelling is one important factor that may affect predictions. We applied 10 distinct modelling techniques to presence-only data for 50 species in five different regions, to test whether: (1) a 10-fold coarsening of resolution affects predictive performance of SDMs, and (2) any observed effects are dependent on the type of region, modelling technique, or species considered. Results show that a 10 times change in grain size does not severely affect predictions from species distribution models. The overall trend is towards degradation of model performance, but improvement can also be observed. Changing grain size does not equally affect models across regions, techniques, and species types. The strongest effect is on regions and species types, with tree species in the data sets (regions) with highest locational accuracy being most affected. Changing grain size had little influence on the ranking of techniques: boosted regression trees remain best at both resolutions. The number of occurrences used for model training had an important effect, with larger sample sizes resulting in better models, which tended to be more sensitive to grain. Effect of grain change was only noticeable for models reaching sufficient performance and/or with initial data that have an intrinsic error smaller than the coarser grain size.
Resumo:
BACKGROUND: Serosorting is practiced by men who have sex with men (MSM) to reduce human immunodeficiency virus (HIV) transmission. This study evaluates the prevalence of serosorting with casual partners, and analyses the characteristics and estimated numbers of serosorters in Switzerland 2007-2009. METHODS: Data were extracted from cross-sectional surveys conducted in 2007 and 2009 among self-selected MSM recruited online, through gay newspapers, and through gay organizations. Nested models were fitted to ascertain the appropriateness of pooling the datasets. Multiple logistic regression analysis was performed on pooled data to determine the association between serosorting and demographic, lifestyle-related, and health-related factors. Extrapolations were performed by applying proportions of various types of serosorters to Swiss population data collected in 2007. RESULTS: A significant and stable number of MSM (approximately 39% in 2007 and 2009) intentionally engage in serosorting with casual partners in Switzerland. Variables significantly associated with serosorting were: gay organization membership (aOR = 1.67), frequent internet use for sexual encounters (aOR = 1.71), having had a sexually transmitted infection (STI) at any time in the past 12 months (aOR = 1.70), HIV-positive status (aOR = 0.52), regularly frequenting sex-on-premises venues (aOR = 0.42), and unprotected anal intercourse (UAI) with partners of different or unknown HIV status in the past 12 months (aOR = 0.22). Approximately one-fifth of serosorters declared HIV negativity without being tested in the past 12 months; 15.8% reported not knowing their own HIV status. CONCLUSION: The particular risk profile of serosorters having UAI with casual partners (multiple partners, STI history, and inadequate testing frequency) requires specific preventive interventions tailored to HIV status.
Resumo:
The plasma concentrations of alpha 1-acid glycoprotein (AAG), albumin, triglycerides, cholesterol, and total proteins, as well as the plasma binding of racemic, d-methadone, and l-methadone were measured in 45 healthy subjects. The AAG phenotypes and the concentrations of AAG variants were also determined. The measured free fractions for racemic, d-methadone, and l-methadone were, respectively, 12.7% +/- 3.3%, 10.0% +/- 2.9%, and 14.2% +/- 3.2% (mean +/- SD). A significant correlation was obtained between the binding ratio (B/F) for dl-methadone and the total AAG concentration (r = 0.724; p less than 0.001). A multiple stepwise regression analysis showed that AAG was the main explanatory variable for the binding of the racemate. When concentrations of AAG variants were considered, a significant correlation was obtained between the binding ratio of dl-methadone and orosomucoid2 A concentration (r = 0.715; p less than 0.001), a weak correlation between dl-methadone and orosomucoid1 S concentration (r = 0.494; p less than 0.001), and no correlation between dl-methadone and orosomucoid1 F1 concentration (r = 0.049; not significant). Similar findings were obtained with the enantiomers. This study shows the importance of considering not only total AAG but also concentrations of AAG variants when measuring the binding of methadone and possibly of other drugs in plasma.