67 resultados para Atheoretical regression trees
em Université de Lausanne, Switzerland
Resumo:
Background Individual signs and symptoms are of limited value for the diagnosis of influenza. Objective To develop a decision tree for the diagnosis of influenza based on a classification and regression tree (CART) analysis. Methods Data from two previous similar cohort studies were assembled into a single dataset. The data were randomly divided into a development set (70%) and a validation set (30%). We used CART analysis to develop three models that maximize the number of patients who do not require diagnostic testing prior to treatment decisions. The validation set was used to evaluate overfitting of the model to the training set. Results Model 1 has seven terminal nodes based on temperature, the onset of symptoms and the presence of chills, cough and myalgia. Model 2 was a simpler tree with only two splits based on temperature and the presence of chills. Model 3 was developed with temperature as a dichotomous variable (≥38°C) and had only two splits based on the presence of fever and myalgia. The area under the receiver operating characteristic curves (AUROCC) for the development and validation sets, respectively, were 0.82 and 0.80 for Model 1, 0.75 and 0.76 for Model 2 and 0.76 and 0.77 for Model 3. Model 2 classified 67% of patients in the validation group into a high- or low-risk group compared with only 38% for Model 1 and 54% for Model 3. Conclusions A simple decision tree (Model 2) classified two-thirds of patients as low or high risk and had an AUROCC of 0.76. After further validation in an independent population, this CART model could support clinical decision making regarding influenza, with low-risk patients requiring no further evaluation for influenza and high-risk patients being candidates for empiric symptomatic or drug therapy.
Resumo:
PURPOSE: According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. METHOD: About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). RESULTS: The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. CONCLUSION: Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables like soil gas radon measurements as well as more detailed geological information.
Resumo:
1. Species distribution modelling is used increasingly in both applied and theoretical research to predict how species are distributed and to understand attributes of species' environmental requirements. In species distribution modelling, various statistical methods are used that combine species occurrence data with environmental spatial data layers to predict the suitability of any site for that species. While the number of data sharing initiatives involving species' occurrences in the scientific community has increased dramatically over the past few years, various data quality and methodological concerns related to using these data for species distribution modelling have not been addressed adequately. 2. We evaluated how uncertainty in georeferences and associated locational error in occurrences influence species distribution modelling using two treatments: (1) a control treatment where models were calibrated with original, accurate data and (2) an error treatment where data were first degraded spatially to simulate locational error. To incorporate error into the coordinates, we moved each coordinate with a random number drawn from the normal distribution with a mean of zero and a standard deviation of 5 km. We evaluated the influence of error on the performance of 10 commonly used distributional modelling techniques applied to 40 species in four distinct geographical regions. 3. Locational error in occurrences reduced model performance in three of these regions; relatively accurate predictions of species distributions were possible for most species, even with degraded occurrences. Two species distribution modelling techniques, boosted regression trees and maximum entropy, were the best performing models in the face of locational errors. The results obtained with boosted regression trees were only slightly degraded by errors in location, and the results obtained with the maximum entropy approach were not affected by such errors. 4. Synthesis and applications. To use the vast array of occurrence data that exists currently for research and management relating to the geographical ranges of species, modellers need to know the influence of locational error on model quality and whether some modelling techniques are particularly robust to error. We show that certain modelling techniques are particularly robust to a moderate level of locational error and that useful predictions of species distributions can be made even when occurrence data include some error.
Resumo:
Predictive species distribution modelling (SDM) has become an essential tool in biodiversity conservation and management. The choice of grain size (resolution) of environmental layers used in modelling is one important factor that may affect predictions. We applied 10 distinct modelling techniques to presence-only data for 50 species in five different regions, to test whether: (1) a 10-fold coarsening of resolution affects predictive performance of SDMs, and (2) any observed effects are dependent on the type of region, modelling technique, or species considered. Results show that a 10 times change in grain size does not severely affect predictions from species distribution models. The overall trend is towards degradation of model performance, but improvement can also be observed. Changing grain size does not equally affect models across regions, techniques, and species types. The strongest effect is on regions and species types, with tree species in the data sets (regions) with highest locational accuracy being most affected. Changing grain size had little influence on the ranking of techniques: boosted regression trees remain best at both resolutions. The number of occurrences used for model training had an important effect, with larger sample sizes resulting in better models, which tended to be more sensitive to grain. Effect of grain change was only noticeable for models reaching sufficient performance and/or with initial data that have an intrinsic error smaller than the coarser grain size.
Resumo:
BACKGROUND: We sought to improve upon previously published statistical modeling strategies for binary classification of dyslipidemia for general population screening purposes based on the waist-to-hip circumference ratio and body mass index anthropometric measurements. METHODS: Study subjects were participants in WHO-MONICA population-based surveys conducted in two Swiss regions. Outcome variables were based on the total serum cholesterol to high density lipoprotein cholesterol ratio. The other potential predictor variables were gender, age, current cigarette smoking, and hypertension. The models investigated were: (i) linear regression; (ii) logistic classification; (iii) regression trees; (iv) classification trees (iii and iv are collectively known as "CART"). Binary classification performance of the region-specific models was externally validated by classifying the subjects from the other region. RESULTS: Waist-to-hip circumference ratio and body mass index remained modest predictors of dyslipidemia. Correct classification rates for all models were 60-80%, with marked gender differences. Gender-specific models provided only small gains in classification. The external validations provided assurance about the stability of the models. CONCLUSIONS: There were no striking differences between either the algebraic (i, ii) vs. non-algebraic (iii, iv), or the regression (i, iii) vs. classification (ii, iv) modeling approaches. Anticipated advantages of the CART vs. simple additive linear and logistic models were less than expected in this particular application with a relatively small set of predictor variables. CART models may be more useful when considering main effects and interactions between larger sets of predictor variables.
Resumo:
An important statistical development of the last 30 years has been the advance in regression analysis provided by generalized linear models (GLMs) and generalized additive models (GAMs). Here we introduce a series of papers prepared within the framework of an international workshop entitled: Advances in GLMs/GAMs modeling: from species distribution to environmental management, held in Riederalp, Switzerland, 6-11 August 2001.We first discuss some general uses of statistical models in ecology, as well as provide a short review of several key examples of the use of GLMs and GAMs in ecological modeling efforts. We next present an overview of GLMs and GAMs, and discuss some of their related statistics used for predictor selection, model diagnostics, and evaluation. Included is a discussion of several new approaches applicable to GLMs and GAMs, such as ridge regression, an alternative to stepwise selection of predictors, and methods for the identification of interactions by a combined use of regression trees and several other approaches. We close with an overview of the papers and how we feel they advance our understanding of their application to ecological modeling.
Resumo:
It is estimated that around 230 people die each year due to radon (222Rn) exposure in Switzerland. 222Rn occurs mainly in closed environments like buildings and originates primarily from the subjacent ground. Therefore it depends strongly on geology and shows substantial regional variations. Correct identification of these regional variations would lead to substantial reduction of 222Rn exposure of the population based on appropriate construction of new and mitigation of already existing buildings. Prediction of indoor 222Rn concentrations (IRC) and identification of 222Rn prone areas is however difficult since IRC depend on a variety of different variables like building characteristics, meteorology, geology and anthropogenic factors. The present work aims at the development of predictive models and the understanding of IRC in Switzerland, taking into account a maximum of information in order to minimize the prediction uncertainty. The predictive maps will be used as a decision-support tool for 222Rn risk management. The construction of these models is based on different data-driven statistical methods, in combination with geographical information systems (GIS). In a first phase we performed univariate analysis of IRC for different variables, namely the detector type, building category, foundation, year of construction, the average outdoor temperature during measurement, altitude and lithology. All variables showed significant associations to IRC. Buildings constructed after 1900 showed significantly lower IRC compared to earlier constructions. We observed a further drop of IRC after 1970. In addition to that, we found an association of IRC with altitude. With regard to lithology, we observed the lowest IRC in sedimentary rocks (excluding carbonates) and sediments and the highest IRC in the Jura carbonates and igneous rock. The IRC data was systematically analyzed for potential bias due to spatially unbalanced sampling of measurements. In order to facilitate the modeling and the interpretation of the influence of geology on IRC, we developed an algorithm based on k-medoids clustering which permits to define coherent geological classes in terms of IRC. We performed a soil gas 222Rn concentration (SRC) measurement campaign in order to determine the predictive power of SRC with respect to IRC. We found that the use of SRC is limited for IRC prediction. The second part of the project was dedicated to predictive mapping of IRC using models which take into account the multidimensionality of the process of 222Rn entry into buildings. We used kernel regression and ensemble regression tree for this purpose. We could explain up to 33% of the variance of the log transformed IRC all over Switzerland. This is a good performance compared to former attempts of IRC modeling in Switzerland. As predictor variables we considered geographical coordinates, altitude, outdoor temperature, building type, foundation, year of construction and detector type. Ensemble regression trees like random forests allow to determine the role of each IRC predictor in a multidimensional setting. We found spatial information like geology, altitude and coordinates to have stronger influences on IRC than building related variables like foundation type, building type and year of construction. Based on kernel estimation we developed an approach to determine the local probability of IRC to exceed 300 Bq/m3. In addition to that we developed a confidence index in order to provide an estimate of uncertainty of the map. All methods allow an easy creation of tailor-made maps for different building characteristics. Our work is an essential step towards a 222Rn risk assessment which accounts at the same time for different architectural situations as well as geological and geographical conditions. For the communication of 222Rn hazard to the population we recommend to make use of the probability map based on kernel estimation. The communication of 222Rn hazard could for example be implemented via a web interface where the users specify the characteristics and coordinates of their home in order to obtain the probability to be above a given IRC with a corresponding index of confidence. Taking into account the health effects of 222Rn, our results have the potential to substantially improve the estimation of the effective dose from 222Rn delivered to the Swiss population.
Resumo:
Aberrant blood vessels enable tumor growth, provide a barrier to immune infiltration, and serve as a source of protumorigenic signals. Targeting tumor blood vessels for destruction, or tumor vascular disruption therapy, can therefore provide significant therapeutic benefit. Here, we describe the ability of chimeric antigen receptor (CAR)-bearing T cells to recognize human prostate-specific membrane antigen (hPSMA) on endothelial targets in vitro as well as in vivo. CAR T cells were generated using the anti-PSMA scFv, J591, and the intracellular signaling domains: CD3ζ, CD28, and/or CD137/4-1BB. We found that all anti-hPSMA CAR T cells recognized and eliminated PSMA(+) endothelial targets in vitro, regardless of the signaling domain. T cells bearing the third-generation anti-hPSMA CAR, P28BBζ, were able to recognize and kill primary human endothelial cells isolated from gynecologic cancers. In addition, the P28BBζ CAR T cells mediated regression of hPSMA-expressing vascular neoplasms in mice. Finally, in murine models of ovarian cancers populated by murine vessels expressing hPSMA, the P28BBζ CAR T cells were able to ablate PSMA(+) vessels, cause secondary depletion of tumor cells, and reduce tumor burden. Taken together, these results provide a strong rationale for the use of CAR T cells as agents of tumor vascular disruption, specifically those targeting PSMA. Cancer Immunol Res; 3(1); 68-84. ©2014 AACR.
Resumo:
Purpose: Recent reports have suggested that intraabdominal postoperative infection is associated with higher rates of overall and local recurrence and cancer-specific mortality. However, the mechanisms responsible for this association are unknown. We hypothesized that the greater inflammatory response in patients with postoperative intraabdominal infection is associated to an increase in local and systemic angiogenesis. Methods: We designed a prospective cohorts study with matched controls. Patients with postoperative intra-abdominal infection (abscess and/or anastomotic leakage) (group 1; n=17) after elective colorectal cancer resection operated on for cure were compared to patients with an uncomplicated postoperative course (group 2; n=17). IL-6 and VEGF levels were determined by ELISA in serum and peritoneal fluid at baseline, 48 hours and postoperative day 4 or at the time the peritoneal infection occurred. Results: No differences were observed in age, gender, preoperative CEA, tumor stage and location and type of procedure performed. Although there were no differences in serum IL-6 levels at 48 hours, this pro-inflammatory cytokine was higher in group 1 on postoperative day 4 (group 1: 21533 + 27900 vs. group 2: 1130 + 3563 pg/ml; p < 0.001). Serum VEGF levels were higher in group 1 on postoperative day 4 (group 1: 1212 + 1025 vs. group 2: 408 + 407 pg/ml; p < 0.01). Peritoneal fluid VEGF levels were also higher in group 1 at 48 hours (group 1: 4857 + 4384 vs. group 2: 630 + 461 pg/ml; p < 0.001) and postoperative day 4 (group 1: 32807 + 98486 vs. group 2: 1002 + 1229 pg/ml; p < 0.001). A positive correlation between serum IL-6 and VEGF serum levels was observed on postoperative day 4 (r=0.7; p<0.01). Conclusions: These results suggest that not only the inflammatory response but also the angiogenic pathways are stimulated in patients with intra-abdominal infection after surgery for colorectal cancer. The implications of this finding on long-term follow-up need to be evaluated.
Resumo:
PURPOSE: To present the long-term follow-up of 10 adolescents and young adults with documented cognitive and behavioral regression as children due to nonlesional focal, mainly frontal, epilepsy with continuous spike-waves during slow wave sleep (CSWS). METHODS: Past medical and electroencephalography (EEG) data were reviewed and neuropsychological tests exploring main cognitive functions were administered. KEY FINDINGS: After a mean duration of follow-up of 15.6 years (range, 8-23 years), none of the 10 patients had recovered fully, but four regained borderline to normal intelligence and were almost independent. Patients with prolonged global intellectual regression had the worst outcome, whereas those with more specific and short-lived deficits recovered best. The marked behavioral disorders resolved in all but one patient. Executive functions were neither severely nor homogenously affected. Three patients with a frontal syndrome during the active phase (AP) disclosed only mild residual executive and social cognition deficits. The main cognitive gains occurred shortly after the AP, but qualitative improvements continued to occur. Long-term outcome correlated best with duration of CSWS. SIGNIFICANCE: Our findings emphasize that cognitive recovery after cessation of CSWS depends on the severity and duration of the initial regression. None of our patients had major executive and social cognition deficits with preserved intelligence, as reported in adults with early destructive lesions of the frontal lobes. Early recognition of epilepsy with CSWS and rapid introduction of effective therapy are crucial for a best possible outcome.
Resumo:
The role of land cover change as a significant component of global change has become increasingly recognized in recent decades. Large databases measuring land cover change, and the data which can potentially be used to explain the observed changes, are also becoming more commonly available. When developing statistical models to investigate observed changes, it is important to be aware that the chosen sampling strategy and modelling techniques can influence results. We present a comparison of three sampling strategies and two forms of grouped logistic regression models (multinomial and ordinal) in the investigation of patterns of successional change after agricultural land abandonment in Switzerland. Results indicated that both ordinal and nominal transitional change occurs in the landscape and that the use of different sampling regimes and modelling techniques as investigative tools yield different results. Synthesis and applications. Our multimodel inference identified successfully a set of consistently selected indicators of land cover change, which can be used to predict further change, including annual average temperature, the number of already overgrown neighbouring areas of land and distance to historically destructive avalanche sites. This allows for more reliable decision making and planning with respect to landscape management. Although both model approaches gave similar results, ordinal regression yielded more parsimonious models that identified the important predictors of land cover change more efficiently. Thus, this approach is favourable where land cover change pattern can be interpreted as an ordinal process. Otherwise, multinomial logistic regression is a viable alternative.
Resumo:
A 28-month-old boy was referred for acute onset of abnormal head movements. History revealed an insidious progressive regression in behaviour and communication over several months. Head and shoulder 'spasms' with alteration of consciousness and on one occasion ictal laughter were seen. The electroencephalograph (EEG) showed repeated bursts of brief generalized polyspikes and spike-wave during the 'spasms', followed by flattening, a special pattern which never recurred after treatment. Review of family videos showed a single 'minor' identical seizure 6 months previously. Magnetic resonance imaging was normal. Clonazepam brought immediate cessation of seizures, normalization of the EEG and a parallel spectacular improvement in communication, mood and language. Follow-up over the next 10 months showed a new regression unaccompained by recognized seizures, although numerous seizures were discovered during the videotaped neuropsychological examination, when stereotyped subtle brief paroxysmal changes in posture and behaviour could be studied in slow motion and compared with the 'prototypical' initial ones. The EEG showed predominant rare left-sided fronto-temporal discharges. Clonazepam was changed to carbamazepin with marked improvement in behaviour, language and cognition which has been sustained up to the last control at 51 months. Videotaped home observations allowed the documentation of striking qualitative and quantitative variations in social interaction and play of autistic type in relation to the epileptic activity. We conclude that this child has a special characteristic epileptic syndrome with subtle motor and vegetative symptomatology associated with an insidious catastrophic 'autistic-like' regression which could be overlooked. The methods used to document such fluctuating epileptic behavioural manifestations are discussed.
Resumo:
In this paper we included a very broad representation of grass family diversity (84% of tribes and 42% of genera). Phylogenetic inference was based on three plastid DNA regions rbcL, matK and trnL-F, using maximum parsimony and Bayesian methods. Our results resolved most of the subfamily relationships within the major clades (BEP and PACCMAD), which had previously been unclear, such as, among others the: (i) BEP and PACCMAD sister relationship, (ii) composition of clades and the sister-relationship of Ehrhartoideae and Bambusoideae + Pooideae, (iii) paraphyly of tribe Bambuseae, (iv) position of Gynerium as sister to Panicoideae, (v) phylogenetic position of Micrairoideae. With the presence of a relatively large amount of missing data, we were able to increase taxon sampling substantially in our analyses from 107 to 295 taxa. However, bootstrap support and to a lesser extent Bayesian inference posterior probabilities were generally lower in analyses involving missing data than those not including them. We produced a fully resolved phylogenetic summary tree for the grass family at subfamily level and indicated the most likely relationships of all included tribes in our analysis.