50 resultados para Multinomial logit models with random coefficients (RCL)
Resumo:
1. The ecological niche is a fundamental biological concept. Modelling species' niches is central to numerous ecological applications, including predicting species invasions, identifying reservoirs for disease, nature reserve design and forecasting the effects of anthropogenic and natural climate change on species' ranges. 2. A computational analogue of Hutchinson's ecological niche concept (the multidimensional hyperspace of species' environmental requirements) is the support of the distribution of environments in which the species persist. Recently developed machine-learning algorithms can estimate the support of such high-dimensional distributions. We show how support vector machines can be used to map ecological niches using only observations of species presence to train distribution models for 106 species of woody plants and trees in a montane environment using up to nine environmental covariates. 3. We compared the accuracy of three methods that differ in their approaches to reducing model complexity. We tested models with independent observations of both species presence and species absence. We found that the simplest procedure, which uses all available variables and no pre-processing to reduce correlation, was best overall. Ecological niche models based on support vector machines are theoretically superior to models that rely on simulating pseudo-absence data and are comparable in empirical tests. 4. Synthesis and applications. Accurate species distribution models are crucial for effective environmental planning, management and conservation, and for unravelling the role of the environment in human health and welfare. Models based on distribution estimation rather than classification overcome theoretical and practical obstacles that pervade species distribution modelling. In particular, ecological niche models based on machine-learning algorithms for estimating the support of a statistical distribution provide a promising new approach to identifying species' potential distributions and to project changes in these distributions as a result of climate change, land use and landscape alteration.
Resumo:
Genetic variants influence the risk to develop certain diseases or give rise to differences in drug response. Recent progresses in cost-effective, high-throughput genome-wide techniques, such as microarrays measuring Single Nucleotide Polymorphisms (SNPs), have facilitated genotyping of large clinical and population cohorts. Combining the massive genotypic data with measurements of phenotypic traits allows for the determination of genetic differences that explain, at least in part, the phenotypic variations within a population. So far, models combining the most significant variants can only explain a small fraction of the variance, indicating the limitations of current models. In particular, researchers have only begun to address the possibility of interactions between genotypes and the environment. Elucidating the contributions of such interactions is a difficult task because of the large number of genetic as well as possible environmental factors.In this thesis, I worked on several projects within this context. My first and main project was the identification of possible SNP-environment interactions, where the phenotypes were serum lipid levels of patients from the Swiss HIV Cohort Study (SHCS) treated with antiretroviral therapy. Here the genotypes consisted of a limited set of SNPs in candidate genes relevant for lipid transport and metabolism. The environmental variables were the specific combinations of drugs given to each patient over the treatment period. My work explored bioinformatic and statistical approaches to relate patients' lipid responses to these SNPs, drugs and, importantly, their interactions. The goal of this project was to improve our understanding and to explore the possibility of predicting dyslipidemia, a well-known adverse drug reaction of antiretroviral therapy. Specifically, I quantified how much of the variance in lipid profiles could be explained by the host genetic variants, the administered drugs and SNP-drug interactions and assessed the predictive power of these features on lipid responses. Using cross-validation stratified by patients, we could not validate our hypothesis that models that select a subset of SNP-drug interactions in a principled way have better predictive power than the control models using "random" subsets. Nevertheless, all models tested containing SNP and/or drug terms, exhibited significant predictive power (as compared to a random predictor) and explained a sizable proportion of variance, in the patient stratified cross-validation context. Importantly, the model containing stepwise selected SNP terms showed higher capacity to predict triglyceride levels than a model containing randomly selected SNPs. Dyslipidemia is a complex trait for which many factors remain to be discovered, thus missing from the data, and possibly explaining the limitations of our analysis. In particular, the interactions of drugs with SNPs selected from the set of candidate genes likely have small effect sizes which we were unable to detect in a sample of the present size (<800 patients).In the second part of my thesis, I performed genome-wide association studies within the Cohorte Lausannoise (CoLaus). I have been involved in several international projects to identify SNPs that are associated with various traits, such as serum calcium, body mass index, two-hour glucose levels, as well as metabolic syndrome and its components. These phenotypes are all related to major human health issues, such as cardiovascular disease. I applied statistical methods to detect new variants associated with these phenotypes, contributing to the identification of new genetic loci that may lead to new insights into the genetic basis of these traits. This kind of research will lead to a better understanding of the mechanisms underlying these pathologies, a better evaluation of disease risk, the identification of new therapeutic leads and may ultimately lead to the realization of "personalized" medicine.
Resumo:
Differential X-ray phase-contrast tomography (DPCT) refers to a class of promising methods for reconstructing the X-ray refractive index distribution of materials that present weak X-ray absorption contrast. The tomographic projection data in DPCT, from which an estimate of the refractive index distribution is reconstructed, correspond to one-dimensional (1D) derivatives of the two-dimensional (2D) Radon transform of the refractive index distribution. There is an important need for the development of iterative image reconstruction methods for DPCT that can yield useful images from few-view projection data, thereby mitigating the long data-acquisition times and large radiation doses associated with use of analytic reconstruction methods. In this work, we analyze the numerical and statistical properties of two classes of discrete imaging models that form the basis for iterative image reconstruction in DPCT. We also investigate the use of one of the models with a modern image reconstruction algorithm for performing few-view image reconstruction of a tissue specimen.
Resumo:
BACKGROUND: Workers with persistent disabilities after orthopaedic trauma may need occupational rehabilitation. Despite various risk profiles for non-return-to-work (non-RTW), there is no available predictive model. Moreover, injured workers may have various origins (immigrant workers), which may either affect their return to work or their eligibility for research purposes. The aim of this study was to develop and validate a predictive model that estimates the likelihood of non-RTW after occupational rehabilitation using predictors which do not rely on the worker's background. METHODS: Prospective cohort study (3177 participants, native (51%) and immigrant workers (49%)) with two samples: a) Development sample with patients from 2004 to 2007 with Full and Reduced Models, b) External validation of the Reduced Model with patients from 2008 to March 2010. We collected patients' data and biopsychosocial complexity with an observer rated interview (INTERMED). Non-RTW was assessed two years after discharge from the rehabilitation. Discrimination was assessed by the area under the receiver operating curve (AUC) and calibration was evaluated with a calibration plot. The model was reduced with random forests. RESULTS: At 2 years, the non-RTW status was known for 2462 patients (77.5% of the total sample). The prevalence of non-RTW was 50%. The full model (36 items) and the reduced model (19 items) had acceptable discrimination performance (AUC 0.75, 95% CI 0.72 to 0.78 and 0.74, 95% CI 0.71 to 0.76, respectively) and good calibration. For the validation model, the discrimination performance was acceptable (AUC 0.73; 95% CI 0.70 to 0.77) and calibration was also adequate. CONCLUSIONS: Non-RTW may be predicted with a simple model constructed with variables independent of the patient's education and language fluency. This model is useful for all kinds of trauma in order to adjust for case mix and it is applicable to vulnerable populations like immigrant workers.
Resumo:
The role of competition for light among plants has long been recognized at local scales, but its potential importance for plant species' distribution at larger spatial scales has largely been ignored. Tree cover acts as a modulator of local abiotic conditions, notably by reducing light availability below the canopy and thus the performance of species that are not adapted to low-light conditions. However, this local effect may propagate to coarser spatial grains. Using 6,935 vegetation plots located across the European Alps, we fit Generalized Linear Models (GLM) for the distribution of 960 herbs and shrubs species to assess the effect of tree cover at both plot and landscape grain sizes (~ 10-m and 1-km, respectively). We ran four models with different combinations of variables (climate, soil and tree cover) for each species at both spatial grains. We used partial regressions to evaluate the independent effects of plot- and landscape-scale tree cover on plant communities. Finally, the effects on species' elevational range limits were assessed by simulating a removal experiment comparing the species' distribution under high and low tree cover. Accounting for tree cover improved model performance, with shade-tolerant species increasing their probability of presence at high tree cover whereas shade-intolerant species showed the opposite pattern. The tree cover effect occurred consistently at both plot and landscape spatial grains, albeit strongest at the former. Importantly, tree cover at the two grain sizes had partially independent effects on plot-scale plant communities, suggesting that the effects may be transmitted to coarser grains through meta-community dynamics. At high tree cover, shade-intolerant species exhibited elevational range contractions, especially at their upper limit, whereas shade-tolerant species showed elevational range expansions at both limits. Our findings suggest that the range shifts for herb and shrub species may be modulated by tree cover dynamics.
Resumo:
Geophysical data may provide crucial information about hydrological properties, states, and processes that are difficult to obtain by other means. Large data sets can be acquired over widely different scales in a minimally invasive manner and at comparatively low costs, but their effective use in hydrology makes it necessary to understand the fidelity of geophysical models, the assumptions made in their construction, and the links between geophysical and hydrological properties. Geophysics has been applied for groundwater prospecting for almost a century, but it is only in the last 20 years that it is regularly used together with classical hydrological data to build predictive hydrological models. A largely unexplored venue for future work is to use geophysical data to falsify or rank competing conceptual hydrological models. A promising cornerstone for such a model selection strategy is the Bayes factor, but it can only be calculated reliably when considering the main sources of uncertainty throughout the hydrogeophysical parameter estimation process. Most classical geophysical imaging tools tend to favor models with smoothly varying property fields that are at odds with most conceptual hydrological models of interest. It is thus necessary to account for this bias or use alternative approaches in which proposed conceptual models are honored at all steps in the model building process.
Resumo:
Maximum entropy modeling (Maxent) is a widely used algorithm for predicting species distributions across space and time. Properly assessing the uncertainty in such predictions is non-trivial and requires validation with independent datasets. Notably, model complexity (number of model parameters) remains a major concern in relation to overfitting and, hence, transferability of Maxent models. An emerging approach is to validate the cross-temporal transferability of model predictions using paleoecological data. In this study, we assess the effect of model complexity on the performance of Maxent projections across time using two European plant species (Alnus giutinosa (L.) Gaertn. and Corylus avellana L) with an extensive late Quaternary fossil record in Spain as a study case. We fit 110 models with different levels of complexity under present time and tested model performance using AUC (area under the receiver operating characteristic curve) and AlCc (corrected Akaike Information Criterion) through the standard procedure of randomly partitioning current occurrence data. We then compared these results to an independent validation by projecting the models to mid-Holocene (6000 years before present) climatic conditions in Spain to assess their ability to predict fossil pollen presence-absence and abundance. We find that calibrating Maxent models with default settings result in the generation of overly complex models. While model performance increased with model complexity when predicting current distributions, it was higher with intermediate complexity when predicting mid-Holocene distributions. Hence, models of intermediate complexity resulted in the best trade-off to predict species distributions across time. Reliable temporal model transferability is especially relevant for forecasting species distributions under future climate change. Consequently, species-specific model tuning should be used to find the best modeling settings to control for complexity, notably with paleoecological data to independently validate model projections. For cross-temporal projections of species distributions for which paleoecological data is not available, models of intermediate complexity should be selected.
Resumo:
PURPOSE: According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. METHOD: About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). RESULTS: The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. CONCLUSION: Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables like soil gas radon measurements as well as more detailed geological information.
Resumo:
BACKGROUND: Theory of mind (ToM), the capacity to infer the intention, beliefs and emotional states of others, is frequently impaired in behavioural variant fronto-temporal dementia patients (bv-FTDp); however, its impact on caregiver burden is unexplored. SETTING: National Institute of Neurological Disorders and Stroke, National Institutes of Health. SUBJECTS: bv-FTDp (n = 28), a subgroup of their caregivers (n = 20) and healthy controls (n = 32). METHODS: we applied a faux-pas (FP) task as a ToM measure in bv-FTDp and healthy controls and the Zarit Burden Interview as a measure of burden in patients' caregivers. Patients underwent structural MRI; we used voxel-based morphometry to examine relationships between regional atrophy and ToM impairment and caregiver burden. RESULTS: FP task performance was impaired in bv-FTDp and negatively associated with caregiver burden. Atrophy was found in areas involved in ToM. Caregiver burden increased with greater atrophy in left lateral premotor cortex, a region associated in animal models with the presence of mirror neurons, possibly involved in empathy. CONCLUSION: ToM impairment in bv-FTDp is associated with increased caregiver burden.
Resumo:
BACKGROUND: Diagnosing pediatric pneumonia is challenging in low-resource settings. The World Health Organization (WHO) has defined primary end-point radiological pneumonia for use in epidemiological and vaccine studies. However, radiography requires expertise and is often inaccessible. We hypothesized that plasma biomarkers of inflammation and endothelial activation may be useful surrogates for end-point pneumonia, and may provide insight into its biological significance. METHODS: We studied children with WHO-defined clinical pneumonia (n = 155) within a prospective cohort of 1,005 consecutive febrile children presenting to Tanzanian outpatient clinics. Based on x-ray findings, participants were categorized as primary end-point pneumonia (n = 30), other infiltrates (n = 31), or normal chest x-ray (n = 94). Plasma levels of 7 host response biomarkers at presentation were measured by ELISA. Associations between biomarker levels and radiological findings were assessed by Kruskal-Wallis test and multivariable logistic regression. Biomarker ability to predict radiological findings was evaluated using receiver operating characteristic curve analysis and Classification and Regression Tree analysis. RESULTS: Compared to children with normal x-ray, children with end-point pneumonia had significantly higher C-reactive protein, procalcitonin and Chitinase 3-like-1, while those with other infiltrates had elevated procalcitonin and von Willebrand Factor and decreased soluble Tie-2 and endoglin. Clinical variables were not predictive of radiological findings. Classification and Regression Tree analysis generated multi-marker models with improved performance over single markers for discriminating between groups. A model based on C-reactive protein and Chitinase 3-like-1 discriminated between end-point pneumonia and non-end-point pneumonia with 93.3% sensitivity (95% confidence interval 76.5-98.8), 80.8% specificity (72.6-87.1), positive likelihood ratio 4.9 (3.4-7.1), negative likelihood ratio 0.083 (0.022-0.32), and misclassification rate 0.20 (standard error 0.038). CONCLUSIONS: In Tanzanian children with WHO-defined clinical pneumonia, combinations of host biomarkers distinguished between end-point pneumonia, other infiltrates, and normal chest x-ray, whereas clinical variables did not. These findings generate pathophysiological hypotheses and may have potential research and clinical utility.
Resumo:
OBJECTIVE: We examined the influence of clinical, radiologic, and echocardiographic characteristics on antithrombotic choice in patients with cryptogenic stroke (CS) and patent foramen ovale (PFO), hypothesizing that features suggestive of paradoxical embolism might lead to greater use of anticoagulation. METHODS: The Risk of Paradoxical Embolism Study combined 12 databases to create the largest dataset of patients with CS and known PFO status. We used generalized linear mixed models with a random effect of component study to explore whether anticoagulation was preferentially selected based on the following: (1) younger age and absence of vascular risk factors, (2) "high-risk" echocardiographic features, and (3) neuroradiologic findings. RESULTS: A total of 1,132 patients with CS and PFO treated with anticoagulation or antiplatelets were included. Overall, 438 participants (39%) were treated with anticoagulation with a range (by database) of 22% to 54%. Treatment choice was not influenced by age or vascular risk factors. However, neuroradiologic findings (superficial or multiple infarcts) and high-risk echocardiographic features (large shunts, shunt at rest, and septal hypermobility) were predictors of anticoagulation use. CONCLUSION: Both antithrombotic regimens are widely used for secondary stroke prevention in patients with CS and PFO. Radiologic and echocardiographic features were strongly associated with treatment choice, whereas conventional vascular risk factors were not. Prior observational studies are likely to be biased by confounding by indication.
Resumo:
BACKGROUND: The purpose of this study was to confirm the prognostic value of pancreatic stone protein (PSP) in patients with severe infections requiring ICU management and to develop and validate a model to enhance mortality prediction by combining severity scores with biomarkers. METHODS: We enrolled prospectively patients with severe sepsis or septic shock in mixed tertiary ICUs in Switzerland (derivation cohort) and Brazil (validation cohort). Severity scores (APACHE [Acute Physiology and Chronic Health Evaluation] II or Simplified Acute Physiology Score [SAPS] II) were combined with biomarkers obtained at the time of diagnosis of sepsis, including C-reactive-protein, procalcitonin (PCT), and PSP. Logistic regression models with the lowest prediction errors were selected to predict in-hospital mortality. RESULTS: Mortality rates of patients with septic shock enrolled in the derivation cohort (103 out of 158) and the validation cohort (53 out of 91) were 37% and 57%, respectively. APACHE II and PSP were significantly higher in dying patients. In the derivation cohort, the models combining either APACHE II, PCT, and PSP (area under the receiver operating characteristic curve [AUC], 0.721; 95% CI, 0.632-0.812) or SAPS II, PCT, and PSP (AUC, 0.710; 95% CI, 0.617-0.802) performed better than each individual biomarker (AUC PCT, 0.534; 95% CI, 0.433-0.636; AUC PSP, 0.665; 95% CI, 0.572-0.758) or severity score (AUC APACHE II, 0.638; 95% CI, 0.543-0.733; AUC SAPS II, 0.598; 95% CI, 0.499-0.698). These models were externally confirmed in the independent validation cohort. CONCLUSIONS: We confirmed the prognostic value of PSP in patients with severe sepsis and septic shock requiring ICU management. A model combining severity scores with PCT and PSP improves mortality prediction in these patients.
Resumo:
We propose a task for eliciting attitudes toward risk that is close to real-world risky decisions which typically involve gains and losses. The task consists of accepting or rejecting gambles that provide a gain with probability p and a loss with probability 1−p . We employ finite mixture models to uncover heterogeneity in risk preferences and find that (i) behavior is heterogeneous, with one half of the subjects behaving as expected utility maximizers, (ii) for the others, reference-dependent models perform better than those where subjects derive utility from final outcomes, (iii) models with sign-dependent decision weights perform better than those without, and (iv) there is no evidence for loss aversion. The procedure is sufficiently simple so that it can be easily used in field or lab experiments where risk elicitation is not the main experiment.
Resumo:
PURPOSE: This study aims to describe emotional distress and quality of life (QoL) of patients at different phases of their lung cancer and the association with their family physician (FP) involvement. METHODS: A prospective study on patients with lung cancer was conducted in three regions of Quebec, Canada. Patients completed, at baseline, several validated questionnaires regarding their psychosocial characteristics and their perceived level of FP involvement. Emotional distress [profile of mood states (POMS)] and QoL [European Organization for Research and Treatment of Cancer Quality of Life Core 30 (EORTC QLQ-C30)] were reassessed every 3-6 months, whether patients had metastasis or not, up to 18 months. Results were regrouped according to cancer phase. Mixed models with repeated measurements were performed to identify variation in distress and QoL. RESULTS: In this cohort of 395 patients, distress was low at diagnosis (0.79 ± 0.7 on a 0-4 scale), raising to 1.36 ± 0.8 at the advance phase (p < 0.0001). Patient's global QoL scores significantly decreased from the diagnosis to the advance phase (from 66 to 45 on a 0-100 scale; p < 0.0001). At all phases of cancer, FP involvement was significantly associated with patients' distress (p = 0.0004) and their global perception of QoL (p = 0.0080). These associations remained statistically significant even after controlling for age, gender, and presence of metastases. CONCLUSIONS: This study provides new knowledge on patients' emotional distress and QoL with cancer evolution and, particularly, their association with FP involvement. Other studies should be conducted to further explore FP role in cancer supportive care.
Resumo:
Previous clinical observations and data from mouse models with defects in lipid metabolism suggested that epineurial adipocytes may play a role in peripheral nervous system myelination. We have used adipocyte-specific Lpin1 knockout mice to characterize the consequences of the presence of impaired epineurial adipocytes on the myelinating peripheral nerve. Our data revealed that the capacity of Schwann cells to establish myelin, and the functional properties of peripheral nerves, were not affected by compromised epineurial adipocytes in adipocyte-specific Lpin1 knockout mice. To evaluate the possibility that Lpin1-negative adipocytes are still able to support endoneurial Schwann cells, we also characterized sciatic nerves from mice carrying epiblast-specific deletion of peroxisome proliferator-activated receptor gamma, which develop general lipoatrophy. Interestingly, even the complete loss of adipocytes in the epineurium of peroxisome proliferator-activated receptor gamma knockout mice did not lead to detectable defects in Schwann cell myelination. However, probably as a consequence of their hyperglycemia, these mice have reduced nerve conduction velocity, thus mimicking the phenotype observed under diabetic condition. Together, our data indicate that while adipocytes, as regulators of lipid and glucose homeostasis, play a role in nerve function, their presence in epineurium is not essential for establishment or maintenance of proper myelin.