Biblioteca Digital

20 resultados para MAXIMUM ENTROPY METHOD (MAXENT)

em Université de Lausanne, Switzerland

Advanced mapping of environmental data: Geostatistics, Machine Learning and Bayesian Maximum Entropy

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This book combines geostatistics and global mapping systems to present an up-to-the-minute study of environmental data. Featuring numerous case studies, the reference covers model dependent (geostatistics) and data driven (machine learning algorithms) analysis techniques such as risk mapping, conditional stochastic simulations, descriptions of spatial uncertainty and variability, artificial neural networks (ANN) for spatial data, Bayesian maximum entropy (BME), and more.

Impact of model complexity on cross-temporal transferability in Maxent species distribution models: An assessment using paleobotanical data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Maximum entropy modeling (Maxent) is a widely used algorithm for predicting species distributions across space and time. Properly assessing the uncertainty in such predictions is non-trivial and requires validation with independent datasets. Notably, model complexity (number of model parameters) remains a major concern in relation to overfitting and, hence, transferability of Maxent models. An emerging approach is to validate the cross-temporal transferability of model predictions using paleoecological data. In this study, we assess the effect of model complexity on the performance of Maxent projections across time using two European plant species (Alnus giutinosa (L.) Gaertn. and Corylus avellana L) with an extensive late Quaternary fossil record in Spain as a study case. We fit 110 models with different levels of complexity under present time and tested model performance using AUC (area under the receiver operating characteristic curve) and AlCc (corrected Akaike Information Criterion) through the standard procedure of randomly partitioning current occurrence data. We then compared these results to an independent validation by projecting the models to mid-Holocene (6000 years before present) climatic conditions in Spain to assess their ability to predict fossil pollen presence-absence and abundance. We find that calibrating Maxent models with default settings result in the generation of overly complex models. While model performance increased with model complexity when predicting current distributions, it was higher with intermediate complexity when predicting mid-Holocene distributions. Hence, models of intermediate complexity resulted in the best trade-off to predict species distributions across time. Reliable temporal model transferability is especially relevant for forecasting species distributions under future climate change. Consequently, species-specific model tuning should be used to find the best modeling settings to control for complexity, notably with paleoecological data to independently validate model projections. For cross-temporal projections of species distributions for which paleoecological data is not available, models of intermediate complexity should be selected.

Active Learning of Very-High Resolution Optical Imagery with SVM: Entropy vs Margin Sampling

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An active learning method is proposed for the semi-automatic selection of training sets in remote sensing image classification. The method adds iteratively to the current training set the unlabeled pixels for which the prediction of an ensemble of classifiers based on bagged training sets show maximum entropy. This way, the algorithm selects the pixels that are the most uncertain and that will improve the model if added in the training set. The user is asked to label such pixels at each iteration. Experiments using support vector machines (SVM) on an 8 classes QuickBird image show the excellent performances of the methods, that equals accuracies of both a model trained with ten times more pixels and a model whose training set has been built using a state-of-the-art SVM specific active learning method

BME-based uncertainty assessment of the Chernobyl fallout

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The vast territories that have been radioactively contaminated during the 1986 Chernobyl accident provide a substantial data set of radioactive monitoring data, which can be used for the verification and testing of the different spatial estimation (prediction) methods involved in risk assessment studies. Using the Chernobyl data set for such a purpose is motivated by its heterogeneous spatial structure (the data are characterized by large-scale correlations, short-scale variability, spotty features, etc.). The present work is concerned with the application of the Bayesian Maximum Entropy (BME) method to estimate the extent and the magnitude of the radioactive soil contamination by 137Cs due to the Chernobyl fallout. The powerful BME method allows rigorous incorporation of a wide variety of knowledge bases into the spatial estimation procedure leading to informative contamination maps. Exact measurements (?hard? data) are combined with secondary information on local uncertainties (treated as ?soft? data) to generate science-based uncertainty assessment of soil contamination estimates at unsampled locations. BME describes uncertainty in terms of the posterior probability distributions generated across space, whereas no assumption about the underlying distribution is made and non-linear estimators are automatically incorporated. Traditional estimation variances based on the assumption of an underlying Gaussian distribution (analogous, e.g., to the kriging variance) can be derived as a special case of the BME uncertainty analysis. The BME estimates obtained using hard and soft data are compared with the BME estimates obtained using only hard data. The comparison involves both the accuracy of the estimation maps using the exact data and the assessment of the associated uncertainty using repeated measurements. Furthermore, a comparison of the spatial estimation accuracy obtained by the two methods was carried out using a validation data set of hard data. Finally, a separate uncertainty analysis was conducted that evaluated the ability of the posterior probabilities to reproduce the distribution of the raw repeated measurements available in certain populated sites. The analysis provides an illustration of the improvement in mapping accuracy obtained by adding soft data to the existing hard data and, in general, demonstrates that the BME method performs well both in terms of estimation accuracy as well as in terms estimation error assessment, which are both useful features for the Chernobyl fallout study.

Effects of sample size on the performance of species distribution models

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A wide range of modelling algorithms is used by ecologists, conservation practitioners, and others to predict species ranges from point locality data. Unfortunately, the amount of data available is limited for many taxa and regions, making it essential to quantify the sensitivity of these algorithms to sample size. This is the first study to address this need by rigorously evaluating a broad suite of algorithms with independent presence-absence data from multiple species and regions. We evaluated predictions from 12 algorithms for 46 species (from six different regions of the world) at three sample sizes (100, 30, and 10 records). We used data from natural history collections to run the models, and evaluated the quality of model predictions with area under the receiver operating characteristic curve (AUC). With decreasing sample size, model accuracy decreased and variability increased across species and between models. Novel modelling methods that incorporate both interactions between predictor variables and complex response shapes (i.e. GBM, MARS-INT, BRUTO) performed better than most methods at large sample sizes but not at the smallest sample sizes. Other algorithms were much less sensitive to sample size, including an algorithm based on maximum entropy (MAXENT) that had among the best predictive power across all sample sizes. Relative to other algorithms, a distance metric algorithm (DOMAIN) and a genetic algorithm (OM-GARP) had intermediate performance at the largest sample size and among the best performance at the lowest sample size. No algorithm predicted consistently well with small sample size (n < 30) and this should encourage highly conservative use of predictions based on small sample size and restrict their use to exploratory modelling.

Novel method to estimate the phenotypic variation explained by genome-wide association studies reveals large fraction of the missing heritability.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Genome-wide association studies (GWAS) are conducted with the promise to discover novel genetic variants associated with diverse traits. For most traits, associated markers individually explain just a modest fraction of the phenotypic variation, but their number can well be in the hundreds. We developed a maximum likelihood method that allows us to infer the distribution of associated variants even when many of them were missed by chance. Compared to previous approaches, the novelty of our method is that it (a) does not require having an independent (unbiased) estimate of the effect sizes; (b) makes use of the complete distribution of P-values while allowing for the false discovery rate; (c) takes into account allelic heterogeneity and the SNP pruning strategy. We applied our method to the latest GWAS meta-analysis results of the GIANT consortium. It revealed that while the explained variance of genome-wide (GW) significant SNPs is around 1% for waist-hip ratio (WHR), the observed P-values provide evidence for the existence of variants explaining 10% (CI=[8.5-11.5%]) of the phenotypic variance in total. Similarly, the total explained variance likely to exist for height is estimated to be 29% (CI=[28-30%]), three times higher than what the observed GW significant SNPs give rise to. This methodology also enables us to predict the benefit of future GWA studies that aim to reveal more associated genetic markers via increased sample size.

Evaluating the influence of spatial uncertainty in locality points for species distributional modeling

Relevância:

100.00% 100.00%

Publicador:

Resumo:

1. Species distribution modelling is used increasingly in both applied and theoretical research to predict how species are distributed and to understand attributes of species' environmental requirements. In species distribution modelling, various statistical methods are used that combine species occurrence data with environmental spatial data layers to predict the suitability of any site for that species. While the number of data sharing initiatives involving species' occurrences in the scientific community has increased dramatically over the past few years, various data quality and methodological concerns related to using these data for species distribution modelling have not been addressed adequately. 2. We evaluated how uncertainty in georeferences and associated locational error in occurrences influence species distribution modelling using two treatments: (1) a control treatment where models were calibrated with original, accurate data and (2) an error treatment where data were first degraded spatially to simulate locational error. To incorporate error into the coordinates, we moved each coordinate with a random number drawn from the normal distribution with a mean of zero and a standard deviation of 5 km. We evaluated the influence of error on the performance of 10 commonly used distributional modelling techniques applied to 40 species in four distinct geographical regions. 3. Locational error in occurrences reduced model performance in three of these regions; relatively accurate predictions of species distributions were possible for most species, even with degraded occurrences. Two species distribution modelling techniques, boosted regression trees and maximum entropy, were the best performing models in the face of locational errors. The results obtained with boosted regression trees were only slightly degraded by errors in location, and the results obtained with the maximum entropy approach were not affected by such errors. 4. Synthesis and applications. To use the vast array of occurrence data that exists currently for research and management relating to the geographical ranges of species, modellers need to know the influence of locational error on model quality and whether some modelling techniques are particularly robust to error. We show that certain modelling techniques are particularly robust to a moderate level of locational error and that useful predictions of species distributions can be made even when occurrence data include some error.

5.6

Relevância:

100.00% 100.00%

Publicador:

What do we gain from simplicity versus complexity in species distribution models?

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Species distribution models (SDMs) are widely used to explain and predict species ranges and environmental niches. They are most commonly constructed by inferring species' occurrence-environment relationships using statistical and machine-learning methods. The variety of methods that can be used to construct SDMs (e.g. generalized linear/additive models, tree-based models, maximum entropy, etc.), and the variety of ways that such models can be implemented, permits substantial flexibility in SDM complexity. Building models with an appropriate amount of complexity for the study objectives is critical for robust inference. We characterize complexity as the shape of the inferred occurrence-environment relationships and the number of parameters used to describe them, and search for insights into whether additional complexity is informative or superfluous. By building 'under fit' models, having insufficient flexibility to describe observed occurrence-environment relationships, we risk misunderstanding the factors shaping species distributions. By building 'over fit' models, with excessive flexibility, we risk inadvertently ascribing pattern to noise or building opaque models. However, model selection can be challenging, especially when comparing models constructed under different modeling approaches. Here we argue for a more pragmatic approach: researchers should constrain the complexity of their models based on study objective, attributes of the data, and an understanding of how these interact with the underlying biological processes. We discuss guidelines for balancing under fitting with over fitting and consequently how complexity affects decisions made during model building. Although some generalities are possible, our discussion reflects differences in opinions that favor simpler versus more complex models. We conclude that combining insights from both simple and complex SDM building approaches best advances our knowledge of current and future species ranges.

Geostatistics: Spatial predictions and simulations

Relevância:

100.00% 100.00%

Publicador:

Influence of environmental conditions on the distribution of Central Asian green toads with three ploidy levels.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We studied the distribution of Palearctic green toads (Bufo viridis subgroup), an anuran species group with three ploidy levels, inhabiting the Central Asian Amudarya River drainage. Various approaches (one-way, multivariate, components variance analyses and maximum entropy modelling) were used to estimate the effect of altitude, precipitation, temperature and land vegetation covers on the distribution of toads. It is usually assumed that polyploid species occur in regions with harsher climatic conditions (higher latitudes, elevations, etc.), but for the green toads complex, we revealed a more intricate situation. The diploid species (Bufo shaartusiensis and Bufo turanensis) inhabit the arid lowlands (from 44 to 789 m a.s.l.), while tetraploid Bufo pewzowi were recorded in mountainous regions (340-3492 m a.s.l.) with usually lower temperatures and higher precipitation rates than in the region inhabited by diploid species. The triploid species Bufo baturae was found in the Pamirs (Tajikistan) at the highest altitudes (2503-3859 m a.s.l.) under the harshest climatic conditions.

The impact of transmission clusters on primary drug resistance in newly diagnosed HIV-1 infection.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVES: To monitor HIV-1 transmitted drug resistance (TDR) in a well defined urban area with large access to antiretroviral therapy and to assess the potential source of infection of newly diagnosed HIV individuals. METHODS: All individuals resident in Geneva, Switzerland, with a newly diagnosed HIV infection between 2000 and 2008 were screened for HIV resistance. An infection was considered as recent when the positive test followed a negative screening test within less than 1 year. Phylogenetic analyses were performed by using the maximum likelihood method on pol sequences including 1058 individuals with chronic infection living in Geneva. RESULTS: Of 637 individuals with newly diagnosed HIV infection, 20% had a recent infection. Mutations associated with resistance to at least one drug class were detected in 8.5% [nucleoside reverse transcriptase inhibitors (NRTIs), 6.3%; non-nucleoside reverse transcriptase inhibitors (NNRTIs), 3.5%; protease inhibitors, 1.9%]. TDR (P-trend = 0.015) and, in particular, NNRTI resistance (P = 0.002) increased from 2000 to 2008. Phylogenetic analyses revealed that 34.9% of newly diagnosed individuals, and 52.7% of those with recent infection were linked to transmission clusters. Clusters were more frequent in individuals with TDR than in those with sensitive strains (59.3 vs. 32.6%, respectively; P < 0.0001). Moreover, 84% of newly diagnosed individuals with TDR were part of clusters composed of only newly diagnosed individuals. CONCLUSION: Reconstruction of the HIV transmission networks using phylogenetic analysis shows that newly diagnosed HIV infections are a significant source of onward transmission, particularly of resistant strains, thus suggesting an important self-fueling mechanism for TDR.

Comparative genomics suggests that the human pathogenic fungus Pneumocystis jirovecii acquired obligate biotrophy through gene loss.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pneumocystis jirovecii is a fungal parasite that colonizes specifically humans and turns into an opportunistic pathogen in immunodeficient individuals. The fungus is able to reproduce extracellularly in host lungs without eliciting massive cellular death. The molecular mechanisms that govern this process are poorly understood, in part because of the lack of an in vitro culture system for Pneumocystis spp. In this study, we explored the origin and evolution of the putative biotrophy of P. jirovecii through comparative genomics and reconstruction of ancestral gene repertoires. We used the maximum parsimony method and genomes of related fungi of the Taphrinomycotina subphylum. Our results suggest that the last common ancestor of Pneumocystis spp. lost 2,324 genes in relation to the acquisition of obligate biotrophy. These losses may result from neutral drift and affect the biosyntheses of amino acids and thiamine, the assimilation of inorganic nitrogen and sulfur, and the catabolism of purines. In addition, P. jirovecii shows a reduced panel of lytic proteases and has lost the RNA interference machinery, which might contribute to its genome plasticity. Together with other characteristics, that is, a sex life cycle within the host, the absence of massive destruction of host cells, difficult culturing, and the lack of virulence factors, these gene losses constitute a unique combination of characteristics which are hallmarks of both obligate biotrophs and animal parasites. These findings suggest that Pneumocystis spp. should be considered as the first described obligate biotrophs of animals, whose evolution has been marked by gene losses.

Heritability of blood pressure in the swiss population: the family-based skipogh study

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective: Blood pressure is known to aggregate in families. Yet, heritability estimates are population-specific and no Swiss data have been published so far. Moreover, little is known on the heritability of the white-coat effect. We investigated the heritability of various blood pressure (BP) traits in a Swiss population-based sample. Methods: SKIPOGH (Swiss Kidney Project on Genes in Hypertension) is a family-based multi-centre (Lausanne, Bern, Geneva) cross-sectional study that examines the role of genes in determining BP levels. Office and 24-hour ambulatory BP were measured using validated devices (A&D UM-101 and Diasys Integra). We estimated the heritability of systolic BP (SBP), diastolic BP (DBP), heart rate (HR), pulse pressure (PP), proportional white-coat effect (i.e. [office BP-mean ambulatory daytime BP]/mean ambulatory daytime BP), and nocturnal BP dipping (difference between mean ambulatory daytime and night-time BP) using a maximum likelihood method implemented in the SAGE software. Analyses were adjusted for age, sex, body mass index (BMI), and study centre. Analyses involving PP were additionally adjusted for DBP. Results: The 517 men and 579 women included in this analysis had a mean (}SD) age of 46.8 (17.8) and 47.8 (17.1) years and a mean BMI of 26.0 (4.2) and 24.2 (4.6) kg/m2, respectively. Heritability estimates (}SE) for office SBP, DBP, HR, and PP were 0.20}0.07, 0.20}0.07, 0.39}0.08, and 0.16}0.07 (all P<0.01). Heritability estimates for 24-hour ambulatory SBP, DBP, HR, and PP were, respectively, 0.39}0.07, 0.30}.08, 0.19}0.09, and 0.25}0.08 (all P<0.05). The heritability of the white-coat effect was 0.29}0.07 for SBP and 0.31}0.07 for DBP (both P<0.001). The heritability of nocturnal BP dipping was 0.15}0.08 for SBP and 0.22}0.07 for DBP (both P<0.05). Conclusions: We found that the white-coat effect is significantly heritable. Our findings show that BP traits are moderately heritable in a multi-centric study in Switzerland, in line with previous population-based studies, justifying the ongoing search for genetic determinants in this field.

2

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The quality of environmental data analysis and propagation of errors are heavily affected by the representativity of the initial sampling design [CRE 93, DEU 97, KAN 04a, LEN 06, MUL07]. Geostatistical methods such as kriging are related to field samples, whose spatial distribution is crucial for the correct detection of the phenomena. Literature about the design of environmental monitoring networks (MN) is widespread and several interesting books have recently been published [GRU 06, LEN 06, MUL 07] in order to clarify the basic principles of spatial sampling design (monitoring networks optimization) based on Support Vector Machines was proposed. Nonetheless, modelers often receive real data coming from environmental monitoring networks that suffer from problems of non-homogenity (clustering). Clustering can be related to the preferential sampling or to the impossibility of reaching certain regions.

«
1
2
»