119 resultados para Data Driven Modeling
Resumo:
The velocity of a liquid slug falling in a capillary tube is lower than predicted for Poiseuille flow due to presence of menisci, whose shapes are determined by the complex interplay of capillary, viscous, and gravitational forces. Due to the presence of menisci, a capillary pressure proportional to surface curvature acts on the slug and streamlines are bent close to the interface, resulting in enhanced viscous dissipation at the wedges. To determine the origin of drag-force increase relative to Poiseuille flow, we compute the force resultant acting on the slug by integrating Navier-Stokes equations over the liquid volume. Invoking relationships from differential geometry we demonstrate that the additional drag is due to viscous forces only and that no capillary drag of hydrodynamic origin exists (i.e., due to hydrodynamic deformation of the interface). Requiring that the force resultant is zero, we derive scaling laws for the steady velocity in the limit of small capillary numbers by estimating the leading order viscous dissipation in the different regions of the slug (i.e., the unperturbed Poiseuille-like bulk, the static menisci close to the tube axis and the dynamic regions close to the contact lines). Considering both partial and complete wetting, we find that the relationship between dimensionless velocity and weight is, in general, nonlinear. Whereas the relationship obtained for complete-wetting conditions is found in agreement with the experimental data of Bico and Quere [J. Bico and D. Quere, J. Colloid Interface Sci. 243, 262 (2001)], the scaling law under partial-wetting conditions is validated by numerical simulations performed with the Volume of Fluid method. The simulated steady velocities agree with the behavior predicted by the theoretical scaling laws in presence and in absence of static contact angle hysteresis. The numerical simulations suggest that wedge-flow dissipation alone cannot account for the entire additional drag and that the non-Poiseuille dissipation in the static menisci (not considered in previous studies) has to be considered for large contact angles.
Resumo:
For patients with chronic lung diseases, such as chronic obstructive pulmonary disease (COPD), exacerbations are life-threatening events causing acute respiratory distress that can even lead to hospitalization and death. Although a great deal of effort has been put into research of exacerbations and potential treatment options, the exact underlying mechanisms are yet to be deciphered and no therapy that effectively targets the excessive inflammation is available. In this study, we report that interleukin-1β (IL-1β) and interleukin-17A (IL-17A) are key mediators of neutrophilic inflammation in influenza-induced exacerbations of chronic lung inflammation. Using a mouse model of disease, our data shows a role for IL-1β in mediating lung dysfunction, and in driving neutrophilic inflammation during the whole phase of viral infection. We further report a role for IL-17A as a mediator of IL-1β induced neutrophilia at early time points during influenza-induced exacerbations. Blocking of IL-17A or IL-1 resulted in a significant abrogation of neutrophil recruitment to the airways in the initial phase of infection or at the peak of viral replication, respectively. Therefore, IL-17A and IL-1β are potential targets for therapeutic treatment of viral exacerbations of chronic lung inflammation.
Resumo:
Résumé Cette thèse est consacrée à l'analyse, la modélisation et la visualisation de données environnementales à référence spatiale à l'aide d'algorithmes d'apprentissage automatique (Machine Learning). L'apprentissage automatique peut être considéré au sens large comme une sous-catégorie de l'intelligence artificielle qui concerne particulièrement le développement de techniques et d'algorithmes permettant à une machine d'apprendre à partir de données. Dans cette thèse, les algorithmes d'apprentissage automatique sont adaptés pour être appliqués à des données environnementales et à la prédiction spatiale. Pourquoi l'apprentissage automatique ? Parce que la majorité des algorithmes d'apprentissage automatiques sont universels, adaptatifs, non-linéaires, robustes et efficaces pour la modélisation. Ils peuvent résoudre des problèmes de classification, de régression et de modélisation de densité de probabilités dans des espaces à haute dimension, composés de variables informatives spatialisées (« géo-features ») en plus des coordonnées géographiques. De plus, ils sont idéaux pour être implémentés en tant qu'outils d'aide à la décision pour des questions environnementales allant de la reconnaissance de pattern à la modélisation et la prédiction en passant par la cartographie automatique. Leur efficacité est comparable au modèles géostatistiques dans l'espace des coordonnées géographiques, mais ils sont indispensables pour des données à hautes dimensions incluant des géo-features. Les algorithmes d'apprentissage automatique les plus importants et les plus populaires sont présentés théoriquement et implémentés sous forme de logiciels pour les sciences environnementales. Les principaux algorithmes décrits sont le Perceptron multicouches (MultiLayer Perceptron, MLP) - l'algorithme le plus connu dans l'intelligence artificielle, le réseau de neurones de régression généralisée (General Regression Neural Networks, GRNN), le réseau de neurones probabiliste (Probabilistic Neural Networks, PNN), les cartes auto-organisées (SelfOrganized Maps, SOM), les modèles à mixture Gaussiennes (Gaussian Mixture Models, GMM), les réseaux à fonctions de base radiales (Radial Basis Functions Networks, RBF) et les réseaux à mixture de densité (Mixture Density Networks, MDN). Cette gamme d'algorithmes permet de couvrir des tâches variées telle que la classification, la régression ou l'estimation de densité de probabilité. L'analyse exploratoire des données (Exploratory Data Analysis, EDA) est le premier pas de toute analyse de données. Dans cette thèse les concepts d'analyse exploratoire de données spatiales (Exploratory Spatial Data Analysis, ESDA) sont traités selon l'approche traditionnelle de la géostatistique avec la variographie expérimentale et selon les principes de l'apprentissage automatique. La variographie expérimentale, qui étudie les relations entre pairs de points, est un outil de base pour l'analyse géostatistique de corrélations spatiales anisotropiques qui permet de détecter la présence de patterns spatiaux descriptible par une statistique. L'approche de l'apprentissage automatique pour l'ESDA est présentée à travers l'application de la méthode des k plus proches voisins qui est très simple et possède d'excellentes qualités d'interprétation et de visualisation. Une part importante de la thèse traite de sujets d'actualité comme la cartographie automatique de données spatiales. Le réseau de neurones de régression généralisée est proposé pour résoudre cette tâche efficacement. Les performances du GRNN sont démontrées par des données de Comparaison d'Interpolation Spatiale (SIC) de 2004 pour lesquelles le GRNN bat significativement toutes les autres méthodes, particulièrement lors de situations d'urgence. La thèse est composée de quatre chapitres : théorie, applications, outils logiciels et des exemples guidés. Une partie importante du travail consiste en une collection de logiciels : Machine Learning Office. Cette collection de logiciels a été développée durant les 15 dernières années et a été utilisée pour l'enseignement de nombreux cours, dont des workshops internationaux en Chine, France, Italie, Irlande et Suisse ainsi que dans des projets de recherche fondamentaux et appliqués. Les cas d'études considérés couvrent un vaste spectre de problèmes géoenvironnementaux réels à basse et haute dimensionnalité, tels que la pollution de l'air, du sol et de l'eau par des produits radioactifs et des métaux lourds, la classification de types de sols et d'unités hydrogéologiques, la cartographie des incertitudes pour l'aide à la décision et l'estimation de risques naturels (glissements de terrain, avalanches). Des outils complémentaires pour l'analyse exploratoire des données et la visualisation ont également été développés en prenant soin de créer une interface conviviale et facile à l'utilisation. Machine Learning for geospatial data: algorithms, software tools and case studies Abstract The thesis is devoted to the analysis, modeling and visualisation of spatial environmental data using machine learning algorithms. In a broad sense machine learning can be considered as a subfield of artificial intelligence. It mainly concerns with the development of techniques and algorithms that allow computers to learn from data. In this thesis machine learning algorithms are adapted to learn from spatial environmental data and to make spatial predictions. Why machine learning? In few words most of machine learning algorithms are universal, adaptive, nonlinear, robust and efficient modeling tools. They can find solutions for the classification, regression, and probability density modeling problems in high-dimensional geo-feature spaces, composed of geographical space and additional relevant spatially referenced features. They are well-suited to be implemented as predictive engines in decision support systems, for the purposes of environmental data mining including pattern recognition, modeling and predictions as well as automatic data mapping. They have competitive efficiency to the geostatistical models in low dimensional geographical spaces but are indispensable in high-dimensional geo-feature spaces. The most important and popular machine learning algorithms and models interesting for geo- and environmental sciences are presented in details: from theoretical description of the concepts to the software implementation. The main algorithms and models considered are the following: multi-layer perceptron (a workhorse of machine learning), general regression neural networks, probabilistic neural networks, self-organising (Kohonen) maps, Gaussian mixture models, radial basis functions networks, mixture density networks. This set of models covers machine learning tasks such as classification, regression, and density estimation. Exploratory data analysis (EDA) is initial and very important part of data analysis. In this thesis the concepts of exploratory spatial data analysis (ESDA) is considered using both traditional geostatistical approach such as_experimental variography and machine learning. Experimental variography is a basic tool for geostatistical analysis of anisotropic spatial correlations which helps to understand the presence of spatial patterns, at least described by two-point statistics. A machine learning approach for ESDA is presented by applying the k-nearest neighbors (k-NN) method which is simple and has very good interpretation and visualization properties. Important part of the thesis deals with a hot topic of nowadays, namely, an automatic mapping of geospatial data. General regression neural networks (GRNN) is proposed as efficient model to solve this task. Performance of the GRNN model is demonstrated on Spatial Interpolation Comparison (SIC) 2004 data where GRNN model significantly outperformed all other approaches, especially in case of emergency conditions. The thesis consists of four chapters and has the following structure: theory, applications, software tools, and how-to-do-it examples. An important part of the work is a collection of software tools - Machine Learning Office. Machine Learning Office tools were developed during last 15 years and was used both for many teaching courses, including international workshops in China, France, Italy, Ireland, Switzerland and for realizing fundamental and applied research projects. Case studies considered cover wide spectrum of the real-life low and high-dimensional geo- and environmental problems, such as air, soil and water pollution by radionuclides and heavy metals, soil types and hydro-geological units classification, decision-oriented mapping with uncertainties, natural hazards (landslides, avalanches) assessments and susceptibility mapping. Complementary tools useful for the exploratory data analysis and visualisation were developed as well. The software is user friendly and easy to use.
Resumo:
Connectivity among populations plays a crucial role in maintaining genetic variation at a local scale, especially in small populations affected strongly by genetic drift. The negative consequences of population disconnection on allelic richness and gene diversity (heterozygosity) are well recognized and empirically established. It is not well recognized, however, that a sudden drop in local effective population size induced by such disconnection produces a temporary disequilibrium in allelic frequency distributions that is akin to the genetic signature of a demographic bottleneck. To document this effect, we used individual-based simulations and empirical data on allelic richness and gene diversity in six pairs of isolated versus well-connected (core) populations of European tree frogs. In our simulations, population disconnection depressed allelic richness more than heterozygosity and thus resulted in a temporary excess in gene diversity relative to mutation drift equilibrium (i.e., signature of a genetic bottleneck). We observed a similar excess in gene diversity in isolated populations of tree frogs. Our results show that population disconnection can create a genetic bottleneck in the absence of demographic collapse.
Resumo:
A human in vivo toxicokinetic model was built to allow a better understanding of the toxicokinetics of folpet fungicide and its key ring biomarkers of exposure: phthalimide (PI), phthalamic acid (PAA) and phthalic acid (PA). Both PI and the sum of ring metabolites, expressed as PA equivalents (PAeq), may be used as biomarkers of exposure. The conceptual representation of the model was based on the analysis of the time course of these biomarkers in volunteers orally and dermally exposed to folpet. In the model, compartments were also used to represent the body burden of folpet and experimentally relevant PI, PAA and PA ring metabolites in blood and in key tissues as well as in excreta, hence urinary and feces. The time evolution of these biomarkers in each compartment of the model was then mathematically described by a system of coupled differential equations. The mathematical parameters of the model were then determined from best fits to the time courses of PI and PAeq in blood and urine of five volunteers administered orally 1 mg kg(-1) and dermally 10 mg kg(-1) of folpet. In the case of oral administration, the mean elimination half-life of PI from blood (through feces, urine or metabolism) was found to be 39.9 h as compared with 28.0 h for PAeq. In the case of a dermal application, mean elimination half-life of PI and PAeq was estimated to be 34.3 and 29.3 h, respectively. The average final fractions of administered dose recovered in urine as PI over the 0-96 h period were 0.030 and 0.002%, for oral and dermal exposure, respectively. Corresponding values for PAeq were 24.5 and 1.83%, respectively. Finally, the average clearance rate of PI from blood calculated from the oral and dermal data was 0.09 ± 0.03 and 0.13 ± 0.05 ml h(-1) while the volume of distribution was 4.30 ± 1.12 and 6.05 ± 2.22 l, respectively. It was not possible to obtain the corresponding values from PAeq data owing to the lack of blood time course data.
Resumo:
Data characteristics and species traits are expected to influence the accuracy with which species' distributions can be modeled and predicted. We compare 10 modeling techniques in terms of predictive power and sensitivity to location error, change in map resolution, and sample size, and assess whether some species traits can explain variation in model performance. We focused on 30 native tree species in Switzerland and used presence-only data to model current distribution, which we evaluated against independent presence-absence data. While there are important differences between the predictive performance of modeling methods, the variance in model performance is greater among species than among techniques. Within the range of data perturbations in this study, some extrinsic parameters of data affect model performance more than others: location error and sample size reduced performance of many techniques, whereas grain had little effect on most techniques. No technique can rescue species that are difficult to predict. The predictive power of species-distribution models can partly be predicted from a series of species characteristics and traits based on growth rate, elevational distribution range, and maximum elevation. Slow-growing species or species with narrow and specialized niches tend to be better modeled. The Swiss presence-only tree data produce models that are reliable enough to be useful in planning and management applications.
Resumo:
OBJECTIVE: To evaluate the public health impact of statin prescribing strategies based on the Justification for the Use of Statins in Primary Prevention: An Intervention Trial Evaluating Rosuvastatin Study (JUPITER). METHODS: We studied 2268 adults aged 35-75 without cardiovascular disease in a population-based study in Switzerland in 2003-2006. We assessed the eligibility for statins according to the Adult Treatment Panel III (ATPIII) guidelines, and by adding "strict" (hs-CRP≥2.0mg/L and LDL-cholesterol <3.4mmol/L), and "extended" (hs-CRP≥2.0mg/L alone) JUPITER-like criteria. We estimated the proportion of CHD deaths potentially prevented over 10years in the Swiss population. RESULTS: Fifteen % were already taking statins, 42% were eligible by ATPIII guidelines, 53% by adding "strict", and 62% by adding "extended" criteria, with a total of 19% newly eligible. The number needed to treat with statins to avoid one CHD death over 10years was 38 for ATPIII, 84 for "strict" and 92 for "extended" JUPITER-like criteria. ATPIII would prevent 17% of CHD deaths, compared with 20% for ATPIII+"strict" and 23% for ATPIII + "extended" criteria (+6%). CONCLUSION: Implementing JUPITER-like strategies would make statin prescribing for primary prevention more common and less efficient than it is with current guidelines.
Resumo:
Malgré son importance dans notre vie de tous les jours, certaines propriétés de l?eau restent inexpliquées. L'étude des interactions entre l'eau et les particules organiques occupe des groupes de recherche dans le monde entier et est loin d'être finie. Dans mon travail j'ai essayé de comprendre, au niveau moléculaire, ces interactions importantes pour la vie. J'ai utilisé pour cela un modèle simple de l'eau pour décrire des solutions aqueuses de différentes particules. Récemment, l?eau liquide a été décrite comme une structure formée d?un réseau aléatoire de liaisons hydrogènes. En introduisant une particule hydrophobe dans cette structure à basse température, certaines liaisons hydrogènes sont détruites ce qui est énergétiquement défavorable. Les molécules d?eau s?arrangent alors autour de cette particule en formant une cage qui permet de récupérer des liaisons hydrogènes (entre molécules d?eau) encore plus fortes : les particules sont alors solubles dans l?eau. A des températures plus élevées, l?agitation thermique des molécules devient importante et brise les liaisons hydrogènes. Maintenant, la dissolution des particules devient énergétiquement défavorable, et les particules se séparent de l?eau en formant des agrégats qui minimisent leur surface exposée à l?eau. Pourtant, à très haute température, les effets entropiques deviennent tellement forts que les particules se mélangent de nouveau avec les molécules d?eau. En utilisant un modèle basé sur ces changements de structure formée par des liaisons hydrogènes j?ai pu reproduire les phénomènes principaux liés à l?hydrophobicité. J?ai trouvé une région de coexistence de deux phases entre les températures critiques inférieure et supérieure de solubilité, dans laquelle les particules hydrophobes s?agrègent. En dehors de cette région, les particules sont dissoutes dans l?eau. J?ai démontré que l?interaction hydrophobe est décrite par un modèle qui prend uniquement en compte les changements de structure de l?eau liquide en présence d?une particule hydrophobe, plutôt que les interactions directes entre les particules. Encouragée par ces résultats prometteurs, j?ai étudié des solutions aqueuses de particules hydrophobes en présence de co-solvants cosmotropiques et chaotropiques. Ce sont des substances qui stabilisent ou déstabilisent les agrégats de particules hydrophobes. La présence de ces substances peut être incluse dans le modèle en décrivant leur effet sur la structure de l?eau. J?ai pu reproduire la concentration élevée de co-solvants chaotropiques dans le voisinage immédiat de la particule, et l?effet inverse dans le cas de co-solvants cosmotropiques. Ce changement de concentration du co-solvant à proximité de particules hydrophobes est la cause principale de son effet sur la solubilité des particules hydrophobes. J?ai démontré que le modèle adapté prédit correctement les effets implicites des co-solvants sur les interactions de plusieurs corps entre les particules hydrophobes. En outre, j?ai étendu le modèle à la description de particules amphiphiles comme des lipides. J?ai trouvé la formation de différents types de micelles en fonction de la distribution des regions hydrophobes à la surface des particules. L?hydrophobicité reste également un sujet controversé en science des protéines. J?ai défini une nouvelle échelle d?hydrophobicité pour les acides aminés qui forment des protéines, basée sur leurs surfaces exposées à l?eau dans des protéines natives. Cette échelle permet une comparaison meilleure entre les expériences et les résultats théoriques. Ainsi, le modèle développé dans mon travail contribue à mieux comprendre les solutions aqueuses de particules hydrophobes. Je pense que les résultats analytiques et numériques obtenus éclaircissent en partie les processus physiques qui sont à la base de l?interaction hydrophobe.<br/><br/>Despite the importance of water in our daily lives, some of its properties remain unexplained. Indeed, the interactions of water with organic particles are investigated in research groups all over the world, but controversy still surrounds many aspects of their description. In my work I have tried to understand these interactions on a molecular level using both analytical and numerical methods. Recent investigations describe liquid water as random network formed by hydrogen bonds. The insertion of a hydrophobic particle at low temperature breaks some of the hydrogen bonds, which is energetically unfavorable. The water molecules, however, rearrange in a cage-like structure around the solute particle. Even stronger hydrogen bonds are formed between water molecules, and thus the solute particles are soluble. At higher temperatures, this strict ordering is disrupted by thermal movements, and the solution of particles becomes unfavorable. They minimize their exposed surface to water by aggregating. At even higher temperatures, entropy effects become dominant and water and solute particles mix again. Using a model based on these changes in water structure I have reproduced the essential phenomena connected to hydrophobicity. These include an upper and a lower critical solution temperature, which define temperature and density ranges in which aggregation occurs. Outside of this region the solute particles are soluble in water. Because I was able to demonstrate that the simple mixture model contains implicitly many-body interactions between the solute molecules, I feel that the study contributes to an important advance in the qualitative understanding of the hydrophobic effect. I have also studied the aggregation of hydrophobic particles in aqueous solutions in the presence of cosolvents. Here I have demonstrated that the important features of the destabilizing effect of chaotropic cosolvents on hydrophobic aggregates may be described within the same two-state model, with adaptations to focus on the ability of such substances to alter the structure of water. The relevant phenomena include a significant enhancement of the solubility of non-polar solute particles and preferential binding of chaotropic substances to solute molecules. In a similar fashion, I have analyzed the stabilizing effect of kosmotropic cosolvents in these solutions. Including the ability of kosmotropic substances to enhance the structure of liquid water, leads to reduced solubility, larger aggregation regime and the preferential exclusion of the cosolvent from the hydration shell of hydrophobic solute particles. I have further adapted the MLG model to include the solvation of amphiphilic solute particles in water, by allowing different distributions of hydrophobic regions at the molecular surface, I have found aggregation of the amphiphiles, and formation of various types of micelle as a function of the hydrophobicity pattern. I have demonstrated that certain features of micelle formation may be reproduced by the adapted model to describe alterations of water structure near different surface regions of the dissolved amphiphiles. Hydrophobicity remains a controversial quantity also in protein science. Based on the surface exposure of the 20 amino-acids in native proteins I have defined the a new hydrophobicity scale, which may lead to an improvement in the comparison of experimental data with the results from theoretical HP models. Overall, I have shown that the primary features of the hydrophobic interaction in aqueous solutions may be captured within a model which focuses on alterations in water structure around non-polar solute particles. The results obtained within this model may illuminate the processes underlying the hydrophobic interaction.<br/><br/>La vie sur notre planète a commencé dans l'eau et ne pourrait pas exister en son absence : les cellules des animaux et des plantes contiennent jusqu'à 95% d'eau. Malgré son importance dans notre vie de tous les jours, certaines propriétés de l?eau restent inexpliquées. En particulier, l'étude des interactions entre l'eau et les particules organiques occupe des groupes de recherche dans le monde entier et est loin d'être finie. Dans mon travail j'ai essayé de comprendre, au niveau moléculaire, ces interactions importantes pour la vie. J'ai utilisé pour cela un modèle simple de l'eau pour décrire des solutions aqueuses de différentes particules. Bien que l?eau soit généralement un bon solvant, un grand groupe de molécules, appelées molécules hydrophobes (du grecque "hydro"="eau" et "phobia"="peur"), n'est pas facilement soluble dans l'eau. Ces particules hydrophobes essayent d'éviter le contact avec l'eau, et forment donc un agrégat pour minimiser leur surface exposée à l'eau. Cette force entre les particules est appelée interaction hydrophobe, et les mécanismes physiques qui conduisent à ces interactions ne sont pas bien compris à l'heure actuelle. Dans mon étude j'ai décrit l'effet des particules hydrophobes sur l'eau liquide. L'objectif était d'éclaircir le mécanisme de l'interaction hydrophobe qui est fondamentale pour la formation des membranes et le fonctionnement des processus biologiques dans notre corps. Récemment, l'eau liquide a été décrite comme un réseau aléatoire formé par des liaisons hydrogènes. En introduisant une particule hydrophobe dans cette structure, certaines liaisons hydrogènes sont détruites tandis que les molécules d'eau s'arrangent autour de cette particule en formant une cage qui permet de récupérer des liaisons hydrogènes (entre molécules d?eau) encore plus fortes : les particules sont alors solubles dans l'eau. A des températures plus élevées, l?agitation thermique des molécules devient importante et brise la structure de cage autour des particules hydrophobes. Maintenant, la dissolution des particules devient défavorable, et les particules se séparent de l'eau en formant deux phases. A très haute température, les mouvements thermiques dans le système deviennent tellement forts que les particules se mélangent de nouveau avec les molécules d'eau. A l'aide d'un modèle qui décrit le système en termes de restructuration dans l'eau liquide, j'ai réussi à reproduire les phénomènes physiques liés à l?hydrophobicité. J'ai démontré que les interactions hydrophobes entre plusieurs particules peuvent être exprimées dans un modèle qui prend uniquement en compte les liaisons hydrogènes entre les molécules d'eau. Encouragée par ces résultats prometteurs, j'ai inclus dans mon modèle des substances fréquemment utilisées pour stabiliser ou déstabiliser des solutions aqueuses de particules hydrophobes. J'ai réussi à reproduire les effets dûs à la présence de ces substances. De plus, j'ai pu décrire la formation de micelles par des particules amphiphiles comme des lipides dont la surface est partiellement hydrophobe et partiellement hydrophile ("hydro-phile"="aime l'eau"), ainsi que le repliement des protéines dû à l'hydrophobicité, qui garantit le fonctionnement correct des processus biologiques de notre corps. Dans mes études futures je poursuivrai l'étude des solutions aqueuses de différentes particules en utilisant les techniques acquises pendant mon travail de thèse, et en essayant de comprendre les propriétés physiques du liquide le plus important pour notre vie : l'eau.
Resumo:
Approximate models (proxies) can be employed to reduce the computational costs of estimating uncertainty. The price to pay is that the approximations introduced by the proxy model can lead to a biased estimation. To avoid this problem and ensure a reliable uncertainty quantification, we propose to combine functional data analysis and machine learning to build error models that allow us to obtain an accurate prediction of the exact response without solving the exact model for all realizations. We build the relationship between proxy and exact model on a learning set of geostatistical realizations for which both exact and approximate solvers are run. Functional principal components analysis (FPCA) is used to investigate the variability in the two sets of curves and reduce the dimensionality of the problem while maximizing the retained information. Once obtained, the error model can be used to predict the exact response of any realization on the basis of the sole proxy response. This methodology is purpose-oriented as the error model is constructed directly for the quantity of interest, rather than for the state of the system. Also, the dimensionality reduction performed by FPCA allows a diagnostic of the quality of the error model to assess the informativeness of the learning set and the fidelity of the proxy to the exact model. The possibility of obtaining a prediction of the exact response for any newly generated realization suggests that the methodology can be effectively used beyond the context of uncertainty quantification, in particular for Bayesian inference and optimization.
Resumo:
OBJECTIVES: Blood pressures in persons of African descent exceed those of other racial/ethnic groups in the United States. Whether this trait is attributable to the genetic factors in African-origin populations, or a result of inadequately measured environmental exposures, such as racial discrimination, is not known. To study this question, we conducted a multisite comparative study of communities in the African diaspora, drawn from metropolitan Chicago, Kingston, Jamaica, rural Ghana, Cape Town, South Africa, and the Seychelles. METHODS: At each site, 500 participants between the age of 25 and 49 years, with approximately equal sex balance, were enrolled for a longitudinal study of energy expenditure and weight gain. In this study, we describe the patterns of blood pressure and hypertension observed at baseline among the sites. RESULTS: Mean SBP and DBP were very similar in the United States and South Africa in both men and women, although among women, the prevalence of hypertension was higher in the United States (24 vs. 17%, respectively). After adjustment for multiple covariates, relative to participants in the United States, SBP was significantly higher among the South Africans by 9.7 mmHg (P < 0.05) and significantly lower for each of the other sites: for example, Jamaica: -7.9 mmHg (P = 0.06), Ghana: -12.8 mmHg (P < 0.01) and Seychelles: -11.1 mmHg (P = 0.01). CONCLUSION: These data are consistent with prior findings of a blood pressure gradient in societies of the African diaspora and confirm that African-origin populations with lower social status in multiracial societies, such as the United States and South Africa, experience more hypertension than anticipated based on anthropometric and measurable socioeconomic risk factors.
Resumo:
Maximum entropy modeling (Maxent) is a widely used algorithm for predicting species distributions across space and time. Properly assessing the uncertainty in such predictions is non-trivial and requires validation with independent datasets. Notably, model complexity (number of model parameters) remains a major concern in relation to overfitting and, hence, transferability of Maxent models. An emerging approach is to validate the cross-temporal transferability of model predictions using paleoecological data. In this study, we assess the effect of model complexity on the performance of Maxent projections across time using two European plant species (Alnus giutinosa (L.) Gaertn. and Corylus avellana L) with an extensive late Quaternary fossil record in Spain as a study case. We fit 110 models with different levels of complexity under present time and tested model performance using AUC (area under the receiver operating characteristic curve) and AlCc (corrected Akaike Information Criterion) through the standard procedure of randomly partitioning current occurrence data. We then compared these results to an independent validation by projecting the models to mid-Holocene (6000 years before present) climatic conditions in Spain to assess their ability to predict fossil pollen presence-absence and abundance. We find that calibrating Maxent models with default settings result in the generation of overly complex models. While model performance increased with model complexity when predicting current distributions, it was higher with intermediate complexity when predicting mid-Holocene distributions. Hence, models of intermediate complexity resulted in the best trade-off to predict species distributions across time. Reliable temporal model transferability is especially relevant for forecasting species distributions under future climate change. Consequently, species-specific model tuning should be used to find the best modeling settings to control for complexity, notably with paleoecological data to independently validate model projections. For cross-temporal projections of species distributions for which paleoecological data is not available, models of intermediate complexity should be selected.
Resumo:
Nowadays, Species Distribution Models (SDMs) are a widely used tool. Using different statistical approaches these models reconstruct the realized niche of a species using presence data and a set of variables, often topoclimatic. There utilization range is quite large from understanding single species requirements, to the creation of nature reserve based on species hotspots, or modeling of climate change impact, etc... Most of the time these models are using variables at a resolution of 50km x 50km or 1 km x 1 km. However in some cases these models are used with resolutions below the kilometer scale and thus called high resolution models (100 m x 100 m or 25 m x 25 m). Quite recently a new kind of data has emerged enabling precision up to lm x lm and thus allowing very high resolution modeling. However these new variables are very costly and need an important amount of time to be processed. This is especially the case when these variables are used in complex calculation like models projections over large areas. Moreover the importance of very high resolution data in SDMs has not been assessed yet and is not well understood. Some basic knowledge on what drive species presence-absences is still missing. Indeed, it is not clear whether in mountain areas like the Alps coarse topoclimatic gradients are driving species distributions or if fine scale temperature or topography are more important or if their importance can be neglected when balance to competition or stochasticity. In this thesis I investigated the importance of very high resolution data (2-5m) in species distribution models using either very high resolution topographic, climatic or edaphic variables over a 2000m elevation gradient in the Western Swiss Alps. I also investigated more local responses of these variables for a subset of species living in this area at two precise elvation belts. During this thesis I showed that high resolution data necessitates very good datasets (species and variables for the models) to produce satisfactory results. Indeed, in mountain areas, temperature is the most important factor driving species distribution and needs to be modeled at very fine resolution instead of being interpolated over large surface to produce satisfactory results. Despite the instinctive idea that topographic should be very important at high resolution, results are mitigated. However looking at the importance of variables over a large gradient buffers the importance of the variables. Indeed topographic factors have been shown to be highly important at the subalpine level but their importance decrease at lower elevations. Wether at the mountane level edaphic and land use factors are more important high resolution topographic data is more imporatant at the subalpine level. Finally the biggest improvement in the models happens when edaphic variables are added. Indeed, adding soil variables is of high importance and variables like pH are overpassing the usual topographic variables in SDMs in term of importance in the models. To conclude high resolution is very important in modeling but necessitate very good datasets. Only increasing the resolution of the usual topoclimatic predictors is not sufficient and the use of edaphic predictors has been highlighted as fundamental to produce significantly better models. This is of primary importance, especially if these models are used to reconstruct communities or as basis for biodiversity assessments. -- Ces dernières années, l'utilisation des modèles de distribution d'espèces (SDMs) a continuellement augmenté. Ces modèles utilisent différents outils statistiques afin de reconstruire la niche réalisée d'une espèce à l'aide de variables, notamment climatiques ou topographiques, et de données de présence récoltées sur le terrain. Leur utilisation couvre de nombreux domaines allant de l'étude de l'écologie d'une espèce à la reconstruction de communautés ou à l'impact du réchauffement climatique. La plupart du temps, ces modèles utilisent des occur-rences issues des bases de données mondiales à une résolution plutôt large (1 km ou même 50 km). Certaines bases de données permettent cependant de travailler à haute résolution, par conséquent de descendre en dessous de l'échelle du kilomètre et de travailler avec des résolutions de 100 m x 100 m ou de 25 m x 25 m. Récemment, une nouvelle génération de données à très haute résolution est apparue et permet de travailler à l'échelle du mètre. Les variables qui peuvent être générées sur la base de ces nouvelles données sont cependant très coûteuses et nécessitent un temps conséquent quant à leur traitement. En effet, tout calcul statistique complexe, comme des projections de distribution d'espèces sur de larges surfaces, demande des calculateurs puissants et beaucoup de temps. De plus, les facteurs régissant la distribution des espèces à fine échelle sont encore mal connus et l'importance de variables à haute résolution comme la microtopographie ou la température dans les modèles n'est pas certaine. D'autres facteurs comme la compétition ou la stochasticité naturelle pourraient avoir une influence toute aussi forte. C'est dans ce contexte que se situe mon travail de thèse. J'ai cherché à comprendre l'importance de la haute résolution dans les modèles de distribution d'espèces, que ce soit pour la température, la microtopographie ou les variables édaphiques le long d'un important gradient d'altitude dans les Préalpes vaudoises. J'ai également cherché à comprendre l'impact local de certaines variables potentiellement négligées en raison d'effets confondants le long du gradient altitudinal. Durant cette thèse, j'ai pu monter que les variables à haute résolution, qu'elles soient liées à la température ou à la microtopographie, ne permettent qu'une amélioration substantielle des modèles. Afin de distinguer une amélioration conséquente, il est nécessaire de travailler avec des jeux de données plus importants, tant au niveau des espèces que des variables utilisées. Par exemple, les couches climatiques habituellement interpolées doivent être remplacées par des couches de température modélisées à haute résolution sur la base de données de terrain. Le fait de travailler le long d'un gradient de température de 2000m rend naturellement la température très importante au niveau des modèles. L'importance de la microtopographie est négligeable par rapport à la topographie à une résolution de 25m. Cependant, lorsque l'on regarde à une échelle plus locale, la haute résolution est une variable extrêmement importante dans le milieu subalpin. À l'étage montagnard par contre, les variables liées aux sols et à l'utilisation du sol sont très importantes. Finalement, les modèles de distribution d'espèces ont été particulièrement améliorés par l'addition de variables édaphiques, principalement le pH, dont l'importance supplante ou égale les variables topographique lors de leur ajout aux modèles de distribution d'espèces habituels.
Resumo:
BACKGROUND: Most available pharmacotherapies for alcohol-dependent patients target abstinence; however, reduced alcohol consumption may be a more realistic goal. Using randomized clinical trial (RCT) data, a previous microsimulation model evaluated the clinical relevance of reduced consumption in terms of avoided alcohol-attributable events. Using real-life observational data, the current analysis aimed to adapt the model and confirm previous findings about the clinical relevance of reduced alcohol consumption. METHODS: Based on the prospective observational CONTROL study, evaluating daily alcohol consumption among alcohol-dependent patients, the model predicted the probability of drinking any alcohol during a given day. Predicted daily alcohol consumption was simulated in a hypothetical sample of 200,000 patients observed over a year. Individual total alcohol consumption (TAC) and number of heavy drinking days (HDD) were derived. Using published risk equations, probabilities of alcohol-attributable adverse health events (e.g., hospitalizations or death) corresponding to simulated consumptions were computed, and aggregated for categories of patients defined by HDDs and TAC (expressed per 100,000 patient-years). Sensitivity analyses tested model robustness. RESULTS: Shifting from >220 HDDs per year to 120-140 HDDs and shifting from 36,000-39,000 g TAC per year (120-130 g/day) to 15,000-18,000 g TAC per year (50-60 g/day) impacted substantially on the incidence of events (14,588 and 6148 events avoided per 100,000 patient-years, respectively). Results were robust to sensitivity analyses. CONCLUSIONS: This study corroborates the previous microsimulation modeling approach and, using real-life data, confirms RCT-based findings that reduced alcohol consumption is a relevant objective for consideration in alcohol dependence management to improve public health.
Resumo:
Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand.Here we present a community-driven curation effort, supported by ELIXIR-the European infrastructure for biological information-that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners.As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.