959 resultados para Bayesian inference on precipitation
Resumo:
When actuaries face with the problem of pricing an insurance contract that contains different types of coverage, such as a motor insurance or homeowner's insurance policy, they usually assume that types of claim are independent. However, this assumption may not be realistic: several studies have shown that there is a positive correlation between types of claim. Here we introduce different regression models in order to relax the independence assumption, including zero-inflated models to account for excess of zeros and overdispersion. These models have been largely ignored to multivariate Poisson date, mainly because of their computational di±culties. Bayesian inference based on MCMC helps to solve this problem (and also lets us derive, for several quantities of interest, posterior summaries to account for uncertainty). Finally, these models are applied to an automobile insurance claims database with three different types of claims. We analyse the consequences for pure and loaded premiums when the independence assumption is relaxed by using different multivariate Poisson regression models and their zero-inflated versions.
Resumo:
In this paper we included a very broad representation of grass family diversity (84% of tribes and 42% of genera). Phylogenetic inference was based on three plastid DNA regions rbcL, matK and trnL-F, using maximum parsimony and Bayesian methods. Our results resolved most of the subfamily relationships within the major clades (BEP and PACCMAD), which had previously been unclear, such as, among others the: (i) BEP and PACCMAD sister relationship, (ii) composition of clades and the sister-relationship of Ehrhartoideae and Bambusoideae + Pooideae, (iii) paraphyly of tribe Bambuseae, (iv) position of Gynerium as sister to Panicoideae, (v) phylogenetic position of Micrairoideae. With the presence of a relatively large amount of missing data, we were able to increase taxon sampling substantially in our analyses from 107 to 295 taxa. However, bootstrap support and to a lesser extent Bayesian inference posterior probabilities were generally lower in analyses involving missing data than those not including them. We produced a fully resolved phylogenetic summary tree for the grass family at subfamily level and indicated the most likely relationships of all included tribes in our analysis.
Resumo:
The article discusses the behavioral aspects that affect the entrepreneurs' decision making under the Knightian uncertainty approach. Since the profit arising from entrepreneurial activity represents the reward of an immeasurable and subjective risk, it has been hypothesized that innovative entrepreneurs have excessive optimism and confidence, which leads them to invest in high-risk activities. A behavioral model of decision making under uncertainty is used to test the hypothesis of overconfidence. This model is based on Bayesian inference, which allows us to model the assumption that these entrepreneurs are overconfident. We conclude that, under the hypothesis of overconfidence, these entrepreneurs decide to invest, despite the fact that the expected utility model indicates the contrary. This theoretical finding could explain why there are a large number of business failures in the first years of activity.
Resumo:
Doping with natural steroids can be detected by evaluating the urinary concentrations and ratios of several endogenous steroids. Since these biomarkers of steroid doping are known to present large inter-individual variations, monitoring of individual steroid profiles over time allows switching from population-based towards subject-based reference ranges for improved detection. In an Athlete Biological Passport (ABP), biomarkers data are collated throughout the athlete's sporting career and individual thresholds defined adaptively. For now, this approach has been validated on a limited number of markers of steroid doping, such as the testosterone (T) over epitestosterone (E) ratio to detect T misuse in athletes. Additional markers are required for other endogenous steroids like dihydrotestosterone (DHT) and dehydroepiandrosterone (DHEA). By combining comprehensive steroid profiles composed of 24 steroid concentrations with Bayesian inference techniques for longitudinal profiling, a selection was made for the detection of DHT and DHEA misuse. The biomarkers found were rated according to relative response, parameter stability, discriminative power, and maximal detection time. This analysis revealed DHT/E, DHT/5β-androstane-3α,17β-diol and 5α-androstane-3α,17β-diol/5β-androstane-3α,17β-diol as best biomarkers for DHT administration and DHEA/E, 16α-hydroxydehydroepiandrosterone/E, 7β-hydroxydehydroepiandrosterone/E and 5β-androstane-3α,17β-diol/5α-androstane-3α,17β-diol for DHEA. The selected biomarkers were found suitable for individual referencing. A drastic overall increase in sensitivity was obtained. The use of multiple markers as formalized in an Athlete Steroidal Passport (ASP) can provide firm evidence of doping with endogenous steroids.
Resumo:
Le travail d'un(e) expert(e) en science forensique exige que ce dernier (cette dernière) prenne une série de décisions. Ces décisions sont difficiles parce qu'elles doivent être prises dans l'inévitable présence d'incertitude, dans le contexte unique des circonstances qui entourent la décision, et, parfois, parce qu'elles sont complexes suite à de nombreuse variables aléatoires et dépendantes les unes des autres. Etant donné que ces décisions peuvent aboutir à des conséquences sérieuses dans l'administration de la justice, la prise de décisions en science forensique devrait être soutenue par un cadre robuste qui fait des inférences en présence d'incertitudes et des décisions sur la base de ces inférences. L'objectif de cette thèse est de répondre à ce besoin en présentant un cadre théorique pour faire des choix rationnels dans des problèmes de décisions rencontrés par les experts dans un laboratoire de science forensique. L'inférence et la théorie de la décision bayésienne satisfont les conditions nécessaires pour un tel cadre théorique. Pour atteindre son objectif, cette thèse consiste de trois propositions, recommandant l'utilisation (1) de la théorie de la décision, (2) des réseaux bayésiens, et (3) des réseaux bayésiens de décision pour gérer des problèmes d'inférence et de décision forensiques. Les résultats présentent un cadre uniforme et cohérent pour faire des inférences et des décisions en science forensique qui utilise les concepts théoriques ci-dessus. Ils décrivent comment organiser chaque type de problème en le décomposant dans ses différents éléments, et comment trouver le meilleur plan d'action en faisant la distinction entre des problèmes de décision en une étape et des problèmes de décision en deux étapes et en y appliquant le principe de la maximisation de l'utilité espérée. Pour illustrer l'application de ce cadre à des problèmes rencontrés par les experts dans un laboratoire de science forensique, des études de cas théoriques appliquent la théorie de la décision, les réseaux bayésiens et les réseaux bayésiens de décision à une sélection de différents types de problèmes d'inférence et de décision impliquant différentes catégories de traces. Deux études du problème des deux traces illustrent comment la construction de réseaux bayésiens permet de gérer des problèmes d'inférence complexes, et ainsi surmonter l'obstacle de la complexité qui peut être présent dans des problèmes de décision. Trois études-une sur ce qu'il faut conclure d'une recherche dans une banque de données qui fournit exactement une correspondance, une sur quel génotype il faut rechercher dans une banque de données sur la base des observations faites sur des résultats de profilage d'ADN, et une sur s'il faut soumettre une trace digitale à un processus qui compare la trace avec des empreintes de sources potentielles-expliquent l'application de la théorie de la décision et des réseaux bayésiens de décision à chacune de ces décisions. Les résultats des études des cas théoriques soutiennent les trois propositions avancées dans cette thèse. Ainsi, cette thèse présente un cadre uniforme pour organiser et trouver le plan d'action le plus rationnel dans des problèmes de décisions rencontrés par les experts dans un laboratoire de science forensique. Le cadre proposé est un outil interactif et exploratoire qui permet de mieux comprendre un problème de décision afin que cette compréhension puisse aboutir à des choix qui sont mieux informés. - Forensic science casework involves making a sériés of choices. The difficulty in making these choices lies in the inévitable presence of uncertainty, the unique context of circumstances surrounding each décision and, in some cases, the complexity due to numerous, interrelated random variables. Given that these décisions can lead to serious conséquences in the admin-istration of justice, forensic décision making should be supported by a robust framework that makes inferences under uncertainty and décisions based on these inferences. The objective of this thesis is to respond to this need by presenting a framework for making rational choices in décision problems encountered by scientists in forensic science laboratories. Bayesian inference and décision theory meets the requirements for such a framework. To attain its objective, this thesis consists of three propositions, advocating the use of (1) décision theory, (2) Bayesian networks, and (3) influence diagrams for handling forensic inference and décision problems. The results present a uniform and coherent framework for making inferences and décisions in forensic science using the above theoretical concepts. They describe how to organize each type of problem by breaking it down into its différent elements, and how to find the most rational course of action by distinguishing between one-stage and two-stage décision problems and applying the principle of expected utility maximization. To illustrate the framework's application to the problems encountered by scientists in forensic science laboratories, theoretical case studies apply décision theory, Bayesian net-works and influence diagrams to a selection of différent types of inference and décision problems dealing with différent catégories of trace evidence. Two studies of the two-trace problem illustrate how the construction of Bayesian networks can handle complex inference problems, and thus overcome the hurdle of complexity that can be present in décision prob-lems. Three studies-one on what to conclude when a database search provides exactly one hit, one on what genotype to search for in a database based on the observations made on DNA typing results, and one on whether to submit a fingermark to the process of comparing it with prints of its potential sources-explain the application of décision theory and influ¬ence diagrams to each of these décisions. The results of the theoretical case studies support the thesis's three propositions. Hence, this thesis présents a uniform framework for organizing and finding the most rational course of action in décision problems encountered by scientists in forensic science laboratories. The proposed framework is an interactive and exploratory tool for better understanding a décision problem so that this understanding may lead to better informed choices.
Resumo:
Mathematical methods combined with measurements of single-cell dynamics provide a means to reconstruct intracellular processes that are only partly or indirectly accessible experimentally. To obtain reliable reconstructions, the pooling of measurements from several cells of a clonal population is mandatory. However, cell-to-cell variability originating from diverse sources poses computational challenges for such process reconstruction. We introduce a scalable Bayesian inference framework that properly accounts for population heterogeneity. The method allows inference of inaccessible molecular states and kinetic parameters; computation of Bayes factors for model selection; and dissection of intrinsic, extrinsic and technical noise. We show how additional single-cell readouts such as morphological features can be included in the analysis. We use the method to reconstruct the expression dynamics of a gene under an inducible promoter in yeast from time-lapse microscopy data.
Resumo:
Prior probabilities represent a core element of the Bayesian probabilistic approach to relatedness testing. This letter opinions on the commentary 'Use of prior odds for missing persons identifications' by Budowle et al. (2011), published recently in this journal. Contrary to Budowle et al. (2011), we argue that the concept of prior probabilities (i) is not endowed with the notion of objectivity, (ii) is not a case for computation and (iii) does not require new guidelines edited by the forensic DNA community - as long as probability is properly considered as an expression of personal belief. Please see related article: http://www.investigativegenetics.com/content/3/1/3
Resumo:
Strepsirhines comprise 10 living or recently extinct families, ≥50% of extant primate families. Their phylogenetic relationships have been intensively studied, but common topologies have only recently emerged; e.g. all recent reconstructions link the Lepilemuridae and Cheirogaleidae. The position of the indriids, however, remains uncertain, and molecular studies have placed them as the sister to every clade except Daubentonia, the preferred sister group of morphologists. The node subtending Afro-Asian lorisids has been similarly elusive. We probed these phylogenetic inconsistencies using a test data set including 20 strepsirhine taxa and 2 outgroups represented by 3,543 mtDNA base pairs, and 43 selected morphological characters, subjecting the data to maximum parsimony, maximum likelihood and Bayesian inference analyses, and reconstructing topology and node ages jointly from the molecular data using relaxed molecular clock analyses. Our permutations yielded compatible but not identical evolutionary histories, and currently popular techniques seem unable to deal adequately with morphological data. We investigated the influence of morphological characters on tree topologies, and examined the effect of taxon sampling in two experiments: (1) we removed the molecular data only for 5 endangered Malagasy taxa to simulate 'extinction leaving a fossil record'; (2) we removed both the sequence and morphological data for these taxa. Topologies were affected more by the inclusion of morphological data only, indicating that palaeontological studies that involve inserting a partial morphological data set into a combined data matrix of extant species should be interpreted with caution. The gap of approximately 10 million years between the daubentoniid divergence and those of the other Malagasy families deserves more study. The apparently contemporaneous divergence of African and non-daubentoniid Malagasy families 40-30 million years ago may be related to regional plume-induced uplift followed by a global period of cooling and drying. © 2013 S. Karger AG, Basel.
Resumo:
BACKGROUND: The majority of Haemosporida species infect birds or reptiles, but many important genera, including Plasmodium, infect mammals. Dipteran vectors shared by avian, reptilian and mammalian Haemosporida, suggest multiple invasions of Mammalia during haemosporidian evolution; yet, phylogenetic analyses have detected only a single invasion event. Until now, several important mammal-infecting genera have been absent in these analyses. This study focuses on the evolutionary origin of Polychromophilus, a unique malaria genus that only infects bats (Microchiroptera) and is transmitted by bat flies (Nycteribiidae). METHODS: Two species of Polychromophilus were obtained from wild bats caught in Switzerland. These were molecularly characterized using four genes (asl, clpc, coI, cytb) from the three different genomes (nucleus, apicoplast, mitochondrion). These data were then combined with data of 60 taxa of Haemosporida available in GenBank. Bayesian inference, maximum likelihood and a range of rooting methods were used to test specific hypotheses concerning the phylogenetic relationships between Polychromophilus and the other haemosporidian genera. RESULTS: The Polychromophilus melanipherus and Polychromophilus murinus samples show genetically distinct patterns and group according to species. The Bayesian tree topology suggests that the monophyletic clade of Polychromophilus falls within the avian/saurian clade of Plasmodium and directed hypothesis testing confirms the Plasmodium origin. CONCLUSION: Polychromophilus' ancestor was most likely a bird- or reptile-infecting Plasmodium before it switched to bats. The invasion of mammals as hosts has, therefore, not been a unique event in the evolutionary history of Haemosporida, despite the suspected costs of adapting to a new host. This was, moreover, accompanied by a switch in dipteran host.
Resumo:
Doping with natural steroids can be detected by evaluating the urinary concentrations and ratios of several endogenous steroids. Since these biomarkers of steroid doping are known to present large inter-individual variations, monitoring of individual steroid profiles over time allows switching from population-based towards subject-based reference ranges for improved detection. In an Athlete Biological Passport (ABP), biomarkers data are collated throughout the athlete's sporting career and individual thresholds defined adaptively. For now, this approach has been validated on a limited number of markers of steroid doping, such as the testosterone (T) over epitestosterone (E) ratio to detect T misuse in athletes. Additional markers are required for other endogenous steroids like dihydrotestosterone (DHT) and dehydroepiandrosterone (DHEA). By combining comprehensive steroid profiles composed of 24 steroid concentrations with Bayesian inference techniques for longitudinal profiling, a selection was made for the detection of DHT and DHEA misuse. The biomarkers found were rated according to relative response, parameter stability, discriminative power, and maximal detection time. This analysis revealed DHT/E, DHT/5β-androstane-3α,17β-diol and 5α-androstane-3α,17β-diol/5β-androstane-3α,17β-diol as best biomarkers for DHT administration and DHEA/E, 16α-hydroxydehydroepiandrosterone/E, 7β-hydroxydehydroepiandrosterone/E and 5β-androstane-3α,17β-diol/5α-androstane-3α,17β-diol for DHEA. The selected biomarkers were found suitable for individual referencing. A drastic overall increase in sensitivity was obtained.The use of multiple markers as formalized in an Athlete Steroidal Passport (ASP) can provide firm evidence of doping with endogenous steroids. Copyright © 2010 John Wiley & Sons, Ltd.
Resumo:
BACKGROUND AND AIMS: Although it is well known that fire acts as a selective pressure shaping plant phenotypes, there are no quantitative estimates of the heritability of any trait related to plant persistence under recurrent fires, such as serotiny. In this study, the heritability of serotiny in Pinus halepensis is calculated, and an evaluation is made as to whether fire has left a selection signature on the level of serotiny among populations by comparing the genetic divergence of serotiny with the expected divergence of neutral molecular markers (QST-FST comparison). METHODS: A common garden of P. halepensis was used, located in inland Spain and composed of 145 open-pollinated families from 29 provenances covering the entire natural range of P. halepensis in the Iberian Peninsula and Balearic Islands. Narrow-sense heritability (h(2)) and quantitative genetic differentiation among populations for serotiny (QST) were estimated by means of an 'animal model' fitted by Bayesian inference. In order to determine whether genetic differentiation for serotiny is the result of differential natural selection, QST estimates for serotiny were compared with FST estimates obtained from allozyme data. Finally, a test was made of whether levels of serotiny in the different provenances were related to different fire regimes, using summer rainfall as a proxy for fire regime in each provenance. KEY RESULTS: Serotiny showed a significant narrow-sense heritability (h(2)) of 0·20 (credible interval 0·09-0·40). Quantitative genetic differentiation among provenances for serotiny (QST = 0·44) was significantly higher than expected under a neutral process (FST = 0·12), suggesting adaptive differentiation. A significant negative relationship was found between the serotiny level of trees in the common garden and summer rainfall of their provenance sites. CONCLUSIONS: Serotiny is a heritable trait in P. halepensis, and selection acts on it, giving rise to contrasting serotiny levels among populations depending on the fire regime, and supporting the role of fire in generating genetic divergence for adaptive traits.
Resumo:
Contrast enhancement is an image processing technique where the objective is to preprocess the image so that relevant information can be either seen or further processed more reliably. These techniques are typically applied when the image itself or the device used for image reproduction provides poor visibility and distinguishability of different regions of interest inthe image. In most studies, the emphasis is on the visualization of image data,but this human observer biased goal often results to images which are not optimal for automated processing. The main contribution of this study is to express the contrast enhancement as a mapping from N-channel image data to 1-channel gray-level image, and to devise a projection method which results to an image with minimal error to the correct contrast image. The projection, the minimum-error contrast image, possess the optimal contrast between the regions of interest in the image. The method is based on estimation of the probability density distributions of the region values, and it employs Bayesian inference to establish the minimum error projection.
Resumo:
Approximate models (proxies) can be employed to reduce the computational costs of estimating uncertainty. The price to pay is that the approximations introduced by the proxy model can lead to a biased estimation. To avoid this problem and ensure a reliable uncertainty quantification, we propose to combine functional data analysis and machine learning to build error models that allow us to obtain an accurate prediction of the exact response without solving the exact model for all realizations. We build the relationship between proxy and exact model on a learning set of geostatistical realizations for which both exact and approximate solvers are run. Functional principal components analysis (FPCA) is used to investigate the variability in the two sets of curves and reduce the dimensionality of the problem while maximizing the retained information. Once obtained, the error model can be used to predict the exact response of any realization on the basis of the sole proxy response. This methodology is purpose-oriented as the error model is constructed directly for the quantity of interest, rather than for the state of the system. Also, the dimensionality reduction performed by FPCA allows a diagnostic of the quality of the error model to assess the informativeness of the learning set and the fidelity of the proxy to the exact model. The possibility of obtaining a prediction of the exact response for any newly generated realization suggests that the methodology can be effectively used beyond the context of uncertainty quantification, in particular for Bayesian inference and optimization.
Resumo:
Notre consommation en eau souterraine, en particulier comme eau potable ou pour l'irrigation, a considérablement augmenté au cours des années. De nombreux problèmes font alors leur apparition, allant de la prospection de nouvelles ressources à la remédiation des aquifères pollués. Indépendamment du problème hydrogéologique considéré, le principal défi reste la caractérisation des propriétés du sous-sol. Une approche stochastique est alors nécessaire afin de représenter cette incertitude en considérant de multiples scénarios géologiques et en générant un grand nombre de réalisations géostatistiques. Nous rencontrons alors la principale limitation de ces approches qui est le coût de calcul dû à la simulation des processus d'écoulements complexes pour chacune de ces réalisations. Dans la première partie de la thèse, ce problème est investigué dans le contexte de propagation de l'incertitude, oú un ensemble de réalisations est identifié comme représentant les propriétés du sous-sol. Afin de propager cette incertitude à la quantité d'intérêt tout en limitant le coût de calcul, les méthodes actuelles font appel à des modèles d'écoulement approximés. Cela permet l'identification d'un sous-ensemble de réalisations représentant la variabilité de l'ensemble initial. Le modèle complexe d'écoulement est alors évalué uniquement pour ce sousensemble, et, sur la base de ces réponses complexes, l'inférence est faite. Notre objectif est d'améliorer la performance de cette approche en utilisant toute l'information à disposition. Pour cela, le sous-ensemble de réponses approximées et exactes est utilisé afin de construire un modèle d'erreur, qui sert ensuite à corriger le reste des réponses approximées et prédire la réponse du modèle complexe. Cette méthode permet de maximiser l'utilisation de l'information à disposition sans augmentation perceptible du temps de calcul. La propagation de l'incertitude est alors plus précise et plus robuste. La stratégie explorée dans le premier chapitre consiste à apprendre d'un sous-ensemble de réalisations la relation entre les modèles d'écoulement approximé et complexe. Dans la seconde partie de la thèse, cette méthodologie est formalisée mathématiquement en introduisant un modèle de régression entre les réponses fonctionnelles. Comme ce problème est mal posé, il est nécessaire d'en réduire la dimensionnalité. Dans cette optique, l'innovation du travail présenté provient de l'utilisation de l'analyse en composantes principales fonctionnelles (ACPF), qui non seulement effectue la réduction de dimensionnalités tout en maximisant l'information retenue, mais permet aussi de diagnostiquer la qualité du modèle d'erreur dans cet espace fonctionnel. La méthodologie proposée est appliquée à un problème de pollution par une phase liquide nonaqueuse et les résultats obtenus montrent que le modèle d'erreur permet une forte réduction du temps de calcul tout en estimant correctement l'incertitude. De plus, pour chaque réponse approximée, une prédiction de la réponse complexe est fournie par le modèle d'erreur. Le concept de modèle d'erreur fonctionnel est donc pertinent pour la propagation de l'incertitude, mais aussi pour les problèmes d'inférence bayésienne. Les méthodes de Monte Carlo par chaîne de Markov (MCMC) sont les algorithmes les plus communément utilisés afin de générer des réalisations géostatistiques en accord avec les observations. Cependant, ces méthodes souffrent d'un taux d'acceptation très bas pour les problèmes de grande dimensionnalité, résultant en un grand nombre de simulations d'écoulement gaspillées. Une approche en deux temps, le "MCMC en deux étapes", a été introduite afin d'éviter les simulations du modèle complexe inutiles par une évaluation préliminaire de la réalisation. Dans la troisième partie de la thèse, le modèle d'écoulement approximé couplé à un modèle d'erreur sert d'évaluation préliminaire pour le "MCMC en deux étapes". Nous démontrons une augmentation du taux d'acceptation par un facteur de 1.5 à 3 en comparaison avec une implémentation classique de MCMC. Une question reste sans réponse : comment choisir la taille de l'ensemble d'entrainement et comment identifier les réalisations permettant d'optimiser la construction du modèle d'erreur. Cela requiert une stratégie itérative afin que, à chaque nouvelle simulation d'écoulement, le modèle d'erreur soit amélioré en incorporant les nouvelles informations. Ceci est développé dans la quatrième partie de la thèse, oú cette méthodologie est appliquée à un problème d'intrusion saline dans un aquifère côtier. -- Our consumption of groundwater, in particular as drinking water and for irrigation, has considerably increased over the years and groundwater is becoming an increasingly scarce and endangered resource. Nofadays, we are facing many problems ranging from water prospection to sustainable management and remediation of polluted aquifers. Independently of the hydrogeological problem, the main challenge remains dealing with the incomplete knofledge of the underground properties. Stochastic approaches have been developed to represent this uncertainty by considering multiple geological scenarios and generating a large number of realizations. The main limitation of this approach is the computational cost associated with performing complex of simulations in each realization. In the first part of the thesis, we explore this issue in the context of uncertainty propagation, where an ensemble of geostatistical realizations is identified as representative of the subsurface uncertainty. To propagate this lack of knofledge to the quantity of interest (e.g., the concentration of pollutant in extracted water), it is necessary to evaluate the of response of each realization. Due to computational constraints, state-of-the-art methods make use of approximate of simulation, to identify a subset of realizations that represents the variability of the ensemble. The complex and computationally heavy of model is then run for this subset based on which inference is made. Our objective is to increase the performance of this approach by using all of the available information and not solely the subset of exact responses. Two error models are proposed to correct the approximate responses follofing a machine learning approach. For the subset identified by a classical approach (here the distance kernel method) both the approximate and the exact responses are knofn. This information is used to construct an error model and correct the ensemble of approximate responses to predict the "expected" responses of the exact model. The proposed methodology makes use of all the available information without perceptible additional computational costs and leads to an increase in accuracy and robustness of the uncertainty propagation. The strategy explored in the first chapter consists in learning from a subset of realizations the relationship between proxy and exact curves. In the second part of this thesis, the strategy is formalized in a rigorous mathematical framework by defining a regression model between functions. As this problem is ill-posed, it is necessary to reduce its dimensionality. The novelty of the work comes from the use of functional principal component analysis (FPCA), which not only performs the dimensionality reduction while maximizing the retained information, but also allofs a diagnostic of the quality of the error model in the functional space. The proposed methodology is applied to a pollution problem by a non-aqueous phase-liquid. The error model allofs a strong reduction of the computational cost while providing a good estimate of the uncertainty. The individual correction of the proxy response by the error model leads to an excellent prediction of the exact response, opening the door to many applications. The concept of functional error model is useful not only in the context of uncertainty propagation, but also, and maybe even more so, to perform Bayesian inference. Monte Carlo Markov Chain (MCMC) algorithms are the most common choice to ensure that the generated realizations are sampled in accordance with the observations. Hofever, this approach suffers from lof acceptance rate in high dimensional problems, resulting in a large number of wasted of simulations. This led to the introduction of two-stage MCMC, where the computational cost is decreased by avoiding unnecessary simulation of the exact of thanks to a preliminary evaluation of the proposal. In the third part of the thesis, a proxy is coupled to an error model to provide an approximate response for the two-stage MCMC set-up. We demonstrate an increase in acceptance rate by a factor three with respect to one-stage MCMC results. An open question remains: hof do we choose the size of the learning set and identify the realizations to optimize the construction of the error model. This requires devising an iterative strategy to construct the error model, such that, as new of simulations are performed, the error model is iteratively improved by incorporating the new information. This is discussed in the fourth part of the thesis, in which we apply this methodology to a problem of saline intrusion in a coastal aquifer.
Resumo:
This thesis was focussed on statistical analysis methods and proposes the use of Bayesian inference to extract information contained in experimental data by estimating Ebola model parameters. The model is a system of differential equations expressing the behavior and dynamics of Ebola. Two sets of data (onset and death data) were both used to estimate parameters, which has not been done by previous researchers in (Chowell, 2004). To be able to use both data, a new version of the model has been built. Model parameters have been estimated and then used to calculate the basic reproduction number and to study the disease-free equilibrium. Estimates of the parameters were useful to determine how well the model fits the data and how good estimates were, in terms of the information they provided about the possible relationship between variables. The solution showed that Ebola model fits the observed onset data at 98.95% and the observed death data at 93.6%. Since Bayesian inference can not be performed analytically, the Markov chain Monte Carlo approach has been used to generate samples from the posterior distribution over parameters. Samples have been used to check the accuracy of the model and other characteristics of the target posteriors.