981 resultados para Chain Monte-carlo
Resumo:
INTRODUCTION: Malaria is a serious problem in the Brazilian Amazon region, and the detection of possible risk factors could be of great interest for public health authorities. The objective of this article was to investigate the association between environmental variables and the yearly registers of malaria in the Amazon region using Bayesian spatiotemporal methods. METHODS: We used Poisson spatiotemporal regression models to analyze the Brazilian Amazon forest malaria count for the period from 1999 to 2008. In this study, we included some covariates that could be important in the yearly prediction of malaria, such as deforestation rate. We obtained the inferences using a Bayesian approach and Markov Chain Monte Carlo (MCMC) methods to simulate samples for the joint posterior distribution of interest. The discrimination of different models was also discussed. RESULTS: The model proposed here suggests that deforestation rate, the number of inhabitants per km², and the human development index (HDI) are important in the prediction of malaria cases. CONCLUSIONS: It is possible to conclude that human development, population growth, deforestation, and their associated ecological alterations are conducive to increasing malaria risk. We conclude that the use of Poisson regression models that capture the spatial and temporal effects under the Bayesian paradigm is a good strategy for modeling malaria counts.
Resumo:
There are both theoretical and empirical reasons for believing that the parameters of macroeconomic models may vary over time. However, work with time-varying parameter models has largely involved Vector autoregressions (VARs), ignoring cointegration. This is despite the fact that cointegration plays an important role in informing macroeconomists on a range of issues. In this paper we develop time varying parameter models which permit cointegration. Time-varying parameter VARs (TVP-VARs) typically use state space representations to model the evolution of parameters. In this paper, we show that it is not sensible to use straightforward extensions of TVP-VARs when allowing for cointegration. Instead we develop a specification which allows for the cointegrating space to evolve over time in a manner comparable to the random walk variation used with TVP-VARs. The properties of our approach are investigated before developing a method of posterior simulation. We use our methods in an empirical investigation involving a permanent/transitory variance decomposition for inflation.
Resumo:
This paper contributes to the on-going empirical debate regarding the role of the RBC model and in particular of technology shocks in explaining aggregate fluctuations. To this end we estimate the model’s posterior density using Markov-Chain Monte-Carlo (MCMC) methods. Within this framework we extend Ireland’s (2001, 2004) hybrid estimation approach to allow for a vector autoregressive moving average (VARMA) process to describe the movements and co-movements of the model’s errors not explained by the basic RBC model. The results of marginal likelihood ratio tests reveal that the more general model of the errors significantly improves the model’s fit relative to the VAR and AR alternatives. Moreover, despite setting the RBC model a more difficult task under the VARMA specification, our analysis, based on forecast error and spectral decompositions, suggests that the RBC model is still capable of explaining a significant fraction of the observed variation in macroeconomic aggregates in the post-war U.S. economy.
Resumo:
This paper develops methods for Stochastic Search Variable Selection (currently popular with regression and Vector Autoregressive models) for Vector Error Correction models where there are many possible restrictions on the cointegration space. We show how this allows the researcher to begin with a single unrestricted model and either do model selection or model averaging in an automatic and computationally efficient manner. We apply our methods to a large UK macroeconomic model.
Resumo:
This paper considers the instrumental variable regression model when there is uncertainty about the set of instruments, exogeneity restrictions, the validity of identifying restrictions and the set of exogenous regressors. This uncertainty can result in a huge number of models. To avoid statistical problems associated with standard model selection procedures, we develop a reversible jump Markov chain Monte Carlo algorithm that allows us to do Bayesian model averaging. The algorithm is very exible and can be easily adapted to analyze any of the di¤erent priors that have been proposed in the Bayesian instrumental variables literature. We show how to calculate the probability of any relevant restriction (e.g. the posterior probability that over-identifying restrictions hold) and discuss diagnostic checking using the posterior distribution of discrepancy vectors. We illustrate our methods in a returns-to-schooling application.
Resumo:
Vector Autoregressive Moving Average (VARMA) models have many theoretical properties which should make them popular among empirical macroeconomists. However, they are rarely used in practice due to over-parameterization concerns, difficulties in ensuring identification and computational challenges. With the growing interest in multivariate time series models of high dimension, these problems with VARMAs become even more acute, accounting for the dominance of VARs in this field. In this paper, we develop a Bayesian approach for inference in VARMAs which surmounts these problems. It jointly ensures identification and parsimony in the context of an efficient Markov chain Monte Carlo (MCMC) algorithm. We use this approach in a macroeconomic application involving up to twelve dependent variables. We find our algorithm to work successfully and provide insights beyond those provided by VARs.
Resumo:
Time-lapse crosshole ground-penetrating radar (GPR) data, collected while infiltration occurs, can provide valuable information regarding the hydraulic properties of the unsaturated zone. In particular, the stochastic inversion of such data provides estimates of parameter uncertainties, which are necessary for hydrological prediction and decision making. Here, we investigate the effect of different infiltration conditions on the stochastic inversion of time-lapse, zero-offset-profile, GPR data. Inversions are performed using a Bayesian Markov-chain-Monte-Carlo methodology. Our results clearly indicate that considering data collected during a forced infiltration test helps to better refine soil hydraulic properties compared to data collected under natural infiltration conditions
Resumo:
Des progrès significatifs ont été réalisés dans le domaine de l'intégration quantitative des données géophysique et hydrologique l'échelle locale. Cependant, l'extension à de plus grandes échelles des approches correspondantes constitue encore un défi majeur. Il est néanmoins extrêmement important de relever ce défi pour développer des modèles fiables de flux des eaux souterraines et de transport de contaminant. Pour résoudre ce problème, j'ai développé une technique d'intégration des données hydrogéophysiques basée sur une procédure bayésienne de simulation séquentielle en deux étapes. Cette procédure vise des problèmes à plus grande échelle. L'objectif est de simuler la distribution d'un paramètre hydraulique cible à partir, d'une part, de mesures d'un paramètre géophysique pertinent qui couvrent l'espace de manière exhaustive, mais avec une faible résolution (spatiale) et, d'autre part, de mesures locales de très haute résolution des mêmes paramètres géophysique et hydraulique. Pour cela, mon algorithme lie dans un premier temps les données géophysiques de faible et de haute résolution à travers une procédure de réduction déchelle. Les données géophysiques régionales réduites sont ensuite reliées au champ du paramètre hydraulique à haute résolution. J'illustre d'abord l'application de cette nouvelle approche dintégration des données à une base de données synthétiques réaliste. Celle-ci est constituée de mesures de conductivité hydraulique et électrique de haute résolution réalisées dans les mêmes forages ainsi que destimations des conductivités électriques obtenues à partir de mesures de tomographic de résistivité électrique (ERT) sur l'ensemble de l'espace. Ces dernières mesures ont une faible résolution spatiale. La viabilité globale de cette méthode est testée en effectuant les simulations de flux et de transport au travers du modèle original du champ de conductivité hydraulique ainsi que du modèle simulé. Les simulations sont alors comparées. Les résultats obtenus indiquent que la procédure dintégration des données proposée permet d'obtenir des estimations de la conductivité en adéquation avec la structure à grande échelle ainsi que des predictions fiables des caractéristiques de transports sur des distances de moyenne à grande échelle. Les résultats correspondant au scénario de terrain indiquent que l'approche d'intégration des données nouvellement mise au point est capable d'appréhender correctement les hétérogénéitées à petite échelle aussi bien que les tendances à gande échelle du champ hydraulique prévalent. Les résultats montrent également une flexibilté remarquable et une robustesse de cette nouvelle approche dintégration des données. De ce fait, elle est susceptible d'être appliquée à un large éventail de données géophysiques et hydrologiques, à toutes les gammes déchelles. Dans la deuxième partie de ma thèse, j'évalue en détail la viabilité du réechantillonnage geostatique séquentiel comme mécanisme de proposition pour les méthodes Markov Chain Monte Carlo (MCMC) appliquées à des probmes inverses géophysiques et hydrologiques de grande dimension . L'objectif est de permettre une quantification plus précise et plus réaliste des incertitudes associées aux modèles obtenus. En considérant une série dexemples de tomographic radar puits à puits, j'étudie deux classes de stratégies de rééchantillonnage spatial en considérant leur habilité à générer efficacement et précisément des réalisations de la distribution postérieure bayésienne. Les résultats obtenus montrent que, malgré sa popularité, le réechantillonnage séquentiel est plutôt inefficace à générer des échantillons postérieurs indépendants pour des études de cas synthétiques réalistes, notamment pour le cas assez communs et importants où il existe de fortes corrélations spatiales entre le modèle et les paramètres. Pour résoudre ce problème, j'ai développé un nouvelle approche de perturbation basée sur une déformation progressive. Cette approche est flexible en ce qui concerne le nombre de paramètres du modèle et lintensité de la perturbation. Par rapport au rééchantillonage séquentiel, cette nouvelle approche s'avère être très efficace pour diminuer le nombre requis d'itérations pour générer des échantillons indépendants à partir de la distribution postérieure bayésienne. - Significant progress has been made with regard to the quantitative integration of geophysical and hydrological data at the local scale. However, extending corresponding approaches beyond the local scale still represents a major challenge, yet is critically important for the development of reliable groundwater flow and contaminant transport models. To address this issue, I have developed a hydrogeophysical data integration technique based on a two-step Bayesian sequential simulation procedure that is specifically targeted towards larger-scale problems. The objective is to simulate the distribution of a target hydraulic parameter based on spatially exhaustive, but poorly resolved, measurements of a pertinent geophysical parameter and locally highly resolved, but spatially sparse, measurements of the considered geophysical and hydraulic parameters. To this end, my algorithm links the low- and high-resolution geophysical data via a downscaling procedure before relating the downscaled regional-scale geophysical data to the high-resolution hydraulic parameter field. I first illustrate the application of this novel data integration approach to a realistic synthetic database consisting of collocated high-resolution borehole measurements of the hydraulic and electrical conductivities and spatially exhaustive, low-resolution electrical conductivity estimates obtained from electrical resistivity tomography (ERT). The overall viability of this method is tested and verified by performing and comparing flow and transport simulations through the original and simulated hydraulic conductivity fields. The corresponding results indicate that the proposed data integration procedure does indeed allow for obtaining faithful estimates of the larger-scale hydraulic conductivity structure and reliable predictions of the transport characteristics over medium- to regional-scale distances. The approach is then applied to a corresponding field scenario consisting of collocated high- resolution measurements of the electrical conductivity, as measured using a cone penetrometer testing (CPT) system, and the hydraulic conductivity, as estimated from electromagnetic flowmeter and slug test measurements, in combination with spatially exhaustive low-resolution electrical conductivity estimates obtained from surface-based electrical resistivity tomography (ERT). The corresponding results indicate that the newly developed data integration approach is indeed capable of adequately capturing both the small-scale heterogeneity as well as the larger-scale trend of the prevailing hydraulic conductivity field. The results also indicate that this novel data integration approach is remarkably flexible and robust and hence can be expected to be applicable to a wide range of geophysical and hydrological data at all scale ranges. In the second part of my thesis, I evaluate in detail the viability of sequential geostatistical resampling as a proposal mechanism for Markov Chain Monte Carlo (MCMC) methods applied to high-dimensional geophysical and hydrological inverse problems in order to allow for a more accurate and realistic quantification of the uncertainty associated with the thus inferred models. Focusing on a series of pertinent crosshole georadar tomographic examples, I investigated two classes of geostatistical resampling strategies with regard to their ability to efficiently and accurately generate independent realizations from the Bayesian posterior distribution. The corresponding results indicate that, despite its popularity, sequential resampling is rather inefficient at drawing independent posterior samples for realistic synthetic case studies, notably for the practically common and important scenario of pronounced spatial correlation between model parameters. To address this issue, I have developed a new gradual-deformation-based perturbation approach, which is flexible with regard to the number of model parameters as well as the perturbation strength. Compared to sequential resampling, this newly proposed approach was proven to be highly effective in decreasing the number of iterations required for drawing independent samples from the Bayesian posterior distribution.
Resumo:
Over the past decade, significant interest has been expressed in relating the spatial statistics of surface-based reflection ground-penetrating radar (GPR) data to those of the imaged subsurface volume. A primary motivation for this work is that changes in the radar wave velocity, which largely control the character of the observed data, are expected to be related to corresponding changes in subsurface water content. Although previous work has indeed indicated that the spatial statistics of GPR images are linked to those of the water content distribution of the probed region, a viable method for quantitatively analyzing the GPR data and solving the corresponding inverse problem has not yet been presented. Here we address this issue by first deriving a relationship between the 2-D autocorrelation of a water content distribution and that of the corresponding GPR reflection image. We then show how a Bayesian inversion strategy based on Markov chain Monte Carlo sampling can be used to estimate the posterior distribution of subsurface correlation model parameters that are consistent with the GPR data. Our results indicate that if the underlying assumptions are valid and we possess adequate prior knowledge regarding the water content distribution, in particular its vertical variability, this methodology allows not only for the reliable recovery of lateral correlation model parameters but also for estimates of parameter uncertainties. In the case where prior knowledge regarding the vertical variability of water content is not available, the results show that the methodology still reliably recovers the aspect ratio of the heterogeneity.
Resumo:
Time-lapse geophysical data acquired during transient hydrological experiments are being increasingly employed to estimate subsurface hydraulic properties at the field scale. In particular, crosshole ground-penetrating radar (GPR) data, collected while water infiltrates into the subsurface either by natural or artificial means, have been demonstrated in a number of studies to contain valuable information concerning the hydraulic properties of the unsaturated zone. Previous work in this domain has considered a variety of infiltration conditions and different amounts of time-lapse GPR data in the estimation procedure. However, the particular benefits and drawbacks of these different strategies as well as the impact of a variety of key and common assumptions remain unclear. Using a Bayesian Markov-chain-Monte-Carlo stochastic inversion methodology, we examine in this paper the information content of time-lapse zero-offset-profile (ZOP) GPR traveltime data, collected under three different infiltration conditions, for the estimation of van Genuchten-Mualem (VGM) parameters in a layered subsurface medium. Specifically, we systematically analyze synthetic and field GPR data acquired under natural loading and two rates of forced infiltration, and we consider the value of incorporating different amounts of time-lapse measurements into the estimation procedure. Our results confirm that, for all infiltration scenarios considered, the ZOP GPR traveltime data contain important information about subsurface hydraulic properties as a function of depth, with forced infiltration offering the greatest potential for VGM parameter refinement because of the higher stressing of the hydrological system. Considering greater amounts of time-lapse data in the inversion procedure is also found to help refine VGM parameter estimates. Quite importantly, however, inconsistencies observed in the field results point to the strong possibility that posterior uncertainties are being influenced by model structural errors, which in turn underlines the fundamental importance of a systematic analysis of such errors in future related studies.
Resumo:
This paper proposes a method to conduct inference in panel VAR models with cross unit interdependencies and time variations in the coefficients. The approach can be used to obtain multi-unit forecasts and leading indicators and to conduct policy analysis in a multiunit setups. The framework of analysis is Bayesian and MCMC methods are used to estimate the posterior distribution of the features of interest. The model is reparametrized to resemble an observable index model and specification searches are discussed. As an example, we construct leading indicators for inflation and GDP growth in the Euro area using G-7 information.
Resumo:
We analyze crash data collected by the Iowa Department of Transportation using Bayesian methods. The data set includes monthly crash numbers, estimated monthly traffic volumes, site length and other information collected at 30 paired sites in Iowa over more than 20 years during which an intervention experiment was set up. The intervention consisted in transforming 15 undivided road segments from four-lane to three lanes, while an additional 15 segments, thought to be comparable in terms of traffic safety-related characteristics were not converted. The main objective of this work is to find out whether the intervention reduces the number of crashes and the crash rates at the treated sites. We fitted a hierarchical Poisson regression model with a change-point to the number of monthly crashes per mile at each of the sites. Explanatory variables in the model included estimated monthly traffic volume, time, an indicator for intervention reflecting whether the site was a “treatment” or a “control” site, and various interactions. We accounted for seasonal effects in the number of crashes at a site by including smooth trigonometric functions with three different periods to reflect the four seasons of the year. A change-point at the month and year in which the intervention was completed for treated sites was also included. The number of crashes at a site can be thought to follow a Poisson distribution. To estimate the association between crashes and the explanatory variables, we used a log link function and added a random effect to account for overdispersion and for autocorrelation among observations obtained at the same site. We used proper but non-informative priors for all parameters in the model, and carried out all calculations using Markov chain Monte Carlo methods implemented in WinBUGS. We evaluated the effect of the four to three-lane conversion by comparing the expected number of crashes per year per mile during the years preceding the conversion and following the conversion for treatment and control sites. We estimated this difference using the observed traffic volumes at each site and also on a per 100,000,000 vehicles. We also conducted a prospective analysis to forecast the expected number of crashes per mile at each site in the study one year, three years and five years following the four to three-lane conversion. Posterior predictive distributions of the number of crashes, the crash rate and the percent reduction in crashes per mile were obtained for each site for the months of January and June one, three and five years after completion of the intervention. The model appears to fit the data well. We found that in most sites, the intervention was effective and reduced the number of crashes. Overall, and for the observed traffic volumes, the reduction in the expected number of crashes per year and mile at converted sites was 32.3% (31.4% to 33.5% with 95% probability) while at the control sites, the reduction was estimated to be 7.1% (5.7% to 8.2% with 95% probability). When the reduction in the expected number of crashes per year, mile and 100,000,000 AADT was computed, the estimates were 44.3% (43.9% to 44.6%) and 25.5% (24.6% to 26.0%) for converted and control sites, respectively. In both cases, the difference in the percent reduction in the expected number of crashes during the years following the conversion was significantly larger at converted sites than at control sites, even though the number of crashes appears to decline over time at all sites. Results indicate that the reduction in the expected number of sites per mile has a steeper negative slope at converted than at control sites. Consistent with this, the forecasted reduction in the number of crashes per year and mile during the years after completion of the conversion at converted sites is more pronounced than at control sites. Seasonal effects on the number of crashes have been well-documented. In this dataset, we found that, as expected, the expected number of monthly crashes per mile tends to be higher during winter months than during the rest of the year. Perhaps more interestingly, we found that there is an interaction between the four to three-lane conversion and season; the reduction in the number of crashes appears to be more pronounced during months, when the weather is nice than during other times of the year, even though a reduction was estimated for the entire year. Thus, it appears that the four to three-lane conversion, while effective year-round, is particularly effective in reducing the expected number of crashes in nice weather.
Resumo:
This paper describes a methodology to estimate the coefficients, to test specification hypothesesand to conduct policy exercises in multi-country VAR models with cross unit interdependencies, unit specific dynamics and time variations in the coefficients. The framework of analysis is Bayesian: a prior flexibly reduces the dimensionality of the model and puts structure on the time variations; MCMC methods are used to obtain posterior distributions; and marginal likelihoods to check the fit of various specifications. Impulse responses and conditional forecasts are obtained with the output of MCMC routine. The transmission of certain shocks across countries is analyzed.
Resumo:
L'utilisation efficace des systèmes géothermaux, la séquestration du CO2 pour limiter le changement climatique et la prévention de l'intrusion d'eau salée dans les aquifères costaux ne sont que quelques exemples qui démontrent notre besoin en technologies nouvelles pour suivre l'évolution des processus souterrains à partir de la surface. Un défi majeur est d'assurer la caractérisation et l'optimisation des performances de ces technologies à différentes échelles spatiales et temporelles. Les méthodes électromagnétiques (EM) d'ondes planes sont sensibles à la conductivité électrique du sous-sol et, par conséquent, à la conductivité électrique des fluides saturant la roche, à la présence de fractures connectées, à la température et aux matériaux géologiques. Ces méthodes sont régies par des équations valides sur de larges gammes de fréquences, permettant détudier de manières analogues des processus allant de quelques mètres sous la surface jusqu'à plusieurs kilomètres de profondeur. Néanmoins, ces méthodes sont soumises à une perte de résolution avec la profondeur à cause des propriétés diffusives du champ électromagnétique. Pour cette raison, l'estimation des modèles du sous-sol par ces méthodes doit prendre en compte des informations a priori afin de contraindre les modèles autant que possible et de permettre la quantification des incertitudes de ces modèles de façon appropriée. Dans la présente thèse, je développe des approches permettant la caractérisation statique et dynamique du sous-sol à l'aide d'ondes EM planes. Dans une première partie, je présente une approche déterministe permettant de réaliser des inversions répétées dans le temps (time-lapse) de données d'ondes EM planes en deux dimensions. Cette stratégie est basée sur l'incorporation dans l'algorithme d'informations a priori en fonction des changements du modèle de conductivité électrique attendus. Ceci est réalisé en intégrant une régularisation stochastique et des contraintes flexibles par rapport à la gamme des changements attendus en utilisant les multiplicateurs de Lagrange. J'utilise des normes différentes de la norme l2 pour contraindre la structure du modèle et obtenir des transitions abruptes entre les régions du model qui subissent des changements dans le temps et celles qui n'en subissent pas. Aussi, j'incorpore une stratégie afin d'éliminer les erreurs systématiques de données time-lapse. Ce travail a mis en évidence l'amélioration de la caractérisation des changements temporels par rapport aux approches classiques qui réalisent des inversions indépendantes à chaque pas de temps et comparent les modèles. Dans la seconde partie de cette thèse, j'adopte un formalisme bayésien et je teste la possibilité de quantifier les incertitudes sur les paramètres du modèle dans l'inversion d'ondes EM planes. Pour ce faire, je présente une stratégie d'inversion probabiliste basée sur des pixels à deux dimensions pour des inversions de données d'ondes EM planes et de tomographies de résistivité électrique (ERT) séparées et jointes. Je compare les incertitudes des paramètres du modèle en considérant différents types d'information a priori sur la structure du modèle et différentes fonctions de vraisemblance pour décrire les erreurs sur les données. Les résultats indiquent que la régularisation du modèle est nécessaire lorsqu'on a à faire à un large nombre de paramètres car cela permet d'accélérer la convergence des chaînes et d'obtenir des modèles plus réalistes. Cependent, ces contraintes mènent à des incertitudes d'estimations plus faibles, ce qui implique des distributions a posteriori qui ne contiennent pas le vrai modèledans les régions ou` la méthode présente une sensibilité limitée. Cette situation peut être améliorée en combinant des méthodes d'ondes EM planes avec d'autres méthodes complémentaires telles que l'ERT. De plus, je montre que le poids de régularisation des paramètres et l'écart-type des erreurs sur les données peuvent être retrouvés par une inversion probabiliste. Finalement, j'évalue la possibilité de caractériser une distribution tridimensionnelle d'un panache de traceur salin injecté dans le sous-sol en réalisant une inversion probabiliste time-lapse tridimensionnelle d'ondes EM planes. Etant donné que les inversions probabilistes sont très coûteuses en temps de calcul lorsque l'espace des paramètres présente une grande dimension, je propose une stratégie de réduction du modèle ou` les coefficients de décomposition des moments de Legendre du panache de traceur injecté ainsi que sa position sont estimés. Pour ce faire, un modèle de résistivité de base est nécessaire. Il peut être obtenu avant l'expérience time-lapse. Un test synthétique montre que la méthodologie marche bien quand le modèle de résistivité de base est caractérisé correctement. Cette méthodologie est aussi appliquée à un test de trac¸age par injection d'une solution saline et d'acides réalisé dans un système géothermal en Australie, puis comparée à une inversion time-lapse tridimensionnelle réalisée selon une approche déterministe. L'inversion probabiliste permet de mieux contraindre le panache du traceur salin gr^ace à la grande quantité d'informations a priori incluse dans l'algorithme. Néanmoins, les changements de conductivités nécessaires pour expliquer les changements observés dans les données sont plus grands que ce qu'expliquent notre connaissance actuelle des phénomenès physiques. Ce problème peut être lié à la qualité limitée du modèle de résistivité de base utilisé, indiquant ainsi que des efforts plus grands devront être fournis dans le futur pour obtenir des modèles de base de bonne qualité avant de réaliser des expériences dynamiques. Les études décrites dans cette thèse montrent que les méthodes d'ondes EM planes sont très utiles pour caractériser et suivre les variations temporelles du sous-sol sur de larges échelles. Les présentes approches améliorent l'évaluation des modèles obtenus, autant en termes d'incorporation d'informations a priori, qu'en termes de quantification d'incertitudes a posteriori. De plus, les stratégies développées peuvent être appliquées à d'autres méthodes géophysiques, et offrent une grande flexibilité pour l'incorporation d'informations additionnelles lorsqu'elles sont disponibles. -- The efficient use of geothermal systems, the sequestration of CO2 to mitigate climate change, and the prevention of seawater intrusion in coastal aquifers are only some examples that demonstrate the need for novel technologies to monitor subsurface processes from the surface. A main challenge is to assure optimal performance of such technologies at different temporal and spatial scales. Plane-wave electromagnetic (EM) methods are sensitive to subsurface electrical conductivity and consequently to fluid conductivity, fracture connectivity, temperature, and rock mineralogy. These methods have governing equations that are the same over a large range of frequencies, thus allowing to study in an analogous manner processes on scales ranging from few meters close to the surface down to several hundreds of kilometers depth. Unfortunately, they suffer from a significant resolution loss with depth due to the diffusive nature of the electromagnetic fields. Therefore, estimations of subsurface models that use these methods should incorporate a priori information to better constrain the models, and provide appropriate measures of model uncertainty. During my thesis, I have developed approaches to improve the static and dynamic characterization of the subsurface with plane-wave EM methods. In the first part of this thesis, I present a two-dimensional deterministic approach to perform time-lapse inversion of plane-wave EM data. The strategy is based on the incorporation of prior information into the inversion algorithm regarding the expected temporal changes in electrical conductivity. This is done by incorporating a flexible stochastic regularization and constraints regarding the expected ranges of the changes by using Lagrange multipliers. I use non-l2 norms to penalize the model update in order to obtain sharp transitions between regions that experience temporal changes and regions that do not. I also incorporate a time-lapse differencing strategy to remove systematic errors in the time-lapse inversion. This work presents improvements in the characterization of temporal changes with respect to the classical approach of performing separate inversions and computing differences between the models. In the second part of this thesis, I adopt a Bayesian framework and use Markov chain Monte Carlo (MCMC) simulations to quantify model parameter uncertainty in plane-wave EM inversion. For this purpose, I present a two-dimensional pixel-based probabilistic inversion strategy for separate and joint inversions of plane-wave EM and electrical resistivity tomography (ERT) data. I compare the uncertainties of the model parameters when considering different types of prior information on the model structure and different likelihood functions to describe the data errors. The results indicate that model regularization is necessary when dealing with a large number of model parameters because it helps to accelerate the convergence of the chains and leads to more realistic models. These constraints also lead to smaller uncertainty estimates, which imply posterior distributions that do not include the true underlying model in regions where the method has limited sensitivity. This situation can be improved by combining planewave EM methods with complimentary geophysical methods such as ERT. In addition, I show that an appropriate regularization weight and the standard deviation of the data errors can be retrieved by the MCMC inversion. Finally, I evaluate the possibility of characterizing the three-dimensional distribution of an injected water plume by performing three-dimensional time-lapse MCMC inversion of planewave EM data. Since MCMC inversion involves a significant computational burden in high parameter dimensions, I propose a model reduction strategy where the coefficients of a Legendre moment decomposition of the injected water plume and its location are estimated. For this purpose, a base resistivity model is needed which is obtained prior to the time-lapse experiment. A synthetic test shows that the methodology works well when the base resistivity model is correctly characterized. The methodology is also applied to an injection experiment performed in a geothermal system in Australia, and compared to a three-dimensional time-lapse inversion performed within a deterministic framework. The MCMC inversion better constrains the water plumes due to the larger amount of prior information that is included in the algorithm. The conductivity changes needed to explain the time-lapse data are much larger than what is physically possible based on present day understandings. This issue may be related to the base resistivity model used, therefore indicating that more efforts should be given to obtain high-quality base models prior to dynamic experiments. The studies described herein give clear evidence that plane-wave EM methods are useful to characterize and monitor the subsurface at a wide range of scales. The presented approaches contribute to an improved appraisal of the obtained models, both in terms of the incorporation of prior information in the algorithms and the posterior uncertainty quantification. In addition, the developed strategies can be applied to other geophysical methods, and offer great flexibility to incorporate additional information when available.
Resumo:
We present a Bayesian approach for estimating the relative frequencies of multi-single nucleotide polymorphism (SNP) haplotypes in populations of the malaria parasite Plasmodium falciparum by using microarray SNP data from human blood samples. Each sample comes from a malaria patient and contains one or several parasite clones that may genetically differ. Samples containing multiple parasite clones with different genetic markers pose a special challenge. The situation is comparable with a polyploid organism. The data from each blood sample indicates whether the parasites in the blood carry a mutant or a wildtype allele at various selected genomic positions. If both mutant and wildtype alleles are detected at a given position in a multiply infected sample, the data indicates the presence of both alleles, but the ratio is unknown. Thus, the data only partially reveals which specific combinations of genetic markers (i.e. haplotypes across the examined SNPs) occur in distinct parasite clones. In addition, SNP data may contain errors at non-negligible rates. We use a multinomial mixture model with partially missing observations to represent this data and a Markov chain Monte Carlo method to estimate the haplotype frequencies in a population. Our approach addresses both challenges, multiple infections and data errors.