9 resultados para SAMPLERS
em Queensland University of Technology - ePrints Archive
Resumo:
This dissertation is primarily an applied statistical modelling investigation, motivated by a case study comprising real data and real questions. Theoretical questions on modelling and computation of normalization constants arose from pursuit of these data analytic questions. The essence of the thesis can be described as follows. Consider binary data observed on a two-dimensional lattice. A common problem with such data is the ambiguity of zeroes recorded. These may represent zero response given some threshold (presence) or that the threshold has not been triggered (absence). Suppose that the researcher wishes to estimate the effects of covariates on the binary responses, whilst taking into account underlying spatial variation, which is itself of some interest. This situation arises in many contexts and the dingo, cypress and toad case studies described in the motivation chapter are examples of this. Two main approaches to modelling and inference are investigated in this thesis. The first is frequentist and based on generalized linear models, with spatial variation modelled by using a block structure or by smoothing the residuals spatially. The EM algorithm can be used to obtain point estimates, coupled with bootstrapping or asymptotic MLE estimates for standard errors. The second approach is Bayesian and based on a three- or four-tier hierarchical model, comprising a logistic regression with covariates for the data layer, a binary Markov Random field (MRF) for the underlying spatial process, and suitable priors for parameters in these main models. The three-parameter autologistic model is a particular MRF of interest. Markov chain Monte Carlo (MCMC) methods comprising hybrid Metropolis/Gibbs samplers is suitable for computation in this situation. Model performance can be gauged by MCMC diagnostics. Model choice can be assessed by incorporating another tier in the modelling hierarchy. This requires evaluation of a normalization constant, a notoriously difficult problem. Difficulty with estimating the normalization constant for the MRF can be overcome by using a path integral approach, although this is a highly computationally intensive method. Different methods of estimating ratios of normalization constants (N Cs) are investigated, including importance sampling Monte Carlo (ISMC), dependent Monte Carlo based on MCMC simulations (MCMC), and reverse logistic regression (RLR). I develop an idea present though not fully developed in the literature, and propose the Integrated mean canonical statistic (IMCS) method for estimating log NC ratios for binary MRFs. The IMCS method falls within the framework of the newly identified path sampling methods of Gelman & Meng (1998) and outperforms ISMC, MCMC and RLR. It also does not rely on simplifying assumptions, such as ignoring spatio-temporal dependence in the process. A thorough investigation is made of the application of IMCS to the three-parameter Autologistic model. This work introduces background computations required for the full implementation of the four-tier model in Chapter 7. Two different extensions of the three-tier model to a four-tier version are investigated. The first extension incorporates temporal dependence in the underlying spatio-temporal process. The second extensions allows the successes and failures in the data layer to depend on time. The MCMC computational method is extended to incorporate the extra layer. A major contribution of the thesis is the development of a fully Bayesian approach to inference for these hierarchical models for the first time. Note: The author of this thesis has agreed to make it open access but invites people downloading the thesis to send her an email via the 'Contact Author' function.
Resumo:
As part of a larger indoor environmental study, residential indoor and outdoor levels of nitrogen dioxide (NO2) were measured for 14 houses in a suburb of Brisbane, Queensland, Australia. Passive samplers were used for 48-h sampling periods during the winter of 1999. The average indoor and outdoor NO2 levels were 13.8 ± 6.3 and 16.7 ± 4.2 ppb, respectively. The indoor/outdoor NO2 concentration ratio ranged from 0.4 to 2.3, with a median value of 0.82. The results of statistic analyses indicated that there was no significant correlation between indoor and outdoor NO2 concentrations, or between indoor and fixed site NO2 monitoring station concentrations. However, there was a significant correlation between outdoor and fixed site NO2 monitoring station concentrations. There was also a significant correlation between indoor NO2 concentration and indoor submicrometre (0.007–0.808 μm) aerosol particle number concentrations. The results in this study indicated indoor NO2 levels are significantly affected by indoor NO2 sources, such as a gas stove and cigarette smoking. It implies that the outdoor or fixed site monitoring concentration alone is a poor predictor of indoor NO2 concentration.
Resumo:
Markov chain Monte Carlo (MCMC) estimation provides a solution to the complex integration problems that are faced in the Bayesian analysis of statistical problems. The implementation of MCMC algorithms is, however, code intensive and time consuming. We have developed a Python package, which is called PyMCMC, that aids in the construction of MCMC samplers and helps to substantially reduce the likelihood of coding error, as well as aid in the minimisation of repetitive code. PyMCMC contains classes for Gibbs, Metropolis Hastings, independent Metropolis Hastings, random walk Metropolis Hastings, orientational bias Monte Carlo and slice samplers as well as specific modules for common models such as a module for Bayesian regression analysis. PyMCMC is straightforward to optimise, taking advantage of the Python libraries Numpy and Scipy, as well as being readily extensible with C or Fortran.
Resumo:
Passive air samplers (PAS) consisting of polyurethane foam (PUF) disks were deployed at 6 outdoor air monitoring stations in different land use categories (commercial, industrial, residential and semi-rural) to assess the spatial distribution of polybrominated diphenyl ethers (PBDEs) in the Brisbane airshed. Air monitoring sites covered an area of 1143 km2 and PAS were allowed to accumulate PBDEs in the city's airshed over three consecutive seasons commencing in the winter of 2008. The average sum of five (∑5) PBDEs (BDEs 28, 47, 99, 100 and 209) levels were highest at the commercial and industrial sites (12.7 ± 5.2 ng PUF−1), which were relatively close to the city center and were a factor of 8 times higher than residential and semi-rural sites located in outer Brisbane. To estimate the magnitude of the urban ‘plume’ an empirical exponential decay model was used to fit PAS data vs. distance from the CBD, with the best correlation observed when the particulate bound BDE-209 was not included (∑5-209) (r2 = 0.99), rather than ∑5 (r2 = 0.84). At 95% confidence intervals the model predicts that regardless of site characterization, ∑5-209 concentrations in a PAS sample taken between 4–10 km from the city centre would be half that from a sample taken from the city centre and reach a baseline or plateau (0.6 to 1.3 ng PUF−1), approximately 30 km from the CBD. The observed exponential decay in ∑5-209 levels over distance corresponded with Brisbane's decreasing population density (persons/km2) from the city center. The residual error associated with the model increased significantly when including BDE-209 levels, primarily due to the highest level (11.4 ± 1.8 ng PUF−1) being consistently detected at the industrial site, indicating a potential primary source at this site. Active air samples collected alongside the PAS at the industrial air monitoring site (B) indicated BDE-209 dominated congener composition and was entirely associated with the particulate phase. This study demonstrates that PAS are effective tools for monitoring citywide regional differences however, interpretation of spatial trends for POPs which are predominantly associated with the particulate phase such as BDE-209, may be restricted to identifying ‘hotspots’ rather than broad spatial trends.
Resumo:
A nation-wide passive air sampling campaign recorded concentrations of persistent organic pollutants in Australia's atmosphere in 2012. XAD-based passive air samplers were deployed for one year at 15 sampling sites located in remote/background, agricultural and semi-urban and urban areas across the continent. Concentrations of 47 polychlorinated biphenyls ranged from 0.73 to 72 pg m-3 (median of 8.9 pg m-3) and were consistently higher at urban sites. The toxic equivalent concentration for the sum of 12 dioxin-like PCBs was low, ranging from below detection limits to 0.24 fg m-3 (median of 0.0086 fg m-3). Overall, the levels of polychlorinated biphenyls in Australia were among the lowest reported globally to date. Among the organochlorine pesticides, hexachlorobenzene had the highest (median of 41 pg m-3) and most uniform concentration (with a ratio between highest and lowest value [similar]5). Bushfires may be responsible for atmospheric hexachlorobenzene levels in Australia that exceeded Southern Hemispheric baseline levels by a factor of [similar]4. Organochlorine pesticide concentrations generally increased from remote/background and agricultural sites to urban sites, except for high concentrations of [small alpha]-endosulfan and DDTs at specific agricultural sites. Concentrations of heptachlor (0.47-210 pg m-3), dieldrin (ND-160 pg m-3) and trans- and cis-chlordanes (0.83-180 pg m-3, sum of) in Australian air were among the highest reported globally to date, whereas those of DDT and its metabolites (ND-160 pg m-3, sum of), [small alpha]-, [small beta]-, [gamma]- and [small delta]-hexachlorocyclohexane (ND-6.7 pg m-3, sum of) and [small alpha]-endosulfan (ND-27 pg m-3) were among the lowest.
Resumo:
Phosphorus has a number of indispensable biochemical roles, but its natural deposition and the low solubility of phosphates as well as their rapid transformation to insoluble forms make the element commonly the growth-limiting nutrient, particularly in aquatic ecosystems. Famously, phosphorus that reaches water bodies is commonly the main cause of eutrophication. This undesirable process can severely affect many aquatic biotas in the world. More management practices are proposed but long-term monitoring of phosphorus level is necessary to ensure that the eutrophication won't occur. Passive sampling techniques, which have been developed over the last decades, could provide several advantages to the conventional sampling methods including simpler sampling devices, more cost-effective sampling campaign, providing flow proportional load as well as representative average of concentrations of phosphorus in the environment. Although some types of passive samplers are commercially available, their uses are still scarcely reported in the literature. In Japan, there is limited application of passive sampling technique to monitor phosphorus even in the field of agricultural environment. This paper aims to introduce the relatively new P-sampling techniques and their potential to use in environmental monitoring studies.
Resumo:
As there are a myriad of micro organic pollutants that can affect the well-being of human and other organisms in the environment the need for an effective monitoring tool is eminent. Passive sampling techniques, which have been developed over the last decades, could provide several advantages to the conventional sampling methods including simpler sampling devices, more cost-effective sampling campaign, providing time-integrated load as well as representative average of concentrations of pollutants in the environment. Those techniques have been applied to monitor many pollutants caused by agricultural activities, i.e. residues of pesticides, veterinary drugs and so on. Several types of passive samplers are commercially available and their uses are widely accepted. However, not many applications of those techniques have been found in Japan, especially in the field of agricultural environment. This paper aims to introduce the field of passive sampling and then to describe some applications of passive sampling techniques in environmental monitoring studies related to the agriculture industry.
Resumo:
This paper proposes solutions to three issues pertaining to the estimation of finite mixture models with an unknown number of components: the non-identifiability induced by overfitting the number of components, the mixing limitations of standard Markov Chain Monte Carlo (MCMC) sampling techniques, and the related label switching problem. An overfitting approach is used to estimate the number of components in a finite mixture model via a Zmix algorithm. Zmix provides a bridge between multidimensional samplers and test based estimation methods, whereby priors are chosen to encourage extra groups to have weights approaching zero. MCMC sampling is made possible by the implementation of prior parallel tempering, an extension of parallel tempering. Zmix can accurately estimate the number of components, posterior parameter estimates and allocation probabilities given a sufficiently large sample size. The results will reflect uncertainty in the final model and will report the range of possible candidate models and their respective estimated probabilities from a single run. Label switching is resolved with a computationally light-weight method, Zswitch, developed for overfitted mixtures by exploiting the intuitiveness of allocation-based relabelling algorithms and the precision of label-invariant loss functions. Four simulation studies are included to illustrate Zmix and Zswitch, as well as three case studies from the literature. All methods are available as part of the R package Zmix, which can currently be applied to univariate Gaussian mixture models.