265 resultados para Jeffreys priors
Resumo:
We present preliminary results about the detection of high redshift (U)LIRGs in the Bullet cluster field by the PACS and SPIRE instruments within the Herschel Lensing Survey (HLS) Program. We describe in detail a photometric procedure designed to recover robust fluxes and deblend faint Herschel sources near the confusion noise. The method is based on the use of the positions of Spitzer/MIPS 24 μm sources as priors. Our catalogs are able to reliably (5σ) recover galaxies with fluxes above 6 and 10 mJy in the PACS 100 and 160 μm channels, respectively, and 12 to 18 mJy in the SPIRE bands. We also obtain spectral energy distributions covering the optical through the far-infrared/millimeter spectral ranges of all the Herschel detected sources, and analyze them to obtain independent estimations of the photometric redshift based on either stellar population or dust emission models. We exemplify the potential of the combined use of Spitzer position priors plus independent optical and IR photometric redshifts to robustly assign optical/NIR counterparts to the sources detected by Herschel and other (sub-)mm instruments.
Resumo:
Mixtures of Zellner's g-priors have been studied extensively in linear models and have been shown to have numerous desirable properties for Bayesian variable selection and model averaging. Several extensions of g-priors to Generalized Linear Models (GLMs) have been proposed in the literature; however, the choice of prior distribution of g and resulting properties for inference have received considerably less attention. In this paper, we extend mixtures of g-priors to GLMs by assigning the truncated Compound Confluent Hypergeometric (tCCH) distribution to 1/(1+g) and illustrate how this prior distribution encompasses several special cases of mixtures of g-priors in the literature, such as the Hyper-g, truncated Gamma, Beta-prime, and the Robust prior. Under an integrated Laplace approximation to the likelihood, the posterior distribution of 1/(1+g) is in turn a tCCH distribution, and approximate marginal likelihoods are thus available analytically. We discuss the local geometric properties of the g-prior in GLMs and show that specific choices of the hyper-parameters satisfy the various desiderata for model selection proposed by Bayarri et al, such as asymptotic model selection consistency, information consistency, intrinsic consistency, and measurement invariance. We also illustrate inference using these priors and contrast them to others in the literature via simulation and real examples.
Resumo:
The second decision of the House of Lords to consider the nature of copyright law. As was the case in Donaldson v. Becket (1774) (uk_1774) the law lords were in disagreement with the majority of common law judges invited to speak to the issue for the consideration of the House. In the course of their opinions, two of the law lords (Lord Brougham and Lord St Leonards) explicitly reject the concept of copyright at common law. Rather than a natural authorial property right, they present copyright as a purely statutory phenomenon specifically grounded in public interest concerns. Ultimately, the Lords decided that a foreign national, resident abroad, but first publishing in Britain, enjoys no protection in his work under British copyright law.
Resumo:
Résumé : L'imagerie par résonance magnétique pondérée en diffusion est une modalité unique sensible aux mouvements microscopiques des molécules d'eau dans les tissus biologiques. Il est possible d'utiliser les caractéristiques de ce mouvement pour inférer la structure macroscopique des faisceaux de la matière blanche du cerveau. La technique, appelée tractographie, est devenue l'outil de choix pour étudier cette structure de façon non invasive. Par exemple, la tractographie est utilisée en planification neurochirurgicale et pour le suivi du développement de maladies neurodégénératives. Dans cette thèse, nous exposons certains des biais introduits lors de reconstructions par tractographie, et des méthodes sont proposées pour les réduire. D'abord, nous utilisons des connaissances anatomiques a priori pour orienter la reconstruction. Ainsi, nous montrons que l'information anatomique sur la nature des tissus permet d'estimer des faisceaux anatomiquement plausibles et de réduire les biais dans l'estimation de structures complexes de la matière blanche. Ensuite, nous utilisons des connaissances microstructurelles a priori dans la reconstruction, afin de permettre à la tractographie de suivre le mouvement des molécules d'eau non seulement le long des faisceaux, mais aussi dans des milieux microstructurels spécifiques. La tractographie peut ainsi distinguer différents faisceaux, réduire les erreurs de reconstruction et permettre l'étude de la microstructure le long de la matière blanche. Somme toute, nous montrons que l'utilisation de connaissances anatomiques et microstructurelles a priori, en tractographie, augmente l'exactitude des reconstructions de la matière blanche du cerveau.
Resumo:
The ontology engineering research community has focused for many years on supporting the creation, development and evolution of ontologies. Ontology forecasting, which aims at predicting semantic changes in an ontology, represents instead a new challenge. In this paper, we want to give a contribution to this novel endeavour by focusing on the task of forecasting semantic concepts in the research domain. Indeed, ontologies representing scientific disciplines contain only research topics that are already popular enough to be selected by human experts or automatic algorithms. They are thus unfit to support tasks which require the ability of describing and exploring the forefront of research, such as trend detection and horizon scanning. We address this issue by introducing the Semantic Innovation Forecast (SIF) model, which predicts new concepts of an ontology at time t + 1, using only data available at time t. Our approach relies on lexical innovation and adoption information extracted from historical data. We evaluated the SIF model on a very large dataset consisting of over one million scientific papers belonging to the Computer Science domain: the outcomes show that the proposed approach offers a competitive boost in mean average precision-at-ten compared to the baselines when forecasting over 5 years.
Resumo:
Background The problem of silent multiple comparisons is one of the most difficult statistical problems faced by scientists. It is a particular problem for investigating a one-off cancer cluster reported to a health department because any one of hundreds, or possibly thousands, of neighbourhoods, schools, or workplaces could have reported a cluster, which could have been for any one of several types of cancer or any one of several time periods. Methods This paper contrasts the frequentist approach with a Bayesian approach for dealing with silent multiple comparisons in the context of a one-off cluster reported to a health department. Two published cluster investigations were re-analysed using the Dunn-Sidak method to adjust frequentist p-values and confidence intervals for silent multiple comparisons. Bayesian methods were based on the Gamma distribution. Results Bayesian analysis with non-informative priors produced results similar to the frequentist analysis, and suggested that both clusters represented a statistical excess. In the frequentist framework, the statistical significance of both clusters was extremely sensitive to the number of silent multiple comparisons, which can only ever be a subjective "guesstimate". The Bayesian approach is also subjective: whether there is an apparent statistical excess depends on the specified prior. Conclusion In cluster investigations, the frequentist approach is just as subjective as the Bayesian approach, but the Bayesian approach is less ambitious in that it treats the analysis as a synthesis of data and personal judgements (possibly poor ones), rather than objective reality. Bayesian analysis is (arguably) a useful tool to support complicated decision-making, because it makes the uncertainty associated with silent multiple comparisons explicit.
Resumo:
1. Species' distribution modelling relies on adequate data sets to build reliable statistical models with high predictive ability. However, the money spent collecting empirical data might be better spent on management. A less expensive source of species' distribution information is expert opinion. This study evaluates expert knowledge and its source. In particular, we determine whether models built on expert knowledge apply over multiple regions or only within the region where the knowledge was derived. 2. The case study focuses on the distribution of the brush-tailed rock-wallaby Petrogale penicillata in eastern Australia. We brought together from two biogeographically different regions substantial and well-designed field data and knowledge from nine experts. We used a novel elicitation tool within a geographical information system to systematically collect expert opinions. The tool utilized an indirect approach to elicitation, asking experts simpler questions about observable rather than abstract quantities, with measures in place to identify uncertainty and offer feedback. Bayesian analysis was used to combine field data and expert knowledge in each region to determine: (i) how expert opinion affected models based on field data and (ii) how similar expert-informed models were within regions and across regions. 3. The elicitation tool effectively captured the experts' opinions and their uncertainties. Experts were comfortable with the map-based elicitation approach used, especially with graphical feedback. Experts tended to predict lower values of species occurrence compared with field data. 4. Across experts, consensus on effect sizes occurred for several habitat variables. Expert opinion generally influenced predictions from field data. However, south-east Queensland and north-east New South Wales experts had different opinions on the influence of elevation and geology, with these differences attributable to geological differences between these regions. 5. Synthesis and applications. When formulated as priors in Bayesian analysis, expert opinion is useful for modifying or strengthening patterns exhibited by empirical data sets that are limited in size or scope. Nevertheless, the ability of an expert to extrapolate beyond their region of knowledge may be poor. Hence there is significant merit in obtaining information from local experts when compiling species' distribution models across several regions.
Resumo:
Numerous expert elicitation methods have been suggested for generalised linear models (GLMs). This paper compares three relatively new approaches to eliciting expert knowledge in a form suitable for Bayesian logistic regression. These methods were trialled on two experts in order to model the habitat suitability of the threatened Australian brush-tailed rock-wallaby (Petrogale penicillata). The first elicitation approach is a geographically assisted indirect predictive method with a geographic information system (GIS) interface. The second approach is a predictive indirect method which uses an interactive graphical tool. The third method uses a questionnaire to elicit expert knowledge directly about the impact of a habitat variable on the response. Two variables (slope and aspect) are used to examine prior and posterior distributions of the three methods. The results indicate that there are some similarities and dissimilarities between the expert informed priors of the two experts formulated from the different approaches. The choice of elicitation method depends on the statistical knowledge of the expert, their mapping skills, time constraints, accessibility to experts and funding available. This trial reveals that expert knowledge can be important when modelling rare event data, such as threatened species, because experts can provide additional information that may not be represented in the dataset. However care must be taken with the way in which this information is elicited and formulated.
Resumo:
The main objective of this PhD was to further develop Bayesian spatio-temporal models (specifically the Conditional Autoregressive (CAR) class of models), for the analysis of sparse disease outcomes such as birth defects. The motivation for the thesis arose from problems encountered when analyzing a large birth defect registry in New South Wales. The specific components and related research objectives of the thesis were developed from gaps in the literature on current formulations of the CAR model, and health service planning requirements. Data from a large probabilistically-linked database from 1990 to 2004, consisting of fields from two separate registries: the Birth Defect Registry (BDR) and Midwives Data Collection (MDC) were used in the analyses in this thesis. The main objective was split into smaller goals. The first goal was to determine how the specification of the neighbourhood weight matrix will affect the smoothing properties of the CAR model, and this is the focus of chapter 6. Secondly, I hoped to evaluate the usefulness of incorporating a zero-inflated Poisson (ZIP) component as well as a shared-component model in terms of modeling a sparse outcome, and this is carried out in chapter 7. The third goal was to identify optimal sampling and sample size schemes designed to select individual level data for a hybrid ecological spatial model, and this is done in chapter 8. Finally, I wanted to put together the earlier improvements to the CAR model, and along with demographic projections, provide forecasts for birth defects at the SLA level. Chapter 9 describes how this is done. For the first objective, I examined a series of neighbourhood weight matrices, and showed how smoothing the relative risk estimates according to similarity by an important covariate (i.e. maternal age) helped improve the model’s ability to recover the underlying risk, as compared to the traditional adjacency (specifically the Queen) method of applying weights. Next, to address the sparseness and excess zeros commonly encountered in the analysis of rare outcomes such as birth defects, I compared a few models, including an extension of the usual Poisson model to encompass excess zeros in the data. This was achieved via a mixture model, which also encompassed the shared component model to improve on the estimation of sparse counts through borrowing strength across a shared component (e.g. latent risk factor/s) with the referent outcome (caesarean section was used in this example). Using the Deviance Information Criteria (DIC), I showed how the proposed model performed better than the usual models, but only when both outcomes shared a strong spatial correlation. The next objective involved identifying the optimal sampling and sample size strategy for incorporating individual-level data with areal covariates in a hybrid study design. I performed extensive simulation studies, evaluating thirteen different sampling schemes along with variations in sample size. This was done in the context of an ecological regression model that incorporated spatial correlation in the outcomes, as well as accommodating both individual and areal measures of covariates. Using the Average Mean Squared Error (AMSE), I showed how a simple random sample of 20% of the SLAs, followed by selecting all cases in the SLAs chosen, along with an equal number of controls, provided the lowest AMSE. The final objective involved combining the improved spatio-temporal CAR model with population (i.e. women) forecasts, to provide 30-year annual estimates of birth defects at the Statistical Local Area (SLA) level in New South Wales, Australia. The projections were illustrated using sixteen different SLAs, representing the various areal measures of socio-economic status and remoteness. A sensitivity analysis of the assumptions used in the projection was also undertaken. By the end of the thesis, I will show how challenges in the spatial analysis of rare diseases such as birth defects can be addressed, by specifically formulating the neighbourhood weight matrix to smooth according to a key covariate (i.e. maternal age), incorporating a ZIP component to model excess zeros in outcomes and borrowing strength from a referent outcome (i.e. caesarean counts). An efficient strategy to sample individual-level data and sample size considerations for rare disease will also be presented. Finally, projections in birth defect categories at the SLA level will be made.
Resumo:
This dissertation is primarily an applied statistical modelling investigation, motivated by a case study comprising real data and real questions. Theoretical questions on modelling and computation of normalization constants arose from pursuit of these data analytic questions. The essence of the thesis can be described as follows. Consider binary data observed on a two-dimensional lattice. A common problem with such data is the ambiguity of zeroes recorded. These may represent zero response given some threshold (presence) or that the threshold has not been triggered (absence). Suppose that the researcher wishes to estimate the effects of covariates on the binary responses, whilst taking into account underlying spatial variation, which is itself of some interest. This situation arises in many contexts and the dingo, cypress and toad case studies described in the motivation chapter are examples of this. Two main approaches to modelling and inference are investigated in this thesis. The first is frequentist and based on generalized linear models, with spatial variation modelled by using a block structure or by smoothing the residuals spatially. The EM algorithm can be used to obtain point estimates, coupled with bootstrapping or asymptotic MLE estimates for standard errors. The second approach is Bayesian and based on a three- or four-tier hierarchical model, comprising a logistic regression with covariates for the data layer, a binary Markov Random field (MRF) for the underlying spatial process, and suitable priors for parameters in these main models. The three-parameter autologistic model is a particular MRF of interest. Markov chain Monte Carlo (MCMC) methods comprising hybrid Metropolis/Gibbs samplers is suitable for computation in this situation. Model performance can be gauged by MCMC diagnostics. Model choice can be assessed by incorporating another tier in the modelling hierarchy. This requires evaluation of a normalization constant, a notoriously difficult problem. Difficulty with estimating the normalization constant for the MRF can be overcome by using a path integral approach, although this is a highly computationally intensive method. Different methods of estimating ratios of normalization constants (N Cs) are investigated, including importance sampling Monte Carlo (ISMC), dependent Monte Carlo based on MCMC simulations (MCMC), and reverse logistic regression (RLR). I develop an idea present though not fully developed in the literature, and propose the Integrated mean canonical statistic (IMCS) method for estimating log NC ratios for binary MRFs. The IMCS method falls within the framework of the newly identified path sampling methods of Gelman & Meng (1998) and outperforms ISMC, MCMC and RLR. It also does not rely on simplifying assumptions, such as ignoring spatio-temporal dependence in the process. A thorough investigation is made of the application of IMCS to the three-parameter Autologistic model. This work introduces background computations required for the full implementation of the four-tier model in Chapter 7. Two different extensions of the three-tier model to a four-tier version are investigated. The first extension incorporates temporal dependence in the underlying spatio-temporal process. The second extensions allows the successes and failures in the data layer to depend on time. The MCMC computational method is extended to incorporate the extra layer. A major contribution of the thesis is the development of a fully Bayesian approach to inference for these hierarchical models for the first time. Note: The author of this thesis has agreed to make it open access but invites people downloading the thesis to send her an email via the 'Contact Author' function.
Resumo:
Expert knowledge is valuable in many modelling endeavours, particularly where data is not extensive or sufficiently robust. In Bayesian statistics, expert opinion may be formulated as informative priors, to provide an honest reflection of the current state of knowledge, before updating this with new information. Technology is increasingly being exploited to help support the process of eliciting such information. This paper reviews the benefits that have been gained from utilizing technology in this way. These benefits can be structured within a six-step elicitation design framework proposed recently (Low Choy et al., 2009). We assume that the purpose of elicitation is to formulate a Bayesian statistical prior, either to provide a standalone expert-defined model, or for updating new data within a Bayesian analysis. We also assume that the model has been pre-specified before selecting the software. In this case, technology has the most to offer to: targeting what experts know (E2), eliciting and encoding expert opinions (E4), whilst enhancing accuracy (E5), and providing an effective and efficient protocol (E6). Benefits include: -providing an environment with familiar nuances (to make the expert comfortable) where experts can explore their knowledge from various perspectives (E2); -automating tedious or repetitive tasks, thereby minimizing calculation errors, as well as encouraging interaction between elicitors and experts (E5); -cognitive gains by educating users, enabling instant feedback (E2, E4-E5), and providing alternative methods of communicating assessments and feedback information, since experts think and learn differently; and -ensuring a repeatable and transparent protocol is used (E6).
Resumo:
Statistical modeling of traffic crashes has been of interest to researchers for decades. Over the most recent decade many crash models have accounted for extra-variation in crash counts—variation over and above that accounted for by the Poisson density. The extra-variation – or dispersion – is theorized to capture unaccounted for variation in crashes across sites. The majority of studies have assumed fixed dispersion parameters in over-dispersed crash models—tantamount to assuming that unaccounted for variation is proportional to the expected crash count. Miaou and Lord [Miaou, S.P., Lord, D., 2003. Modeling traffic crash-flow relationships for intersections: dispersion parameter, functional form, and Bayes versus empirical Bayes methods. Transport. Res. Rec. 1840, 31–40] challenged the fixed dispersion parameter assumption, and examined various dispersion parameter relationships when modeling urban signalized intersection accidents in Toronto. They suggested that further work is needed to determine the appropriateness of the findings for rural as well as other intersection types, to corroborate their findings, and to explore alternative dispersion functions. This study builds upon the work of Miaou and Lord, with exploration of additional dispersion functions, the use of an independent data set, and presents an opportunity to corroborate their findings. Data from Georgia are used in this study. A Bayesian modeling approach with non-informative priors is adopted, using sampling-based estimation via Markov Chain Monte Carlo (MCMC) and the Gibbs sampler. A total of eight model specifications were developed; four of them employed traffic flows as explanatory factors in mean structure while the remainder of them included geometric factors in addition to major and minor road traffic flows. The models were compared and contrasted using the significance of coefficients, standard deviance, chi-square goodness-of-fit, and deviance information criteria (DIC) statistics. The findings indicate that the modeling of the dispersion parameter, which essentially explains the extra-variance structure, depends greatly on how the mean structure is modeled. In the presence of a well-defined mean function, the extra-variance structure generally becomes insignificant, i.e. the variance structure is a simple function of the mean. It appears that extra-variation is a function of covariates when the mean structure (expected crash count) is poorly specified and suffers from omitted variables. In contrast, when sufficient explanatory variables are used to model the mean (expected crash count), extra-Poisson variation is not significantly related to these variables. If these results are generalizable, they suggest that model specification may be improved by testing extra-variation functions for significance. They also suggest that known influences of expected crash counts are likely to be different than factors that might help to explain unaccounted for variation in crashes across sites
Resumo:
This paper describes the formalization and application of a methodology to evaluate the safety benefit of countermeasures in the face of uncertainty. To illustrate the methodology, 18 countermeasures for improving safety of at grade railroad crossings (AGRXs) in the Republic of Korea are considered. Akin to “stated preference” methods in travel survey research, the methodology applies random selection and laws of large numbers to derive accident modification factor (AMF) densities from expert opinions. In a full Bayesian analysis framework, the collective opinions in the form of AMF densities (data likelihood) are combined with prior knowledge (AMF density priors) for the 18 countermeasures to obtain ‘best’ estimates of AMFs (AMF posterior credible intervals). The countermeasures are then compared and recommended based on the largest safety returns with minimum risk (uncertainty). To the author's knowledge the complete methodology is new and has not previously been applied or reported in the literature. The results demonstrate that the methodology is able to discern anticipated safety benefit differences across candidate countermeasures. For the 18 at grade railroad crossings considered in this analysis, it was found that the top three performing countermeasures for reducing crashes are in-vehicle warning systems, obstacle detection systems, and constant warning time systems.