968 resultados para Randomization-based Inference
Resumo:
This paper presents a video surveillance framework that robustly and efficiently detects abandoned objects in surveillance scenes. The framework is based on a novel threat assessment algorithm which combines the concept of ownership with automatic understanding of social relations in order to infer abandonment of objects. Implementation is achieved through development of a logic-based inference engine based on Prolog. Threat detection performance is conducted by testing against a range of datasets describing realistic situations and demonstrates a reduction in the number of false alarms generated. The proposed system represents the approach employed in the EU SUBITO project (Surveillance of Unattended Baggage and the Identification and Tracking of the Owner).
Resumo:
Many modern statistical applications involve inference for complex stochastic models, where it is easy to simulate from the models, but impossible to calculate likelihoods. Approximate Bayesian computation (ABC) is a method of inference for such models. It replaces calculation of the likelihood by a step which involves simulating artificial data for different parameter values, and comparing summary statistics of the simulated data with summary statistics of the observed data. Here we show how to construct appropriate summary statistics for ABC in a semi-automatic manner. We aim for summary statistics which will enable inference about certain parameters of interest to be as accurate as possible. Theoretical results show that optimal summary statistics are the posterior means of the parameters. Although these cannot be calculated analytically, we use an extra stage of simulation to estimate how the posterior means vary as a function of the data; and we then use these estimates of our summary statistics within ABC. Empirical results show that our approach is a robust method for choosing summary statistics that can result in substantially more accurate ABC analyses than the ad hoc choices of summary statistics that have been proposed in the literature. We also demonstrate advantages over two alternative methods of simulation-based inference.
Resumo:
We investigated the processes of how adult readers evaluate and revise their situation model during reading by monitoring their eye movements as they read narrative texts and subsequent critical sentences. In each narrative text, a short introduction primed a knowledge-based inference, followed by a target concept that was either expected (e.g., “oven”) or unexpected (e.g., “grill”) in relation to the inferred concept. Eye movements showed that readers detected a mismatch between the new unexpected information and their prior interpretation, confirming their ability to evaluate inferential information. Just below the narrative text, a critical sentence included a target word that was either congruent (e.g., “roasted”) or incongruent (e.g., “barbecued”) with the expected but not the unexpected concept. Readers spent less time reading the congruent than the incongruent target word, reflecting the facilitation of prior information. In addition, when the unexpected (but not expected) concept had been presented, participants with lower verbal (but not visuospatial) working memory span exhibited longer reading times and made more regressions (from the critical sentence to previous information) on encountering congruent information, indicating difficulty in inhibiting their initial incorrect interpretation and revising their situation model
Resumo:
Mixed models may be defined with or without reference to sampling, and can be used to predict realized random effects, as when estimating the latent values of study subjects measured with response error. When the model is specified without reference to sampling, a simple mixed model includes two random variables, one stemming from an exchangeable distribution of latent values of study subjects and the other, from the study subjects` response error distributions. Positive probabilities are assigned to both potentially realizable responses and artificial responses that are not potentially realizable, resulting in artificial latent values. In contrast, finite population mixed models represent the two-stage process of sampling subjects and measuring their responses, where positive probabilities are only assigned to potentially realizable responses. A comparison of the estimators over the same potentially realizable responses indicates that the optimal linear mixed model estimator (the usual best linear unbiased predictor, BLUP) is often (but not always) more accurate than the comparable finite population mixed model estimator (the FPMM BLUP). We examine a simple example and provide the basis for a broader discussion of the role of conditioning, sampling, and model assumptions in developing inference.
Resumo:
Prediction of random effects is an important problem with expanding applications. In the simplest context, the problem corresponds to prediction of the latent value (the mean) of a realized cluster selected via two-stage sampling. Recently, Stanek and Singer [Predicting random effects from finite population clustered samples with response error. J. Amer. Statist. Assoc. 99, 119-130] developed best linear unbiased predictors (BLUP) under a finite population mixed model that outperform BLUPs from mixed models and superpopulation models. Their setup, however, does not allow for unequally sized clusters. To overcome this drawback, we consider an expanded finite population mixed model based on a larger set of random variables that span a higher dimensional space than those typically applied to such problems. We show that BLUPs for linear combinations of the realized cluster means derived under such a model have considerably smaller mean squared error (MSE) than those obtained from mixed models, superpopulation models, and finite population mixed models. We motivate our general approach by an example developed for two-stage cluster sampling and show that it faithfully captures the stochastic aspects of sampling in the problem. We also consider simulation studies to illustrate the increased accuracy of the BLUP obtained under the expanded finite population mixed model. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
We consider methods for estimating causal effects of treatment in the situation where the individuals in the treatment and the control group are self selected, i.e., the selection mechanism is not randomized. In this case, simple comparison of treated and control outcomes will not generally yield valid estimates of casual effects. The propensity score method is frequently used for the evaluation of treatment effect. However, this method is based onsome strong assumptions, which are not directly testable. In this paper, we present an alternative modeling approachto draw causal inference by using share random-effect model and the computational algorithm to draw likelihood based inference with such a model. With small numerical studies and a real data analysis, we show that our approach gives not only more efficient estimates but it is also less sensitive to model misspecifications, which we consider, than the existing methods.
Resumo:
For first-order Horn clauses without equality, resolution is complete with an arbitrary selection of a single literal in each clause [dN 96]. Here we extend this result to the case of clauses with equality for superposition-based inference systems. Our result is a generalization of the result given in [BG 01]. We answer their question about the completeness of a superposition-based system for general clauses with an arbitrary selection strategy, provided there exists a refutation without applications of the factoring inference rule.
Resumo:
1. The crabeater seal Lobodon carcinophaga is considered to be a key species in the krill-based food web of the Southern Ocean. Reliable estimates of the abundance of this species are necessary to allow the development of multispecies, predator–prey models as a basis for management of the krill fishery in the Southern Ocean. 2. A survey of crabeater seal abundance was undertaken in 1500 000 km2 of pack-ice off east Antarctica between longitudes 64–150° E during the austral summer of 1999/2000. Sighting surveys, using double observer line transect methods, were conducted from an icebreaker and two helicopters to estimate the density of seals hauled out on the ice in survey strips. Satellite-linked dive recorders were deployed on a sample of seals to estimate the probability of seals being hauled out on the ice at the times of day when sighting surveys were conducted. Model-based inference, involving fitting a density surface, was used to infer densities in the entire survey region from estimates in the surveyed areas. 3. Crabeater seal abundance was estimated to be between 0.7 and 1.4 million animals (with 95% confidence), with the most likely estimate slightly less than 1 million. 4. Synthesis and applications. The estimation of crabeater seal abundance in Convention for the Conservation of Antarctic Marine Living Resources (CCAMLR) management areas off east Antarctic where krill biomass has also been estimated recently provides the data necessary to begin extending from single-species to multispecies management of the krill fishery. Incorporation of all major sources of uncertainty allows a precautionary interpretation of crabeater abundance and demand for krill in keeping with CCAMLR’s precautionary approach to management. While this study focuses on the crabeater seal and management of living resources in the Southern Ocean, it has also led to technical and theoretical developments in survey methodology that have widespread potential application in ecological and resource management studies, and will contribute to a more fundamental understanding of the structure and function of the Southern Ocean ecosystem.
Resumo:
Fossils of chironomid larvae (non-biting midges) preserved in lake sediments are well-established palaeotemperature indicators which, with the aid of numerical chironomid-based inference models (transfer functions), can provide quantitative estimates of past temperature change. This approach to temperature reconstruction relies on the strong relationship between air and lake surface water temperature and the distribution of individual chironomid taxa (species, species groups, genera) that has been observed in different climate regions (arctic, subarctic, temperate and tropical) in both the Northern and Southern hemisphere. A major complicating factor for the use of chironomids for palaeoclimate reconstruction which increases the uncertainty associated with chironomid-based temperature estimates is that the exact nature of the mechanism responsible for the strong relationship between temperature and chironomid assemblages in lakes remains uncertain. While a number of authors have provided state of the art overviews of fossil chironomid palaeoecology and the use of chironomids for temperature reconstruction, few have focused on examining the ecological basis for this approach. Here, we review the nature of the relationship between chironomids and temperature based on the available ecological evidence. After discussing many of the surveys describing the distribution of chironomid taxa in lake surface sediments in relation to temperature, we also examine evidence from laboratory and field studies exploring the effects of temperature on chironomid physiology, life cycles and behaviour. We show that, even though a direct influence of water temperature on chironomid development, growth and survival is well described, chironomid palaeoclimatology is presently faced with the paradoxical situation that the relationship between chironomid distribution and temperature seems strongest in relatively deep, thermally stratified lakes in temperate and subarctic regions in which the benthic chironomid fauna lives largely decoupled from the direct influence of air and surface water temperature. This finding suggests that indirect effects of temperature on physical and chemical characteristics of lakes play an important role in determining the distribution of lake-living chironomid larvae. However, we also demonstrate that no single indirect mechanism has been identified that can explain the strong relationship between chironomid distribution and temperature in all regions and datasets presently available. This observation contrasts with the previously published hypothesis that climatic effects on lake nutrient status and productivity may be largely responsible for the apparent correlation between chironomid assemblage distribution and temperature. We conclude our review by summarizing the implications of our findings for chironomid-based palaeoclimatology and by pointing towards further avenues of research necessary to improve our mechanistic understanding of the chironomid-temperature relationship.
Resumo:
Professor Sir David R. Cox (DRC) is widely acknowledged as among the most important scientists of the second half of the twentieth century. He inherited the mantle of statistical science from Pearson and Fisher, advanced their ideas, and translated statistical theory into practice so as to forever change the application of statistics in many fields, but especially biology and medicine. The logistic and proportional hazards models he substantially developed, are arguably among the most influential biostatistical methods in current practice. This paper looks forward over the period from DRC's 80th to 90th birthdays, to speculate about the future of biostatistics, drawing lessons from DRC's contributions along the way. We consider "Cox's model" of biostatistics, an approach to statistical science that: formulates scientific questions or quantities in terms of parameters gamma in probability models f(y; gamma) that represent in a parsimonious fashion, the underlying scientific mechanisms (Cox, 1997); partition the parameters gamma = theta, eta into a subset of interest theta and other "nuisance parameters" eta necessary to complete the probability distribution (Cox and Hinkley, 1974); develops methods of inference about the scientific quantities that depend as little as possible upon the nuisance parameters (Barndorff-Nielsen and Cox, 1989); and thinks critically about the appropriate conditional distribution on which to base infrences. We briefly review exciting biomedical and public health challenges that are capable of driving statistical developments in the next decade. We discuss the statistical models and model-based inferences central to the CM approach, contrasting them with computationally-intensive strategies for prediction and inference advocated by Breiman and others (e.g. Breiman, 2001) and to more traditional design-based methods of inference (Fisher, 1935). We discuss the hierarchical (multi-level) model as an example of the future challanges and opportunities for model-based inference. We then consider the role of conditional inference, a second key element of the CM. Recent examples from genetics are used to illustrate these ideas. Finally, the paper examines causal inference and statistical computing, two other topics we believe will be central to biostatistics research and practice in the coming decade. Throughout the paper, we attempt to indicate how DRC's work and the "Cox Model" have set a standard of excellence to which all can aspire in the future.
Resumo:
Genomic alterations have been linked to the development and progression of cancer. The technique of Comparative Genomic Hybridization (CGH) yields data consisting of fluorescence intensity ratios of test and reference DNA samples. The intensity ratios provide information about the number of copies in DNA. Practical issues such as the contamination of tumor cells in tissue specimens and normalization errors necessitate the use of statistics for learning about the genomic alterations from array-CGH data. As increasing amounts of array CGH data become available, there is a growing need for automated algorithms for characterizing genomic profiles. Specifically, there is a need for algorithms that can identify gains and losses in the number of copies based on statistical considerations, rather than merely detect trends in the data. We adopt a Bayesian approach, relying on the hidden Markov model to account for the inherent dependence in the intensity ratios. Posterior inferences are made about gains and losses in copy number. Localized amplifications (associated with oncogene mutations) and deletions (associated with mutations of tumor suppressors) are identified using posterior probabilities. Global trends such as extended regions of altered copy number are detected. Since the posterior distribution is analytically intractable, we implement a Metropolis-within-Gibbs algorithm for efficient simulation-based inference. Publicly available data on pancreatic adenocarcinoma, glioblastoma multiforme and breast cancer are analyzed, and comparisons are made with some widely-used algorithms to illustrate the reliability and success of the technique.
Resumo:
Dynamic changes in ERP topographies can be conveniently analyzed by means of microstates, the so-called "atoms of thoughts", that represent brief periods of quasi-stable synchronized network activation. Comparing temporal microstate features such as on- and offset or duration between groups and conditions therefore allows a precise assessment of the timing of cognitive processes. So far, this has been achieved by assigning the individual time-varying ERP maps to spatially defined microstate templates obtained from clustering the grand mean data into predetermined numbers of topographies (microstate prototypes). Features obtained from these individual assignments were then statistically compared. This has the problem that the individual noise dilutes the match between individual topographies and templates leading to lower statistical power. We therefore propose a randomization-based procedure that works without assigning grand-mean microstate prototypes to individual data. In addition, we propose a new criterion to select the optimal number of microstate prototypes based on cross-validation across subjects. After a formal introduction, the method is applied to a sample data set of an N400 experiment and to simulated data with varying signal-to-noise ratios, and the results are compared to existing methods. In a first comparison with previously employed statistical procedures, the new method showed an increased robustness to noise, and a higher sensitivity for more subtle effects of microstate timing. We conclude that the proposed method is well-suited for the assessment of timing differences in cognitive processes. The increased statistical power allows identifying more subtle effects, which is particularly important in small and scarce patient populations.
Resumo:
Linear- and unimodal-based inference models for mean summer temperatures (partial least squares, weighted averaging, and weighted averaging partial least squares models) were applied to a high-resolution pollen and cladoceran stratigraphy from Gerzensee, Switzerland. The time-window of investigation included the Allerød, the Younger Dryas, and the Preboreal. Characteristic major and minor oscillations in the oxygen-isotope stratigraphy, such as the Gerzensee oscillation, the onset and end of the Younger Dryas stadial, and the Preboreal oscillation, were identified by isotope analysis of bulk-sediment carbonates of the same core and were used as independent indicators for hemispheric or global scale climatic change. In general, the pollen-inferred mean summer temperature reconstruction using all three inference models follows the oxygen-isotope curve more closely than the cladoceran curve. The cladoceran-inferred reconstruction suggests generally warmer summers than the pollen-based reconstructions, which may be an effect of terrestrial vegetation not being in equilibrium with climate due to migrational lags during the Late Glacial and early Holocene. Allerød summer temperatures range between 11 and 12°C based on pollen, whereas the cladoceran-inferred temperatures lie between 11 and 13°C. Pollen and cladocera-inferred reconstructions both suggest a drop to 9–10°C at the beginning of the Younger Dryas. Although the Allerød–Younger Dryas transition lasted 150–160 years in the oxygen-isotope stratigraphy, the pollen-inferred cooling took 180–190 years and the cladoceran-inferred cooling lasted 250–260 years. The pollen-inferred summer temperature rise to 11.5–12°C at the transition from the Younger Dryas to the Preboreal preceded the oxygen-isotope signal by several decades, whereas the cladoceran-inferred warming lagged. Major discrepancies between the pollen- and cladoceran-inference models are observed for the Preboreal, where the cladoceran-inference model suggests mean summer temperatures of up to 14–15°C. Both pollen- and cladoceran-inferred reconstructions suggest a cooling that may be related to the Gerzensee oscillation, but there is no evidence for a cooling synchronous with the Preboreal oscillation as recorded in the oxygen-isotope record. For the Gerzensee oscillation the inferred cooling was ca. 1 and 0.5°C based on pollen and cladocera, respectively, which lies well within the inherent prediction errors of the inference models.
Resumo:
Very large spatially-referenced datasets, for example, those derived from satellite-based sensors which sample across the globe or large monitoring networks of individual sensors, are becoming increasingly common and more widely available for use in environmental decision making. In large or dense sensor networks, huge quantities of data can be collected over small time periods. In many applications the generation of maps, or predictions at specific locations, from the data in (near) real-time is crucial. Geostatistical operations such as interpolation are vital in this map-generation process and in emergency situations, the resulting predictions need to be available almost instantly, so that decision makers can make informed decisions and define risk and evacuation zones. It is also helpful when analysing data in less time critical applications, for example when interacting directly with the data for exploratory analysis, that the algorithms are responsive within a reasonable time frame. Performing geostatistical analysis on such large spatial datasets can present a number of problems, particularly in the case where maximum likelihood. Although the storage requirements only scale linearly with the number of observations in the dataset, the computational complexity in terms of memory and speed, scale quadratically and cubically respectively. Most modern commodity hardware has at least 2 processor cores if not more. Other mechanisms for allowing parallel computation such as Grid based systems are also becoming increasingly commonly available. However, currently there seems to be little interest in exploiting this extra processing power within the context of geostatistics. In this paper we review the existing parallel approaches for geostatistics. By recognising that diffeerent natural parallelisms exist and can be exploited depending on whether the dataset is sparsely or densely sampled with respect to the range of variation, we introduce two contrasting novel implementations of parallel algorithms based on approximating the data likelihood extending the methods of Vecchia [1988] and Tresp [2000]. Using parallel maximum likelihood variogram estimation and parallel prediction algorithms we show that computational time can be significantly reduced. We demonstrate this with both sparsely sampled data and densely sampled data on a variety of architectures ranging from the common dual core processor, found in many modern desktop computers, to large multi-node super computers. To highlight the strengths and weaknesses of the diffeerent methods we employ synthetic data sets and go on to show how the methods allow maximum likelihood based inference on the exhaustive Walker Lake data set.
Resumo:
Heterogeneous datasets arise naturally in most applications due to the use of a variety of sensors and measuring platforms. Such datasets can be heterogeneous in terms of the error characteristics and sensor models. Treating such data is most naturally accomplished using a Bayesian or model-based geostatistical approach; however, such methods generally scale rather badly with the size of dataset, and require computationally expensive Monte Carlo based inference. Recently within the machine learning and spatial statistics communities many papers have explored the potential of reduced rank representations of the covariance matrix, often referred to as projected or fixed rank approaches. In such methods the covariance function of the posterior process is represented by a reduced rank approximation which is chosen such that there is minimal information loss. In this paper a sequential Bayesian framework for inference in such projected processes is presented. The observations are considered one at a time which avoids the need for high dimensional integrals typically required in a Bayesian approach. A C++ library, gptk, which is part of the INTAMAP web service, is introduced which implements projected, sequential estimation and adds several novel features. In particular the library includes the ability to use a generic observation operator, or sensor model, to permit data fusion. It is also possible to cope with a range of observation error characteristics, including non-Gaussian observation errors. Inference for the covariance parameters is explored, including the impact of the projected process approximation on likelihood profiles. We illustrate the projected sequential method in application to synthetic and real datasets. Limitations and extensions are discussed. © 2010 Elsevier Ltd.