878 resultados para Presence-absence Data
Resumo:
This dissertation is primarily an applied statistical modelling investigation, motivated by a case study comprising real data and real questions. Theoretical questions on modelling and computation of normalization constants arose from pursuit of these data analytic questions. The essence of the thesis can be described as follows. Consider binary data observed on a two-dimensional lattice. A common problem with such data is the ambiguity of zeroes recorded. These may represent zero response given some threshold (presence) or that the threshold has not been triggered (absence). Suppose that the researcher wishes to estimate the effects of covariates on the binary responses, whilst taking into account underlying spatial variation, which is itself of some interest. This situation arises in many contexts and the dingo, cypress and toad case studies described in the motivation chapter are examples of this. Two main approaches to modelling and inference are investigated in this thesis. The first is frequentist and based on generalized linear models, with spatial variation modelled by using a block structure or by smoothing the residuals spatially. The EM algorithm can be used to obtain point estimates, coupled with bootstrapping or asymptotic MLE estimates for standard errors. The second approach is Bayesian and based on a three- or four-tier hierarchical model, comprising a logistic regression with covariates for the data layer, a binary Markov Random field (MRF) for the underlying spatial process, and suitable priors for parameters in these main models. The three-parameter autologistic model is a particular MRF of interest. Markov chain Monte Carlo (MCMC) methods comprising hybrid Metropolis/Gibbs samplers is suitable for computation in this situation. Model performance can be gauged by MCMC diagnostics. Model choice can be assessed by incorporating another tier in the modelling hierarchy. This requires evaluation of a normalization constant, a notoriously difficult problem. Difficulty with estimating the normalization constant for the MRF can be overcome by using a path integral approach, although this is a highly computationally intensive method. Different methods of estimating ratios of normalization constants (N Cs) are investigated, including importance sampling Monte Carlo (ISMC), dependent Monte Carlo based on MCMC simulations (MCMC), and reverse logistic regression (RLR). I develop an idea present though not fully developed in the literature, and propose the Integrated mean canonical statistic (IMCS) method for estimating log NC ratios for binary MRFs. The IMCS method falls within the framework of the newly identified path sampling methods of Gelman & Meng (1998) and outperforms ISMC, MCMC and RLR. It also does not rely on simplifying assumptions, such as ignoring spatio-temporal dependence in the process. A thorough investigation is made of the application of IMCS to the three-parameter Autologistic model. This work introduces background computations required for the full implementation of the four-tier model in Chapter 7. Two different extensions of the three-tier model to a four-tier version are investigated. The first extension incorporates temporal dependence in the underlying spatio-temporal process. The second extensions allows the successes and failures in the data layer to depend on time. The MCMC computational method is extended to incorporate the extra layer. A major contribution of the thesis is the development of a fully Bayesian approach to inference for these hierarchical models for the first time. Note: The author of this thesis has agreed to make it open access but invites people downloading the thesis to send her an email via the 'Contact Author' function.
Resumo:
1. Little consensus has been reached as to general features of spatial variation in beta diversity, a fundamental component of species diversity. This could reflect a genuine lack of simple gradients in beta diversity, or a lack of agreement as to just what constitutes beta diversity. Unfortunately, a large number of approaches have been applied to the investigation of variation in beta diversity, which potentially makes comparisons of the findings difficult.
2. We review 24 measures of beta diversity for presence/absence data (the most frequent form of data to which such measures are applied) that have been employed in the literature, express many of them for the first time in common terms, and compare some of their basic properties.
3. Four groups of measures are distinguished, with a fundamental distinction arising between 'broad sense' measures incorporating differences in composition attributable to species richness gradients, and 'narrow sense' measures that focus on compositional differences independent of such gradients. On a number of occasions on which the former have been employed in the literature the latter may have been more appropriate, and there are many situations in which consideration of both kinds of measures would be valuable.
4. We particularly recommend (i) considering beta diversity measures in terms of matching/mismatching components (usually denoted a , b and c) and thereby identifying the contribution of different sources of variation in species composition, and (ii) the use of ternary plots to express the relationship between the values of these measures and of the components, and as a way of understanding patterns in beta diversity.
Resumo:
Effective detection of population trend is crucial for managing threatened species. Little theory exists, however, to assist managers in choosing the most cost-effective monitoring techniques for diagnosing trend. We present a framework for determining the optimal monitoring strategy by simulating a manager collecting data on a declining species, the Chestnut-rumped Hylacola (Hylacola pyrrhopygia parkeri), to determine whether the species should be listed under the IUCN (World Conservation Union) Red List. We compared the efficiencies of two strategies for detecting trend, abundance, and presence-absence surveys, underfinancial constraints. One might expect the abundance surveys to be superior under all circumstances because more information is collected at each site. Nevertheless, the presence-absence data can be collected at more sites because the surveyor is not obliged to spend a fixed amount of time at each site. The optimal strategy for monitoring was very dependent on the budget available. Under some circumstances, presence-absence surveys outperformed abundance surveys for diagnosing the IUCN Red List categories cost-effectively. Abundance surveys were best if the species was expected to be recorded more than 16 times/year; otherwise, presence-absence surveys were best. The relationship between the strategies we investigated is likely to be relevant for many comparisons of presence-absence or abundance data. Managers of any cryptic or low-density species who hope to maximize their success of estimating trend should find an application for our results.
Resumo:
As age-diagnostic fossils are rare in the Middle to Upper Jurassic sedimentary succession of Gebel Maghara, North Sinai, Egypt, and in order to ensure maximal stratigraphic resolution, chronostratigraphic boundaries were determined based on quantitative biostratigraphy. A data matrix comprising 231 macrofaunal taxa in 93 samples from four sections has been processed with the Unitary Association (UA) Method. This led to construction of a sequence of 29 UAs (maximal sets of actually or virtually coexisting taxa), which have been grouped into 14 laterally reproducible association zones. The UA method allowed an in-depth analysis of the stratigraphically conflicting taxa, enabled the biostratigraphic subdivision of the studied interval, and also provided stratigraphic correlation among the measured sections and with the Tethyan ammonite zones.
Resumo:
This paper describes techniques to estimate the worst case execution time of executable code on architectures with data caches. The underlying mechanism is Abstract Interpretation, which is used for the dual purposes of tracking address computations and cache behavior. A simultaneous numeric and pointer analysis using an abstraction for discrete sets of values computes safe approximations of access addresses which are then used to predict cache behavior using Must Analysis. A heuristic is also proposed which generates likely worst case estimates. It can be used in soft real time systems and also for reasoning about the tightness of the safe estimate. The analysis methods can handle programs with non-affine access patterns, for which conventional Presburger Arithmetic formulations or Cache Miss Equations do not apply. The precision of the estimates is user-controlled and can be traded off against analysis time. Executables are analyzed directly, which, apart from enhancing precision, renders the method language independent.
Resumo:
We consider the impact of data revisions on the forecast performance of a SETAR regime-switching model of U.S. output growth. The impact of data uncertainty in real-time forecasting will affect a model's forecast performance via the effect on the model parameter estimates as well as via the forecast being conditioned on data measured with error. We find that benchmark revisions do affect the performance of the non-linear model of the growth rate, and that the performance relative to a linear comparator deteriorates in real-time compared to a pseudo out-of-sample forecasting exercise.
Resumo:
We examine how the accuracy of real-time forecasts from models that include autoregressive terms can be improved by estimating the models on ‘lightly revised’ data instead of using data from the latest-available vintage. The benefits of estimating autoregressive models on lightly revised data are related to the nature of the data revision process and the underlying process for the true values. Empirically, we find improvements in root mean square forecasting error of 2–4% when forecasting output growth and inflation with univariate models, and of 8% with multivariate models. We show that multiple-vintage models, which explicitly model data revisions, require large estimation samples to deliver competitive forecasts. Copyright © 2012 John Wiley & Sons, Ltd.
Resumo:
Plant biosecurity requires statistical tools to interpret field surveillance data in order to manage pest incursions that threaten crop production and trade. Ultimately, management decisions need to be based on the probability that an area is infested or free of a pest. Current informal approaches to delimiting pest extent rely upon expert ecological interpretation of presence / absence data over space and time. Hierarchical Bayesian models provide a cohesive statistical framework that can formally integrate the available information on both pest ecology and data. The overarching method involves constructing an observation model for the surveillance data, conditional on the hidden extent of the pest and uncertain detection sensitivity. The extent of the pest is then modelled as a dynamic invasion process that includes uncertainty in ecological parameters. Modelling approaches to assimilate this information are explored through case studies on spiralling whitefly, Aleurodicus dispersus and red banded mango caterpillar, Deanolis sublimbalis. Markov chain Monte Carlo simulation is used to estimate the probable extent of pests, given the observation and process model conditioned by surveillance data. Statistical methods, based on time-to-event models, are developed to apply hierarchical Bayesian models to early detection programs and to demonstrate area freedom from pests. The value of early detection surveillance programs is demonstrated through an application to interpret surveillance data for exotic plant pests with uncertain spread rates. The model suggests that typical early detection programs provide a moderate reduction in the probability of an area being infested but a dramatic reduction in the expected area of incursions at a given time. Estimates of spiralling whitefly extent are examined at local, district and state-wide scales. The local model estimates the rate of natural spread and the influence of host architecture, host suitability and inspector efficiency. These parameter estimates can support the development of robust surveillance programs. Hierarchical Bayesian models for the human-mediated spread of spiralling whitefly are developed for the colonisation of discrete cells connected by a modified gravity model. By estimating dispersal parameters, the model can be used to predict the extent of the pest over time. An extended model predicts the climate restricted distribution of the pest in Queensland. These novel human-mediated movement models are well suited to demonstrating area freedom at coarse spatio-temporal scales. At finer scales, and in the presence of ecological complexity, exploratory models are developed to investigate the capacity for surveillance information to estimate the extent of red banded mango caterpillar. It is apparent that excessive uncertainty about observation and ecological parameters can impose limits on inference at the scales required for effective management of response programs. The thesis contributes novel statistical approaches to estimating the extent of pests and develops applications to assist decision-making across a range of plant biosecurity surveillance activities. Hierarchical Bayesian modelling is demonstrated as both a useful analytical tool for estimating pest extent and a natural investigative paradigm for developing and focussing biosecurity programs.
Resumo:
Early detection surveillance programs aim to find invasions of exotic plant pests and diseases before they are too widespread to eradicate. However, the value of these programs can be difficult to justify when no positive detections are made. To demonstrate the value of pest absence information provided by these programs, we use a hierarchical Bayesian framework to model estimates of incursion extent with and without surveillance. A model for the latent invasion process provides the baseline against which surveillance data are assessed. Ecological knowledge and pest management criteria are introduced into the model using informative priors for invasion parameters. Observation models assimilate information from spatio-temporal presence/absence data to accommodate imperfect detection and generate posterior estimates of pest extent. When applied to an early detection program operating in Queensland, Australia, the framework demonstrates that this typical surveillance regime provides a modest reduction in the estimate that a surveyed district is infested. More importantly, the model suggests that early detection surveillance programs can provide a dramatic reduction in the putative area of incursion and therefore offer a substantial benefit to incursion management. By mapping spatial estimates of the point probability of infestation, the model identifies where future surveillance resources can be most effectively deployed.
Resumo:
Early detection surveillance programs aim to find invasions of exotic plant pests and diseases before they are too widespread to eradicate. However, the value of these programs can be difficult to justify when no positive detections are made. To demonstrate the value of pest absence information provided by these programs, we use a hierarchical Bayesian framework to model estimates of incursion extent with and without surveillance. A model for the latent invasion process provides the baseline against which surveillance data are assessed. Ecological knowledge and pest management criteria are introduced into the model using informative priors for invasion parameters. Observation models assimilate information from spatio-temporal presence/absence data to accommodate imperfect detection and generate posterior estimates of pest extent. When applied to an early detection program operating in Queensland, Australia, the framework demonstrates that this typical surveillance regime provides a modest reduction in the estimate that a surveyed district is infested. More importantly, the model suggests that early detection surveillance programs can provide a dramatic reduction in the putative area of incursion and therefore offer a substantial benefit to incursion management. By mapping spatial estimates of the point probability of infestation, the model identifies where future surveillance resources can be most effectively deployed.
Assessment of insect occurrence in boreal forests based on satellite imagery and field measurements.
Resumo:
The presence/absence data of twenty-seven forest insect taxa (e.g. Retinia resinella, Formica spp., Pissodes spp., several scolytids) and recorded environmental variation were used to investigate the applicability of modelling insect occurrence based on satellite imagery. The sampling was based on 1800 sample plots (25 m by 25 m) placed along the sides of 30 equilateral triangles (side 1 km) in a fragmented forest area (approximately 100 km2) in Evo, S Finland. The triangles were overlaid on land use maps interpreted from satellite images (Landsat TM 30 m multispectral scanner imagery 1991) and digitized geological maps. Insect occurrence was explained using either environmental variables measured in the field or those interpreted from the land use and geological maps. The fit of logistic regression models varied between species, possibly because some species may be associated with the characteristics of single trees while other species with stand characteristics. The occurrence of certain insect species at least, especially those associated with Scots pine, could be relatively accurately assessed indirectly on the basis of satellite imagery and geological maps. Models based on both remotely sensed and geological data better predicted the distribution of forest insects except in the case of Xylechinus pilosus, Dryocoetes sp. and Trypodendron lineatum, where the differences were relatively small in favour of the models based on field measurements. The number of species was related to habitat compartment size and distance from the habitat edge calculated from the land use maps, but logistic regressions suggested that other environmental variables in general masked the effect of these variables in species occurrence at the present scale.
Resumo:
In the absence of information on species in decline with contracting ranges, management should emphasize remaining populations and protection of their habitats. Threatened by anthropogenic pressure including habitat degradation and loss, sloth bears (Melursus ursinus) in India have become limited in range, habitat, and population size. We identified ecological and anthropogenic determinants of occurrence within an occupancy framework to evaluate habitat suitability of non-protected regions (with sloth bears) in northeastern Karnataka, India. We employed a systematic sampling methodology to yield presence absence data to examine a priori hypotheses of determinants that affected occupancy. These covariates were broadly classified as habitat or anthropogenic factors. Mean number of termite mounds and trees positively influenced sloth bear occupancy, and grazing pressure expounded by mean number of livestock dung affected it negatively. Also, mean percentage of shrub coverage had no impact on bear inhabitance. The best fitting model further predicted habitats in Bukkasagara, Agoli, and Benakal reserved forests to have 38%, 75%, and 88%, respectively, of their sampled grid cells with high occupancies (>0.70) albeit little or no legal protection. We recommend a conservation strategy that includes protection of vegetation stand-structure, maintenance of soil moisture, and enrichment of habitat for the long-term welfare of this species.