954 resultados para Presence-only data
Resumo:
The present data set was used as a training set for a Habitat Suitability Model. It contains occurrence (presence-only) of living Lophelia pertusa reefs in the Irish continental margin, which were assembled from databases, cruise reports and publications. A total of 4423 records were inspected and quality assessed to ensure that they (1) represented confirmed living L. pertusa reefs (so excluding 2900 records of dead and isolated coral colony records); (2) were derived from sampling equipment that allows for accurate (<200 m) geo-referencing (so excluding 620 records derived mainly from trawling and dredging activities); and (3) were not duplicated. A total of 245 occurrences were retained for the analysis. Coral observations are highly clustered in regions targeted by research expeditions, which might lead to falsely inflated model evaluation measures (Veloz, 2009). Therefore, we coarsened the distribution data by deleting all but one record within grid cells of 0.02° resolution (Davies & Guinotte 2011). The remaining 53 points were subject to a spatial cross-validation process: a random presence point was chosen, grouped with its 12 closest neighbour presence points based on Euclidean distance and withheld from model training. This process was repeated for all records, resulting in 53 replicates of spatially non-overlapping sets of test (n=13) and training (n=40) data. The final 53 occurrence records were used for model training.
Resumo:
This dissertation is primarily an applied statistical modelling investigation, motivated by a case study comprising real data and real questions. Theoretical questions on modelling and computation of normalization constants arose from pursuit of these data analytic questions. The essence of the thesis can be described as follows. Consider binary data observed on a two-dimensional lattice. A common problem with such data is the ambiguity of zeroes recorded. These may represent zero response given some threshold (presence) or that the threshold has not been triggered (absence). Suppose that the researcher wishes to estimate the effects of covariates on the binary responses, whilst taking into account underlying spatial variation, which is itself of some interest. This situation arises in many contexts and the dingo, cypress and toad case studies described in the motivation chapter are examples of this. Two main approaches to modelling and inference are investigated in this thesis. The first is frequentist and based on generalized linear models, with spatial variation modelled by using a block structure or by smoothing the residuals spatially. The EM algorithm can be used to obtain point estimates, coupled with bootstrapping or asymptotic MLE estimates for standard errors. The second approach is Bayesian and based on a three- or four-tier hierarchical model, comprising a logistic regression with covariates for the data layer, a binary Markov Random field (MRF) for the underlying spatial process, and suitable priors for parameters in these main models. The three-parameter autologistic model is a particular MRF of interest. Markov chain Monte Carlo (MCMC) methods comprising hybrid Metropolis/Gibbs samplers is suitable for computation in this situation. Model performance can be gauged by MCMC diagnostics. Model choice can be assessed by incorporating another tier in the modelling hierarchy. This requires evaluation of a normalization constant, a notoriously difficult problem. Difficulty with estimating the normalization constant for the MRF can be overcome by using a path integral approach, although this is a highly computationally intensive method. Different methods of estimating ratios of normalization constants (N Cs) are investigated, including importance sampling Monte Carlo (ISMC), dependent Monte Carlo based on MCMC simulations (MCMC), and reverse logistic regression (RLR). I develop an idea present though not fully developed in the literature, and propose the Integrated mean canonical statistic (IMCS) method for estimating log NC ratios for binary MRFs. The IMCS method falls within the framework of the newly identified path sampling methods of Gelman & Meng (1998) and outperforms ISMC, MCMC and RLR. It also does not rely on simplifying assumptions, such as ignoring spatio-temporal dependence in the process. A thorough investigation is made of the application of IMCS to the three-parameter Autologistic model. This work introduces background computations required for the full implementation of the four-tier model in Chapter 7. Two different extensions of the three-tier model to a four-tier version are investigated. The first extension incorporates temporal dependence in the underlying spatio-temporal process. The second extensions allows the successes and failures in the data layer to depend on time. The MCMC computational method is extended to incorporate the extra layer. A major contribution of the thesis is the development of a fully Bayesian approach to inference for these hierarchical models for the first time. Note: The author of this thesis has agreed to make it open access but invites people downloading the thesis to send her an email via the 'Contact Author' function.
Resumo:
Long-term systematic population monitoring data sets are rare but are essential in identifying changes in species abundance. In contrast, community groups and natural history organizations have collected many species lists. These represent a large, untapped source of information on changes in abundance but are generally considered of little value. The major problem with using species lists to detect population changes is that the amount of effort used to obtain the list is often uncontrolled and usually unknown. It has been suggested that using the number of species on the list, the "list length," can be a measure of effort. This paper significantly extends the utility of Franklin's approach using Bayesian logistic regression. We demonstrate the value of List Length Analysis to model changes in species prevalence (i.e., the proportion of lists on which the species occurs) using bird lists collected by a local bird club over 40 years around Brisbane, southeast Queensland, Australia. We estimate the magnitude and certainty of change for 269 bird species and calculate the probabilities that there have been declines and increases of given magnitudes. List Length Analysis confirmed suspected species declines and increases. This method is an important complement to systematically designed intensive monitoring schemes and provides a means of utilizing data that may otherwise be deemed useless. The results of List Length Analysis can be used for targeting species of conservation concern for listing purposes or for more intensive monitoring. While Bayesian methods are not essential for List Length Analysis, they can offer more flexibility in interrogating the data and are able to provide a range of parameters that are easy to interpret and can facilitate conservation listing and prioritization. © 2010 by the Ecological Society of America.
Resumo:
This paper describes techniques to estimate the worst case execution time of executable code on architectures with data caches. The underlying mechanism is Abstract Interpretation, which is used for the dual purposes of tracking address computations and cache behavior. A simultaneous numeric and pointer analysis using an abstraction for discrete sets of values computes safe approximations of access addresses which are then used to predict cache behavior using Must Analysis. A heuristic is also proposed which generates likely worst case estimates. It can be used in soft real time systems and also for reasoning about the tightness of the safe estimate. The analysis methods can handle programs with non-affine access patterns, for which conventional Presburger Arithmetic formulations or Cache Miss Equations do not apply. The precision of the estimates is user-controlled and can be traded off against analysis time. Executables are analyzed directly, which, apart from enhancing precision, renders the method language independent.
Resumo:
1. Little consensus has been reached as to general features of spatial variation in beta diversity, a fundamental component of species diversity. This could reflect a genuine lack of simple gradients in beta diversity, or a lack of agreement as to just what constitutes beta diversity. Unfortunately, a large number of approaches have been applied to the investigation of variation in beta diversity, which potentially makes comparisons of the findings difficult.
2. We review 24 measures of beta diversity for presence/absence data (the most frequent form of data to which such measures are applied) that have been employed in the literature, express many of them for the first time in common terms, and compare some of their basic properties.
3. Four groups of measures are distinguished, with a fundamental distinction arising between 'broad sense' measures incorporating differences in composition attributable to species richness gradients, and 'narrow sense' measures that focus on compositional differences independent of such gradients. On a number of occasions on which the former have been employed in the literature the latter may have been more appropriate, and there are many situations in which consideration of both kinds of measures would be valuable.
4. We particularly recommend (i) considering beta diversity measures in terms of matching/mismatching components (usually denoted a , b and c) and thereby identifying the contribution of different sources of variation in species composition, and (ii) the use of ternary plots to express the relationship between the values of these measures and of the components, and as a way of understanding patterns in beta diversity.
Resumo:
We consider the impact of data revisions on the forecast performance of a SETAR regime-switching model of U.S. output growth. The impact of data uncertainty in real-time forecasting will affect a model's forecast performance via the effect on the model parameter estimates as well as via the forecast being conditioned on data measured with error. We find that benchmark revisions do affect the performance of the non-linear model of the growth rate, and that the performance relative to a linear comparator deteriorates in real-time compared to a pseudo out-of-sample forecasting exercise.
Resumo:
We examine how the accuracy of real-time forecasts from models that include autoregressive terms can be improved by estimating the models on ‘lightly revised’ data instead of using data from the latest-available vintage. The benefits of estimating autoregressive models on lightly revised data are related to the nature of the data revision process and the underlying process for the true values. Empirically, we find improvements in root mean square forecasting error of 2–4% when forecasting output growth and inflation with univariate models, and of 8% with multivariate models. We show that multiple-vintage models, which explicitly model data revisions, require large estimation samples to deliver competitive forecasts. Copyright © 2012 John Wiley & Sons, Ltd.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Cualquier estructura vibra según unas frecuencias propias definidas por sus parámetros modales (frecuencias naturales, amortiguamientos y formas modales). A través de las mediciones de la vibración en puntos clave de la estructura, los parámetros modales pueden ser estimados. En estructuras civiles, es difícil excitar una estructura de manera controlada, por lo tanto, las técnicas que implican la estimación de los parámetros modales sólo registrando su respuesta son de vital importancia para este tipo de estructuras. Esta técnica se conoce como Análisis Modal Operacional (OMA). La técnica del OMA no necesita excitar artificialmente la estructura, atendiendo únicamente a su comportamiento en servicio. La motivación para llevar a cabo pruebas de OMA surge en el campo de la Ingeniería Civil, debido a que excitar artificialmente con éxito grandes estructuras no sólo resulta difícil y costoso, sino que puede incluso dañarse la estructura. Su importancia reside en que el comportamiento global de una estructura está directamente relacionado con sus parámetros modales, y cualquier variación de rigidez, masa o condiciones de apoyo, aunque sean locales, quedan reflejadas en los parámetros modales. Por lo tanto, esta identificación puede integrarse en un sistema de vigilancia de la integridad estructural. La principal dificultad para el uso de los parámetros modales estimados mediante OMA son las incertidumbres asociadas a este proceso de estimación. Existen incertidumbres en el valor de los parámetros modales asociadas al proceso de cálculo (internos) y también asociadas a la influencia de los factores ambientales (externas), como es la temperatura. Este Trabajo Fin de Máster analiza estas dos fuentes de incertidumbre. Es decir, en primer lugar, para una estructura de laboratorio, se estudian y cuantifican las incertidumbres asociadas al programa de OMA utilizado. En segundo lugar, para una estructura en servicio (una pasarela de banda tesa), se estudian tanto el efecto del programa OMA como la influencia del factor ambiental en la estimación de los parámetros modales. Más concretamente, se ha propuesto un método para hacer un seguimiento de las frecuencias naturales de un mismo modo. Este método incluye un modelo de regresión lineal múltiple que permite eliminar la influencia de estos agentes externos. A structure vibrates according to some of its vibration modes, defined by their modal parameters (natural frequencies, damping ratios and modal shapes). Through the measurements of the vibration at key points of the structure, the modal parameters can be estimated. In civil engineering structures, it is difficult to excite structures in a controlled manner, thus, techniques involving output-only modal estimation are of vital importance for these structure. This techniques are known as Operational Modal Analysis (OMA). The OMA technique does not need to excite artificially the structure, this considers its behavior in service only. The motivation for carrying out OMA tests arises in the area of Civil Engineering, because successfully artificially excite large structures is difficult and expensive. It also may even damage the structure. The main goal is that the global behavior of a structure is directly related to their modal parameters, and any variation of stiffness, mass or support conditions, although it is local, is also reflected in the modal parameters. Therefore, this identification may be within a Structural Health Monitoring system. The main difficulty for using the modal parameters estimated by an OMA is the uncertainties associated to this estimation process. Thus, there are uncertainties in the value of the modal parameters associated to the computing process (internal) and the influence of environmental factors (external), such as the temperature. This Master’s Thesis analyzes these two sources of uncertainties. That is, firstly, for a lab structure, the uncertainties associated to the OMA program used are studied and quantified. Secondly, for an in-service structure (a stress-ribbon footbridge), both the effect of the OMA program and the influence of environmental factor on the modal parameters estimation are studied. More concretely, a method to track natural frequencies of the same mode has been proposed. This method includes a multiple linear regression model that allows to remove the influence of these external agents.
Resumo:
Tese de Doutoramento em Ciências do Mar, especialidade em Ecologia Marinha.
Resumo:
Documenting changes in distribution is necessary for understanding species' response to environmental changes, but data on species distributions are heterogeneous in accuracy and resolution. Combining different data sources and methodological approaches can fill gaps in knowledge about the dynamic processes driving changes in species-rich, but data-poor regions. We combined recent bird survey data from the Neotropical Biodiversity Mapping Initiative (NeoMaps) with historical distribution records to estimate potential changes in the distribution of eight species of Amazon parrots in Venezuela. Using environmental covariates and presence-only data from museum collections and the literature, we first used maximum likelihood to fit a species distribution model (SDM) estimating a historical maximum probability of occurrence for each species. We then used recent, NeoMaps survey data to build single-season occupancy models (OM) with the same environmental covariates, as well as with time- and effort-dependent detectability, resulting in estimates of the current probability of occurrence. We finally calculated the disagreement between predictions as a matrix of probability of change in the state of occurrence. Our results suggested negative changes for the only restricted, threatened species, Amazona barbadensis, which has been independently confirmed with field studies. Two of the three remaining widespread species that were detected, Amazona amazonica, Amazona ochrocephala, also had a high probability of negative changes in northern Venezuela, but results were not conclusive for Amazona farinosa. The four remaining species were undetected in recent field surveys; three of these were most probably absent from the survey locations (Amazona autumnalis, Amazona mercenaria and Amazona festiva), while a fourth (Amazona dufresniana) requires more intensive targeted sampling to estimate its current status. Our approach is unique in taking full advantage of available, but limited data, and in detecting a high probability of change even for rare and patchily-distributed species. However, it is presently limited to species meeting the strong assumptions required for maximum-likelihood estimation with presence-only data, including very high detectability and representative sampling of its historical distribution.
Resumo:
Fragilariopsis kerguelensis, a dominant diatom species throughout the Antarctic Circumpolar Current, is coined to be one of the main drivers of the biological silicate pump. Here, we study the distribution of this important species and expected consequences of climate change upon it, using correlative species distribution modeling and publicly available presence-only data. As experience with SDM is scarce for marine phytoplankton, this also serves as a pilot study for this organism group. We used the maximum entropy method to calculate distribution models for the diatom F. kerguelensis based on yearly and monthly environmental data (sea surface temperature, salinity, nitrate and silicate concentrations). Observation data were harvested from GBIF and the Global Diatom Database, and for further analyses also from the Hustedt Diatom Collection (BRM). The models were projected on current yearly and seasonal environmental data to study current distribution and its seasonality. Furthermore, we projected the seasonal model on future environmental data obtained from climate models for the year 2100. Projected on current yearly averaged environmental data, all models showed similar distribution patterns for F. kerguelensis. The monthly model showed seasonality, for example, a shift of the southern distribution boundary toward the north in the winter. Projections on future scenarios resulted in a moderately to negligibly shrinking distribution area and a change in seasonality. We found a substantial bias in the publicly available observation datasets, which could be reduced by additional observation records we obtained from the Hustedt Diatom Collection. Present-day distribution patterns inferred from the models coincided well with background knowledge and previous reports about F. kerguelensis distribution, showing that maximum entropy-based distribution models are suitable to map distribution patterns for oceanic planktonic organisms. Our scenario projections indicate moderate effects of climate change upon the biogeography of F. kerguelensis.
Resumo:
Effective detection of population trend is crucial for managing threatened species. Little theory exists, however, to assist managers in choosing the most cost-effective monitoring techniques for diagnosing trend. We present a framework for determining the optimal monitoring strategy by simulating a manager collecting data on a declining species, the Chestnut-rumped Hylacola (Hylacola pyrrhopygia parkeri), to determine whether the species should be listed under the IUCN (World Conservation Union) Red List. We compared the efficiencies of two strategies for detecting trend, abundance, and presence-absence surveys, underfinancial constraints. One might expect the abundance surveys to be superior under all circumstances because more information is collected at each site. Nevertheless, the presence-absence data can be collected at more sites because the surveyor is not obliged to spend a fixed amount of time at each site. The optimal strategy for monitoring was very dependent on the budget available. Under some circumstances, presence-absence surveys outperformed abundance surveys for diagnosing the IUCN Red List categories cost-effectively. Abundance surveys were best if the species was expected to be recorded more than 16 times/year; otherwise, presence-absence surveys were best. The relationship between the strategies we investigated is likely to be relevant for many comparisons of presence-absence or abundance data. Managers of any cryptic or low-density species who hope to maximize their success of estimating trend should find an application for our results.
Resumo:
We examined the taxonomic resolution of zooplankton data required to identify ocean basin scale biogeographic zonation in the Southern Ocean. A 2,154 km transect was completed south of Australia. Sea surface temperature (SST) measured at 1 min intervals showed that seven physical zones were sampled. Zooplankton were collected at a spatial resolution of similar to 9.2 km with a continuous plankton recorder, identified to the highest possible taxonomic resolution and enumerated. Zooplankton assemblage similarity between samples was calculated using the Bray-Curtis index for the taxonomic levels of species, genus, family, order and class after first log(10)(x + 1) (LA) and then presence/absence (PA) transformation of abundance data. Although within and between zone sample similarity increased with decreasing taxonomic resolution, for both data transformations, cluster analysis demonstrated that the biogeographic separation of zones remained at all taxonomic levels when using LA data. ANOSIM confirmed this, detecting significant differences in zooplankton assemblage structure between all seven a priori determined physical zones for all taxonomic levels when using the LA data. In the case of the PA data for the complete data set, and both LA and PA data for a crustacean only data set, no significant differences were detected between zooplankton assemblages in the Polar frontal zone (PFZ) and inter-PFZ at any taxonomic level. Loss of information at resolutions below the species level, particularly in the PA data, prevented the separation of some zones. However, the majority of physical zones were biogeographically distinct from species level to class using both LA and PA transformations. Significant relationships between SST and zooplankton community structure, summarised as NMDS scores, at all taxonomic levels, for both LA and PA transformations, and complete and crustacean only data sets, highlighted the biogeographic relevance of low resolution taxonomic data. The retention of biogeographic information in low taxonomic resolution data shows that data sets collected with different taxonomic resolutions may be meaningfully merged for the post hoc generation of Southern Ocean time series.