886 resultados para Sampling bias
Resumo:
Monthly zonal mean climatologies of atmospheric measurements from satellite instruments can have biases due to the nonuniform sampling of the atmosphere by the instruments. We characterize potential sampling biases in stratospheric trace gas climatologies of the Stratospheric Processes and Their Role in Climate (SPARC) Data Initiative using chemical fields from a chemistry climate model simulation and sampling patterns from 16 satellite-borne instruments. The exercise is performed for the long-lived stratospheric trace gases O3 and H2O. Monthly sampling biases for O3 exceed 10% for many instruments in the high-latitude stratosphere and in the upper troposphere/lower stratosphere, while annual mean sampling biases reach values of up to 20% in the same regions for some instruments. Sampling biases for H2O are generally smaller than for O3, although still notable in the upper troposphere/lower stratosphere and Southern Hemisphere high latitudes. The most important mechanism leading to monthly sampling bias is nonuniform temporal sampling, i.e., the fact that for many instruments, monthly means are produced from measurements which span less than the full month in question. Similarly, annual mean sampling biases are well explained by nonuniformity in the month-to-month sampling by different instruments. Nonuniform sampling in latitude and longitude are shown to also lead to nonnegligible sampling biases, which are most relevant for climatologies which are otherwise free of biases due to nonuniform temporal sampling.
Resumo:
In this paper we determine the extent to which host-mediated mutations and a known sampling bias affect evolutionary studies of human influenza A. Previous phylogenetic reconstruction of influenza A (H3N2) evolution using the hemagglutinin gene revealed an excess of nonsilent substitutions assigned to the terminal branches of the tree. We investigate two hypotheses to explain this observation. The first hypothesis is that the excess reflects mutations that were either not present or were at low frequency in the viral sample isolated from its human host, and that these mutations increased in frequency during passage of the virus in embryonated eggs. A set of 22 codons known to undergo such “host-mediated” mutations showed a significant excess of mutations assigned to branches attaching sequences from egg-cultured (as opposed to cell-cultured) isolates to the tree. Our second hypothesis is that the remaining excess results from sampling bias. Influenza surveillance is purposefully biased toward sequencing antigenically dissimilar strains in an effort to identify new variants that may signal the need to update the vaccine. This bias produces an excess of mutations assigned to terminal branches simply because an isolate with no close relatives is by definition attached to the tree by a relatively long branch. Simulations show that the magnitude of excess mutations we observed in the hemagglutinin tree is consistent with expectations based on our sampling protocol. Sampling bias does not affect inferences about evolution drawn from phylogenetic analyses. However, if possible, the excess caused by host-mediated mutations should be removed from studies of the evolution of influenza viruses as they replicate in their human hosts.
Resumo:
Several deterministic and probabilistic methods are used to evaluate the probability of seismically induced liquefaction of a soil. The probabilistic models usually possess some uncertainty in that model and uncertainties in the parameters used to develop that model. These model uncertainties vary from one statistical model to another. Most of the model uncertainties are epistemic, and can be addressed through appropriate knowledge of the statistical model. One such epistemic model uncertainty in evaluating liquefaction potential using a probabilistic model such as logistic regression is sampling bias. Sampling bias is the difference between the class distribution in the sample used for developing the statistical model and the true population distribution of liquefaction and non-liquefaction instances. Recent studies have shown that sampling bias can significantly affect the predicted probability using a statistical model. To address this epistemic uncertainty, a new approach was developed for evaluating the probability of seismically-induced soil liquefaction, in which a logistic regression model in combination with Hosmer-Lemeshow statistic was used. This approach was used to estimate the population (true) distribution of liquefaction to non-liquefaction instances of standard penetration test (SPT) and cone penetration test (CPT) based most updated case histories. Apart from this, other model uncertainties such as distribution of explanatory variables and significance of explanatory variables were also addressed using KS test and Wald statistic respectively. Moreover, based on estimated population distribution, logistic regression equations were proposed to calculate the probability of liquefaction for both SPT and CPT based case history. Additionally, the proposed probability curves were compared with existing probability curves based on SPT and CPT case histories.
Resumo:
We consider estimating the total load from frequent flow data but less frequent concentration data. There are numerous load estimation methods available, some of which are captured in various online tools. However, most estimators are subject to large biases statistically, and their associated uncertainties are often not reported. This makes interpretation difficult and the estimation of trends or determination of optimal sampling regimes impossible to assess. In this paper, we first propose two indices for measuring the extent of sampling bias, and then provide steps for obtaining reliable load estimates that minimizes the biases and makes use of informative predictive variables. The key step to this approach is in the development of an appropriate predictive model for concentration. This is achieved using a generalized rating-curve approach with additional predictors that capture unique features in the flow data, such as the concept of the first flush, the location of the event on the hydrograph (e.g. rise or fall) and the discounted flow. The latter may be thought of as a measure of constituent exhaustion occurring during flood events. Forming this additional information can significantly improve the predictability of concentration, and ultimately the precision with which the pollutant load is estimated. We also provide a measure of the standard error of the load estimate which incorporates model, spatial and/or temporal errors. This method also has the capacity to incorporate measurement error incurred through the sampling of flow. We illustrate this approach for two rivers delivering to the Great Barrier Reef, Queensland, Australia. One is a data set from the Burdekin River, and consists of the total suspended sediment (TSS) and nitrogen oxide (NO(x)) and gauged flow for 1997. The other dataset is from the Tully River, for the period of July 2000 to June 2008. For NO(x) Burdekin, the new estimates are very similar to the ratio estimates even when there is no relationship between the concentration and the flow. However, for the Tully dataset, by incorporating the additional predictive variables namely the discounted flow and flow phases (rising or recessing), we substantially improved the model fit, and thus the certainty with which the load is estimated.
Resumo:
Considerable attention has been focused on the properties of graphs derived from Internet measurements. Router-level topologies collected via traceroute studies have led some authors to conclude that the router graph of the Internet is a scale-free graph, or more generally a power-law random graph. In such a graph, the degree distribution of nodes follows a distribution with a power-law tail. In this paper we argue that the evidence to date for this conclusion is at best insufficient. We show that graphs appearing to have power-law degree distributions can arise surprisingly easily, when sampling graphs whose true degree distribution is not at all like a power-law. For example, given a classical Erdös-Rényi sparse, random graph, the subgraph formed by a collection of shortest paths from a small set of random sources to a larger set of random destinations can easily appear to show a degree distribution remarkably like a power-law. We explore the reasons for how this effect arises, and show that in such a setting, edges are sampled in a highly biased manner. This insight allows us to distinguish measurements taken from the Erdös-Rényi graphs from those taken from power-law random graphs. When we apply this distinction to a number of well-known datasets, we find that the evidence for sampling bias in these datasets is strong.
Resumo:
Roadside surveys such as the Breeding Bird Survey (BBS) are widely used to assess the relative abundance of bird populations. The accuracy of roadside surveys depends on the extent to which surveys from roads represent the entire region under study. We quantified roadside land cover sampling bias in Tennessee, USA, by comparing land cover proportions near roads to proportions of the surrounding region. Roadside surveys gave a biased estimate of patterns across the region because some land cover types were over- or underrepresented near roads. These biases changed over time, introducing varying levels of distortion into the data. We constructed simulated population trends for five bird species of management interest based on these measured roadside sampling biases and on field data on bird abundance. These simulations indicated that roadside surveys may give overly negative assessments of the population trends of early successional birds and of synanthropic birds, but not of late-successional birds. Because roadside surveys are the primary source of avian population trend information in North America, we conclude that these surveys should be corrected for roadside land cover sampling bias. In addition, current recommendations about the need to create more early successional habitat for birds may need reassessment in the light of the undersampling of this habitat by roads.
Resumo:
Most empirical disciplines promote the reuse and sharing of datasets, as it leads to greater possibility of replication. While this is increasingly the case in Empirical Software Engineering, some of the most popular bug-fix datasets are now known to be biased. This raises two significants concerns: first, that sample bias may lead to underperforming prediction models, and second, that the external validity of the studies based on biased datasets may be suspect. This issue has raised considerable consternation in the ESE literature in recent years. However, there is a confounding factor of these datasets that has not been examined carefully: size. Biased datasets are sampling only some of the data that could be sampled, and doing so in a biased fashion; but biased samples could be smaller, or larger. Smaller data sets in general provide less reliable bases for estimating models, and thus could lead to inferior model performance. In this setting, we ask the question, what affects performance more? bias, or size? We conduct a detailed, large-scale meta-analysis, using simulated datasets sampled with bias from a high-quality dataset which is relatively free of bias. Our results suggest that size always matters just as much bias direction, and in fact much more than bias direction when considering information-retrieval measures such as AUC and F-score. This indicates that at least for prediction models, even when dealing with sampling bias, simply finding larger samples can sometimes be sufficient. Our analysis also exposes the complexity of the bias issue, and raises further issues to be explored in the future.
Resumo:
The life history strategies of massive Porites corals make them a valuable resource not only as key providers of reef structure, but also as recorders of past environmental change. Yet recent documented evidence of an unprecedented increase in the frequency of mortality in Porites warrants investigation into the history of mortality and associated drivers. To achieve this, both an accurate chronology and an understanding of the life history strategies of Porites are necessary. Sixty-two individual Uranium–Thorium (U–Th) dates from 50 dead massive Porites colonies from the central inshore region of the Great Barrier Reef (GBR) revealed the timing of mortality to have occurred predominantly over two main periods from 1989.2 ± 4.1 to 2001.4 ± 4.1, and from 2006.4 ± 1.8 to 2008.4 ± 2.2 A.D., with a small number of colonies dating earlier. Overall, the peak ages of mortality are significantly correlated with maximum sea-surface temperature anomalies. Despite potential sampling bias, the frequency of mortality increased dramatically post-1980. These observations are similar to the results reported for the Southern South China Sea. High resolution measurements of Sr/Ca and Mg/Ca obtained from a well preserved sample that died in 1994.6 ± 2.3 revealed that the time of death occurred at the peak of sea surface temperatures (SST) during the austral summer. In contrast, Sr/Ca and Mg/Ca analysis in two colonies dated to 2006.9 ± 3.0 and 2008.3 ± 2.0, suggest that both died after the austral winter. An increase in Sr/Ca ratios and the presence of low Mg-calcite cements (as determined by SEM and elemental ratio analysis) in one of the colonies was attributed to stressful conditions that may have persisted for some time prior to mortality. For both colonies, however, the timing of mortality coincides with the 4th and 6th largest flood events reported for the Burdekin River in the past 60 years, implying that factors associated with terrestrial runoff may have been responsible for mortality. Our results show that a combination of U–Th and elemental ratio geochemistry can potentially be used to precisely and accurately determine the timing and season of mortality in modern massive Porites corals. For reefs where long-term monitoring data are absent, the ability to reconstruct historical events in coral communities may prove useful to reef managers by providing some baseline knowledge on disturbance history and associated drivers.
Resumo:
There are numerous load estimation methods available, some of which are captured in various online tools. However, most estimators are subject to large biases statistically, and their associated uncertainties are often not reported. This makes interpretation difficult and the estimation of trends or determination of optimal sampling regimes impossible to assess. In this paper, we first propose two indices for measuring the extent of sampling bias, and then provide steps for obtaining reliable load estimates by minimizing the biases and making use of possible predictive variables. The load estimation procedure can be summarized by the following four steps: - (i) output the flow rates at regular time intervals (e.g. 10 minutes) using a time series model that captures all the peak flows; - (ii) output the predicted flow rates as in (i) at the concentration sampling times, if the corresponding flow rates are not collected; - (iii) establish a predictive model for the concentration data, which incorporates all possible predictor variables and output the predicted concentrations at the regular time intervals as in (i), and; - (iv) obtain the sum of all the products of the predicted flow and the predicted concentration over the regular time intervals to represent an estimate of the load. The key step to this approach is in the development of an appropriate predictive model for concentration. This is achieved using a generalized regression (rating-curve) approach with additional predictors that capture unique features in the flow data, namely the concept of the first flush, the location of the event on the hydrograph (e.g. rise or fall) and cumulative discounted flow. The latter may be thought of as a measure of constituent exhaustion occurring during flood events. The model also has the capacity to accommodate autocorrelation in model errors which are the result of intensive sampling during floods. Incorporating this additional information can significantly improve the predictability of concentration, and ultimately the precision with which the pollutant load is estimated. We also provide a measure of the standard error of the load estimate which incorporates model, spatial and/or temporal errors. This method also has the capacity to incorporate measurement error incurred through the sampling of flow. We illustrate this approach using the concentrations of total suspended sediment (TSS) and nitrogen oxide (NOx) and gauged flow data from the Burdekin River, a catchment delivering to the Great Barrier Reef. The sampling biases for NOx concentrations range from 2 to 10 times indicating severe biases. As we expect, the traditional average and extrapolation methods produce much higher estimates than those when bias in sampling is taken into account.
Resumo:
Top-predators have been reported to have an important role in structuring food webs and maintaining ecological processes for the benefit of biodiversity at lower trophic levels. This is thought to be achieved through their suppressive effects on sympatric mesopredators and prey. Great scientific and public interest surrounds the potential use of top-predators as biodiversity conservation tools, and it can often be difficult to separate what we think we know and what we really know about their ecological utility. Not all the claims made about the ecological roles of top-predators can be substantiated by current evidence. We review the methodology underpinning empirical data on the ecological roles of Australian dingoes (Canis lupus dingo and hybrids) to provide a comprehensive and objective benchmark for knowledge of the ecological roles of Australia's largest terrestrial predator. From a wide variety of methodological flaws, sampling bias, and experimental design constraints inherent to 38 of the 40 field studies we assessed, we demonstrate that there is presently unreliable and inconclusive evidence for dingoes role as a biodiversity regulator. We also discuss the widespread (both taxonomically and geographically) and direct negative effects of dingoes to native fauna, and the few robust studies investigating their positive roles. In light of the highly variable and context-specific impacts of dingoes on faunal biodiversity and the inconclusive state of the literature, we strongly caution against the positive management of dingoes in the absence of a supporting evidence-base for such action.
Resumo:
Although only recently described, Colletotrichum boninense is well established in literature as an anthracnose pathogen or endophyte of a diverse range of host plants worldwide. It is especially prominent on members of Amaryllidaceae, Orchidaceae, Proteaceae and Solanaceae. Reports from literature and preliminary studies using ITS sequence data indicated that C. boninense represents a species complex. A multilocus molecular phylogenetic analysis (ITS, ACT, TUB2, CHS-1, GAPDH, HIS3, CAL) of 86 strains previously identified as C. boninense and other related strains revealed 18 clades. These clades are recognised here as separate species, including C. boninense s. str., C. hippeastri, C. karstii and 12 previously undescribed species, C. annellatum, C. beeveri, C. brassicicola, C. brasiliense, C. colombiense, C. constrictum, C. cymbidiicola, C. dacrycarpi, C. novae-zelandiae, C. oncidii, C. parsonsiae and C. torulosum. Seven of the new species are only known from New Zealand, perhaps reflecting a sampling bias. The new combination C. phyllanthi was made, and C. dracaenae Petch was epitypified and the name replaced with C. petchii. Typical for species of the C. boninense species complex are the conidiogenous cells with rather prominent periclinal thickening that also sometimes extend to form a new conidiogenous locus or annellations as well as conidia that have a prominent basal scar. Many species in the C. boninense complex form teleomorphs in culture. TAXONOMIC NOVELTIES: New combination - Colletotrichum phyllanthi (H. Surendranath Pai) Damm, P.F. Cannon & Crous. Name replacement - C. petchii Damm, P.F. Cannon & Crous. New species - C. annellatum Damm, P.F. Cannon & Crous, C. beeveri Damm, P.F. Cannon, Crous, P.R. Johnst. & B. Weir, C. brassicicola Damm, P.F. Cannon & Crous, C. brasiliense Damm, P.F. Cannon, Crous & Massola, C. colombiense Damm, P.F. Cannon, Crous, C. constrictum Damm, P.F. Cannon, Crous, P.R. Johnst. & B. Weir, C. cymbidiicola Damm, P.F. Cannon, Crous, P.R. Johnst. & B. Weir, C. dacrycarpi Damm, P.F. Cannon, Crous, P.R. Johnst. & B. Weir, C. novae-zelandiae Damm, P.F. Cannon, Crous, P.R. Johnst. & B. Weir, C. oncidii Damm, P.F. Cannon & Crous, C. parsonsiae Damm, P.F. Cannon, Crous, P.R. Johnst. & B. Weir, C. torulosum Damm, P.F. Cannon, Crous, P.R. Johnst. & B. Weir. Typifications: Epitypifications - C. dracaenae Petch.
Resumo:
Previous work has determined the age distribution from a sample of spotted dolphins (Stenella attenuata) killed in the eastern Pacific tuna purse-seine fishery. In this paper we examine the usefulness of this age distribution for estimating natural mortality rates. The observed age
distribution has a deficiency of individuals from 5-15 years and cannot represent a stable age distribution. Sampling bias and errors in age interpretation are examined as possible causes of the "dip" in the observed age structure. Natural mortality rates are estimated for the 15+ age classes based on the assumption that these are sampled representatively. The resulting annual survival rate
Resumo:
Errors in growth estimates can affect drastically the spawner-perrecruit threshold used to recommend quotas for commercial fish catches. Growth parameters for sablefish (Anoplopoma fimbria) in Alaska have not been updated for stock assessment purposes for more than 20 years, although aging of sablefish has continued. In this study, length-stratified data (1981–93 data from the annual longline survey conducted cooperatively by the Fisheries Agency of Japan and the Alaska Fisheries Science Center of the National Marine Fisheries Service) were updated and corrected for discovered sampling bias. In addition, more recent, randomly collected samples (1996–2004 data from the annual longline survey conducted by the Alaska Fisheries Science Center) were analyzed and new length-at-age and weight-at-age parameters were estimated. Results were similar between this analysis with length-at-age data from 1981 to 2004 and analysis with updated longline survey data through 2010; therefore, we used our initial results from analysis done with data through 2004. We found that, because of a stratified sampling scheme, growth estimates of sablefish were overestimated with the older data (1981–93), and growth parameters used in the Alaskan sablefish assessment model were, thus, too large. In addition, a comparison of the bias-corrected 1981–93 data and the 1996–2004 data showed that, in more recent years, sablefish grew larger and growth differed among regions. The updated growth information improves the fit of the data to the sablefish stock assessment model with biologically reasonable results. These findings indicate that when the updated growth data (1996–2004) are used in the existing sablefish assessment model, estimates of fishing mortality increase slightly and estimates of female spawning biomass decrease slightly. This study provides evidence of the importance of periodically revisiting biological parameter estimates, especially as data accumulate, because the addition of more recent data often will be more biologically realistic. In addition, it exemplifies the importance of correcting biases from sampling that may contribute to erroneous parameter estimates.
Resumo:
Fishery-independent estimates of spawning biomass (BSP) of the Pacific sardine (Sardinops sagax) on the south and lower west coasts of Western Australia (WA) were obtained periodically between 1991 and 1999 by using the daily egg production method (DEPM). Ichthyoplankton data collected during these surveys, specifically the presence or absence of S. sagax eggs, were used to investigate trends in the spawning area of S. sagax within each of four regions. The expectation was that trends in BSP and spawning area were positively related. With the DEPM model, estimates of BSP will change proportionally with spawning area if all other variables remain constant. The proportion of positive stations (PPS), i.e., stations with nonzero egg counts — an objective estimator of spawning area — was high for all south coast regions during the early 1990s (a period when the estimated BSP was also high) and then decreased after the mid-1990s. There was a decrease in PPS from the mid-1990s to 1999. The particularly low estimates in 1999 followed a severe epidemic mass mortality of S. sagax throughout their range across southern Australia. Deviations from the expected relationship between BSP and PPS were used to identify uncertainty around estimates of BSP. Because estimation of spawning area is subject to less sampling bias than estimation of BSP, the deviation in the relation between the two provides an objective basis for adjusting some estimates of the latter. Such an approach is particularly useful for fisheries management purposes when sampling problems are suspected to be present. The analysis of PPS undertaken from the same set of samples from which the DEPM estimate is derived will help provide information for stock assessments and for the management of purse-seine fisheries.
Resumo:
The problem of bias in female petrale sole age and length-at-maturity relationships caused by sampling from spawning aggregations was investigated. Samples were collected prior to aggregation, and histological methods were used to determine maturity status. Mature and immature fish were classified by inspecting oocytes for the presence of yolk in September, when substantial divergence in yolked and unyolked oocyte diameters had been observed. Comparison of macroscopic and microscopic assessment of maturity showed that maturity status cannot be determined accurately by using macroscopic inspection during the summer. Female petrale sole from the central Oregon coast were 50% mature at 33 cm and 5 years of age. Comparison of data from our study with data used in recent petrale sole stock assessments showed that both sampling bias and the use of samples from sea-sons when status cannot be accurately determined have likely caused errors in fitted maturity relationships.