892 resultados para Sampling Bias
Resumo:
Monthly zonal mean climatologies of atmospheric measurements from satellite instruments can have biases due to the nonuniform sampling of the atmosphere by the instruments. We characterize potential sampling biases in stratospheric trace gas climatologies of the Stratospheric Processes and Their Role in Climate (SPARC) Data Initiative using chemical fields from a chemistry climate model simulation and sampling patterns from 16 satellite-borne instruments. The exercise is performed for the long-lived stratospheric trace gases O3 and H2O. Monthly sampling biases for O3 exceed 10% for many instruments in the high-latitude stratosphere and in the upper troposphere/lower stratosphere, while annual mean sampling biases reach values of up to 20% in the same regions for some instruments. Sampling biases for H2O are generally smaller than for O3, although still notable in the upper troposphere/lower stratosphere and Southern Hemisphere high latitudes. The most important mechanism leading to monthly sampling bias is nonuniform temporal sampling, i.e., the fact that for many instruments, monthly means are produced from measurements which span less than the full month in question. Similarly, annual mean sampling biases are well explained by nonuniformity in the month-to-month sampling by different instruments. Nonuniform sampling in latitude and longitude are shown to also lead to nonnegligible sampling biases, which are most relevant for climatologies which are otherwise free of biases due to nonuniform temporal sampling.
Resumo:
In this paper we determine the extent to which host-mediated mutations and a known sampling bias affect evolutionary studies of human influenza A. Previous phylogenetic reconstruction of influenza A (H3N2) evolution using the hemagglutinin gene revealed an excess of nonsilent substitutions assigned to the terminal branches of the tree. We investigate two hypotheses to explain this observation. The first hypothesis is that the excess reflects mutations that were either not present or were at low frequency in the viral sample isolated from its human host, and that these mutations increased in frequency during passage of the virus in embryonated eggs. A set of 22 codons known to undergo such “host-mediated” mutations showed a significant excess of mutations assigned to branches attaching sequences from egg-cultured (as opposed to cell-cultured) isolates to the tree. Our second hypothesis is that the remaining excess results from sampling bias. Influenza surveillance is purposefully biased toward sequencing antigenically dissimilar strains in an effort to identify new variants that may signal the need to update the vaccine. This bias produces an excess of mutations assigned to terminal branches simply because an isolate with no close relatives is by definition attached to the tree by a relatively long branch. Simulations show that the magnitude of excess mutations we observed in the hemagglutinin tree is consistent with expectations based on our sampling protocol. Sampling bias does not affect inferences about evolution drawn from phylogenetic analyses. However, if possible, the excess caused by host-mediated mutations should be removed from studies of the evolution of influenza viruses as they replicate in their human hosts.
Resumo:
Several deterministic and probabilistic methods are used to evaluate the probability of seismically induced liquefaction of a soil. The probabilistic models usually possess some uncertainty in that model and uncertainties in the parameters used to develop that model. These model uncertainties vary from one statistical model to another. Most of the model uncertainties are epistemic, and can be addressed through appropriate knowledge of the statistical model. One such epistemic model uncertainty in evaluating liquefaction potential using a probabilistic model such as logistic regression is sampling bias. Sampling bias is the difference between the class distribution in the sample used for developing the statistical model and the true population distribution of liquefaction and non-liquefaction instances. Recent studies have shown that sampling bias can significantly affect the predicted probability using a statistical model. To address this epistemic uncertainty, a new approach was developed for evaluating the probability of seismically-induced soil liquefaction, in which a logistic regression model in combination with Hosmer-Lemeshow statistic was used. This approach was used to estimate the population (true) distribution of liquefaction to non-liquefaction instances of standard penetration test (SPT) and cone penetration test (CPT) based most updated case histories. Apart from this, other model uncertainties such as distribution of explanatory variables and significance of explanatory variables were also addressed using KS test and Wald statistic respectively. Moreover, based on estimated population distribution, logistic regression equations were proposed to calculate the probability of liquefaction for both SPT and CPT based case history. Additionally, the proposed probability curves were compared with existing probability curves based on SPT and CPT case histories.
Resumo:
Roadside surveys such as the Breeding Bird Survey (BBS) are widely used to assess the relative abundance of bird populations. The accuracy of roadside surveys depends on the extent to which surveys from roads represent the entire region under study. We quantified roadside land cover sampling bias in Tennessee, USA, by comparing land cover proportions near roads to proportions of the surrounding region. Roadside surveys gave a biased estimate of patterns across the region because some land cover types were over- or underrepresented near roads. These biases changed over time, introducing varying levels of distortion into the data. We constructed simulated population trends for five bird species of management interest based on these measured roadside sampling biases and on field data on bird abundance. These simulations indicated that roadside surveys may give overly negative assessments of the population trends of early successional birds and of synanthropic birds, but not of late-successional birds. Because roadside surveys are the primary source of avian population trend information in North America, we conclude that these surveys should be corrected for roadside land cover sampling bias. In addition, current recommendations about the need to create more early successional habitat for birds may need reassessment in the light of the undersampling of this habitat by roads.
Resumo:
Most empirical disciplines promote the reuse and sharing of datasets, as it leads to greater possibility of replication. While this is increasingly the case in Empirical Software Engineering, some of the most popular bug-fix datasets are now known to be biased. This raises two significants concerns: first, that sample bias may lead to underperforming prediction models, and second, that the external validity of the studies based on biased datasets may be suspect. This issue has raised considerable consternation in the ESE literature in recent years. However, there is a confounding factor of these datasets that has not been examined carefully: size. Biased datasets are sampling only some of the data that could be sampled, and doing so in a biased fashion; but biased samples could be smaller, or larger. Smaller data sets in general provide less reliable bases for estimating models, and thus could lead to inferior model performance. In this setting, we ask the question, what affects performance more? bias, or size? We conduct a detailed, large-scale meta-analysis, using simulated datasets sampled with bias from a high-quality dataset which is relatively free of bias. Our results suggest that size always matters just as much bias direction, and in fact much more than bias direction when considering information-retrieval measures such as AUC and F-score. This indicates that at least for prediction models, even when dealing with sampling bias, simply finding larger samples can sometimes be sufficient. Our analysis also exposes the complexity of the bias issue, and raises further issues to be explored in the future.
Resumo:
Copyright © 2014 The Authors. Methods in Ecology and Evolution © 2014 British Ecological Society.
Resumo:
Background: Although combination antiretroviral therapy (cART) dramatically reduces rates of AIDS and death, a minority of patients experience clinical disease progression during treatment. <p>Objective: To investigate whether detection of CXCR4(X4)-specific strains or quantification of X4-specific HIV-1 load predict clinical outcome. Methods: From the Swiss HIV Cohort Study, 96 participants who initiated cART yet subsequently progressed to AIDS or death were compared with 84 contemporaneous, treated nonprogressors. A sensitive heteroduplex tracking assay was developed to quantify plasma X4 and CCR5 variants and resolve HIV-1 load into coreceptor-specific components. Measurements were analyzed as cofactors of progression in multivariable Cox models adjusted for concurrent CD4 cell count and total viral load, applying inverse probability weights to adjust for sampling bias. Results: Patients with X4 variants at baseline displayed reduced CD4 cell responses compared with those without X4 strains (40 versus 82 cells/mu l; P= 0.012). The adjusted multivariable hazard ratio (HR) for clinical progression was 4.8 [95% confidence interval (Cl) 2.3-10.0] for those demonstrating X4 strains at baseline. The X4-specific HIV-1 load was a similarly independent predictor, with HR values of 3.7(95%Cl, 1.2-11.3) and 5.9 (95% Cl, 2.2-15.0) for baseline loads of 2.2-4.3 and > 4.3 log(10)copies/ml, respectively, compared with < 2.2 log(10)copies/ml. Conclusions: HIV-1 coreceptor usage and X4-specific viral loads strongly predicted disease progression during cART, independent of and in addition to CD4 cell count or total viral load. Detection and quantification of X4 strains promise to be clinically useful biomarkers to guide patient management and study HIV-1 pathogenesis.
Resumo:
Skeletal muscle mitochondrial (Mito) and lipid droplet (Lipid) content are often measured in human translational studies. Stereological point counting allows computing Mito and Lipid volume density (Vd) from micrographs taken with transmission electron microscopes. Former studies are not specific as to the size of individual squares that make up the grids, making reproducibility difficult, particularly when different magnifications are used. Our objective was to determine which size grid would be best at predicting fractional volume efficiently without sacrificing reliability and to test a novel method to reduce sampling bias. Methods: ten subjects underwent vastus lateralis biopsies. Samples were fixed, embedded, and cut longitudinally in ultrathin sections of 60 nm. Twenty micrographs from the intramyofibrillar region were taken per subject at Ã-33,000 magnification. Different grid sizes were superimposed on each micrograph: 1,000 Ã- 1,000 nm, 500 Ã- 500 nm, and 250 Ã- 250 nm. Results: mean Mito and Lipid Vd were not statistically different across grids. Variability was greater when going from 1,000 Ã- 1,000 to 500 Ã- 500 nm grid than from 500 Ã- 500 to 250 Ã- 250 nm grid. Discussion: this study is the first to attempt to standardize grid size while keeping with the conventional stereology principles. This is all in hopes of producing replicable assessments that can be obtained universally across different studies looking at human skeletal muscle mitochondrial and lipid droplet content.
Resumo:
Objective: Small nodal tumor infiltrates are identified by applying multilevel sectioning and immunohistochemistry (IHC) in addition to H&E (hematoxylin and eosin) stains of resected lymph nodes. However, the use of multilevel sectioning and IHC is very time-consuming and costly. The current standard analysis of lymph nodes in colon cancer patients is based on one slide per lymph node stained by H&E. A new molecular diagnostic system called ''One tep Nucleic Acid Amplification'' (OSNA) was designed for a more accurate detection of lymph node metastases. The objective of the present investigation was to compare the performance ofOSNAto current standard histology (H&E). We hypothesize that OSNA provides a better staging than the routine use of one slide H&E per lymph node.Methods: From 22 colon cancer patients 307 frozen lymph nodes were used to compare OSNA with H&E. The lymph nodes were cut into halves. One half of the lymph node was analyzed by OSNA. The semi-automated OSNA uses amplification of reverse-transcribed cytokeratin19 (CK19) mRNA directly from the homogenate. The remaining tissue was dedicated to histology, with 5 levels of H&E and IHC staining (CK19).Results: On routine evaluation of oneH&Eslide 7 patients were nodal positive (macro-metastases). All these patients were recognized by OSNA analysis as being positive (sensitivity 100%). Two of the remaining 15 patients had lymph node micro-metastases and 9 isolated tumor cells. For the patients with micrometastases both H&E and OSNA were positive in 1 of the 2 patients. For patients with isolated tumor cells, H&E was positive in 1/9 cases whereas OSNA was positive in 3/9 patients (IHC as a reference). There was only one case to be described as IHC negative/OSNA positive. On the basis of single lymph nodes the sensitivity of OSNA and the 5 levels of H&E and IHC was 94・5%.Conclusion: OSNA is a novel molecular tool for the detection of lymph node metastases in colon cancer patients which provides better staging compared to the current standard evaluation of one slide H&E stain. Since the use of OSNA allows the analysis of the whole lymph node, sampling bias and undetected tumor deposits due to uninvestigated material will be overcome. OSNA improves staging in colon cancer patients and may replace the current standard of H&E staining in the future.
Resumo:
Introduction: Approximately one fifth of stage I and II colon cancer patients will suffer from recurrent disease. This is partly due to the presence of small nodal tumour infiltrates, which are undetected by standard histopathology using Haematoxylin & Eosin (H&E) staining on one slice and thus may not receive beneficial adjuvant therapy. A new diagnostic, semi-automatic system, called one-step nucleic acid amplification (OSNA), was recently designed for the detection of cytokeratin 19 (CK19) mRNA as a surrogate for lymph node metastases. The objective of the present investigation was to compare the performance of OSNA with both standard H&E as well as intensive histopathologic analyses in the detection of colon cancer lymph node micro- and macro-metastases.Methods: In this prospective study 313 lymph nodes from 22 consecutive stage I - III colon cancer patients were assessed. Half of each lymph node was analysed initially based on one slice of H&E followed by an intensive histologic work-up (5 levels of H&E and immuno-histochemistry staining for each slice), the other half was analysed using OSNA.Results: All OSNA results were available after less than 40 minutes. Fifty-one lymph nodes were positive and 246 lymph nodes negative with both OSNA and standard H&E. OSNA was more sensitive to detect small nodal tumor infiltrates compared to H&E (11 OSNA pos. /H&E neg.). Compared to intensive histopathologic analyses, OSNA had a sensitivity of 94.5% and a specificity of 97.6% to detect lymph node micro- and macro-metastases with a concordance rate of 97.1%. An upstaging due to OSNA was found in 2/13 (15.3%) initially node negative colon cancer patients.Conclusion: OSNA appears to be a powerful and promising molecular tool for the detection of lymph node macro- and micro-metastases in colon cancer patients. OSNA has a similar performance in the detection of micro- and macro-metastases compared to intensive histopathologic investigations and appears to be superior to standard histology with H&E. Since the use of OSNA allows the analysis of the whole lymph node, the problem of sampling bias and undetected tumor deposits due to uninvestigated material will be overcome in the future and OSNA may thus improve staging in colon cancer patients. It is hoped that this improved staging will lead to better patient selection for adjuvant therapy and consecutively improved local and distant control as well as better overall survival.
Resumo:
Complex networks obtained from real-world networks are often characterized by incompleteness and noise, consequences of imperfect sampling as well as artifacts in the acquisition process. Because the characterization, analysis and modeling of complex systems underlain by complex networks are critically affected by the quality and completeness of the respective initial structures, it becomes imperative to devise methodologies for identifying and quantifying the effects of the sampling on the network structure. One way to evaluate these effects is through an analysis of the sensitivity of complex network measurements to perturbations in the topology of the network. In this paper, measurement sensibility is quantified in terms of the relative entropy of the respective distributions. Three particularly important kinds of progressive perturbations to the network are considered, namely, edge suppression, addition and rewiring. The measurements allowing the best balance of stability (smaller sensitivity to perturbations) and discriminability (separation between different network topologies) are identified with respect to each type of perturbation. Such an analysis includes eight different measurements applied on six different complex networks models and three real-world networks. This approach allows one to choose the appropriate measurements in order to obtain accurate results for networks where sampling bias cannot be avoided-a very frequent situation in research on complex networks.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Mining operations around the world make extensive use of blasthole sampling for short-term planning, which has two undisputed advantages: (1) blastholes are closely spaced providing relatively high sampling density per ton, and (2) there is no additional cost since the blastholes must be drilled anyway. However, blasthole sampling usually presents poor sampling precision, and the inconstant sampling bias caused by particle size and density segregation is an even more serious problem, generally precluding representativeness. One of the main causes of this bias is a highly varying loss of fines, which can lead to both under- and over-estimation of grade depending on the ore type and the gangue. This study validates a new, modified sectorial sampler, designed to reduce the loss of fines and thereby increase sampling accuracy for narrow-diameter blasthole sampling. First results show a significantly improved estimation of gold grade as well as the minimization of the loss of fines.
Resumo:
The Atlantic Forest is an excellent case study for the elevational diversity of birds, and some inventories along elevational gradients have been carried out in Brazil. Since none of these studies explain the patterns of species richness with elevation, we herein review all Brazilian studies on bird elevational diversity, and test a geometric constraint null model that predicts a unimodal species-altitude curve, the Mid-domain Effect (MDE). We searched for bird inventories in the literature and also analysed our own survey data using limited-radius point counts along an 800 m elevational gradient in the state of São Paulo, Brazil. We found 10 investigations of elevational diversity of Atlantic Forest birds and identified five different elevational patterns: monotonic decreasing diversity, constant at low elevations, constant at low elevations but increasing towards the middle, and two undescribed patterns for Atlantic Forest birds, trough-shaped and increasing diversity. The average MDE fit was low (r² = 0.31) and none of the MDE predictions were robust across all gradients. Those studies with good MDE model fits had obvious sampling bias. Although it has been proposed that the MDE may be positively associated with the elevational diversity of birds, it does not fit the Brazilian Atlantic Forest bird elevational diversity.
Resumo:
Background Serologic testing algorithms for recent HIV seroconversion (STARHS) provide important information for HIV surveillance. We have previously demonstrated that a patient's antibody reaction pattern in a confirmatory line immunoassay (INNO-LIA™ HIV I/II Score) provides information on the duration of infection, which is unaffected by clinical, immunological and viral variables. In this report we have set out to determine the diagnostic performance of Inno-Lia algorithms for identifying incident infections in patients with known duration of infection and evaluated the algorithms in annual cohorts of HIV notifications. Methods Diagnostic sensitivity was determined in 527 treatment-naive patients infected for up to 12 months. Specificity was determined in 740 patients infected for longer than 12 months. Plasma was tested by Inno-Lia and classified as either incident (< = 12 m) or older infection by 26 different algorithms. Incident infection rates (IIR) were calculated based on diagnostic sensitivity and specificity of each algorithm and the rule that the total of incident results is the sum of true-incident and false-incident results, which can be calculated by means of the pre-determined sensitivity and specificity. Results The 10 best algorithms had a mean raw sensitivity of 59.4% and a mean specificity of 95.1%. Adjustment for overrepresentation of patients in the first quarter year of infection further reduced the sensitivity. In the preferred model, the mean adjusted sensitivity was 37.4%. Application of the 10 best algorithms to four annual cohorts of HIV-1 notifications totalling 2'595 patients yielded a mean IIR of 0.35 in 2005/6 (baseline) and of 0.45, 0.42 and 0.35 in 2008, 2009 and 2010, respectively. The increase between baseline and 2008 and the ensuing decreases were highly significant. Other adjustment models yielded different absolute IIR, although the relative changes between the cohorts were identical for all models. Conclusions The method can be used for comparing IIR in annual cohorts of HIV notifications. The use of several different algorithms in combination, each with its own sensitivity and specificity to detect incident infection, is advisable as this reduces the impact of individual imperfections stemming primarily from relatively low sensitivities and sampling bias.