123 resultados para Sampling bias
em CentAUR: Central Archive University of Reading - UK
Resumo:
Monthly zonal mean climatologies of atmospheric measurements from satellite instruments can have biases due to the nonuniform sampling of the atmosphere by the instruments. We characterize potential sampling biases in stratospheric trace gas climatologies of the Stratospheric Processes and Their Role in Climate (SPARC) Data Initiative using chemical fields from a chemistry climate model simulation and sampling patterns from 16 satellite-borne instruments. The exercise is performed for the long-lived stratospheric trace gases O3 and H2O. Monthly sampling biases for O3 exceed 10% for many instruments in the high-latitude stratosphere and in the upper troposphere/lower stratosphere, while annual mean sampling biases reach values of up to 20% in the same regions for some instruments. Sampling biases for H2O are generally smaller than for O3, although still notable in the upper troposphere/lower stratosphere and Southern Hemisphere high latitudes. The most important mechanism leading to monthly sampling bias is nonuniform temporal sampling, i.e., the fact that for many instruments, monthly means are produced from measurements which span less than the full month in question. Similarly, annual mean sampling biases are well explained by nonuniformity in the month-to-month sampling by different instruments. Nonuniform sampling in latitude and longitude are shown to also lead to nonnegligible sampling biases, which are most relevant for climatologies which are otherwise free of biases due to nonuniform temporal sampling.
Resumo:
In the Radiative Atmospheric Divergence Using ARM Mobile Facility GERB and AMMA Stations (RADAGAST) project we calculate the divergence of radiative flux across the atmosphere by comparing fluxes measured at each end of an atmospheric column above Niamey, in the African Sahel region. The combination of broadband flux measurements from geostationary orbit and the deployment for over 12 months of a comprehensive suite of active and passive instrumentation at the surface eliminates a number of sampling issues that could otherwise affect divergence calculations of this sort. However, one sampling issue that challenges the project is the fact that the surface flux data are essentially measurements made at a point, while the top-of-atmosphere values are taken over a solid angle that corresponds to an area at the surface of some 2500 km2. Variability of cloud cover and aerosol loading in the atmosphere mean that the downwelling fluxes, even when averaged over a day, will not be an exact match to the area-averaged value over that larger area, although we might expect that it is an unbiased estimate thereof. The heterogeneity of the surface, for example, fixed variations in albedo, further means that there is a likely systematic difference in the corresponding upwelling fluxes. In this paper we characterize and quantify this spatial sampling problem. We bound the root-mean-square error in the downwelling fluxes by exploiting a second set of surface flux measurements from a site that was run in parallel with the main deployment. The differences in the two sets of fluxes lead us to an upper bound to the sampling uncertainty, and their correlation leads to another which is probably optimistic as it requires certain other conditions to be met. For the upwelling fluxes we use data products from a number of satellite instruments to characterize the relevant heterogeneities and so estimate the systematic effects that arise from the flux measurements having to be taken at a single point. The sampling uncertainties vary with the season, being higher during the monsoon period. We find that the sampling errors for the daily average flux are small for the shortwave irradiance, generally less than 5 W m−2, under relatively clear skies, but these increase to about 10 W m−2 during the monsoon. For the upwelling fluxes, again taking daily averages, systematic errors are of order 10 W m−2 as a result of albedo variability. The uncertainty on the longwave component of the surface radiation budget is smaller than that on the shortwave component, in all conditions, but a bias of 4 W m−2 is calculated to exist in the surface leaving longwave flux.
Resumo:
[1] Cloud cover is conventionally estimated from satellite images as the observed fraction of cloudy pixels. Active instruments such as radar and Lidar observe in narrow transects that sample only a small percentage of the area over which the cloud fraction is estimated. As a consequence, the fraction estimate has an associated sampling uncertainty, which usually remains unspecified. This paper extends a Bayesian method of cloud fraction estimation, which also provides an analytical estimate of the sampling error. This method is applied to test the sensitivity of this error to sampling characteristics, such as the number of observed transects and the variability of the underlying cloud field. The dependence of the uncertainty on these characteristics is investigated using synthetic data simulated to have properties closely resembling observations of the spaceborne Lidar NASA-LITE mission. Results suggest that the variance of the cloud fraction is greatest for medium cloud cover and least when conditions are mostly cloudy or clear. However, there is a bias in the estimation, which is greatest around 25% and 75% cloud cover. The sampling uncertainty is also affected by the mean lengths of clouds and of clear intervals; shorter lengths decrease uncertainty, primarily because there are more cloud observations in a transect of a given length. Uncertainty also falls with increasing number of transects. Therefore a sampling strategy aimed at minimizing the uncertainty in transect derived cloud fraction will have to take into account both the cloud and clear sky length distributions as well as the cloud fraction of the observed field. These conclusions have implications for the design of future satellite missions. This paper describes the first integrated methodology for the analytical assessment of sampling uncertainty in cloud fraction observations from forthcoming spaceborne radar and Lidar missions such as NASA's Calipso and CloudSat.
Resumo:
We have developed an ensemble Kalman Filter (EnKF) to estimate 8-day regional surface fluxes of CO2 from space-borne CO2 dry-air mole fraction observations (XCO2) and evaluate the approach using a series of synthetic experiments, in preparation for data from the NASA Orbiting Carbon Observatory (OCO). The 32-day duty cycle of OCO alternates every 16 days between nadir and glint measurements of backscattered solar radiation at short-wave infrared wavelengths. The EnKF uses an ensemble of states to represent the error covariances to estimate 8-day CO2 surface fluxes over 144 geographical regions. We use a 12×8-day lag window, recognising that XCO2 measurements include surface flux information from prior time windows. The observation operator that relates surface CO2 fluxes to atmospheric distributions of XCO2 includes: a) the GEOS-Chem transport model that relates surface fluxes to global 3-D distributions of CO2 concentrations, which are sampled at the time and location of OCO measurements that are cloud-free and have aerosol optical depths <0.3; and b) scene-dependent averaging kernels that relate the CO2 profiles to XCO2, accounting for differences between nadir and glint measurements, and the associated scene-dependent observation errors. We show that OCO XCO2 measurements significantly reduce the uncertainties of surface CO2 flux estimates. Glint measurements are generally better at constraining ocean CO2 flux estimates. Nadir XCO2 measurements over the terrestrial tropics are sparse throughout the year because of either clouds or smoke. Glint measurements provide the most effective constraint for estimating tropical terrestrial CO2 fluxes by accurately sampling fresh continental outflow over neighbouring oceans. We also present results from sensitivity experiments that investigate how flux estimates change with 1) bias and unbiased errors, 2) alternative duty cycles, 3) measurement density and correlations, 4) the spatial resolution of estimated flux estimates, and 5) reducing the length of the lag window and the size of the ensemble. At the revision stage of this manuscript, the OCO instrument failed to reach its orbit after it was launched on 24 February 2009. The EnKF formulation presented here is also applicable to GOSAT measurements of CO2 and CH4.
Resumo:
The variogram is essential for local estimation and mapping of any variable by kriging. The variogram itself must usually be estimated from sample data. The sampling density is a compromise between precision and cost, but it must be sufficiently dense to encompass the principal spatial sources of variance. A nested, multi-stage, sampling with separating distances increasing in geometric progression from stage to stage will do that. The data may then be analyzed by a hierarchical analysis of variance to estimate the components of variance for every stage, and hence lag. By accumulating the components starting from the shortest lag one obtains a rough variogram for modest effort. For balanced designs the analysis of variance is optimal; for unbalanced ones, however, these estimators are not necessarily the best, and the analysis by residual maximum likelihood (REML) will usually be preferable. The paper summarizes the underlying theory and illustrates its application with data from three surveys, one in which the design had four stages and was balanced and two implemented with unbalanced designs to economize when there were more stages. A Fortran program is available for the analysis of variance, and code for the REML analysis is listed in the paper. (c) 2005 Elsevier Ltd. All rights reserved.
Resumo:
The Representative Soil Sampling Scheme (RSSS) has monitored the soil of agricultural land in England and Wales since 1969. Here we describe the first spatial analysis of the data from these surveys using geostatistics. Four years of data (1971, 1981, 1991 and 2001) were chosen to examine the nutrient (available K, Mg and P) and pH status of the soil. At each farm, four fields were sampled; however, for the earlier years, coordinates were available for the farm only and not for each field. The averaged data for each farm were used for spatial analysis and the variograms showed spatial structure even with the smaller sample size. These variograms provide a reasonable summary of the larger scale of variation identified from the data of the more intensively sampled National Soil Inventory. Maps of kriged predictions of K generally show larger values in the central and southeastern areas (above 200 mg L-1) and an increase in values in the west over time, whereas Mg is fairly stable over time. The kriged predictions of P show a decline over time, particularly in the east, and those of pH show an increase in the east over time. Disjunctive kriging was used to examine temporal changes in available P using probabilities less than given thresholds of this element. The RSSS was not designed for spatial analysis, but the results show that the data from these surveys are suitable for this purpose. The results of the spatial analysis, together with those of the statistical analyses, provide a comprehensive view of the RSSS database as a basis for monitoring the soil. These data should be taken into account when future national soil monitoring schemes are designed.
Resumo:
In the continuing debate over the impact of genetically modified (GM) crops on farmers of developing countries, it is important to accurately measure magnitudes such as farm-level yield gains from GM crop adoption. Yet most farm-level studies in the literature do not control for farmer self-selection, a potentially important source of bias in such estimates. We use farm-level panel data from Indian cotton farmers to investigate the yield effect of GM insect-resistant cotton. We explicitly take into account the fact that the choice of crop variety is an endogenous variable which might lead to bias from self-selection. A production function is estimated using a fixed-effects model to control for selection bias. Our results show that efficient farmers adopt Bacillus thuringiensis (Bt) cotton at a higher rate than their less efficient peers. This suggests that cross-sectional estimates of the yield effect of Bt cotton, which do not control for self-selection effects, are likely to be biased upwards. However, after controlling for selection bias, we still find that there is a significant positive yield effect from adoption of Bt cotton that more than offsets the additional cost of Bt seed.
Resumo:
Models developed to identify the rates and origins of nutrient export from land to stream require an accurate assessment of the nutrient load present in the water body in order to calibrate model parameters and structure. These data are rarely available at a representative scale and in an appropriate chemical form except in research catchments. Observational errors associated with nutrient load estimates based on these data lead to a high degree of uncertainty in modelling and nutrient budgeting studies. Here, daily paired instantaneous P and flow data for 17 UK research catchments covering a total of 39 water years (WY) have been used to explore the nature and extent of the observational error associated with nutrient flux estimates based on partial fractions and infrequent sampling. The daily records were artificially decimated to create 7 stratified sampling records, 7 weekly records, and 30 monthly records from each WY and catchment. These were used to evaluate the impact of sampling frequency on load estimate uncertainty. The analysis underlines the high uncertainty of load estimates based on monthly data and individual P fractions rather than total P. Catchments with a high baseflow index and/or low population density were found to return a lower RMSE on load estimates when sampled infrequently than those with a tow baseflow index and high population density. Catchment size was not shown to be important, though a limitation of this study is that daily records may fail to capture the full range of P export behaviour in smaller catchments with flashy hydrographs, leading to an underestimate of uncertainty in Load estimates for such catchments. Further analysis of sub-daily records is needed to investigate this fully. Here, recommendations are given on load estimation methodologies for different catchment types sampled at different frequencies, and the ways in which this analysis can be used to identify observational error and uncertainty for model calibration and nutrient budgeting studies. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
To provide reliable estimates for mapping soil properties for precision agriculture requires intensive sampling and costly laboratory analyses. If the spatial structure of ancillary data, such as yield, digital information from aerial photographs, and soil electrical conductivity (EC) measurements, relates to that of soil properties they could be used to guide the sampling intensity for soil surveys. Variograins of permanent soil properties at two study sites on different parent materials were compared with each other and with those for ancillary data. The ranges of spatial dependence identified by the variograms of both sets of properties are of similar orders of magnitude for each study site, Maps of the ancillary data appear to show similar patterns of variation and these seem to relate to those of the permanent properties of the soil. Correlation analysis has confirmed these relations. Maps of kriged estimates from sub-sampled data and the original variograrns showed that the main patterns of variation were preserved when a sampling interval of less than half the average variogram range of ancillary data was used. Digital data from aerial photographs for different years and EC appear to show a more consistent relation with the soil properties than does yield. Aerial photographs, in particular those of bare soil, seem to be the most useful ancillary data and they are often cheaper to obtain than yield and EC data.
Resumo:
It has been generally accepted that the method of moments (MoM) variogram, which has been widely applied in soil science, requires about 100 sites at an appropriate interval apart to describe the variation adequately. This sample size is often larger than can be afforded for soil surveys of agricultural fields or contaminated sites. Furthermore, it might be a much larger sample size than is needed where the scale of variation is large. A possible alternative in such situations is the residual maximum likelihood (REML) variogram because fewer data appear to be required. The REML method is parametric and is considered reliable where there is trend in the data because it is based on generalized increments that filter trend out and only the covariance parameters are estimated. Previous research has suggested that fewer data are needed to compute a reliable variogram using a maximum likelihood approach such as REML, however, the results can vary according to the nature of the spatial variation. There remain issues to examine: how many fewer data can be used, how should the sampling sites be distributed over the site of interest, and how do different degrees of spatial variation affect the data requirements? The soil of four field sites of different size, physiography, parent material and soil type was sampled intensively, and MoM and REML variograms were calculated for clay content. The data were then sub-sampled to give different sample sizes and distributions of sites and the variograms were computed again. The model parameters for the sets of variograms for each site were used for cross-validation. Predictions based on REML variograms were generally more accurate than those from MoM variograms with fewer than 100 sampling sites. A sample size of around 50 sites at an appropriate distance apart, possibly determined from variograms of ancillary data, appears adequate to compute REML variograms for kriging soil properties for precision agriculture and contaminated sites. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
Long-term monitoring of forest soils as part of a pan-European network to detect environmental change depends on an accurate determination of the mean of the soil properties at each monitoring event. Forest soil is known to be very variable spatially, however. A study was undertaken to explore and quantify this variability at three forest monitoring plots in Britain. Detailed soil sampling was carried out, and the data from the chemical analyses were analysed by classical statistics and geostatistics. An analysis of variance showed that there were no consistent effects from the sample sites in relation to the position of the trees. The variogram analysis showed that there was spatial dependence at each site for several variables and some varied in an apparently periodic way. An optimal sampling analysis based on the multivariate variogram for each site suggested that a bulked sample from 36 cores would reduce error to an acceptable level. Future sampling should be designed so that it neither targets nor avoids trees and disturbed ground. This can be achieved best by using a stratified random sampling design.
Resumo:
As part of the European Commission (EC)'s revision of the Sewage Sludge Directive and the development of a Biowaste Directive, there was recognition of the difficulty of comparing data from Member States (MSs) because of differences in sampling and analytical procedures. The 'HORIZONTAL' initiative, funded by the EC and MSs, seeks to address these differences in approach and to produce standardised procedures in the form of CEN standards. This article is a preliminary investigation into aspects of the sampling of biosolids, composts and soils to which there is a history of biosolid application. The article provides information on the measurement uncertainty associated with sampling from heaps, large bags and pipes and soils in the landscape under a limited set of conditions, using sampling approaches in space and time and sample numbers based on procedures widely used in the relevant industries and when sampling similar materials. These preliminary results suggest that considerably more information is required before the appropriate sample design, optimum number of samples, number of samples comprising a composite, and temporal and spatial frequency of sampling might be recommended to achieve consistent results of a high level of precision and confidence. (C) 2004 Elsevier Ltd. All rights reserved.