954 resultados para ROBUST ESTIMATION
Resumo:
Models developed to identify the rates and origins of nutrient export from land to stream require an accurate assessment of the nutrient load present in the water body in order to calibrate model parameters and structure. These data are rarely available at a representative scale and in an appropriate chemical form except in research catchments. Observational errors associated with nutrient load estimates based on these data lead to a high degree of uncertainty in modelling and nutrient budgeting studies. Here, daily paired instantaneous P and flow data for 17 UK research catchments covering a total of 39 water years (WY) have been used to explore the nature and extent of the observational error associated with nutrient flux estimates based on partial fractions and infrequent sampling. The daily records were artificially decimated to create 7 stratified sampling records, 7 weekly records, and 30 monthly records from each WY and catchment. These were used to evaluate the impact of sampling frequency on load estimate uncertainty. The analysis underlines the high uncertainty of load estimates based on monthly data and individual P fractions rather than total P. Catchments with a high baseflow index and/or low population density were found to return a lower RMSE on load estimates when sampled infrequently than those with a tow baseflow index and high population density. Catchment size was not shown to be important, though a limitation of this study is that daily records may fail to capture the full range of P export behaviour in smaller catchments with flashy hydrographs, leading to an underestimate of uncertainty in Load estimates for such catchments. Further analysis of sub-daily records is needed to investigate this fully. Here, recommendations are given on load estimation methodologies for different catchment types sampled at different frequencies, and the ways in which this analysis can be used to identify observational error and uncertainty for model calibration and nutrient budgeting studies. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
Asymmetry in a distribution can arise from a long tail of values in the underlying process or from outliers that belong to another population that contaminate the primary process. The first paper of this series examined the effects of the former on the variogram and this paper examines the effects of asymmetry arising from outliers. Simulated annealing was used to create normally distributed random fields of different size that are realizations of known processes described by variograms with different nugget:sill ratios. These primary data sets were then contaminated with randomly located and spatially aggregated outliers from a secondary process to produce different degrees of asymmetry. Experimental variograms were computed from these data by Matheron's estimator and by three robust estimators. The effects of standard data transformations on the coefficient of skewness and on the variogram were also investigated. Cross-validation was used to assess the performance of models fitted to experimental variograms computed from a range of data contaminated by outliers for kriging. The results showed that where skewness was caused by outliers the variograms retained their general shape, but showed an increase in the nugget and sill variances and nugget:sill ratios. This effect was only slightly more for the smallest data set than for the two larger data sets and there was little difference between the results for the latter. Overall, the effect of size of data set was small for all analyses. The nugget:sill ratio showed a consistent decrease after transformation to both square roots and logarithms; the decrease was generally larger for the latter, however. Aggregated outliers had different effects on the variogram shape from those that were randomly located, and this also depended on whether they were aggregated near to the edge or the centre of the field. The results of cross-validation showed that the robust estimators and the removal of outliers were the most effective ways of dealing with outliers for variogram estimation and kriging. (C) 2007 Elsevier Ltd. All rights reserved.
Resumo:
A method is presented which allows thermal inertia (the soil heat capacity times the square root of the soil thermal diffusivity, C(h)rootD(h)), to be estimated remotely from micrometeorological observations. The method uses the drop in surface temperature, T-s, between sunset and sunrise, and the average night-time net radiation during that period, for clear, still nights. A Fourier series analysis was applied to analyse the time series of T-s . The Fourier series constants, together with the remote estimate of thermal inertia, were used in an analytical expression to calculate diurnal estimates of the soil heat flux, G. These remote estimates of C(h)rootD(h) and G compared well with values derived from in situ sensors. The remote and in situ estimates of C(h)rootD(h) both correlated well with topsoil moisture content. This method potentially allows area-average estimates of thermal inertia and soil heat flux to be derived from remote sensing, e.g. METEOSAT Second Generation, where the area is determined by the sensor's height and viewing angle. (C) 2003 Elsevier B.V. All rights reserved.
Resumo:
Climate change science is increasingly concerned with methods for managing and integrating sources of uncertainty from emission storylines, climate model projections, and ecosystem model parameterizations. In tropical ecosystems, regional climate projections and modeled ecosystem responses vary greatly, leading to a significant source of uncertainty in global biogeochemical accounting and possible future climate feedbacks. Here, we combine an ensemble of IPCC-AR4 climate change projections for the Amazon Basin (eight general circulation models) with alternative ecosystem parameter sets for the dynamic global vegetation model, LPJmL. We evaluate LPJmL simulations of carbon stocks and fluxes against flux tower and aboveground biomass datasets for individual sites and the entire basin. Variability in LPJmL model sensitivity to future climate change is primarily related to light and water limitations through biochemical and water-balance-related parameters. Temperature-dependent parameters related to plant respiration and photosynthesis appear to be less important than vegetation dynamics (and their parameters) for determining the magnitude of ecosystem response to climate change. Variance partitioning approaches reveal that relationships between uncertainty from ecosystem dynamics and climate projections are dependent on geographic location and the targeted ecosystem process. Parameter uncertainty from the LPJmL model does not affect the trajectory of ecosystem response for a given climate change scenario and the primary source of uncertainty for Amazon 'dieback' results from the uncertainty among climate projections. Our approach for describing uncertainty is applicable for informing and prioritizing policy options related to mitigation and adaptation where long-term investments are required.
Resumo:
Real-time rainfall monitoring in Africa is of great practical importance for operational applications in hydrology and agriculture. Satellite data have been used in this context for many years because of the lack of surface observations. This paper describes an improved artificial neural network algorithm for operational applications. The algorithm combines numerical weather model information with the satellite data. Using this algorithm, daily rainfall estimates were derived for 4 yr of the Ethiopian and Zambian main rainy seasons and were compared with two other algorithms-a multiple linear regression making use of the same information as that of the neural network and a satellite-only method. All algorithms were validated against rain gauge data. Overall, the neural network performs best, but the extent to which it does so depends on the calibration/validation protocol. The advantages of the neural network are most evident when calibration data are numerous and close in space and time to the validation data. This result emphasizes the importance of a real-time calibration system.
Resumo:
This paper presents a first attempt to estimate mixing parameters from sea level observations using a particle method based on importance sampling. The method is applied to an ensemble of 128 members of model simulations with a global ocean general circulation model of high complexity. Idealized twin experiments demonstrate that the method is able to accurately reconstruct mixing parameters from an observed mean sea level field when mixing is assumed to be spatially homogeneous. An experiment with inhomogeneous eddy coefficients fails because of the limited ensemble size. This is overcome by the introduction of local weighting, which is able to capture spatial variations in mixing qualitatively. As the sensitivity of sea level for variations in mixing is higher for low values of mixing coefficients, the method works relatively well in regions of low eddy activity.
Resumo:
In this paper we consider the estimation of population size from onesource capture–recapture data, that is, a list in which individuals can potentially be found repeatedly and where the question is how many individuals are missed by the list. As a typical example, we provide data from a drug user study in Bangkok from 2001 where the list consists of drug users who repeatedly contact treatment institutions. Drug users with 1, 2, 3, . . . contacts occur, but drug users with zero contacts are not present, requiring the size of this group to be estimated. Statistically, these data can be considered as stemming from a zero-truncated count distribution.We revisit an estimator for the population size suggested by Zelterman that is known to be robust under potential unobserved heterogeneity. We demonstrate that the Zelterman estimator can be viewed as a maximum likelihood estimator for a locally truncated Poisson likelihood which is equivalent to a binomial likelihood. This result allows the extension of the Zelterman estimator by means of logistic regression to include observed heterogeneity in the form of covariates. We also review an estimator proposed by Chao and explain why we are not able to obtain similar results for this estimator. The Zelterman estimator is applied in two case studies, the first a drug user study from Bangkok, the second an illegal immigrant study in the Netherlands. Our results suggest the new estimator should be used, in particular, if substantial unobserved heterogeneity is present.
Resumo:
We propose a novel method for scoring the accuracy of protein binding site predictions – the Binding-site Distance Test (BDT) score. Recently, the Matthews Correlation Coefficient (MCC) has been used to evaluate binding site predictions, both by developers of new methods and by the assessors for the community wide prediction experiment – CASP8. Whilst being a rigorous scoring method, the MCC does not take into account the actual 3D location of the predicted residues from the observed binding site. Thus, an incorrectly predicted site that is nevertheless close to the observed binding site will obtain an identical score to the same number of nonbinding residues predicted at random. The MCC is somewhat affected by the subjectivity of determining observed binding residues and the ambiguity of choosing distance cutoffs. By contrast the BDT method produces continuous scores ranging between 0 and 1, relating to the distance between the predicted and observed residues. Residues predicted close to the binding site will score higher than those more distant, providing a better reflection of the true accuracy of predictions. The CASP8 function predictions were evaluated using both the MCC and BDT methods and the scores were compared. The BDT was found to strongly correlate with the MCC scores whilst also being less susceptible to the subjectivity of defining binding residues. We therefore suggest that this new simple score is a potentially more robust method for future evaluations of protein-ligand binding site predictions.