Biblioteca Digital

19 resultados para outliers

em CentAUR: Central Archive University of Reading - UK

Determining the effect of asymmetric data on the variogram. II. Outliers

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Asymmetry in a distribution can arise from a long tail of values in the underlying process or from outliers that belong to another population that contaminate the primary process. The first paper of this series examined the effects of the former on the variogram and this paper examines the effects of asymmetry arising from outliers. Simulated annealing was used to create normally distributed random fields of different size that are realizations of known processes described by variograms with different nugget:sill ratios. These primary data sets were then contaminated with randomly located and spatially aggregated outliers from a secondary process to produce different degrees of asymmetry. Experimental variograms were computed from these data by Matheron's estimator and by three robust estimators. The effects of standard data transformations on the coefficient of skewness and on the variogram were also investigated. Cross-validation was used to assess the performance of models fitted to experimental variograms computed from a range of data contaminated by outliers for kriging. The results showed that where skewness was caused by outliers the variograms retained their general shape, but showed an increase in the nugget and sill variances and nugget:sill ratios. This effect was only slightly more for the smallest data set than for the two larger data sets and there was little difference between the results for the latter. Overall, the effect of size of data set was small for all analyses. The nugget:sill ratio showed a consistent decrease after transformation to both square roots and logarithms; the decrease was generally larger for the latter, however. Aggregated outliers had different effects on the variogram shape from those that were randomly located, and this also depended on whether they were aggregated near to the edge or the centre of the field. The results of cross-validation showed that the robust estimators and the removal of outliers were the most effective ways of dealing with outliers for variogram estimation and kriging. (C) 2007 Elsevier Ltd. All rights reserved.

A comparison of best-fit lines for data with outliers

Relevância:

20.00% 20.00%

Publicador:

Line-fitting with outliers

Relevância:

20.00% 20.00%

Publicador:

Using best-fit lines for science data with outliers

Relevância:

20.00% 20.00%

Publicador:

Predicting outliers in ensemble forecasts

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An ensemble forecast is a collection of runs of a numerical dynamical model, initialized with perturbed initial conditions. In modern weather prediction for example, ensembles are used to retrieve probabilistic information about future weather conditions. In this contribution, we are concerned with ensemble forecasts of a scalar quantity (say, the temperature at a specific location). We consider the event that the verification is smaller than the smallest, or larger than the largest ensemble member. We call these events outliers. If a K-member ensemble accurately reflected the variability of the verification, outliers should occur with a base rate of 2/(K + 1). In operational forecast ensembles though, this frequency is often found to be higher. We study the predictability of outliers and find that, exploiting information available from the ensemble, forecast probabilities for outlier events can be calculated which are more skilful than the unconditional base rate. We prove this analytically for statistically consistent forecast ensembles. Further, the analytical results are compared to the predictability of outliers in an operational forecast ensemble by means of model output statistics. We find the analytical and empirical results to agree both qualitatively and quantitatively.

Towards anomaly detection for increased security in multibiometric systems: spoofing-resistant 1-median fusion eliminating outliers

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Multibiometrics aims at improving biometric security in presence of spoofing attempts, but exposes a larger availability of points of attack. Standard fusion rules have been shown to be highly sensitive to spoofing attempts – even in case of a single fake instance only. This paper presents a novel spoofing-resistant fusion scheme proposing the detection and elimination of anomalous fusion input in an ensemble of evidence with liveness information. This approach aims at making multibiometric systems more resistant to presentation attacks by modeling the typical behaviour of human surveillance operators detecting anomalies as employed in many decision support systems. It is shown to improve security, while retaining the high accuracy level of standard fusion approaches on the latest Fingerprint Liveness Detection Competition (LivDet) 2013 dataset.

Determining the effect of asymmetric data on the variogram. I. Underlying asymmetry

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Matheron's usual variogram estimator can result in unreliable variograms when data are strongly asymmetric or skewed. Asymmetry in a distribution can arise from a long tail of values in the underlying process or from outliers that belong to another population that contaminate the primary process. This paper examines the effects of underlying asymmetry on the variogram and on the accuracy of prediction, and the second one examines the effects arising from outliers. Standard geostatistical texts suggest ways of dealing with underlying asymmetry; however, this is based on informed intuition rather than detailed investigation. To determine whether the methods generally used to deal with underlying asymmetry are appropriate, the effects of different coefficients of skewness on the shape of the experimental variogram and on the model parameters were investigated. Simulated annealing was used to create normally distributed random fields of different size from variograms with different nugget:sill ratios. These data were then modified to give different degrees of asymmetry and the experimental variogram was computed in each case. The effects of standard data transformations on the form of the variogram were also investigated. Cross-validation was used to assess quantitatively the performance of the different variogram models for kriging. The results showed that the shape of the variogram was affected by the degree of asymmetry, and that the effect increased as the size of data set decreased. Transformations of the data were more effective in reducing the skewness coefficient in the larger sets of data. Cross-validation confirmed that variogram models from transformed data were more suitable for kriging than were those from the raw asymmetric data. The results of this study have implications for the 'standard best practice' in dealing with asymmetry in data for geostatistical analyses. (C) 2007 Elsevier Ltd. All rights reserved.

Dating the volcanic eruption at Thera

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The eruption of the volcano at Thera (Santorini) in the Aegean Sea undoubtedly had a profound influence on the civilizations of the surrounding region. The date of the eruption has been a subject of much controversy because it must be linked into the established and intricate archaeological phasings of both the prehistoric Aegean and the wider east Mediterranean. Radiocarbon dating of material from the volcanic destruction layer itself can provide some evidence for the date of the eruption, but because of the shape of the calibration curve for the relevant period, the value of such dates relies on there being no biases in the data sets. However, by dating the material from phases earlier and later than the eruption, some of the problems of the calibration data set can be circumvented and the chronology for the region can be resolved with more certainty. In this paper, we draw together the evidence we have accumulated so far, including new data on the destruction layer itself and for the preceding cultural horizon at Thera, and from associated layers at Miletos in western Turkey. Using Bayesian models to synthesize the data and to identify outliers, we conclude from the most reliable C-14 evidence (and using the INTCAL98 calibration data set) that the eruption of Thera occurred between 1663 and 1599 BC.

Identifying adaptive genetic divergence among populations from genome scans

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The identification of signatures of natural selection in genomic surveys has become an area of intense research, stimulated by the increasing ease with which genetic markers can be typed. Loci identified as subject to selection may be functionally important, and hence (weak) candidates for involvement in disease causation. They can also be useful in determining the adaptive differentiation of populations, and exploring hypotheses about speciation. Adaptive differentiation has traditionally been identified from differences in allele frequencies among different populations, summarised by an estimate of F-ST. Low outliers relative to an appropriate neutral population-genetics model indicate loci subject to balancing selection, whereas high outliers suggest adaptive (directional) selection. However, the problem of identifying statistically significant departures from neutrality is complicated by confounding effects on the distribution of F-ST estimates, and current methods have not yet been tested in large-scale simulation experiments. Here, we simulate data from a structured population at many unlinked, diallelic loci that are predominantly neutral but with some loci subject to adaptive or balancing selection. We develop a hierarchical-Bayesian method, implemented via Markov chain Monte Carlo (MCMC), and assess its performance in distinguishing the loci simulated under selection from the neutral loci. We also compare this performance with that of a frequentist method, based on moment-based estimates of F-ST. We find that both methods can identify loci subject to adaptive selection when the selection coefficient is at least five times the migration rate. Neither method could reliably distinguish loci under balancing selection in our simulations, even when the selection coefficient is twenty times the migration rate.

Bryophyte diversity and community structure on thatched roofs of the Holnicote Estate, Somerset, U.K.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We investigated patterns of bryophyte species richness and community structure, and their relation to roof variables, on thatched roofs of the Holnicote Estate, South Somerset. Thirty-two bryophyte species were recorded from 28 sampled roofs, including the globally rare and endangered thatch moss, Leptodontium gemmascens. Multiple regression analyses revealed that thatch age has a highly significant positive effect on the number of species present, accounting for nearly half the observed variation in species richness after removal of outliers. Aspect has a slight and marginally significant effect on species diversity (accounting for an additional 6% of variation), with north-facing samples having slightly more species. Age also has a significant impact on total bryophyte cover after removal of outlying observations. TWINSPAN analysis of bryophyte cover data suggests the existence of at least five discrete communities. Simple Discriminant Analyses indicate that these communities occupy different ecological subspaces as defined by the measured roof variables, with pitch, aspect and thatch age emerging as especially significant attributes. Contingency Analysis indicates that some communities are disfavoured by water reed as compared to wheat straw. The findings are significant for understanding the structure of bryophyte communities, for evaluating the effect of bryophyte cover on thatch performance, and for conservation of thatch communities, especially those harbouring rare species.

M-estimator, and D-optimality model construction using orthogonal forward regression

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This correspondence introduces a new orthogonal forward regression (OFR) model identification algorithm using D-optimality for model structure selection and is based on an M-estimators of parameter estimates. M-estimator is a classical robust parameter estimation technique to tackle bad data conditions such as outliers. Computationally, The M-estimator can be derived using an iterative reweighted least squares (IRLS) algorithm. D-optimality is a model structure robustness criterion in experimental design to tackle ill-conditioning in model Structure. The orthogonal forward regression (OFR), often based on the modified Gram-Schmidt procedure, is an efficient method incorporating structure selection and parameter estimation simultaneously. The basic idea of the proposed approach is to incorporate an IRLS inner loop into the modified Gram-Schmidt procedure. In this manner, the OFR algorithm for parsimonious model structure determination is extended to bad data conditions with improved performance via the derivation of parameter M-estimators with inherent robustness to outliers. Numerical examples are included to demonstrate the effectiveness of the proposed algorithm.

Combined bias and outlier identification in dynamic data reconciliation

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Measured process data normally contain inaccuracies because the measurements are obtained using imperfect instruments. As well as random errors one can expect systematic bias caused by miscalibrated instruments or outliers caused by process peaks such as sudden power fluctuations. Data reconciliation is the adjustment of a set of process data based on a model of the process so that the derived estimates conform to natural laws. In this paper, techniques for the detection and identification of both systematic bias and outliers in dynamic process data are presented. A novel technique for the detection and identification of systematic bias is formulated and presented. The problem of detection, identification and elimination of outliers is also treated using a modified version of a previously available clustering technique. These techniques are also combined to provide a global dynamic data reconciliation (DDR) strategy. The algorithms presented are tested in isolation and in combination using dynamic simulations of two continuous stirred tank reactors (CSTR).

Enhancing model predictive control using dynamic data reconciliation

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The use of data reconciliation techniques can considerably reduce the inaccuracy of process data due to measurement errors. This in turn results in improved control system performance and process knowledge. Dynamic data reconciliation techniques are applied to a model-based predictive control scheme. It is shown through simulations on a chemical reactor system that the overall performance of the model-based predictive controller is enhanced considerably when data reconciliation is applied. The dynamic data reconciliation techniques used include a combined strategy for the simultaneous identification of outliers and systematic bias.

Obesity and diabetes, the built environment, and the ‘local’ food economy in the United States, 2007

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Obesity and diabetes are increasingly attributed to environmental factors, however, little attention has been paid to the influence of the ‘local’ food economy. This paper examines the association of measures relating to the built environment and ‘local’ agriculture with U.S. county-level prevalence of obesity and diabetes. Key indicators of the ‘local’ food economy include the density of farmers’ markets and the presence of farms with direct sales. This paper employs a robust regression estimator to account for non-normality of the data and to accommodate outliers. Overall, the built environment is associated with the prevalence of obesity and diabetes and a strong local’ food economy may play an important role in prevention. Results imply considerable scope for community-level interventions.

Validating the reported random errors of ACE‐FTS measurements

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In order to validate the reported precision of space‐based atmospheric composition measurements, validation studies often focus on measurements in the tropical stratosphere, where natural variability is weak. The scatter in tropical measurements can then be used as an upper limit on single‐profile measurement precision. Here we introduce a method of quantifying the scatter of tropical measurements which aims to minimize the effects of short‐term atmospheric variability while maintaining large enough sample sizes that the results can be taken as representative of the full data set. We apply this technique to measurements of O3, HNO3, CO, H2O, NO, NO2, N2O, CH4, CCl2F2, and CCl3F produced by the Atmospheric Chemistry Experiment–Fourier Transform Spectrometer (ACE‐FTS). Tropical scatter in the ACE‐FTS retrievals is found to be consistent with the reported random errors (RREs) for H2O and CO at altitudes above 20 km, validating the RREs for these measurements. Tropical scatter in measurements of NO, NO2, CCl2F2, and CCl3F is roughly consistent with the RREs as long as the effect of outliers in the data set is reduced through the use of robust statistics. The scatter in measurements of O3, HNO3, CH4, and N2O in the stratosphere, while larger than the RREs, is shown to be consistent with the variability simulated in the Canadian Middle Atmosphere Model. This result implies that, for these species, stratospheric measurement scatter is dominated by natural variability, not random error, which provides added confidence in the scientific value of single‐profile measurements.

«
1
2
»