368 resultados para KRIGING
Resumo:
Recently, a lot of effort has been spent in the efficient computation of kriging predictors when observations are assimilated sequentially. In particular, kriging update formulae enabling significant computational savings were derived. Taking advantage of the previous kriging mean and variance computations helps avoiding a costly matrix inversion when adding one observation to the TeX already available ones. In addition to traditional update formulae taking into account a single new observation, Emery (2009) proposed formulae for the batch-sequential case, i.e. when TeX new observations are simultaneously assimilated. However, the kriging variance and covariance formulae given in Emery (2009) for the batch-sequential case are not correct. In this paper, we fix this issue and establish correct expressions for updated kriging variances and covariances when assimilating observations in parallel. An application in sequential conditional simulation finally shows that coupling update and residual substitution approaches may enable significant speed-ups.
Resumo:
Let us consider a large set of candidate parameter fields, such as hydraulic conductivity maps, on which we can run an accurate forward flow and transport simulation. We address the issue of rapidly identifying a subset of candidates whose response best match a reference response curve. In order to keep the number of calls to the accurate flow simulator computationally tractable, a recent distance-based approach relying on fast proxy simulations is revisited, and turned into a non-stationary kriging method where the covariance kernel is obtained by combining a classical kernel with the proxy. Once the accurate simulator has been run for an initial subset of parameter fields and a kriging metamodel has been inferred, the predictive distributions of misfits for the remaining parameter fields can be used as a guide to select candidate parameter fields in a sequential way. The proposed algorithm, Proxy-based Kriging for Sequential Inversion (ProKSI), relies on a variant of the Expected Improvement, a popular criterion for kriging-based global optimization. A statistical benchmark of ProKSI’s performances illustrates the efficiency and the robustness of the approach when using different kinds of proxies.
Resumo:
Responses of many real-world problems can only be evaluated perturbed by noise. In order to make an efficient optimization of these problems possible, intelligent optimization strategies successfully coping with noisy evaluations are required. In this article, a comprehensive review of existing kriging-based methods for the optimization of noisy functions is provided. In summary, ten methods for choosing the sequential samples are described using a unified formalism. They are compared on analytical benchmark problems, whereby the usual assumption of homoscedastic Gaussian noise made in the underlying models is meet. Different problem configurations (noise level, maximum number of observations, initial number of observations) and setups (covariance functions, budget, initial sample size) are considered. It is found that the choices of the initial sample size and the covariance function are not critical. The choice of the method, however, can result in significant differences in the performance. In particular, the three most intuitive criteria are found as poor alternatives. Although no criterion is found consistently more efficient than the others, two specialized methods appear more robust on average.
Resumo:
Several strategies relying on kriging have recently been proposed for adaptively estimating contour lines and excursion sets of functions under severely limited evaluation budget. The recently released R package KrigInv 3 is presented and offers a sound implementation of various sampling criteria for those kinds of inverse problems. KrigInv is based on the DiceKriging package, and thus benefits from a number of options concerning the underlying kriging models. Six implemented sampling criteria are detailed in a tutorial and illustrated with graphical examples. Different functionalities of KrigInv are gradually explained. Additionally, two recently proposed criteria for batch-sequential inversion are presented, enabling advanced users to distribute function evaluations in parallel on clusters or clouds of machines. Finally, auxiliary problems are discussed. These include the fine tuning of numerical integration and optimization procedures used within the computation and the optimization of the considered criteria.
Resumo:
Kriging-based optimization relying on noisy evaluations of complex systems has recently motivated contributions from various research communities. Five strategies have been implemented in the DiceOptim package. The corresponding functions constitute a user-friendly tool for solving expensive noisy optimization problems in a sequential framework, while offering some flexibility for advanced users. Besides, the implementation is done in a unified environment, making this package a useful device for studying the relative performances of existing approaches depending on the experimental setup. An overview of the package structure and interface is provided, as well as a description of the strategies and some insight about the implementation challenges and the proposed solutions. The strategies are compared to some existing optimization packages on analytical test functions and show promising performances.
Resumo:
Stepwise uncertainty reduction (SUR) strategies aim at constructing a sequence of points for evaluating a function f in such a way that the residual uncertainty about a quantity of interest progressively decreases to zero. Using such strategies in the framework of Gaussian process modeling has been shown to be efficient for estimating the volume of excursion of f above a fixed threshold. However, SUR strategies remain cumbersome to use in practice because of their high computational complexity, and the fact that they deliver a single point at each iteration. In this article we introduce several multipoint sampling criteria, allowing the selection of batches of points at which f can be evaluated in parallel. Such criteria are of particular interest when f is costly to evaluate and several CPUs are simultaneously available. We also manage to drastically reduce the computational cost of these strategies through the use of closed form formulas. We illustrate their performances in various numerical experiments, including a nuclear safety test case. Basic notions about kriging, auxiliary problems, complexity calculations, R code, and data are available online as supplementary materials.
Resumo:
Kriging is a widely employed method for interpolating and estimating elevations from digital elevation data. Its place of prominence is due to its elegant theoretical foundation and its convenient practical implementation. From an interpolation point of view, kriging is equivalent to a thin-plate spline and is one species among the many in the genus of weighted inverse distance methods, albeit with attractive properties. However, from a statistical point of view, kriging is a best linear unbiased estimator and, consequently, has a place of distinction among all spatial estimators because any other linear estimator that performs as well as kriging (in the least squares sense) must be equivalent to kriging, assuming that the parameters of the semivariogram are known. Therefore, kriging is often held to be the gold standard of digital terrain model elevation estimation. However, I prove that, when used with local support, kriging creates discontinuous digital terrain models, which is to say, surfaces with “rips” and “tears” throughout them. This result is general; it is true for ordinary kriging, kriging with a trend, and other forms. A U.S. Geological Survey (USGS) digital elevation model was analyzed to characterize the distribution of the discontinuities. I show that the magnitude of the discontinuity does not depend on surface gradient but is strongly dependent on the size of the kriging neighborhood.
Resumo:
The economic design of a distillation column or distillation sequences is a challenging problem that has been addressed by superstructure approaches. However, these methods have not been widely used because they lead to mixed-integer nonlinear programs that are hard to solve, and require complex initialization procedures. In this article, we propose to address this challenging problem by substituting the distillation columns by Kriging-based surrogate models generated via state of the art distillation models. We study different columns with increasing difficulty, and show that it is possible to get accurate Kriging-based surrogate models. The optimization strategy ensures that convergence to a local optimum is guaranteed for numerical noise-free models. For distillation columns (slightly noisy systems), Karush–Kuhn–Tucker optimality conditions cannot be tested directly on the actual model, but still we can guarantee a local minimum in a trust region of the surrogate model that contains the actual local minimum.
Resumo:
Superstructure approaches are the solution to the difficult problem which involves the rigorous economic design of a distillation column. These methods require complex initialization procedures and they are hard to solve. For this reason, these methods have not been extensively used. In this work, we present a methodology for the rigorous optimization of chemical processes implemented on a commercial simulator using surrogate models based on a kriging interpolation. Several examples were studied, but in this paper, we perform the optimization of a superstructure for a non-sharp separation to show the efficiency and effectiveness of the method. Noteworthy that it is possible to get surrogate models accurate enough with up to seven degrees of freedom.
Resumo:
Large monitoring networks are becoming increasingly common and can generate large datasets from thousands to millions of observations in size, often with high temporal resolution. Processing large datasets using traditional geostatistical methods is prohibitively slow and in real world applications different types of sensor can be found across a monitoring network. Heterogeneities in the error characteristics of different sensors, both in terms of distribution and magnitude, presents problems for generating coherent maps. An assumption in traditional geostatistics is that observations are made directly of the underlying process being studied and that the observations are contaminated with Gaussian errors. Under this assumption, sub–optimal predictions will be obtained if the error characteristics of the sensor are effectively non–Gaussian. One method, model based geostatistics, assumes that a Gaussian process prior is imposed over the (latent) process being studied and that the sensor model forms part of the likelihood term. One problem with this type of approach is that the corresponding posterior distribution will be non–Gaussian and computationally demanding as Monte Carlo methods have to be used. An extension of a sequential, approximate Bayesian inference method enables observations with arbitrary likelihoods to be treated, in a projected process kriging framework which is less computationally intensive. The approach is illustrated using a simulated dataset with a range of sensor models and error characteristics.
Resumo:
In this thesis, the issue of incorporating uncertainty for environmental modelling informed by imagery is explored by considering uncertainty in deterministic modelling, measurement uncertainty and uncertainty in image composition. Incorporating uncertainty in deterministic modelling is extended for use with imagery using the Bayesian melding approach. In the application presented, slope steepness is shown to be the main contributor to total uncertainty in the Revised Universal Soil Loss Equation. A spatial sampling procedure is also proposed to assist in implementing Bayesian melding given the increased data size with models informed by imagery. Measurement error models are another approach to incorporating uncertainty when data is informed by imagery. These models for measurement uncertainty, considered in a Bayesian conditional independence framework, are applied to ecological data generated from imagery. The models are shown to be appropriate and useful in certain situations. Measurement uncertainty is also considered in the context of change detection when two images are not co-registered. An approach for detecting change in two successive images is proposed that is not affected by registration. The procedure uses the Kolmogorov-Smirnov test on homogeneous segments of an image to detect change, with the homogeneous segments determined using a Bayesian mixture model of pixel values. Using the mixture model to segment an image also allows for uncertainty in the composition of an image. This thesis concludes by comparing several different Bayesian image segmentation approaches that allow for uncertainty regarding the allocation of pixels to different ground components. Each segmentation approach is applied to a data set of chlorophyll values and shown to have different benefits and drawbacks depending on the aims of the analysis.
Resumo:
This overview focuses on the application of chemometrics techniques for the investigation of soils contaminated by polycyclic aromatic hydrocarbons (PAHs) and metals because these two important and very diverse groups of pollutants are ubiquitous in soils. The salient features of various studies carried out in the micro- and recreational environments of humans, are highlighted in the context of the various multivariate statistical techniques available across discipline boundaries that have been effectively used in soil studies. Particular attention is paid to techniques employed in the geosciences that may be effectively utilized for environmental soil studies; classical multivariate approaches that may be used in isolation or as complementary methods to these are also discussed. Chemometrics techniques widely applied in atmospheric studies for identifying sources of pollutants or for determining the importance of contaminant source contributions to a particular site, have seen little use in soil studies, but may be effectively employed in such investigations. Suitable programs are also available for suggesting mitigating measures in cases of soil contamination, and these are also considered. Specific techniques reviewed include pattern recognition techniques such as Principal Components Analysis (PCA), Fuzzy Clustering (FC) and Cluster Analysis (CA); geostatistical tools include variograms, Geographical Information Systems (GIS), contour mapping and kriging; source identification and contribution estimation methods reviewed include Positive Matrix Factorisation (PMF), and Principal Component Analysis on Absolute Principal Component Scores (PCA/APCS). Mitigating measures to limit or eliminate pollutant sources may be suggested through the use of ranking analysis and multi criteria decision making methods (MCDM). These methods are mainly represented in this review by studies employing the Preference Ranking Organisation Method for Enrichment Evaluation (PROMETHEE) and its associated graphic output, Geometrical Analysis for Interactive Aid (GAIA).
Resumo:
This paper presents a method of spatial sampling based on stratification by Local Moran’s I i calculated using auxiliary information. The sampling technique is compared to other design-based approaches including simple random sampling, systematic sampling on a regular grid, conditional Latin Hypercube sampling and stratified sampling based on auxiliary information, and is illustrated using two different spatial data sets. Each of the samples for the two data sets is interpolated using regression kriging to form a geostatistical map for their respective areas. The proposed technique is shown to be competitive in reproducing specific areas of interest with high accuracy.
Resumo:
The health impacts of exposure to ambient temperature have been drawing increasing attention from the environmental health research community, government, society, industries, and the public. Case-crossover and time series models are most commonly used to examine the effects of ambient temperature on mortality. However, some key methodological issues remain to be addressed. For example, few studies have used spatiotemporal models to assess the effects of spatial temperatures on mortality. Few studies have used a case-crossover design to examine the delayed (distributed lag) and non-linear relationship between temperature and mortality. Also, little evidence is available on the effects of temperature changes on mortality, and on differences in heat-related mortality over time. This thesis aimed to address the following research questions: 1. How to combine case-crossover design and distributed lag non-linear models? 2. Is there any significant difference in effect estimates between time series and spatiotemporal models? 3. How to assess the effects of temperature changes between neighbouring days on mortality? 4. Is there any change in temperature effects on mortality over time? To combine the case-crossover design and distributed lag non-linear model, datasets including deaths, and weather conditions (minimum temperature, mean temperature, maximum temperature, and relative humidity), and air pollution were acquired from Tianjin China, for the years 2005 to 2007. I demonstrated how to combine the case-crossover design with a distributed lag non-linear model. This allows the case-crossover design to estimate the non-linear and delayed effects of temperature whilst controlling for seasonality. There was consistent U-shaped relationship between temperature and mortality. Cold effects were delayed by 3 days, and persisted for 10 days. Hot effects were acute and lasted for three days, and were followed by mortality displacement for non-accidental, cardiopulmonary, and cardiovascular deaths. Mean temperature was a better predictor of mortality (based on model fit) than maximum or minimum temperature. It is still unclear whether spatiotemporal models using spatial temperature exposure produce better estimates of mortality risk compared with time series models that use a single site’s temperature or averaged temperature from a network of sites. Daily mortality data were obtained from 163 locations across Brisbane city, Australia from 2000 to 2004. Ordinary kriging was used to interpolate spatial temperatures across the city based on 19 monitoring sites. A spatiotemporal model was used to examine the impact of spatial temperature on mortality. A time series model was used to assess the effects of single site’s temperature, and averaged temperature from 3 monitoring sites on mortality. Squared Pearson scaled residuals were used to check the model fit. The results of this study show that even though spatiotemporal models gave a better model fit than time series models, spatiotemporal and time series models gave similar effect estimates. Time series analyses using temperature recorded from a single monitoring site or average temperature of multiple sites were equally good at estimating the association between temperature and mortality as compared with a spatiotemporal model. A time series Poisson regression model was used to estimate the association between temperature change and mortality in summer in Brisbane, Australia during 1996–2004 and Los Angeles, United States during 1987–2000. Temperature change was calculated by the current day's mean temperature minus the previous day's mean. In Brisbane, a drop of more than 3 �C in temperature between days was associated with relative risks (RRs) of 1.16 (95% confidence interval (CI): 1.02, 1.31) for non-external mortality (NEM), 1.19 (95% CI: 1.00, 1.41) for NEM in females, and 1.44 (95% CI: 1.10, 1.89) for NEM aged 65.74 years. An increase of more than 3 �C was associated with RRs of 1.35 (95% CI: 1.03, 1.77) for cardiovascular mortality and 1.67 (95% CI: 1.15, 2.43) for people aged < 65 years. In Los Angeles, only a drop of more than 3 �C was significantly associated with RRs of 1.13 (95% CI: 1.05, 1.22) for total NEM, 1.25 (95% CI: 1.13, 1.39) for cardiovascular mortality, and 1.25 (95% CI: 1.14, 1.39) for people aged . 75 years. In both cities, there were joint effects of temperature change and mean temperature on NEM. A change in temperature of more than 3 �C, whether positive or negative, has an adverse impact on mortality even after controlling for mean temperature. I examined the variation in the effects of high temperatures on elderly mortality (age . 75 years) by year, city and region for 83 large US cities between 1987 and 2000. High temperature days were defined as two or more consecutive days with temperatures above the 90th percentile for each city during each warm season (May 1 to September 30). The mortality risk for high temperatures was decomposed into: a "main effect" due to high temperatures using a distributed lag non-linear function, and an "added effect" due to consecutive high temperature days. I pooled yearly effects across regions and overall effects at both regional and national levels. The effects of high temperature (both main and added effects) on elderly mortality varied greatly by year, city and region. The years with higher heat-related mortality were often followed by those with relatively lower mortality. Understanding this variability in the effects of high temperatures is important for the development of heat-warning systems. In conclusion, this thesis makes contribution in several aspects. Case-crossover design was combined with distribute lag non-linear model to assess the effects of temperature on mortality in Tianjin. This makes the case-crossover design flexibly estimate the non-linear and delayed effects of temperature. Both extreme cold and high temperatures increased the risk of mortality in Tianjin. Time series model using single site’s temperature or averaged temperature from some sites can be used to examine the effects of temperature on mortality. Temperature change (no matter significant temperature drop or great temperature increase) increases the risk of mortality. The high temperature effect on mortality is highly variable from year to year.
Resumo:
Most studies examining the temperature–mortality association in a city used temperatures from one site or the average from a network of sites. This may cause measurement error as temperature varies across a city due to effects such as urban heat islands. We examined whether spatiotemporal models using spatially resolved temperatures produced different associations between temperature and mortality compared with time series models that used non-spatial temperatures. We obtained daily mortality data in 163 areas across Brisbane city, Australia from 2000 to 2004. We used ordinary kriging to interpolate spatial temperature variation across the city based on 19 monitoring sites. We used a spatiotemporal model to examine the impact of spatially resolved temperatures on mortality. Also, we used a time series model to examine non-spatial temperatures using a single site and the average temperature from three sites. We used squared Pearson scaled residuals to compare model fit. We found that kriged temperatures were consistent with observed temperatures. Spatiotemporal models using kriged temperature data yielded slightly better model fit than time series models using a single site or the average of three sites' data. Despite this better fit, spatiotemporal and time series models produced similar associations between temperature and mortality. In conclusion, time series models using non-spatial temperatures were equally good at estimating the city-wide association between temperature and mortality as spatiotemporal models.