915 resultados para Data replication processes
Resumo:
The modelling of a nonlinear stochastic dynamical processes from data involves solving the problems of data gathering, preprocessing, model architecture selection, learning or adaptation, parametric evaluation and model validation. For a given model architecture such as associative memory networks, a common problem in non-linear modelling is the problem of "the curse of dimensionality". A series of complementary data based constructive identification schemes, mainly based on but not limited to an operating point dependent fuzzy models, are introduced in this paper with the aim to overcome the curse of dimensionality. These include (i) a mixture of experts algorithm based on a forward constrained regression algorithm; (ii) an inherent parsimonious delaunay input space partition based piecewise local lineal modelling concept; (iii) a neurofuzzy model constructive approach based on forward orthogonal least squares and optimal experimental design and finally (iv) the neurofuzzy model construction algorithm based on basis functions that are Bézier Bernstein polynomial functions and the additive decomposition. Illustrative examples demonstrate their applicability, showing that the final major hurdle in data based modelling has almost been removed.
Resumo:
In this paper, we investigate the role of judgement in the formation of forecasts in commercial property markets. The investigation is based on interview surveys with the majority of UK forecast producers, who are using a range of inputs and data sets to form models to predict an array of variables for a range of locations. The findings suggest that forecasts need to be acceptable to their users (and purchasers) and consequently forecasters generally have incentives to avoid presenting contentious or conspicuous forecasts. Where extreme forecasts are generated by a model, forecasters often engage in ‘self‐censorship’ or are ‘censored’ following in‐house consultation. It is concluded that the forecasting process is significantly more complex than merely carrying out econometric modelling, forecasts are mediated and contested within organisations and that impacts can vary considerably across different organizational contexts.
Resumo:
In this paper we investigate the role of judgement in the formation of forecasts in commercial real estate markets. Based on interview surveys with the majority of forecast producers, we find that real estate forecasters are using a range of inputs and data sets to form models to predict an array of variables for a range of locations. The findings suggest that forecasts need to be acceptable to their users (and purchasers) and consequently forecasters generally have incentives to avoid presenting contentious or conspicuous forecasts. Where extreme forecasts are generated by a model, forecasters often engage in ‘self-censorship’ or are ‘censored’ following in-house consultation. It is concluded that the forecasting process is more complex than merely carrying out econometric modelling and that the impact of the influences within this process vary considerably across different organizational contexts.
Resumo:
This paper derives exact discrete time representations for data generated by a continuous time autoregressive moving average (ARMA) system with mixed stock and flow data. The representations for systems comprised entirely of stocks or of flows are also given. In each case the discrete time representations are shown to be of ARMA form, the orders depending on those of the continuous time system. Three examples and applications are also provided, two of which concern the stationary ARMA(2, 1) model with stock variables (with applications to sunspot data and a short-term interest rate) and one concerning the nonstationary ARMA(2, 1) model with a flow variable (with an application to U.S. nondurable consumers’ expenditure). In all three examples the presence of an MA(1) component in the continuous time system has a dramatic impact on eradicating unaccounted-for serial correlation that is present in the discrete time version of the ARMA(2, 0) specification, even though the form of the discrete time model is ARMA(2, 1) for both models.
Resumo:
The effect of episodic drought on dissolved organic carbon (DOC) dynamics in peatlands has been the subject of considerable debate, as decomposition and DOC production is thought to increase under aerobic conditions, yet decreased DOC concentrations have been observed during drought periods. Decreased DOC solubility due to drought-induced acidification driven by sulphur (S) redox reactions has been proposed as a causal mechanism; however evidence is based on a limited number of studies carried out at a few sites. To test this hypothesis on a range of different peats, we carried out controlled drought simulation experiments on peat cores collected from six sites across Great Britain. Our data show a concurrent increase in sulphate (SO4) and a decrease in DOC across all sites during simulated water table draw-down, although the magnitude of the relationship between SO4 and DOC differed between sites. Instead, we found a consistent relationship across all sites between DOC decrease and acidification measured by the pore water acid neutralising capacity (ANC). ANC provided a more consistent measure of drought-induced acidification than SO4 alone because it accounts for differences in base cation and acid anions concentrations between sites. Rewetting resulted in rapid DOC increases without a concurrent increase in soil respiration, suggesting DOC changes were primarily controlled by soil acidity not soil biota. These results highlight the need for an integrated analysis of hydrologically driven chemical and biological processes in peatlands to improve our understanding and ability to predict the interaction between atmospheric pollution and changing climatic conditions from plot to regional and global scales.
Resumo:
The ASTER Global Digital Elevation Model (GDEM) has made elevation data at 30 m spatial resolution freely available, enabling reinvestigation of morphometric relationships derived from limited field data using much larger sample sizes. These data are used to analyse a range of morphometric relationships derived for dunes (between dune height, spacing, and equivalent sand thickness) in the Namib Sand Sea, which was chosen because there are a number of extant studies that could be used for comparison with the results. The relative accuracy of GDEM for capturing dune height and shape was tested against multiple individual ASTER DEM scenes and against field surveys, highlighting the smoothing of the dune crest and resultant underestimation of dune height, and the omission of the smallest dunes, because of the 30 m sampling of ASTER DEM products. It is demonstrated that morphometric relationships derived from GDEM data are broadly comparable with relationships derived by previous methods, across a range of different dune types. The data confirm patterns of dune height, spacing and equivalent sand thickness mapped previously in the Namib Sand Sea, but add new detail to these patterns.
Resumo:
Accurate replication of the processes associated with the energetics of the tropical ocean is necessary if coupled GCMs are to simulate the physics of ENSO correctly, including the transfer of energy from the winds to the ocean thermocline and energy dissipation during the ENSO cycle. Here, we analyze ocean energetics in coupled GCMs in terms of two integral parameters describing net energy loss in the system using the approach recently proposed by Brown and Fedorov (J Clim 23:1563–1580, 2010a) and Fedorov (J Clim 20:1108–1117, 2007). These parameters are (1) the efficiency c of the conversion of wind power into the buoyancy power that controls the rate of change of the available potential energy (APE) in the ocean and (2) the e-folding rate a that characterizes the damping of APE by turbulent diffusion and other processes. Estimating these two parameters for coupled models reveals potential deficiencies (and large differences) in how state-of-the-art coupled GCMs reproduce the ocean energetics as compared to ocean-only models and data assimilating models. The majority of the coupled models we analyzed show a lower efficiency (values of c in the range of 10–50% versus 50–60% for ocean-only simulations or reanalysis) and a relatively strong energy damping (values of a-1 in the range 0.4–1 years versus 0.9–1.2 years). These differences in the model energetics appear to reflect differences in the simulated thermal structure of the tropical ocean, the structure of ocean equatorial currents, and deficiencies in the way coupled models simulate ENSO.
Resumo:
Ice cloud representation in general circulation models remains a challenging task, due to the lack of accurate observations and the complexity of microphysical processes. In this article, we evaluate the ice water content (IWC) and ice cloud fraction statistical distributions from the numerical weather prediction models of the European Centre for Medium-Range Weather Forecasts (ECMWF) and the UK Met Office, exploiting the synergy between the CloudSat radar and CALIPSO lidar. Using the last three weeks of July 2006, we analyse the global ice cloud occurrence as a function of temperature and latitude and show that the models capture the main geographical and temperature-dependent distributions, but overestimate the ice cloud occurrence in the Tropics in the temperature range from −60 °C to −20 °C and in the Antarctic for temperatures higher than −20 °C, but underestimate ice cloud occurrence at very low temperatures. A global statistical comparison of the occurrence of grid-box mean IWC at different temperatures shows that both the mean and range of IWC increases with increasing temperature. Globally, the models capture most of the IWC variability in the temperature range between −60 °C and −5 °C, and also reproduce the observed latitudinal dependencies in the IWC distribution due to different meteorological regimes. Two versions of the ECMWF model are assessed. The recent operational version with a diagnostic representation of precipitating snow and mixed-phase ice cloud fails to represent the IWC distribution in the −20 °C to 0 °C range, but a new version with prognostic variables for liquid water, ice and snow is much closer to the observed distribution. The comparison of models and observations provides a much-needed analysis of the vertical distribution of IWC across the globe, highlighting the ability of the models to reproduce much of the observed variability as well as the deficiencies where further improvements are required.
Resumo:
We present an approach for dealing with coarse-resolution Earth observations (EO) in terrestrial ecosystem data assimilation schemes. The use of coarse-scale observations in ecological data assimilation schemes is complicated by spatial heterogeneity and nonlinear processes in natural ecosystems. If these complications are not appropriately dealt with, then the data assimilation will produce biased results. The “disaggregation” approach that we describe in this paper combines frequent coarse-resolution observations with temporally sparse fine-resolution measurements. We demonstrate the approach using a demonstration data set based on measurements of an Arctic ecosystem. In this example, normalized difference vegetation index observations are assimilated into a “zero-order” model of leaf area index and carbon uptake. The disaggregation approach conserves key ecosystem characteristics regardless of the observation resolution and estimates the carbon uptake to within 1% of the demonstration data set “truth.” Assimilating the same data in the normal manner, but without the disaggregation approach, results in carbon uptake being underestimated by 58% at an observation resolution of 250 m. The disaggregation method allows the combination of multiresolution EO and improves in spatial resolution if observations are located on a grid that shifts from one observation time to the next. Additionally, the approach is not tied to a particular data assimilation scheme, model, or EO product and can cope with complex observation distributions, as it makes no implicit assumptions of normality.
Resumo:
Current methods for estimating vegetation parameters are generally sub-optimal in the way they exploit information and do not generally consider uncertainties. We look forward to a future where operational dataassimilation schemes improve estimates by tracking land surface processes and exploiting multiple types of observations. Dataassimilation schemes seek to combine observations and models in a statistically optimal way taking into account uncertainty in both, but have not yet been much exploited in this area. The EO-LDAS scheme and prototype, developed under ESA funding, is designed to exploit the anticipated wealth of data that will be available under GMES missions, such as the Sentinel family of satellites, to provide improved mapping of land surface biophysical parameters. This paper describes the EO-LDAS implementation, and explores some of its core functionality. EO-LDAS is a weak constraint variational dataassimilationsystem. The prototype provides a mechanism for constraint based on a prior estimate of the state vector, a linear dynamic model, and EarthObservationdata (top-of-canopy reflectance here). The observation operator is a non-linear optical radiative transfer model for a vegetation canopy with a soil lower boundary, operating over the range 400 to 2500 nm. Adjoint codes for all model and operator components are provided in the prototype by automatic differentiation of the computer codes. In this paper, EO-LDAS is applied to the problem of daily estimation of six of the parameters controlling the radiative transfer operator over the course of a year (> 2000 state vector elements). Zero and first order process model constraints are implemented and explored as the dynamic model. The assimilation estimates all state vector elements simultaneously. This is performed in the context of a typical Sentinel-2 MSI operating scenario, using synthetic MSI observations simulated with the observation operator, with uncertainties typical of those achieved by optical sensors supposed for the data. The experiments consider a baseline state vector estimation case where dynamic constraints are applied, and assess the impact of dynamic constraints on the a posteriori uncertainties. The results demonstrate that reductions in uncertainty by a factor of up to two might be obtained by applying the sorts of dynamic constraints used here. The hyperparameter (dynamic model uncertainty) required to control the assimilation are estimated by a cross-validation exercise. The result of the assimilation is seen to be robust to missing observations with quite large data gaps.
Resumo:
In the last decade, a vast number of land surface schemes has been designed for use in global climate models, atmospheric weather prediction, mesoscale numerical models, ecological models, and models of global changes. Since land surface schemes are designed for different purposes they have various levels of complexity in the treatment of bare soil processes, vegetation, and soil water movement. This paper is a contribution to a little group of papers dealing with intercomparison of differently designed and oriented land surface schemes. For that purpose we have chosen three schemes for classification: i) global climate models, BATS (Dickinson et al., 1986; Dickinson et al., 1992); ii) mesoscale and ecological models, LEAF (Lee, 1992) and iii) mesoscale models, LAPS (Mihailović, 1996; Mihailović and Kallos, 1997; Mihailović et al., 1999) according to the Shao et al. (1995) classification. These schemes were compared using surface fluxes and leaf temperature outputs obtained by time integrations of data sets derived from the micrometeorological measurements above a maize field at an experimental site in De Sinderhoeve (The Netherlands) for 18 August, 8 September, and 4 October 1988. Finally, comparison of the schemes was supported applying a simple statistical analysis on the surface flux outputs.
Resumo:
This dissertation deals with aspects of sequential data assimilation (in particular ensemble Kalman filtering) and numerical weather forecasting. In the first part, the recently formulated Ensemble Kalman-Bucy (EnKBF) filter is revisited. It is shown that the previously used numerical integration scheme fails when the magnitude of the background error covariance grows beyond that of the observational error covariance in the forecast window. Therefore, we present a suitable integration scheme that handles the stiffening of the differential equations involved and doesn’t represent further computational expense. Moreover, a transform-based alternative to the EnKBF is developed: under this scheme, the operations are performed in the ensemble space instead of in the state space. Advantages of this formulation are explained. For the first time, the EnKBF is implemented in an atmospheric model. The second part of this work deals with ensemble clustering, a phenomenon that arises when performing data assimilation using of deterministic ensemble square root filters in highly nonlinear forecast models. Namely, an M-member ensemble detaches into an outlier and a cluster of M-1 members. Previous works may suggest that this issue represents a failure of EnSRFs; this work dispels that notion. It is shown that ensemble clustering can be reverted also due to nonlinear processes, in particular the alternation between nonlinear expansion and compression of the ensemble for different regions of the attractor. Some EnSRFs that use random rotations have been developed to overcome this issue; these formulations are analyzed and their advantages and disadvantages with respect to common EnSRFs are discussed. The third and last part contains the implementation of the Robert-Asselin-Williams (RAW) filter in an atmospheric model. The RAW filter is an improvement to the widely popular Robert-Asselin filter that successfully suppresses spurious computational waves while avoiding any distortion in the mean value of the function. Using statistical significance tests both at the local and field level, it is shown that the climatology of the SPEEDY model is not modified by the changed time stepping scheme; hence, no retuning of the parameterizations is required. It is found the accuracy of the medium-term forecasts is increased by using the RAW filter.
Resumo:
Data assimilation algorithms are a crucial part of operational systems in numerical weather prediction, hydrology and climate science, but are also important for dynamical reconstruction in medical applications and quality control for manufacturing processes. Usually, a variety of diverse measurement data are employed to determine the state of the atmosphere or to a wider system including land and oceans. Modern data assimilation systems use more and more remote sensing data, in particular radiances measured by satellites, radar data and integrated water vapor measurements via GPS/GNSS signals. The inversion of some of these measurements are ill-posed in the classical sense, i.e. the inverse of the operator H which maps the state onto the data is unbounded. In this case, the use of such data can lead to significant instabilities of data assimilation algorithms. The goal of this work is to provide a rigorous mathematical analysis of the instability of well-known data assimilation methods. Here, we will restrict our attention to particular linear systems, in which the instability can be explicitly analyzed. We investigate the three-dimensional variational assimilation and four-dimensional variational assimilation. A theory for the instability is developed using the classical theory of ill-posed problems in a Banach space framework. Further, we demonstrate by numerical examples that instabilities can and will occur, including an example from dynamic magnetic tomography.
Resumo:
OBJECTIVES: The prediction of protein structure and the precise understanding of protein folding and unfolding processes remains one of the greatest challenges in structural biology and bioinformatics. Computer simulations based on molecular dynamics (MD) are at the forefront of the effort to gain a deeper understanding of these complex processes. Currently, these MD simulations are usually on the order of tens of nanoseconds, generate a large amount of conformational data and are computationally expensive. More and more groups run such simulations and generate a myriad of data, which raises new challenges in managing and analyzing these data. Because the vast range of proteins researchers want to study and simulate, the computational effort needed to generate data, the large data volumes involved, and the different types of analyses scientists need to perform, it is desirable to provide a public repository allowing researchers to pool and share protein unfolding data. METHODS: To adequately organize, manage, and analyze the data generated by unfolding simulation studies, we designed a data warehouse system that is embedded in a grid environment to facilitate the seamless sharing of available computer resources and thus enable many groups to share complex molecular dynamics simulations on a more regular basis. RESULTS: To gain insight into the conformational fluctuations and stability of the monomeric forms of the amyloidogenic protein transthyretin (TTR), molecular dynamics unfolding simulations of the monomer of human TTR have been conducted. Trajectory data and meta-data of the wild-type (WT) protein and the highly amyloidogenic variant L55P-TTR represent the test case for the data warehouse. CONCLUSIONS: Web and grid services, especially pre-defined data mining services that can run on or 'near' the data repository of the data warehouse, are likely to play a pivotal role in the analysis of molecular dynamics unfolding data.
Resumo:
Advances in hardware and software in the past decade allow to capture, record and process fast data streams at a large scale. The research area of data stream mining has emerged as a consequence from these advances in order to cope with the real time analysis of potentially large and changing data streams. Examples of data streams include Google searches, credit card transactions, telemetric data and data of continuous chemical production processes. In some cases the data can be processed in batches by traditional data mining approaches. However, in some applications it is required to analyse the data in real time as soon as it is being captured. Such cases are for example if the data stream is infinite, fast changing, or simply too large in size to be stored. One of the most important data mining techniques on data streams is classification. This involves training the classifier on the data stream in real time and adapting it to concept drifts. Most data stream classifiers are based on decision trees. However, it is well known in the data mining community that there is no single optimal algorithm. An algorithm may work well on one or several datasets but badly on others. This paper introduces eRules, a new rule based adaptive classifier for data streams, based on an evolving set of Rules. eRules induces a set of rules that is constantly evaluated and adapted to changes in the data stream by adding new and removing old rules. It is different from the more popular decision tree based classifiers as it tends to leave data instances rather unclassified than forcing a classification that could be wrong. The ongoing development of eRules aims to improve its accuracy further through dynamic parameter setting which will also address the problem of changing feature domain values.