852 resultados para Initial data problem
Resumo:
The modelling of a nonlinear stochastic dynamical processes from data involves solving the problems of data gathering, preprocessing, model architecture selection, learning or adaptation, parametric evaluation and model validation. For a given model architecture such as associative memory networks, a common problem in non-linear modelling is the problem of "the curse of dimensionality". A series of complementary data based constructive identification schemes, mainly based on but not limited to an operating point dependent fuzzy models, are introduced in this paper with the aim to overcome the curse of dimensionality. These include (i) a mixture of experts algorithm based on a forward constrained regression algorithm; (ii) an inherent parsimonious delaunay input space partition based piecewise local lineal modelling concept; (iii) a neurofuzzy model constructive approach based on forward orthogonal least squares and optimal experimental design and finally (iv) the neurofuzzy model construction algorithm based on basis functions that are Bézier Bernstein polynomial functions and the additive decomposition. Illustrative examples demonstrate their applicability, showing that the final major hurdle in data based modelling has almost been removed.
Resumo:
We are developing computational tools supporting the detailed analysis of the dependence of neural electrophysiological response on dendritic morphology. We approach this problem by combining simulations of faithful models of neurons (experimental real life morphological data with known models of channel kinetics) with algorithmic extraction of morphological and physiological parameters and statistical analysis. In this paper, we present the novel method for an automatic recognition of spike trains in voltage traces, which eliminates the need for human intervention. This enables classification of waveforms with consistent criteria across all the analyzed traces and so it amounts to reduction of the noise in the data. This method allows for an automatic extraction of relevant physiological parameters necessary for further statistical analysis. In order to illustrate the usefulness of this procedure to analyze voltage traces, we characterized the influence of the somatic current injection level on several electrophysiological parameters in a set of modeled neurons. This application suggests that such an algorithmic processing of physiological data extracts parameters in a suitable form for further investigation of structure-activity relationship in single neurons.
Resumo:
Measured process data normally contain inaccuracies because the measurements are obtained using imperfect instruments. As well as random errors one can expect systematic bias caused by miscalibrated instruments or outliers caused by process peaks such as sudden power fluctuations. Data reconciliation is the adjustment of a set of process data based on a model of the process so that the derived estimates conform to natural laws. In this paper, techniques for the detection and identification of both systematic bias and outliers in dynamic process data are presented. A novel technique for the detection and identification of systematic bias is formulated and presented. The problem of detection, identification and elimination of outliers is also treated using a modified version of a previously available clustering technique. These techniques are also combined to provide a global dynamic data reconciliation (DDR) strategy. The algorithms presented are tested in isolation and in combination using dynamic simulations of two continuous stirred tank reactors (CSTR).
Resumo:
The problem of reconstructing the (otherwise unknown) source and sink field of a tracer in a fluid is studied by developing and testing a simple tracer transport model of a single-level global atmosphere and a dynamic data assimilation system. The source/sink field (taken to be constant over a 10-day assimilation window) and initial tracer field are analysed together by assimilating imperfect tracer observations over the window. Experiments show that useful information about the source/sink field may be determined from relatively few observations when the initial tracer field is known very accurately a-priori, even when a-priori source/sink information is biased (the source/sink a-priori is set to zero). In this case each observation provides information about the source/sink field at positions upstream and the assimilation of many observations together can reasonably determine the location and strength of a test source.
Resumo:
New ways of combining observations with numerical models are discussed in which the size of the state space can be very large, and the model can be highly nonlinear. Also the observations of the system can be related to the model variables in highly nonlinear ways, making this data-assimilation (or inverse) problem highly nonlinear. First we discuss the connection between data assimilation and inverse problems, including regularization. We explore the choice of proposal density in a Particle Filter and show how the ’curse of dimensionality’ might be beaten. In the standard Particle Filter ensembles of model runs are propagated forward in time until observations are encountered, rendering it a pure Monte-Carlo method. In large-dimensional systems this is very inefficient and very large numbers of model runs are needed to solve the data-assimilation problem realistically. In our approach we steer all model runs towards the observations resulting in a much more efficient method. By further ’ensuring almost equal weight’ we avoid performing model runs that are useless in the end. Results are shown for the 40 and 1000 dimensional Lorenz 1995 model.
Conditioning of incremental variational data assimilation, with application to the Met Office system
Resumo:
Implementations of incremental variational data assimilation require the iterative minimization of a series of linear least-squares cost functions. The accuracy and speed with which these linear minimization problems can be solved is determined by the condition number of the Hessian of the problem. In this study, we examine how different components of the assimilation system influence this condition number. Theoretical bounds on the condition number for a single parameter system are presented and used to predict how the condition number is affected by the observation distribution and accuracy and by the specified lengthscales in the background error covariance matrix. The theoretical results are verified in the Met Office variational data assimilation system, using both pseudo-observations and real data.
Resumo:
This study was undertaken to explore gel permeation chromatography (GPC) for estimating molecular weights of proanthocyanidin fractions isolated from sainfoin (Onobrychis viciifolia). The results were compared with data obtained by thiolytic degradation of the same fractions. Polystyrene, polyethylene glycol and polymethyl methacrylate standards were not suitable for estimating the molecular weights of underivatized proanthocyanidins. Therefore, a novel HPLC-GPC method was developed based on two serially connected PolarGel-L columns using DMF that contained 5% water, 1% acetic acid and 0.15 M LiBr at 0.7 ml/min and 50 degrees C. This yielded a single calibration curve for galloyl glucoses (trigalloyl glucose, pentagalloyl glucose), ellagitannins (pedunculagin, vescalagin, punicalagin, oenothein B, gemin A), proanthocyanidins (procyanidin B2, cinnamtannin B1), and several other polyphenols (catechin, epicatechin gallate, epicallocatechin gallate, amentoflavone). These GPC predicted molecular weights represented a considerable advance over previously reported HPLC-GPC methods for underivatized proanthocyanidins. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
The organization of non-crystalline polymeric materials at a local level, namely on a spatial scale between a few and 100 a, is still unclear in many respects. The determination of the local structure in terms of the configuration and conformation of the polymer chain and of the packing characteristics of the chain in the bulk material represents a challenging problem. Data from wide-angle diffraction experiments are very difficult to interpret due to the very large amount of information that they carry, that is the large number of correlations present in the diffraction patterns.We describe new approaches that permit a detailed analysis of the complex neutron diffraction patterns characterizing polymer melts and glasses. The coupling of different computer modelling strategies with neutron scattering data over a wide Q range allows the extraction of detailed quantitative information on the structural arrangements of the materials of interest. Proceeding from modelling routes as diverse as force field calculations, single-chain modelling and reverse Monte Carlo, we show the successes and pitfalls of each approach in describing model systems, which illustrate the need to attack the data analysis problem simultaneously from several fronts.
Resumo:
Recently major processor manufacturers have announced a dramatic shift in their paradigm to increase computing power over the coming years. Instead of focusing on faster clock speeds and more powerful single core CPUs, the trend clearly goes towards multi core systems. This will also result in a paradigm shift for the development of algorithms for computationally expensive tasks, such as data mining applications. Obviously, work on parallel algorithms is not new per se but concentrated efforts in the many application domains are still missing. Multi-core systems, but also clusters of workstations and even large-scale distributed computing infrastructures provide new opportunities and pose new challenges for the design of parallel and distributed algorithms. Since data mining and machine learning systems rely on high performance computing systems, research on the corresponding algorithms must be on the forefront of parallel algorithm research in order to keep pushing data mining and machine learning applications to be more powerful and, especially for the former, interactive. To bring together researchers and practitioners working in this exciting field, a workshop on parallel data mining was organized as part of PKDD/ECML 2006 (Berlin, Germany). The six contributions selected for the program describe various aspects of data mining and machine learning approaches featuring low to high degrees of parallelism: The first contribution focuses the classic problem of distributed association rule mining and focuses on communication efficiency to improve the state of the art. After this a parallelization technique for speeding up decision tree construction by means of thread-level parallelism for shared memory systems is presented. The next paper discusses the design of a parallel approach for dis- tributed memory systems of the frequent subgraphs mining problem. This approach is based on a hierarchical communication topology to solve issues related to multi-domain computational envi- ronments. The forth paper describes the combined use and the customization of software packages to facilitate a top down parallelism in the tuning of Support Vector Machines (SVM) and the next contribution presents an interesting idea concerning parallel training of Conditional Random Fields (CRFs) and motivates their use in labeling sequential data. The last contribution finally focuses on very efficient feature selection. It describes a parallel algorithm for feature selection from random subsets. Selecting the papers included in this volume would not have been possible without the help of an international Program Committee that has provided detailed reviews for each paper. We would like to also thank Matthew Otey who helped with publicity for the workshop.
Resumo:
In numerical weather prediction (NWP) data assimilation (DA) methods are used to combine available observations with numerical model estimates. This is done by minimising measures of error on both observations and model estimates with more weight given to data that can be more trusted. For any DA method an estimate of the initial forecast error covariance matrix is required. For convective scale data assimilation, however, the properties of the error covariances are not well understood. An effective way to investigate covariance properties in the presence of convection is to use an ensemble-based method for which an estimate of the error covariance is readily available at each time step. In this work, we investigate the performance of the ensemble square root filter (EnSRF) in the presence of cloud growth applied to an idealised 1D convective column model of the atmosphere. We show that the EnSRF performs well in capturing cloud growth, but the ensemble does not cope well with discontinuities introduced into the system by parameterised rain. The state estimates lose accuracy, and more importantly the ensemble is unable to capture the spread (variance) of the estimates correctly. We also find, counter-intuitively, that by reducing the spatial frequency of observations and/or the accuracy of the observations, the ensemble is able to capture the states and their variability successfully across all regimes.
Resumo:
Data assimilation aims to incorporate measured observations into a dynamical system model in order to produce accurate estimates of all the current (and future) state variables of the system. The optimal estimates minimize a variational principle and can be found using adjoint methods. The model equations are treated as strong constraints on the problem. In reality, the model does not represent the system behaviour exactly and errors arise due to lack of resolution and inaccuracies in physical parameters, boundary conditions and forcing terms. A technique for estimating systematic and time-correlated errors as part of the variational assimilation procedure is described here. The modified method determines a correction term that compensates for model error and leads to improved predictions of the system states. The technique is illustrated in two test cases. Applications to the 1-D nonlinear shallow water equations demonstrate the effectiveness of the new procedure.
Resumo:
Current methods for estimating vegetation parameters are generally sub-optimal in the way they exploit information and do not generally consider uncertainties. We look forward to a future where operational dataassimilation schemes improve estimates by tracking land surface processes and exploiting multiple types of observations. Dataassimilation schemes seek to combine observations and models in a statistically optimal way taking into account uncertainty in both, but have not yet been much exploited in this area. The EO-LDAS scheme and prototype, developed under ESA funding, is designed to exploit the anticipated wealth of data that will be available under GMES missions, such as the Sentinel family of satellites, to provide improved mapping of land surface biophysical parameters. This paper describes the EO-LDAS implementation, and explores some of its core functionality. EO-LDAS is a weak constraint variational dataassimilationsystem. The prototype provides a mechanism for constraint based on a prior estimate of the state vector, a linear dynamic model, and EarthObservationdata (top-of-canopy reflectance here). The observation operator is a non-linear optical radiative transfer model for a vegetation canopy with a soil lower boundary, operating over the range 400 to 2500 nm. Adjoint codes for all model and operator components are provided in the prototype by automatic differentiation of the computer codes. In this paper, EO-LDAS is applied to the problem of daily estimation of six of the parameters controlling the radiative transfer operator over the course of a year (> 2000 state vector elements). Zero and first order process model constraints are implemented and explored as the dynamic model. The assimilation estimates all state vector elements simultaneously. This is performed in the context of a typical Sentinel-2 MSI operating scenario, using synthetic MSI observations simulated with the observation operator, with uncertainties typical of those achieved by optical sensors supposed for the data. The experiments consider a baseline state vector estimation case where dynamic constraints are applied, and assess the impact of dynamic constraints on the a posteriori uncertainties. The results demonstrate that reductions in uncertainty by a factor of up to two might be obtained by applying the sorts of dynamic constraints used here. The hyperparameter (dynamic model uncertainty) required to control the assimilation are estimated by a cross-validation exercise. The result of the assimilation is seen to be robust to missing observations with quite large data gaps.
Resumo:
This paper presents practical approaches to the problem of sample size re-estimation in the case of clinical trials with survival data when proportional hazards can be assumed. When data are readily available at the time of the review, on a full range of survival experiences across the recruited patients, it is shown that, as expected, performing a blinded re-estimation procedure is straightforward and can help to maintain the trial's pre-specified error rates. Two alternative methods for dealing with the situation where limited survival experiences are available at the time of the sample size review are then presented and compared. In this instance, extrapolation is required in order to undertake the sample size re-estimation. Worked examples, together with results from a simulation study are described. It is concluded that, as in the standard case, use of either extrapolation approach successfully protects the trial error rates. Copyright © 2012 John Wiley & Sons, Ltd.
Resumo:
When performing data fusion, one often measures where targets were and then wishes to deduce where targets currently are. There has been recent research on the processing of such out-of-sequence data. This research has culminated in the development of a number of algorithms for solving the associated tracking problem. This paper reviews these different approaches in a common Bayesian framework and proposes an architecture that orthogonalises the data association and out-of-sequence problems such that any combination of solutions to these two problems can be used together. The emphasis is not on advocating one approach over another on the basis of computational expense, but rather on understanding the relationships among the algorithms so that any approximations made are explicit. Results for a multi-sensor scenario involving out-of-sequence data association are used to illustrate the utility of this approach in a specific context.
Resumo:
In data fusion systems, one often encounters measurements of past target locations and then wishes to deduce where the targets are currently located. Recent research on the processing of such out-of-sequence data has culminated in the development of a number of algorithms for solving the associated tracking problem. This paper reviews these different approaches in a common Bayesian framework and proposes an architecture that orthogonalises the data association and out-of-sequence problems such that any combination of solutions to these two problems can be used together. The emphasis is not on advocating one approach over another on the basis of computational expense, but rather on understanding the relationships between the algorithms so that any approximations made are explicit.