928 resultados para Data Driven Modeling


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work is focused on the study of saltwater intrusion in coastal aquifers, and in particular on the realization of conceptual schemes to evaluate the risk associated with it. Saltwater intrusion depends on different natural and anthropic factors, both presenting a strong aleatory behaviour, that should be considered for an optimal management of the territory and water resources. Given the uncertainty of problem parameters, the risk associated with salinization needs to be cast in a probabilistic framework. On the basis of a widely adopted sharp interface formulation, key hydrogeological problem parameters are modeled as random variables, and global sensitivity analysis is used to determine their influence on the position of saltwater interface. The analyses presented in this work rely on an efficient model reduction technique, based on Polynomial Chaos Expansion, able to combine the best description of the model without great computational burden. When the assumptions of classical analytical models are not respected, and this occurs several times in the applications to real cases of study, as in the area analyzed in the present work, one can adopt data-driven techniques, based on the analysis of the data characterizing the system under study. It follows that a model can be defined on the basis of connections between the system state variables, with only a limited number of assumptions about the "physical" behaviour of the system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This is the first part of a study investigating a model-based transient calibration process for diesel engines. The motivation is to populate hundreds of parameters (which can be calibrated) in a methodical and optimum manner by using model-based optimization in conjunction with the manual process so that, relative to the manual process used by itself, a significant improvement in transient emissions and fuel consumption and a sizable reduction in calibration time and test cell requirements is achieved. Empirical transient modelling and optimization has been addressed in the second part of this work, while the required data for model training and generalization are the focus of the current work. Transient and steady-state data from a turbocharged multicylinder diesel engine have been examined from a model training perspective. A single-cylinder engine with external air-handling has been used to expand the steady-state data to encompass transient parameter space. Based on comparative model performance and differences in the non-parametric space, primarily driven by a high engine difference between exhaust and intake manifold pressures (ΔP) during transients, it has been recommended that transient emission models should be trained with transient training data. It has been shown that electronic control module (ECM) estimates of transient charge flow and the exhaust gas recirculation (EGR) fraction cannot be accurate at the high engine ΔP frequently encountered during transient operation, and that such estimates do not account for cylinder-to-cylinder variation. The effects of high engine ΔP must therefore be incorporated empirically by using transient data generated from a spectrum of transient calibrations. Specific recommendations on how to choose such calibrations, how many data to acquire, and how to specify transient segments for data acquisition have been made. Methods to process transient data to account for transport delays and sensor lags have been developed. The processed data have then been visualized using statistical means to understand transient emission formation. Two modes of transient opacity formation have been observed and described. The first mode is driven by high engine ΔP and low fresh air flowrates, while the second mode is driven by high engine ΔP and high EGR flowrates. The EGR fraction is inaccurately estimated at both modes, while EGR distribution has been shown to be present but unaccounted for by the ECM. The two modes and associated phenomena are essential to understanding why transient emission models are calibration dependent and furthermore how to choose training data that will result in good model generalization.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An important problem in unsupervised data clustering is how to determine the number of clusters. Here we investigate how this can be achieved in an automated way by using interrelation matrices of multivariate time series. Two nonparametric and purely data driven algorithms are expounded and compared. The first exploits the eigenvalue spectra of surrogate data, while the second employs the eigenvector components of the interrelation matrix. Compared to the first algorithm, the second approach is computationally faster and not limited to linear interrelation measures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dynamic changes in ERP topographies can be conveniently analyzed by means of microstates, the so-called "atoms of thoughts", that represent brief periods of quasi-stable synchronized network activation. Comparing temporal microstate features such as on- and offset or duration between groups and conditions therefore allows a precise assessment of the timing of cognitive processes. So far, this has been achieved by assigning the individual time-varying ERP maps to spatially defined microstate templates obtained from clustering the grand mean data into predetermined numbers of topographies (microstate prototypes). Features obtained from these individual assignments were then statistically compared. This has the problem that the individual noise dilutes the match between individual topographies and templates leading to lower statistical power. We therefore propose a randomization-based procedure that works without assigning grand-mean microstate prototypes to individual data. In addition, we propose a new criterion to select the optimal number of microstate prototypes based on cross-validation across subjects. After a formal introduction, the method is applied to a sample data set of an N400 experiment and to simulated data with varying signal-to-noise ratios, and the results are compared to existing methods. In a first comparison with previously employed statistical procedures, the new method showed an increased robustness to noise, and a higher sensitivity for more subtle effects of microstate timing. We conclude that the proposed method is well-suited for the assessment of timing differences in cognitive processes. The increased statistical power allows identifying more subtle effects, which is particularly important in small and scarce patient populations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The hadronic light-by-light contribution to the anomalous magnetic moment of the muon was recently analyzed in the framework of dispersion theory, providing a systematic formalism where all input quantities are expressed in terms of on-shell form factors and scattering amplitudes that are in principle accessible in experiment. We briefly review the main ideas behind this framework and discuss the various experimental ingredients needed for the evaluation of one- and two-pion intermediate states. In particular, we identify processes that in the absence of data for doubly-virtual pion–photon interactions can help constrain parameters in the dispersive reconstruction of the relevant input quantities, the pion transition form factor and the helicity partial waves for γ⁎γ⁎→ππ.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The era of big data opens up new opportunities in personalised medicine, preventive care, chronic disease management and in telemonitoring and managing of patients with implanted devices. The rich data accumulating within online services and internet companies provide a microscope to study human behaviour at scale, and to ask completely new questions about the interplay between behavioural patterns and health. In this paper, we shed light on a particular aspect of data-driven healthcare: autonomous decision-making. We first look at three examples where we can expect data-driven decisions to be taken autonomously by technology, with no or limited human intervention. We then discuss some of the technical and practical challenges that can be expected, and sketch the research agenda to address them.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we propose a new fully-automatic method for localizing and segmenting 3D intervertebral discs from MR images, where the two problems are solved in a unified data-driven regression and classification framework. We estimate the output (image displacements for localization, or fg/bg labels for segmentation) of image points by exploiting both training data and geometric constraints simultaneously. The problem is formulated in a unified objective function which is then solved globally and efficiently. We validate our method on MR images of 25 patients. Taking manually labeled data as the ground truth, our method achieves a mean localization error of 1.3 mm, a mean Dice metric of 87%, and a mean surface distance of 1.3 mm. Our method can be applied to other localization and segmentation tasks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we propose a new method for fully-automatic landmark detection and shape segmentation in X-ray images. To detect landmarks, we estimate the displacements from some randomly sampled image patches to the (unknown) landmark positions, and then we integrate these predictions via a voting scheme. Our key contribution is a new algorithm for estimating these displacements. Different from other methods where each image patch independently predicts its displacement, we jointly estimate the displacements from all patches together in a data driven way, by considering not only the training data but also geometric constraints on the test image. The displacements estimation is formulated as a convex optimization problem that can be solved efficiently. Finally, we use the sparse shape composition model as the a priori information to regularize the landmark positions and thus generate the segmented shape contour. We validate our method on X-ray image datasets of three different anatomical structures: complete femur, proximal femur and pelvis. Experiments show that our method is accurate and robust in landmark detection, and, combined with the shape model, gives a better or comparable performance in shape segmentation compared to state-of-the art methods. Finally, a preliminary study using CT data shows the extensibility of our method to 3D data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper addresses the problem of fully-automatic localization and segmentation of 3D intervertebral discs (IVDs) from MR images. Our method contains two steps, where we first localize the center of each IVD, and then segment IVDs by classifying image pixels around each disc center as foreground (disc) or background. The disc localization is done by estimating the image displacements from a set of randomly sampled 3D image patches to the disc center. The image displacements are estimated by jointly optimizing the training and test displacement values in a data-driven way, where we take into consideration both the training data and the geometric constraint on the test image. After the disc centers are localized, we segment the discs by classifying image pixels around disc centers as background or foreground. The classification is done in a similar data-driven approach as we used for localization, but in this segmentation case we are aiming to estimate the foreground/background probability of each pixel instead of the image displacements. In addition, an extra neighborhood smooth constraint is introduced to enforce the local smoothness of the label field. Our method is validated on 3D T2-weighted turbo spin echo MR images of 35 patients from two different studies. Experiments show that compared to state of the art, our method achieves better or comparable results. Specifically, we achieve for localization a mean error of 1.6-2.0 mm, and for segmentation a mean Dice metric of 85%-88% and a mean surface distance of 1.3-1.4 mm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The overarching goal of the Pathway Semantics Algorithm (PSA) is to improve the in silico identification of clinically useful hypotheses about molecular patterns in disease progression. By framing biomedical questions within a variety of matrix representations, PSA has the flexibility to analyze combined quantitative and qualitative data over a wide range of stratifications. The resulting hypothetical answers can then move to in vitro and in vivo verification, research assay optimization, clinical validation, and commercialization. Herein PSA is shown to generate novel hypotheses about the significant biological pathways in two disease domains: shock / trauma and hemophilia A, and validated experimentally in the latter. The PSA matrix algebra approach identified differential molecular patterns in biological networks over time and outcome that would not be easily found through direct assays, literature or database searches. In this dissertation, Chapter 1 provides a broad overview of the background and motivation for the study, followed by Chapter 2 with a literature review of relevant computational methods. Chapters 3 and 4 describe PSA for node and edge analysis respectively, and apply the method to disease progression in shock / trauma. Chapter 5 demonstrates the application of PSA to hemophilia A and the validation with experimental results. The work is summarized in Chapter 6, followed by extensive references and an Appendix with additional material.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the advancement of both, information technology in general, and databases in particular; data storage devices are becoming cheaper and data processing speed is increasing. As result of this, organizations tend to store large volumes of data holding great potential information. Decision Support Systems, DSS try to use the stored data to obtain valuable information for organizations. In this paper, we use both data models and use cases to represent the functionality of data processing in DSS following Software Engineering processes. We propose a methodology to develop DSS in the Analysis phase, respective of data processing modeling. We have used, as a starting point, a data model adapted to the semantics involved in multidimensional databases or data warehouses, DW. Also, we have taken an algorithm that provides us with all the possible ways to automatically cross check multidimensional model data. Using the aforementioned, we propose diagrams and descriptions of use cases, which can be considered as patterns representing the DSS functionality, in regard to DW data processing, DW on which DSS are based. We highlight the reusability and automation benefits that this can be achieved, and we think this study can serve as a guide in the development of DSS.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The environmental, cultural and socio-economic causes and consequences of farmland abandonment are issues of increasing concern for researchers and policy makers. In previous studies, we proposed a new methodology for selecting the driving factors in farmland abandonment processes. Using Data Mining and GIS, it is possible to select those variables which are more significantly related to abandonment. The aim of this study is to investigate the application of the above mentioned methodology for finding relationships between relief and farmland abandonment in a Mediterranean region (SE Spain).We have taken into account up to 28 different variables in a single analysis, some of them commonly considered in land use change studies (slope, altitude, TWI, etc), but also other novel variables have been evaluated (sky view factor, terrain view factor, etc). The variable selection process provides results in line with the previous knowledge of the study area, describing some processes that are region specific (e.g. abandonment versus intensification of the agricultural activities). The European INSPIRE Directive (2007/2/EC) establishes that the digital elevation models for land surfaces should be available in all member countries, this means that the research described in this work can be extrapolated to any European country to determine whether these variables (slope, altitude, etc) are important in the process of abandonment.