13 resultados para data-driven Stochastic Subspace Identification (SSI-data)
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
This work is focused on the study of saltwater intrusion in coastal aquifers, and in particular on the realization of conceptual schemes to evaluate the risk associated with it. Saltwater intrusion depends on different natural and anthropic factors, both presenting a strong aleatory behaviour, that should be considered for an optimal management of the territory and water resources. Given the uncertainty of problem parameters, the risk associated with salinization needs to be cast in a probabilistic framework. On the basis of a widely adopted sharp interface formulation, key hydrogeological problem parameters are modeled as random variables, and global sensitivity analysis is used to determine their influence on the position of saltwater interface. The analyses presented in this work rely on an efficient model reduction technique, based on Polynomial Chaos Expansion, able to combine the best description of the model without great computational burden. When the assumptions of classical analytical models are not respected, and this occurs several times in the applications to real cases of study, as in the area analyzed in the present work, one can adopt data-driven techniques, based on the analysis of the data characterizing the system under study. It follows that a model can be defined on the basis of connections between the system state variables, with only a limited number of assumptions about the "physical" behaviour of the system.
Resumo:
Intelligent systems are currently inherent to the society, supporting a synergistic human-machine collaboration. Beyond economical and climate factors, energy consumption is strongly affected by the performance of computing systems. The quality of software functioning may invalidate any improvement attempt. In addition, data-driven machine learning algorithms are the basis for human-centered applications, being their interpretability one of the most important features of computational systems. Software maintenance is a critical discipline to support automatic and life-long system operation. As most software registers its inner events by means of logs, log analysis is an approach to keep system operation. Logs are characterized as Big data assembled in large-flow streams, being unstructured, heterogeneous, imprecise, and uncertain. This thesis addresses fuzzy and neuro-granular methods to provide maintenance solutions applied to anomaly detection (AD) and log parsing (LP), dealing with data uncertainty, identifying ideal time periods for detailed software analyses. LP provides deeper semantics interpretation of the anomalous occurrences. The solutions evolve over time and are general-purpose, being highly applicable, scalable, and maintainable. Granular classification models, namely, Fuzzy set-Based evolving Model (FBeM), evolving Granular Neural Network (eGNN), and evolving Gaussian Fuzzy Classifier (eGFC), are compared considering the AD problem. The evolving Log Parsing (eLP) method is proposed to approach the automatic parsing applied to system logs. All the methods perform recursive mechanisms to create, update, merge, and delete information granules according with the data behavior. For the first time in the evolving intelligent systems literature, the proposed method, eLP, is able to process streams of words and sentences. Essentially, regarding to AD accuracy, FBeM achieved (85.64+-3.69)%; eGNN reached (96.17+-0.78)%; eGFC obtained (92.48+-1.21)%; and eLP reached (96.05+-1.04)%. Besides being competitive, eLP particularly generates a log grammar, and presents a higher level of model interpretability.
Resumo:
The discovery of new materials and their functions has always been a fundamental component of technological progress. Nowadays, the quest for new materials is stronger than ever: sustainability, medicine, robotics and electronics are all key assets which depend on the ability to create specifically tailored materials. However, designing materials with desired properties is a difficult task, and the complexity of the discipline makes it difficult to identify general criteria. While scientists developed a set of best practices (often based on experience and expertise), this is still a trial-and-error process. This becomes even more complex when dealing with advanced functional materials. Their properties depend on structural and morphological features, which in turn depend on fabrication procedures and environment, and subtle alterations leads to dramatically different results. Because of this, materials modeling and design is one of the most prolific research fields. Many techniques and instruments are continuously developed to enable new possibilities, both in the experimental and computational realms. Scientists strive to enforce cutting-edge technologies in order to make progress. However, the field is strongly affected by unorganized file management, proliferation of custom data formats and storage procedures, both in experimental and computational research. Results are difficult to find, interpret and re-use, and a huge amount of time is spent interpreting and re-organizing data. This also strongly limit the application of data-driven and machine learning techniques. This work introduces possible solutions to the problems described above. Specifically, it talks about developing features for specific classes of advanced materials and use them to train machine learning models and accelerate computational predictions for molecular compounds; developing method for organizing non homogeneous materials data; automate the process of using devices simulations to train machine learning models; dealing with scattered experimental data and use them to discover new patterns.
Resumo:
Imaging technologies are widely used in application fields such as natural sciences, engineering, medicine, and life sciences. A broad class of imaging problems reduces to solve ill-posed inverse problems (IPs). Traditional strategies to solve these ill-posed IPs rely on variational regularization methods, which are based on minimization of suitable energies, and make use of knowledge about the image formation model (forward operator) and prior knowledge on the solution, but lack in incorporating knowledge directly from data. On the other hand, the more recent learned approaches can easily learn the intricate statistics of images depending on a large set of data, but do not have a systematic method for incorporating prior knowledge about the image formation model. The main purpose of this thesis is to discuss data-driven image reconstruction methods which combine the benefits of these two different reconstruction strategies for the solution of highly nonlinear ill-posed inverse problems. Mathematical formulation and numerical approaches for image IPs, including linear as well as strongly nonlinear problems are described. More specifically we address the Electrical impedance Tomography (EIT) reconstruction problem by unrolling the regularized Gauss-Newton method and integrating the regularization learned by a data-adaptive neural network. Furthermore we investigate the solution of non-linear ill-posed IPs introducing a deep-PnP framework that integrates the graph convolutional denoiser into the proximal Gauss-Newton method with a practical application to the EIT, a recently introduced promising imaging technique. Efficient algorithms are then applied to the solution of the limited electrods problem in EIT, combining compressive sensing techniques and deep learning strategies. Finally, a transformer-based neural network architecture is adapted to restore the noisy solution of the Computed Tomography problem recovered using the filtered back-projection method.
Resumo:
The present Dissertation shows how recent statistical analysis tools and open datasets can be exploited to improve modelling accuracy in two distinct yet interconnected domains of flood hazard (FH) assessment. In the first Part, unsupervised artificial neural networks are employed as regional models for sub-daily rainfall extremes. The models aim to learn a robust relation to estimate locally the parameters of Gumbel distributions of extreme rainfall depths for any sub-daily duration (1-24h). The predictions depend on twenty morphoclimatic descriptors. A large study area in north-central Italy is adopted, where 2238 annual maximum series are available. Validation is performed over an independent set of 100 gauges. Our results show that multivariate ANNs may remarkably improve the estimation of percentiles relative to the benchmark approach from the literature, where Gumbel parameters depend on mean annual precipitation. Finally, we show that the very nature of the proposed ANN models makes them suitable for interpolating predicted sub-daily rainfall quantiles across space and time-aggregation intervals. In the second Part, decision trees are used to combine a selected blend of input geomorphic descriptors for predicting FH. Relative to existing DEM-based approaches, this method is innovative, as it relies on the combination of three characteristics: (1) simple multivariate models, (2) a set of exclusively DEM-based descriptors as input, and (3) an existing FH map as reference information. First, the methods are applied to northern Italy, represented with the MERIT DEM (∼90m resolution), and second, to the whole of Italy, represented with the EU-DEM (25m resolution). The results show that multivariate approaches may (a) significantly enhance flood-prone areas delineation relative to a selected univariate one, (b) provide accurate predictions of expected inundation depths, (c) produce encouraging results in extrapolation, (d) complete the information of imperfect reference maps, and (e) conveniently convert binary maps into continuous representation of FH.
Resumo:
The inherent stochastic character of most of the physical quantities involved in engineering models has led to an always increasing interest for probabilistic analysis. Many approaches to stochastic analysis have been proposed. However, it is widely acknowledged that the only universal method available to solve accurately any kind of stochastic mechanics problem is Monte Carlo Simulation. One of the key parts in the implementation of this technique is the accurate and efficient generation of samples of the random processes and fields involved in the problem at hand. In the present thesis an original method for the simulation of homogeneous, multi-dimensional, multi-variate, non-Gaussian random fields is proposed. The algorithm has proved to be very accurate in matching both the target spectrum and the marginal probability. The computational efficiency and robustness are very good too, even when dealing with strongly non-Gaussian distributions. What is more, the resulting samples posses all the relevant, welldefined and desired properties of “translation fields”, including crossing rates and distributions of extremes. The topic of the second part of the thesis lies in the field of non-destructive parametric structural identification. Its objective is to evaluate the mechanical characteristics of constituent bars in existing truss structures, using static loads and strain measurements. In the cases of missing data and of damages that interest only a small portion of the bar, Genetic Algorithm have proved to be an effective tool to solve the problem.
Resumo:
The Assimilation in the Unstable Subspace (AUS) was introduced by Trevisan and Uboldi in 2004, and developed by Trevisan, Uboldi and Carrassi, to minimize the analysis and forecast errors by exploiting the flow-dependent instabilities of the forecast-analysis cycle system, which may be thought of as a system forced by observations. In the AUS scheme the assimilation is obtained by confining the analysis increment in the unstable subspace of the forecast-analysis cycle system so that it will have the same structure of the dominant instabilities of the system. The unstable subspace is estimated by Breeding on the Data Assimilation System (BDAS). AUS- BDAS has already been tested in realistic models and observational configurations, including a Quasi-Geostrophicmodel and a high dimensional, primitive equation ocean model; the experiments include both fixed and“adaptive”observations. In these contexts, the AUS-BDAS approach greatly reduces the analysis error, with reasonable computational costs for data assimilation with respect, for example, to a prohibitive full Extended Kalman Filter. This is a follow-up study in which we revisit the AUS-BDAS approach in the more basic, highly nonlinear Lorenz 1963 convective model. We run observation system simulation experiments in a perfect model setting, and with two types of model error as well: random and systematic. In the different configurations examined, and in a perfect model setting, AUS once again shows better efficiency than other advanced data assimilation schemes. In the present study, we develop an iterative scheme that leads to a significant improvement of the overall assimilation performance with respect also to standard AUS. In particular, it boosts the efficiency of regime’s changes tracking, with a low computational cost. Other data assimilation schemes need estimates of ad hoc parameters, which have to be tuned for the specific model at hand. In Numerical Weather Prediction models, tuning of parameters — and in particular an estimate of the model error covariance matrix — may turn out to be quite difficult. Our proposed approach, instead, may be easier to implement in operational models.
Resumo:
The Ecosystem Approach to Fisheries represents the most recent research line in the international context, showing interest both towards the whole community and toward the identification and protection of all the “critical habitats” in which marine resources complete their life cycles. Using data coming from trawl surveys performed in the Northern and Central Adriatic from 1996 to 2010, this study provides the first attempt to appraise the status of the whole demersal community. It took into account not only fishery target species but also by-catch and discharge species by the use of a suite of biological indicators both at population and multi-specific level, allowing to have a global picture of the status of the demersal system. This study underlined the decline of extremely important species for the Adriatic fishery in recent years; adverse impact on catches is expected for these species in the coming years, since also minimum values of recruits recently were recorded. Both the excessive exploitation and environmental factors affected availability of resources. Moreover both distribution and nursery areas of the most important resources were pinpointed by means of geostatistical methods. The geospatial analysis also confirmed the presence of relevant recruitment areas in the North and Central Adriatic for several commercial species, as reported in the literature. The morphological and oceanographic features, the relevant rivers inflow together with the mosaic pattern of biocenoses with different food availability affected the location of the observed relevant nursery areas.
Resumo:
An extensive sample (2%) of private vehicles in Italy are equipped with a GPS device that periodically measures their position and dynamical state for insurance purposes. Having access to this type of data allows to develop theoretical and practical applications of great interest: the real-time reconstruction of traffic state in a certain region, the development of accurate models of vehicle dynamics, the study of the cognitive dynamics of drivers. In order for these applications to be possible, we first need to develop the ability to reconstruct the paths taken by vehicles on the road network from the raw GPS data. In fact, these data are affected by positioning errors and they are often very distanced from each other (~2 Km). For these reasons, the task of path identification is not straightforward. This thesis describes the approach we followed to reliably identify vehicle paths from this kind of low-sampling data. The problem of matching data with roads is solved with a bayesian approach of maximum likelihood. While the identification of the path taken between two consecutive GPS measures is performed with a specifically developed optimal routing algorithm, based on A* algorithm. The procedure was applied on an off-line urban data sample and proved to be robust and accurate. Future developments will extend the procedure to real-time execution and nation-wide coverage.
Resumo:
Advances in biomedical signal acquisition systems for motion analysis have led to lowcost and ubiquitous wearable sensors which can be used to record movement data in different settings. This implies the potential availability of large amounts of quantitative data. It is then crucial to identify and to extract the information of clinical relevance from the large amount of available data. This quantitative and objective information can be an important aid for clinical decision making. Data mining is the process of discovering such information in databases through data processing, selection of informative data, and identification of relevant patterns. The databases considered in this thesis store motion data from wearable sensors (specifically accelerometers) and clinical information (clinical data, scores, tests). The main goal of this thesis is to develop data mining tools which can provide quantitative information to the clinician in the field of movement disorders. This thesis will focus on motor impairment in Parkinson's disease (PD). Different databases related to Parkinson subjects in different stages of the disease were considered for this thesis. Each database is characterized by the data recorded during a specific motor task performed by different groups of subjects. The data mining techniques that were used in this thesis are feature selection (a technique which was used to find relevant information and to discard useless or redundant data), classification, clustering, and regression. The aims were to identify high risk subjects for PD, characterize the differences between early PD subjects and healthy ones, characterize PD subtypes and automatically assess the severity of symptoms in the home setting.
Resumo:
Environmental computer models are deterministic models devoted to predict several environmental phenomena such as air pollution or meteorological events. Numerical model output is given in terms of averages over grid cells, usually at high spatial and temporal resolution. However, these outputs are often biased with unknown calibration and not equipped with any information about the associated uncertainty. Conversely, data collected at monitoring stations is more accurate since they essentially provide the true levels. Due the leading role played by numerical models, it now important to compare model output with observations. Statistical methods developed to combine numerical model output and station data are usually referred to as data fusion. In this work, we first combine ozone monitoring data with ozone predictions from the Eta-CMAQ air quality model in order to forecast real-time current 8-hour average ozone level defined as the average of the previous four hours, current hour, and predictions for the next three hours. We propose a Bayesian downscaler model based on first differences with a flexible coefficient structure and an efficient computational strategy to fit model parameters. Model validation for the eastern United States shows consequential improvement of our fully inferential approach compared with the current real-time forecasting system. Furthermore, we consider the introduction of temperature data from a weather forecast model into the downscaler, showing improved real-time ozone predictions. Finally, we introduce a hierarchical model to obtain spatially varying uncertainty associated with numerical model output. We show how we can learn about such uncertainty through suitable stochastic data fusion modeling using some external validation data. We illustrate our Bayesian model by providing the uncertainty map associated with a temperature output over the northeastern United States.
Resumo:
This thesis is a collection of works focused on the topic of Earthquake Early Warning, with a special attention to large magnitude events. The topic is addressed from different points of view and the structure of the thesis reflects the variety of the aspects which have been analyzed. The first part is dedicated to the giant, 2011 Tohoku-Oki earthquake. The main features of the rupture process are first discussed. The earthquake is then used as a case study to test the feasibility Early Warning methodologies for very large events. Limitations of the standard approaches for large events arise in this chapter. The difficulties are related to the real-time magnitude estimate from the first few seconds of recorded signal. An evolutionary strategy for the real-time magnitude estimate is proposed and applied to the single Tohoku-Oki earthquake. In the second part of the thesis a larger number of earthquakes is analyzed, including small, moderate and large events. Starting from the measurement of two Early Warning parameters, the behavior of small and large earthquakes in the initial portion of recorded signals is investigated. The aim is to understand whether small and large earthquakes can be distinguished from the initial stage of their rupture process. A physical model and a plausible interpretation to justify the observations are proposed. The third part of the thesis is focused on practical, real-time approaches for the rapid identification of the potentially damaged zone during a seismic event. Two different approaches for the rapid prediction of the damage area are proposed and tested. The first one is a threshold-based method which uses traditional seismic data. Then an innovative approach using continuous, GPS data is explored. Both strategies improve the prediction of large scale effects of strong earthquakes.
Resumo:
Several countries have acquired, over the past decades, large amounts of area covering Airborne Electromagnetic data. Contribution of airborne geophysics has dramatically increased for both groundwater resource mapping and management proving how those systems are appropriate for large-scale and efficient groundwater surveying. We start with processing and inversion of two AEM dataset from two different systems collected over the Spiritwood Valley Aquifer area, Manitoba, Canada respectively, the AeroTEM III (commissioned by the Geological Survey of Canada in 2010) and the “Full waveform VTEM” dataset, collected and tested over the same survey area, during the fall 2011. We demonstrate that in the presence of multiple datasets, either AEM and ground data, due processing, inversion, post-processing, data integration and data calibration is the proper approach capable of providing reliable and consistent resistivity models. Our approach can be of interest to many end users, ranging from Geological Surveys, Universities to Private Companies, which are often proprietary of large geophysical databases to be interpreted for geological and\or hydrogeological purposes. In this study we deeply investigate the role of integration of several complimentary types of geophysical data collected over the same survey area. We show that data integration can improve inversions, reduce ambiguity and deliver high resolution results. We further attempt to use the final, most reliable output resistivity models as a solid basis for building a knowledge-driven 3D geological voxel-based model. A voxel approach allows a quantitative understanding of the hydrogeological setting of the area, and it can be further used to estimate the aquifers volumes (i.e. potential amount of groundwater resources) as well as hydrogeological flow model prediction. In addition, we investigated the impact of an AEM dataset towards hydrogeological mapping and 3D hydrogeological modeling, comparing it to having only a ground based TEM dataset and\or to having only boreholes data.