32 resultados para Data Driven Modeling
Resumo:
A critical point in the analysis of ground displacements time series is the development of data driven methods that allow the different sources that generate the observed displacements to be discerned and characterised. A widely used multivariate statistical technique is the Principal Component Analysis (PCA), which allows reducing the dimensionality of the data space maintaining most of the variance of the dataset explained. Anyway, PCA does not perform well in finding the solution to the so-called Blind Source Separation (BSS) problem, i.e. in recovering and separating the original sources that generated the observed data. This is mainly due to the assumptions on which PCA relies: it looks for a new Euclidean space where the projected data are uncorrelated. The Independent Component Analysis (ICA) is a popular technique adopted to approach this problem. However, the independence condition is not easy to impose, and it is often necessary to introduce some approximations. To work around this problem, I use a variational bayesian ICA (vbICA) method, which models the probability density function (pdf) of each source signal using a mix of Gaussian distributions. This technique allows for more flexibility in the description of the pdf of the sources, giving a more reliable estimate of them. Here I present the application of the vbICA technique to GPS position time series. First, I use vbICA on synthetic data that simulate a seismic cycle (interseismic + coseismic + postseismic + seasonal + noise) and a volcanic source, and I study the ability of the algorithm to recover the original (known) sources of deformation. Secondly, I apply vbICA to different tectonically active scenarios, such as the 2009 L'Aquila (central Italy) earthquake, the 2012 Emilia (northern Italy) seismic sequence, and the 2006 Guerrero (Mexico) Slow Slip Event (SSE).
Resumo:
This thesis studies how commercial practice is developing with artificial intelligence (AI) technologies and discusses some normative concepts in EU consumer law. The author analyses the phenomenon of 'algorithmic business', which defines the increasing use of data-driven AI in marketing organisations for the optimisation of a range of consumer-related tasks. The phenomenon is orienting business-consumer relations towards some general trends that influence power and behaviors of consumers. These developments are not taking place in a legal vacuum, but against the background of a normative system aimed at maintaining fairness and balance in market transactions. The author assesses current developments in commercial practices in the context of EU consumer law, which is specifically aimed at regulating commercial practices. The analysis is critical by design and without neglecting concrete practices tries to look at the big picture. The thesis consists of nine chapters divided in three thematic parts. The first part discusses the deployment of AI in marketing organisations, a brief history, the technical foundations, and their modes of integration in business organisations. In the second part, a selected number of socio-technical developments in commercial practice are analysed. The following are addressed: the monitoring and analysis of consumers’ behaviour based on data; the personalisation of commercial offers and customer experience; the use of information on consumers’ psychology and emotions, the mediation through marketing conversational applications. The third part assesses these developments in the context of EU consumer law and of the broader policy debate concerning consumer protection in the algorithmic society. In particular, two normative concepts underlying the EU fairness standard are analysed: manipulation, as a substantive regulatory standard that limits commercial behaviours in order to protect consumers’ informed and free choices and vulnerability, as a concept of social policy that portrays people who are more exposed to marketing practices.
Resumo:
This dissertation explores the link between hate crimes that occurred in the United Kingdom in June 2017, June 2018 and June 2019 through the posts of a robust sample of Conservative and radical right users on Twitter. In order to avoid the traditional challenges of this kind of research, I adopted a four staged research protocol that enabled me to merge content produced by a group of randomly selected users to observe the phenomenon from different angles. I collected tweets from thirty Conservative/right wing accounts for each month of June over the three years with the help of programming languages such as Python and CygWin tools. I then examined the language of my data focussing on humorous content in order to reveal whether, and if so how, radical users online often use humour as a tool to spread their views in conditions of heightened disgust and wide-spread political instability. A reflection on humour as a moral occurrence, expanding on the works of Christie Davies as well as applying recent findings on the behavioural immune system on online data, offers new insights on the overlooked humorous nature of radical political discourse. An unorthodox take on the moral foundations pioneered by Jonathan Haidt enriched my understanding of the analysed material through the addition of a moral-based layer of enquiry to my more traditional content-based one. This convergence of theoretical, data driven and real life events constitutes a viable “collection of strategies” for academia, data scientists; NGO’s fighting hate crimes and the wider public alike. Bringing together the ideas of Davies, Haidt and others to my data, helps us to perceive humorous online content in terms of complex radical narratives that are all too often compressed into a single tweet.
Resumo:
This Thesis is composed of a collection of works written in the period 2019-2022, whose aim is to find methodologies of Artificial Intelligence (AI) and Machine Learning to detect and classify patterns and rules in argumentative and legal texts. We define our approach “hybrid”, since we aimed at designing hybrid combinations of symbolic and sub-symbolic AI, involving both “top-down” structured knowledge and “bottom-up” data-driven knowledge. A first group of works is dedicated to the classification of argumentative patterns. Following the Waltonian model of argument and the related theory of Argumentation Schemes, these works focused on the detection of argumentative support and opposition, showing that argumentative evidences can be classified at fine-grained levels without resorting to highly engineered features. To show this, our methods involved not only traditional approaches such as TFIDF, but also some novel methods based on Tree Kernel algorithms. After the encouraging results of this first phase, we explored the use of a some emerging methodologies promoted by actors like Google, which have deeply changed NLP since 2018-19 — i.e., Transfer Learning and language models. These new methodologies markedly improved our previous results, providing us with best-performing NLP tools. Using Transfer Learning, we also performed a Sequence Labelling task to recognize the exact span of argumentative components (i.e., claims and premises), thus connecting portions of natural language to portions of arguments (i.e., to the logical-inferential dimension). The last part of our work was finally dedicated to the employment of Transfer Learning methods for the detection of rules and deontic modalities. In this case, we explored a hybrid approach which combines structured knowledge coming from two LegalXML formats (i.e., Akoma Ntoso and LegalRuleML) with sub-symbolic knowledge coming from pre-trained (and then fine-tuned) neural architectures.
Resumo:
Inverse problems are at the core of many challenging applications. Variational and learning models provide estimated solutions of inverse problems as the outcome of specific reconstruction maps. In the variational approach, the result of the reconstruction map is the solution of a regularized minimization problem encoding information on the acquisition process and prior knowledge on the solution. In the learning approach, the reconstruction map is a parametric function whose parameters are identified by solving a minimization problem depending on a large set of data. In this thesis, we go beyond this apparent dichotomy between variational and learning models and we show they can be harmoniously merged in unified hybrid frameworks preserving their main advantages. We develop several highly efficient methods based on both these model-driven and data-driven strategies, for which we provide a detailed convergence analysis. The arising algorithms are applied to solve inverse problems involving images and time series. For each task, we show the proposed schemes improve the performances of many other existing methods in terms of both computational burden and quality of the solution. In the first part, we focus on gradient-based regularized variational models which are shown to be effective for segmentation purposes and thermal and medical image enhancement. We consider gradient sparsity-promoting regularized models for which we develop different strategies to estimate the regularization strength. Furthermore, we introduce a novel gradient-based Plug-and-Play convergent scheme considering a deep learning based denoiser trained on the gradient domain. In the second part, we address the tasks of natural image deblurring, image and video super resolution microscopy and positioning time series prediction, through deep learning based methods. We boost the performances of supervised, such as trained convolutional and recurrent networks, and unsupervised deep learning strategies, such as Deep Image Prior, by penalizing the losses with handcrafted regularization terms.
Resumo:
In the last decades, Artificial Intelligence has witnessed multiple breakthroughs in deep learning. In particular, purely data-driven approaches have opened to a wide variety of successful applications due to the large availability of data. Nonetheless, the integration of prior knowledge is still required to compensate for specific issues like lack of generalization from limited data, fairness, robustness, and biases. In this thesis, we analyze the methodology of integrating knowledge into deep learning models in the field of Natural Language Processing (NLP). We start by remarking on the importance of knowledge integration. We highlight the possible shortcomings of these approaches and investigate the implications of integrating unstructured textual knowledge. We introduce Unstructured Knowledge Integration (UKI) as the process of integrating unstructured knowledge into machine learning models. We discuss UKI in the field of NLP, where knowledge is represented in a natural language format. We identify UKI as a complex process comprised of multiple sub-processes, different knowledge types, and knowledge integration properties to guarantee. We remark on the challenges of integrating unstructured textual knowledge and bridge connections with well-known research areas in NLP. We provide a unified vision of structured knowledge extraction (KE) and UKI by identifying KE as a sub-process of UKI. We investigate some challenging scenarios where structured knowledge is not a feasible prior assumption and formulate each task from the point of view of UKI. We adopt simple yet effective neural architectures and discuss the challenges of such an approach. Finally, we identify KE as a form of symbolic representation. From this perspective, we remark on the need of defining sophisticated UKI processes to verify the validity of knowledge integration. To this end, we foresee frameworks capable of combining symbolic and sub-symbolic representations for learning as a solution.
Resumo:
The research project is focused on the investigation of the polymorphism of crystalline molecular material for organic semiconductor applications under non-ambient conditions, and the solid-state characterization and crystal structure determination of the different polymorphic forms. In particular, this research project has tackled the investigation and characterization of the polymorphism of perylene diimides (PDIs) derivatives at high temperatures and pressures, in particular N,N’-dialkyl-3,4,9,10-perylendiimide (PDI-Cn, with n = 5, 6, 7, 8). These molecules are characterized by excellent chemical, thermal, and photostability, high electron affinity, strong absorption in the visible region, low LUMO energies, good air stability, and good charge transport properties, which can be tuned via functionalization; these features make them promising n-type organic semiconductor materials for several applications such as OFETs, OPV cells, laser dye, sensors, bioimaging, etc. The thermal characterization of PDI-Cn was carried out by a combination of differential scanning calorimetry, variable temperature X-ray diffraction, hot-stage microscopy, and in the case of PDI-C5 also variable temperature Raman spectroscopy. Whereas crystal structure determination was carried out by both Single Crystal and Powder X-ray diffraction. Moreover, high-pressure polymorphism via pressure-dependent UV-Vis absorption spectroscopy and high-pressure Single Crystal X-ray diffraction was carried out in this project. A data-driven approach based on a combination of self-organizing maps (SOM) and principal component analysis (PCA) is also reported was used to classify different π-stacking arrangements of PDI derivatives into families of similar crystal packing. Besides the main project, in the framework of structure-property analysis under non-ambient conditions, the structural investigation of the water loss in Pt- and Pd- based vapochromic potassium/lithium salts upon temperature, and the investigation of structure-mechanical property relationships in polymorphs of a thienopyrrolyldione endcapped oligothiophene (C4-NT3N) are reported.
Resumo:
Biobanks are key infrastructures in data-driven biomedical research. The counterpoint of this optimistic vision is the reality of biobank governance, which must address various ethical, legal and social issues, especially in terms of open consent, privacy and secondary uses which, if not sufficiently resolved, may undermine participants’ and society’s trust in biobanking. The effect of the digital paradigm on biomedical research has only accentuated these issues by adding new pressure for the data protection of biobank participants against the risks of covert discrimination, abuse of power against individuals and groups, and critical commercial uses. Moreover, the traditional research-ethics framework has been unable to keep pace with the transformative developments of the digital era, and has proven inadequate in protecting biobank participants and providing guidance for ethical practices. To this must be added the challenge of an increased tendency towards exploitation and the commercialisation of personal data in the field of biomedical research, which may undermine the altruistic and solidaristic values associated with biobank participation and risk losing alignment with societal interests in biobanking. My research critically analyses, from a bioethical perspective, the challenges and the goals of biobank governance in data-driven biomedical research in order to understand the conditions for the implementation of a governance model that can foster biomedical research and innovation, while ensuring adequate protection for biobank participants and an alignment of biobank procedures and policies with society’s interests and expectations. The main outcome is a conceptualisation of a socially-oriented and participatory model of biobanks by proposing a new ethical framework that relies on the principles of transparency, data protection and participation to tackle the key challenges of biobanks in the digital age and that is well-suited to foster these goals.
Resumo:
Long-term monitoring of acoustical environments is gaining popularity thanks to the relevant amount of scientific and engineering insights that it provides. The increasing interest is due to the constant growth of storage capacity and computational power to process large amounts of data. In this perspective, machine learning (ML) provides a broad family of data-driven statistical techniques to deal with large databases. Nowadays, the conventional praxis of sound level meter measurements limits the global description of a sound scene to an energetic point of view. The equivalent continuous level Leq represents the main metric to define an acoustic environment, indeed. Finer analyses involve the use of statistical levels. However, acoustic percentiles are based on temporal assumptions, which are not always reliable. A statistical approach, based on the study of the occurrences of sound pressure levels, would bring a different perspective to the analysis of long-term monitoring. Depicting a sound scene through the most probable sound pressure level, rather than portions of energy, brought more specific information about the activity carried out during the measurements. The statistical mode of the occurrences can capture typical behaviors of specific kinds of sound sources. The present work aims to propose an ML-based method to identify, separate and measure coexisting sound sources in real-world scenarios. It is based on long-term monitoring and is addressed to acousticians focused on the analysis of environmental noise in manifold contexts. The presented method is based on clustering analysis. Two algorithms, Gaussian Mixture Model and K-means clustering, represent the main core of a process to investigate different active spaces monitored through sound level meters. The procedure has been applied in two different contexts: university lecture halls and offices. The proposed method shows robust and reliable results in describing the acoustic scenario and it could represent an important analytical tool for acousticians.
Resumo:
In this thesis we focus on the analysis and interpretation of time dependent deformations recorded through different geodetic methods. Firstly, we apply a variational Bayesian Independent Component Analysis (vbICA) technique to GPS daily displacement solutions, to separate the postseismic deformation that followed the mainshocks of the 2016-2017 Central Italy seismic sequence from the other, hydrological, deformation sources. By interpreting the signal associated with the postseismic relaxation, we model an afterslip distribution on the faults involved by the mainshocks consistent with the co-seismic models available in literature. We find evidences of aseismic slip on the Paganica fault, responsible for the Mw 6.1 2009 L’Aquila earthquake, highlighting the importance of aseismic slip and static stress transfer to properly model the recurrence of earthquakes on nearby fault segments. We infer a possible viscoelastic relaxation of the lower crust as a contributing mechanism to the postseismic displacements. We highlight the importance of a proper separation of the hydrological signals for an accurate assessment of the tectonic processes, especially in cases of mm-scale deformations. Contextually, we provide a physical explanation to the ICs associated with the observed hydrological processes. In the second part of the thesis, we focus on strain data from Gladwin Tensor Strainmeters, working on the instruments deployed in Taiwan. We develop a novel approach, completely data driven, to calibrate these strainmeters. We carry out a joint analysis of geodetic (strainmeters, GPS and GRACE products) and hydrological (rain gauges and piezometers) data sets, to characterize the hydrological signals in Southern Taiwan. Lastly, we apply the calibration approach here proposed to the strainmeters recently installed in Central Italy. We provide, as an example, the detection of a storm that hit the Umbria-Marche regions (Italy), demonstrating the potential of strainmeters in following the dynamics of deformation processes with limited spatio-temporal signature
Resumo:
The integration of quantitative data from movement analysis technologies is reshaping the analysis of athletes’ performances and injury mitigation, e.g., anterior cruciate ligament (ACL) rupture. Most of the movement assessments are performed in laboratory environments. Recent progress provides the chance to shift the paradigm to a more ecological approach with sport-specific elements and a closer examination of “real” movement patterns associated with performance and (ACL) injury risk. The present PhD thesis aimed at investigating the on-field motion patterns related to performance and injury prevention in young football players. The objectives of the thesis were: (I) in-lab measures of high-dynamics movements were used to validate wearable inertial sensors technology; (II) in-laboratory and on-field agility movement tasks were compared to inspect the effect of football-specific environment; (III) on-field analysis was conducted to challenge wearable sensors technology in the assessment of dangerous movement patterns towards the ACL rupture; (IV) an overview of technologies that could shape present and future assessment of ACL injury risk in daily practice was presented. The validity of wearables in the assessment of high-dynamics movements was confirmed. Relevant differences emerged between the movements performed in a laboratory setting and on the football pitch, supporting the inclusion of an ecological dynamics approach in preventive protocols. The on-field analysis of football-specific movement tasks demonstrated good reliability of wearable sensors and the presence of residual dangerous patterns in the injured players. A tool to inspect at-risk movement patterns on the field through objective measurements was presented. It discussed how potential alternatives to wearable inertial sensors embrace artificial intelligence and closer collaboration between clinical and technical expertise. The present thesis was meant to contribute to setting the basis for data-driven prevention protocols. A deeper comprehension of injury-related principles and counteractions will contribute to preserving athletes’ careers and health over time.
Resumo:
The integration of distributed and ubiquitous intelligence has emerged over the last years as the mainspring of transformative advancements in mobile radio networks. As we approach the era of “mobile for intelligence”, next-generation wireless networks are poised to undergo significant and profound changes. Notably, the overarching challenge that lies ahead is the development and implementation of integrated communication and learning mechanisms that will enable the realization of autonomous mobile radio networks. The ultimate pursuit of eliminating human-in-the-loop constitutes an ambitious challenge, necessitating a meticulous delineation of the fundamental characteristics that artificial intelligence (AI) should possess to effectively achieve this objective. This challenge represents a paradigm shift in the design, deployment, and operation of wireless networks, where conventional, static configurations give way to dynamic, adaptive, and AI-native systems capable of self-optimization, self-sustainment, and learning. This thesis aims to provide a comprehensive exploration of the fundamental principles and practical approaches required to create autonomous mobile radio networks that seamlessly integrate communication and learning components. The first chapter of this thesis introduces the notion of Predictive Quality of Service (PQoS) and adaptive optimization and expands upon the challenge to achieve adaptable, reliable, and robust network performance in dynamic and ever-changing environments. The subsequent chapter delves into the revolutionary role of generative AI in shaping next-generation autonomous networks. This chapter emphasizes achieving trustworthy uncertainty-aware generation processes with the use of approximate Bayesian methods and aims to show how generative AI can improve generalization while reducing data communication costs. Finally, the thesis embarks on the topic of distributed learning over wireless networks. Distributed learning and its declinations, including multi-agent reinforcement learning systems and federated learning, have the potential to meet the scalability demands of modern data-driven applications, enabling efficient and collaborative model training across dynamic scenarios while ensuring data privacy and reducing communication overhead.
Resumo:
Hydrothermal fluids are a fundamental resource for understanding and monitoring volcanic and non-volcanic systems. This thesis is focused on the study of hydrothermal system through numerical modeling with the geothermal simulator TOUGH2. Several simulations are presented, and geophysical and geochemical observables, arising from fluids circulation, are analyzed in detail throughout the thesis. In a volcanic setting, fluids feeding fumaroles and hot spring may play a key role in the hazard evaluation. The evolution of the fluids circulation is caused by a strong interaction between magmatic and hydrothermal systems. A simultaneous analysis of different geophysical and geochemical observables is a sound approach for interpreting monitored data and to infer a consistent conceptual model. Analyzed observables are ground displacement, gravity changes, electrical conductivity, amount, composition and temperature of the emitted gases at surface, and extent of degassing area. Results highlight the different temporal response of the considered observables, as well as the different radial pattern of variation. However, magnitude, temporal response and radial pattern of these signals depend not only on the evolution of fluid circulation, but a main role is played by the considered rock properties. Numerical simulations highlight differences that arise from the assumption of different permeabilities, for both homogeneous and heterogeneous systems. Rock properties affect hydrothermal fluid circulation, controlling both the range of variation and the temporal evolution of the observable signals. Low temperature fumaroles and low discharge rate may be affected by atmospheric conditions. Detailed parametric simulations were performed, aimed to understand the effects of system properties, such as permeability and gas reservoir overpressure, on diffuse degassing when air temperature and barometric pressure changes are applied to the ground surface. Hydrothermal circulation, however, is not only a characteristic of volcanic system. Hot fluids may be involved in several mankind problems, such as studies on geothermal engineering, nuclear waste propagation in porous medium, and Geological Carbon Sequestration (GCS). The current concept for large-scale GCS is the direct injection of supercritical carbon dioxide into deep geological formations which typically contain brine. Upward displacement of such brine from deep reservoirs driven by pressure increases resulting from carbon dioxide injection may occur through abandoned wells, permeable faults or permeable channels. Brine intrusion into aquifers may degrade groundwater resources. Numerical results show that pressure rise drives dense water up to the conduits, and does not necessarily result in continuous flow. Rather, overpressure leads to new hydrostatic equilibrium if fluids are initially density stratified. If warm and salty fluid does not cool passing through the conduit, an oscillatory solution is then possible. Parameter studies delineate steady-state (static) and oscillatory solutions.
Resumo:
We use data from about 700 GPS stations in the EuroMediterranen region to investigate the present-day behavior of the the Calabrian subduction zone within the Mediterranean-scale plates kinematics and to perform local scale studies about the strain accumulation on active structures. We focus attenction on the Messina Straits and Crati Valley faults where GPS data show extentional velocity gradients of ∼3 mm/yr and ∼2 mm/yr, respectively. We use dislocation model and a non-linear constrained optimization algorithm to invert for fault geometric parameters and slip-rates and evaluate the associated uncertainties adopting a bootstrap approach. Our analysis suggest the presence of two partially locked normal faults. To investigate the impact of elastic strain contributes from other nearby active faults onto the observed velocity gradient we use a block modeling approach. Our models show that the inferred slip-rates on the two analyzed structures are strongly impacted by the assumed locking width of the Calabrian subduction thrust. In order to frame the observed local deformation features within the present- day central Mediterranean kinematics we realyze a statistical analysis testing the indipendent motion (w.r.t. the African and Eurasias plates) of the Adriatic, Cal- abrian and Sicilian blocks. Our preferred model confirms a microplate like behaviour for all the investigated blocks. Within these kinematic boundary conditions we fur- ther investigate the Calabrian Slab interface geometry using a combined approach of block modeling and χ2ν statistic. Almost no information is obtained using only the horizontal GPS velocities that prove to be a not sufficient dataset for a multi-parametric inversion approach. Trying to stronger constrain the slab geometry we estimate the predicted vertical velocities performing suites of forward models of elastic dislocations varying the fault locking depth. Comparison with the observed field suggest a maximum resolved locking depth of 25 km.
Resumo:
Several countries have acquired, over the past decades, large amounts of area covering Airborne Electromagnetic data. Contribution of airborne geophysics has dramatically increased for both groundwater resource mapping and management proving how those systems are appropriate for large-scale and efficient groundwater surveying. We start with processing and inversion of two AEM dataset from two different systems collected over the Spiritwood Valley Aquifer area, Manitoba, Canada respectively, the AeroTEM III (commissioned by the Geological Survey of Canada in 2010) and the “Full waveform VTEM” dataset, collected and tested over the same survey area, during the fall 2011. We demonstrate that in the presence of multiple datasets, either AEM and ground data, due processing, inversion, post-processing, data integration and data calibration is the proper approach capable of providing reliable and consistent resistivity models. Our approach can be of interest to many end users, ranging from Geological Surveys, Universities to Private Companies, which are often proprietary of large geophysical databases to be interpreted for geological and\or hydrogeological purposes. In this study we deeply investigate the role of integration of several complimentary types of geophysical data collected over the same survey area. We show that data integration can improve inversions, reduce ambiguity and deliver high resolution results. We further attempt to use the final, most reliable output resistivity models as a solid basis for building a knowledge-driven 3D geological voxel-based model. A voxel approach allows a quantitative understanding of the hydrogeological setting of the area, and it can be further used to estimate the aquifers volumes (i.e. potential amount of groundwater resources) as well as hydrogeological flow model prediction. In addition, we investigated the impact of an AEM dataset towards hydrogeological mapping and 3D hydrogeological modeling, comparing it to having only a ground based TEM dataset and\or to having only boreholes data.