21 resultados para Data modeling
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
In recent decades, two prominent trends have influenced the data modeling field, namely network analysis and machine learning. This thesis explores the practical applications of these techniques within the domain of drug research, unveiling their multifaceted potential for advancing our comprehension of complex biological systems. The research undertaken during this PhD program is situated at the intersection of network theory, computational methods, and drug research. Across six projects presented herein, there is a gradual increase in model complexity. These projects traverse a diverse range of topics, with a specific emphasis on drug repurposing and safety in the context of neurological diseases. The aim of these projects is to leverage existing biomedical knowledge to develop innovative approaches that bolster drug research. The investigations have produced practical solutions, not only providing insights into the intricacies of biological systems, but also allowing the creation of valuable tools for their analysis. In short, the achievements are: • A novel computational algorithm to identify adverse events specific to fixed-dose drug combinations. • A web application that tracks the clinical drug research response to SARS-CoV-2. • A Python package for differential gene expression analysis and the identification of key regulatory "switch genes". • The identification of pivotal events causing drug-induced impulse control disorders linked to specific medications. • An automated pipeline for discovering potential drug repurposing opportunities. • The creation of a comprehensive knowledge graph and development of a graph machine learning model for predictions. Collectively, these projects illustrate diverse applications of data science and network-based methodologies, highlighting the profound impact they can have in supporting drug research activities.
Resumo:
We use data from about 700 GPS stations in the EuroMediterranen region to investigate the present-day behavior of the the Calabrian subduction zone within the Mediterranean-scale plates kinematics and to perform local scale studies about the strain accumulation on active structures. We focus attenction on the Messina Straits and Crati Valley faults where GPS data show extentional velocity gradients of ∼3 mm/yr and ∼2 mm/yr, respectively. We use dislocation model and a non-linear constrained optimization algorithm to invert for fault geometric parameters and slip-rates and evaluate the associated uncertainties adopting a bootstrap approach. Our analysis suggest the presence of two partially locked normal faults. To investigate the impact of elastic strain contributes from other nearby active faults onto the observed velocity gradient we use a block modeling approach. Our models show that the inferred slip-rates on the two analyzed structures are strongly impacted by the assumed locking width of the Calabrian subduction thrust. In order to frame the observed local deformation features within the present- day central Mediterranean kinematics we realyze a statistical analysis testing the indipendent motion (w.r.t. the African and Eurasias plates) of the Adriatic, Cal- abrian and Sicilian blocks. Our preferred model confirms a microplate like behaviour for all the investigated blocks. Within these kinematic boundary conditions we fur- ther investigate the Calabrian Slab interface geometry using a combined approach of block modeling and χ2ν statistic. Almost no information is obtained using only the horizontal GPS velocities that prove to be a not sufficient dataset for a multi-parametric inversion approach. Trying to stronger constrain the slab geometry we estimate the predicted vertical velocities performing suites of forward models of elastic dislocations varying the fault locking depth. Comparison with the observed field suggest a maximum resolved locking depth of 25 km.
Resumo:
The fast development of Information Communication Technologies (ICT) offers new opportunities to realize future smart cities. To understand, manage and forecast the city's behavior, it is necessary the analysis of different kinds of data from the most varied dataset acquisition systems. The aim of this research activity in the framework of Data Science and Complex Systems Physics is to provide stakeholders with new knowledge tools to improve the sustainability of mobility demand in future cities. Under this perspective, the governance of mobility demand generated by large tourist flows is becoming a vital issue for the quality of life in Italian cities' historical centers, which will worsen in the next future due to the continuous globalization process. Another critical theme is sustainable mobility, which aims to reduce private transportation means in the cities and improve multimodal mobility. We analyze the statistical properties of urban mobility of Venice, Rimini, and Bologna by using different datasets provided by companies and local authorities. We develop algorithms and tools for cartography extraction, trips reconstruction, multimodality classification, and mobility simulation. We show the existence of characteristic mobility paths and statistical properties depending on transport means and user's kinds. Finally, we use our results to model and simulate the overall behavior of the cars moving in the Emilia Romagna Region and the pedestrians moving in Venice with software able to replicate in silico the demand for mobility and its dynamic.
Resumo:
The coastal ocean is a complex environment with extremely dynamic processes that require a high-resolution and cross-scale modeling approach in which all hydrodynamic fields and scales are considered integral parts of the overall system. In the last decade, unstructured-grid models have been used to advance in seamless modeling between scales. On the other hand, the data assimilation methodologies to improve the unstructured-grid models in the coastal seas have been developed only recently and need significant advancements. Here, we link the unstructured-grid ocean modeling to the variational data assimilation methods. In particular, we show results from the modeling system SANIFS based on SHYFEM fully-baroclinic unstructured-grid model interfaced with OceanVar, a state-of-art variational data assimilation scheme adopted for several systems based on a structured grid. OceanVar implements a 3DVar DA scheme. The combination of three linear operators models the background error covariance matrix. The vertical part is represented using multivariate EOFs for temperature, salinity, and sea level anomaly. The horizontal part is assumed to be Gaussian isotropic and is modeled using a first-order recursive filter algorithm designed for structured and regular grids. Here we introduced a novel recursive filter algorithm for unstructured grids. A local hydrostatic adjustment scheme models the rapidly evolving part of the background error covariance. We designed two data assimilation experiments using SANIFS implementation interfaced with OceanVar over the period 2017-2018, one with only temperature and salinity assimilation by Argo profiles and the second also including sea level anomaly. The results showed a successful implementation of the approach and the added value of the assimilation for the active tracer fields. While looking at the broad basin, no significant improvements are highlighted for the sea level, requiring future investigations. Furthermore, a Machine Learning methodology based on an LSTM network has been used to predict the model SST increments.
Resumo:
The present Dissertation shows how recent statistical analysis tools and open datasets can be exploited to improve modelling accuracy in two distinct yet interconnected domains of flood hazard (FH) assessment. In the first Part, unsupervised artificial neural networks are employed as regional models for sub-daily rainfall extremes. The models aim to learn a robust relation to estimate locally the parameters of Gumbel distributions of extreme rainfall depths for any sub-daily duration (1-24h). The predictions depend on twenty morphoclimatic descriptors. A large study area in north-central Italy is adopted, where 2238 annual maximum series are available. Validation is performed over an independent set of 100 gauges. Our results show that multivariate ANNs may remarkably improve the estimation of percentiles relative to the benchmark approach from the literature, where Gumbel parameters depend on mean annual precipitation. Finally, we show that the very nature of the proposed ANN models makes them suitable for interpolating predicted sub-daily rainfall quantiles across space and time-aggregation intervals. In the second Part, decision trees are used to combine a selected blend of input geomorphic descriptors for predicting FH. Relative to existing DEM-based approaches, this method is innovative, as it relies on the combination of three characteristics: (1) simple multivariate models, (2) a set of exclusively DEM-based descriptors as input, and (3) an existing FH map as reference information. First, the methods are applied to northern Italy, represented with the MERIT DEM (∼90m resolution), and second, to the whole of Italy, represented with the EU-DEM (25m resolution). The results show that multivariate approaches may (a) significantly enhance flood-prone areas delineation relative to a selected univariate one, (b) provide accurate predictions of expected inundation depths, (c) produce encouraging results in extrapolation, (d) complete the information of imperfect reference maps, and (e) conveniently convert binary maps into continuous representation of FH.
Resumo:
In this Thesis, we investigate the cosmological co-evolution of supermassive black holes (BHs), Active Galactic Nuclei (AGN) and their hosting dark matter (DM) halos and galaxies, within the standard CDM scenario. We analyze both analytic, semi-analytic and hybrid techniques and use the most recent observational data available to constrain the assumptions underlying our models. First, we focus on very simple analytic models where the assembly of BHs is directly related to the merger history of DM haloes. For this purpose, we implement the two original analytic models of Wyithe & Loeb 2002 and Wyithe & Loeb 2003, compare their predictions to the AGN luminosity function and clustering data, and discuss possible modifications to the models that improve the match to the observation. Then we study more sophisticated semi-analytic models in which however the baryonic physics is neglected as well. Finally we improve the hybrid simulation of De Lucia & Blaizot 2007, adding new semi-analytical prescriptions to describe the BH mass accretion rate during each merger event and its conversion into radiation, and compare the derived BH scaling relations, fundamental plane and mass function, and the AGN luminosity function with observations. All our results support the following scenario: • The cosmological co-evolution of BHs, AGN and galaxies can be well described within the CDM model. • At redshifts z & 1, the evolution history of DM halo fully determines the overall properties of the BH and AGN populations. The AGN emission is triggered mainly by DM halo major mergers and, on average, AGN shine at their Eddington luminosity. • At redshifts z . 1, BH growth decouples from halo growth. Galaxy major mergers cannot constitute the only trigger to accretion episodes in this phase. • When a static hot halo has formed around a galaxy, a fraction of the hot gas continuously accretes onto the central BH, causing a low-energy “radio” activity at the galactic centre, which prevents significant gas cooling and thus limiting the mass of the central galaxies and quenching the star formation at late time. • The cold gas fraction accreted by BHs at high redshifts seems to be larger than at low redshifts.
Resumo:
The main aim of this Ph.D. dissertation is the study of clustering dependent data by means of copula functions with particular emphasis on microarray data. Copula functions are a popular multivariate modeling tool in each field where the multivariate dependence is of great interest and their use in clustering has not been still investigated. The first part of this work contains the review of the literature of clustering methods, copula functions and microarray experiments. The attention focuses on the K–means (Hartigan, 1975; Hartigan and Wong, 1979), the hierarchical (Everitt, 1974) and the model–based (Fraley and Raftery, 1998, 1999, 2000, 2007) clustering techniques because their performance is compared. Then, the probabilistic interpretation of the Sklar’s theorem (Sklar’s, 1959), the estimation methods for copulas like the Inference for Margins (Joe and Xu, 1996) and the Archimedean and Elliptical copula families are presented. In the end, applications of clustering methods and copulas to the genetic and microarray experiments are highlighted. The second part contains the original contribution proposed. A simulation study is performed in order to evaluate the performance of the K–means and the hierarchical bottom–up clustering methods in identifying clusters according to the dependence structure of the data generating process. Different simulations are performed by varying different conditions (e.g., the kind of margins (distinct, overlapping and nested) and the value of the dependence parameter ) and the results are evaluated by means of different measures of performance. In light of the simulation results and of the limits of the two investigated clustering methods, a new clustering algorithm based on copula functions (‘CoClust’ in brief) is proposed. The basic idea, the iterative procedure of the CoClust and the description of the written R functions with their output are given. The CoClust algorithm is tested on simulated data (by varying the number of clusters, the copula models, the dependence parameter value and the degree of overlap of margins) and is compared with the performance of model–based clustering by using different measures of performance, like the percentage of well–identified number of clusters and the not rejection percentage of H0 on . It is shown that the CoClust algorithm allows to overcome all observed limits of the other investigated clustering techniques and is able to identify clusters according to the dependence structure of the data independently of the degree of overlap of margins and the strength of the dependence. The CoClust uses a criterion based on the maximized log–likelihood function of the copula and can virtually account for any possible dependence relationship between observations. Many peculiar characteristics are shown for the CoClust, e.g. its capability of identifying the true number of clusters and the fact that it does not require a starting classification. Finally, the CoClust algorithm is applied to the real microarray data of Hedenfalk et al. (2001) both to the gene expressions observed in three different cancer samples and to the columns (tumor samples) of the whole data matrix.
Resumo:
Hydrothermal fluids are a fundamental resource for understanding and monitoring volcanic and non-volcanic systems. This thesis is focused on the study of hydrothermal system through numerical modeling with the geothermal simulator TOUGH2. Several simulations are presented, and geophysical and geochemical observables, arising from fluids circulation, are analyzed in detail throughout the thesis. In a volcanic setting, fluids feeding fumaroles and hot spring may play a key role in the hazard evaluation. The evolution of the fluids circulation is caused by a strong interaction between magmatic and hydrothermal systems. A simultaneous analysis of different geophysical and geochemical observables is a sound approach for interpreting monitored data and to infer a consistent conceptual model. Analyzed observables are ground displacement, gravity changes, electrical conductivity, amount, composition and temperature of the emitted gases at surface, and extent of degassing area. Results highlight the different temporal response of the considered observables, as well as the different radial pattern of variation. However, magnitude, temporal response and radial pattern of these signals depend not only on the evolution of fluid circulation, but a main role is played by the considered rock properties. Numerical simulations highlight differences that arise from the assumption of different permeabilities, for both homogeneous and heterogeneous systems. Rock properties affect hydrothermal fluid circulation, controlling both the range of variation and the temporal evolution of the observable signals. Low temperature fumaroles and low discharge rate may be affected by atmospheric conditions. Detailed parametric simulations were performed, aimed to understand the effects of system properties, such as permeability and gas reservoir overpressure, on diffuse degassing when air temperature and barometric pressure changes are applied to the ground surface. Hydrothermal circulation, however, is not only a characteristic of volcanic system. Hot fluids may be involved in several mankind problems, such as studies on geothermal engineering, nuclear waste propagation in porous medium, and Geological Carbon Sequestration (GCS). The current concept for large-scale GCS is the direct injection of supercritical carbon dioxide into deep geological formations which typically contain brine. Upward displacement of such brine from deep reservoirs driven by pressure increases resulting from carbon dioxide injection may occur through abandoned wells, permeable faults or permeable channels. Brine intrusion into aquifers may degrade groundwater resources. Numerical results show that pressure rise drives dense water up to the conduits, and does not necessarily result in continuous flow. Rather, overpressure leads to new hydrostatic equilibrium if fluids are initially density stratified. If warm and salty fluid does not cool passing through the conduit, an oscillatory solution is then possible. Parameter studies delineate steady-state (static) and oscillatory solutions.
Resumo:
The term "Brain Imaging" identi�es a set of techniques to analyze the structure and/or functional behavior of the brain in normal and/or pathological situations. These techniques are largely used in the study of brain activity. In addition to clinical usage, analysis of brain activity is gaining popularity in others recent �fields, i.e. Brain Computer Interfaces (BCI) and the study of cognitive processes. In this context, usage of classical solutions (e.g. f MRI, PET-CT) could be unfeasible, due to their low temporal resolution, high cost and limited portability. For these reasons alternative low cost techniques are object of research, typically based on simple recording hardware and on intensive data elaboration process. Typical examples are ElectroEncephaloGraphy (EEG) and Electrical Impedance Tomography (EIT), where electric potential at the patient's scalp is recorded by high impedance electrodes. In EEG potentials are directly generated from neuronal activity, while in EIT by the injection of small currents at the scalp. To retrieve meaningful insights on brain activity from measurements, EIT and EEG relies on detailed knowledge of the underlying electrical properties of the body. This is obtained from numerical models of the electric �field distribution therein. The inhomogeneous and anisotropic electric properties of human tissues make accurate modeling and simulation very challenging, leading to a tradeo�ff between physical accuracy and technical feasibility, which currently severely limits the capabilities of these techniques. Moreover elaboration of data recorded requires usage of regularization techniques computationally intensive, which influences the application with heavy temporal constraints (such as BCI). This work focuses on the parallel implementation of a work-flow for EEG and EIT data processing. The resulting software is accelerated using multi-core GPUs, in order to provide solution in reasonable times and address requirements of real-time BCI systems, without over-simplifying the complexity and accuracy of the head models.
Resumo:
I applied the SBAS-DInSAR method to the Mattinata Fault (MF) (Southern Italy) and to the Doruneh Fault System (DFS) (Central Iran). In the first case, I observed limited internal deformation and determined the right lateral kinematic pattern with a compressional pattern in the northern sector of the fault. Using the Okada model I inverted the observed velocities defining a right lateral strike slip solution for the MF. Even if it fits the data within the uncertainties, the modeled slip rate of 13-15 mm yr-1 seems too high with respect to the geological record. Concerning the Western termination of DFS, SAR data confirms the main left lateral transcurrent kinematics of this fault segment, but reveal a compressional component. My analytical model fits successfully the observed data and quantifies the slip in ~4 mm yr-1 and ~2.5 mm yr-1 of pure horizontal and vertical displacement respectively. The horizontal velocity is compatible with geological record. I applied classic SAR interferometry to the October–December 2008 Balochistan (Central Pakistan) seismic swarm; I discerned the different contributions of the three Mw > 5.7 earthquakes determining fault positions, lengths, widths, depths and slip distributions, constraining the other source parameters using different Global CMT solutions. A well constrained solution has been obtained for the 09/12/2008 aftershock, whereas I tested two possible fault solutions for the 28-29/10/08 mainshocks. It is not possible to favor one of the solutions without independent constraints derived from geological data. Finally I approached the study of the earthquake-cycle in transcurrent tectonic domains using analog modeling, with alimentary gelatins like crust analog material. I successfully joined the study of finite deformation with the earthquake cycle study and sudden dislocation. A lot of seismic cycles were reproduced in which a characteristic earthquake is recognizable in terms of displacement, coseismic velocity and recurrence time.
Resumo:
In this thesis, we extend some ideas of statistical physics to describe the properties of human mobility. By using a database containing GPS measures of individual paths (position, velocity and covered space at a spatial scale of 2 Km or a time scale of 30 sec), which includes the 2% of the private vehicles in Italy, we succeed in determining some statistical empirical laws pointing out "universal" characteristics of human mobility. Developing simple stochastic models suggesting possible explanations of the empirical observations, we are able to indicate what are the key quantities and cognitive features that are ruling individuals' mobility. To understand the features of individual dynamics, we have studied different aspects of urban mobility from a physical point of view. We discuss the implications of the Benford's law emerging from the distribution of times elapsed between successive trips. We observe how the daily travel-time budget is related with many aspects of the urban environment, and describe how the daily mobility budget is then spent. We link the scaling properties of individual mobility networks to the inhomogeneous average durations of the activities that are performed, and those of the networks describing people's common use of space with the fractional dimension of the urban territory. We study entropy measures of individual mobility patterns, showing that they carry almost the same information of the related mobility networks, but are also influenced by a hierarchy among the activities performed. We discover that Wardrop's principles are violated as drivers have only incomplete information on traffic state and therefore rely on knowledge on the average travel-times. We propose an assimilation model to solve the intrinsic scattering of GPS data on the street network, permitting the real-time reconstruction of traffic state at a urban scale.
Resumo:
Population growth in urban areas is a world-wide phenomenon. According to a recent United Nations report, over half of the world now lives in cities. Numerous health and environmental issues arise from this unprecedented urbanization. Recent studies have demonstrated the effectiveness of urban green spaces and the role they play in improving both the aesthetics and the quality of life of its residents. In particular, urban green spaces provide ecosystem services such as: urban air quality improvement by removing pollutants that can cause serious health problems, carbon storage, carbon sequestration and climate regulation through shading and evapotranspiration. Furthermore, epidemiological studies with controlled age, sex, marital and socio-economic status, have provided evidence of a positive relationship between green space and the life expectancy of senior citizens. However, there is little information on the role of public green spaces in mid-sized cities in northern Italy. To address this need, a study was conducted to assess the ecosystem services of urban green spaces in the city of Bolzano, South Tyrol, Italy. In particular, we quantified the cooling effect of urban trees and the hourly amount of pollution removed by the urban forest. The information was gathered using field data collected through local hourly air pollution readings, tree inventory and simulation models. During the study we quantified pollution removal for ozone, nitrogen dioxide, carbon monoxide and particulate matter (<10 microns). We estimated the above ground carbon stored and annually sequestered by the urban forest. Results have been compared to transportation CO2 emissions to determine the CO2 offset potential of urban streetscapes. Furthermore, we assessed commonly used methods for estimating carbon stored and sequestered by urban trees in the city of Bolzano. We also quantified ecosystem disservices such as hourly urban forest volatile organic compound emissions.
Resumo:
Photovoltaic (PV) solar panels generally produce electricity in the 6% to 16% efficiency range, the rest being dissipated in thermal losses. To recover this amount, hybrid photovoltaic thermal systems (PVT) have been devised. These are devices that simultaneously convert solar energy into electricity and heat. It is thus interesting to study the PVT system globally from different point of views in order to evaluate advantages and disadvantages of this technology and its possible uses. In particular in Chapter II, the development of the PVT absorber numerical optimization by a genetic algorithm has been carried out analyzing different internal channel profiles in order to find a right compromise between performance and technical and economical feasibility. Therefore in Chapter III ,thanks to a mobile structure built into the university lab, it has been compared experimentally electrical and thermal output power from PVT panels with separated photovoltaic and solar thermal productions. Collecting a lot of experimental data based on different seasonal conditions (ambient temperature,irradiation, wind...),the aim of this mobile structure has been to evaluate average both thermal and electrical increasing and decreasing efficiency values obtained respect to separate productions through the year. In Chapter IV , new PVT and solar thermal equation based models in steady state conditions have been developed by software Dymola that uses Modelica language. This permits ,in a simplified way respect to previous system modelling softwares, to model and evaluate different concepts about PVT panel regarding its structure before prototyping and measuring it. Chapter V concerns instead the definition of PVT boundary conditions into a HVAC system . This was made trough year simulations by software Polysun in order to finally assess the best solar assisted integrated structure thanks to F_save(solar saving energy)factor. Finally, Chapter VI presents the conclusion and the perspectives of this PhD work.
Resumo:
The advances that have been characterizing spatial econometrics in recent years are mostly theoretical and have not found an extensive empirical application yet. In this work we aim at supplying a review of the main tools of spatial econometrics and to show an empirical application for one of the most recently introduced estimators. Despite the numerous alternatives that the econometric theory provides for the treatment of spatial (and spatiotemporal) data, empirical analyses are still limited by the lack of availability of the correspondent routines in statistical and econometric software. Spatiotemporal modeling represents one of the most recent developments in spatial econometric theory and the finite sample properties of the estimators that have been proposed are currently being tested in the literature. We provide a comparison between some estimators (a quasi-maximum likelihood, QML, estimator and some GMM-type estimators) for a fixed effects dynamic panel data model under certain conditions, by means of a Monte Carlo simulation analysis. We focus on different settings, which are characterized either by fully stable or quasi-unit root series. We also investigate the extent of the bias that is caused by a non-spatial estimation of a model when the data are characterized by different degrees of spatial dependence. Finally, we provide an empirical application of a QML estimator for a time-space dynamic model which includes a temporal, a spatial and a spatiotemporal lag of the dependent variable. This is done by choosing a relevant and prolific field of analysis, in which spatial econometrics has only found limited space so far, in order to explore the value-added of considering the spatial dimension of the data. In particular, we study the determinants of cropland value in Midwestern U.S.A. in the years 1971-2009, by taking the present value model (PVM) as the theoretical framework of analysis.