23 resultados para High-dimensional data visualization

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The main purpose of this thesis is to go beyond two usual assumptions that accompany theoretical analysis in spin-glasses and inference: the i.i.d. (independently and identically distributed) hypothesis on the noise elements and the finite rank regime. The first one appears since the early birth of spin-glasses. The second one instead concerns the inference viewpoint. Disordered systems and Bayesian inference have a well-established relation, evidenced by their continuous cross-fertilization. The thesis makes use of techniques coming both from the rigorous mathematical machinery of spin-glasses, such as the interpolation scheme, and from Statistical Physics, such as the replica method. The first chapter contains an introduction to the Sherrington-Kirkpatrick and spiked Wigner models. The first is a mean field spin-glass where the couplings are i.i.d. Gaussian random variables. The second instead amounts to establish the information theoretical limits in the reconstruction of a fixed low rank matrix, the “spike”, blurred by additive Gaussian noise. In chapters 2 and 3 the i.i.d. hypothesis on the noise is broken by assuming a noise with inhomogeneous variance profile. In spin-glasses this leads to multi-species models. The inferential counterpart is called spatial coupling. All the previous models are usually studied in the Bayes-optimal setting, where everything is known about the generating process of the data. In chapter 4 instead we study the spiked Wigner model where the prior on the signal to reconstruct is ignored. In chapter 5 we analyze the statistical limits of a spiked Wigner model where the noise is no longer Gaussian, but drawn from a random matrix ensemble, which makes its elements dependent. The thesis ends with chapter 6, where the challenging problem of high-rank probabilistic matrix factorization is tackled. Here we introduce a new procedure called "decimation" and we show that it is theoretically to perform matrix factorization through it.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Following the approval of the 2030 Agenda for Sustainable Development in 2015, sustainability became a hotly debated topic. In order to build a better and more sustainable future by 2030, this agenda addressed several global issues, including inequality, climate change, peace, and justice, in the form of 17 Sustainable Development Goals (SDGs), that should be understood and pursued by nations, corporations, institutions, and individuals. In this thesis, we researched how to exploit and integrate Human-Computer Interaction (HCI) and Data Visualization to promote knowledge and awareness about SDG 8, which wants to encourage lasting, inclusive, and sustainable economic growth, full and productive employment, and decent work for all. In particular, we focused on three targets: green economy, sustainable tourism, employment, decent work for all, and social protection. The primary goal of this research is to determine whether HCI approaches may be used to create and validate interactive data visualization that can serve as helpful decision-making aids for specific groups and raise their knowledge of public-interest issues. To accomplish this goal, we analyzed four case studies. In the first two, we wanted to promote knowledge and awareness about green economy issues: we investigated the Human-Building Interaction inside a Smart Campus and the dematerialization process inside a University. In the third, we focused on smart tourism, investigating the relationship between locals and tourists to create meaningful connections and promote more sustainable tourism. In the fourth, we explored the industry context to highlight sustainability policies inside well-known companies. This research focuses on the hypothesis that interactive data visualization tools can make communities aware of sustainability aspects related to SDG8 and its targets. The research questions addressed are two: "how to promote awareness about SDG8 and its targets through interactive data visualizations?" and "to what extent are these interactive data visualizations effective?".

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In these last years a great effort has been put in the development of new techniques for automatic object classification, also due to the consequences in many applications such as medical imaging or driverless cars. To this end, several mathematical models have been developed from logistic regression to neural networks. A crucial aspect of these so called classification algorithms is the use of algebraic tools to represent and approximate the input data. In this thesis, we examine two different models for image classification based on a particular tensor decomposition named Tensor-Train (TT) decomposition. The use of tensor approaches preserves the multidimensional structure of the data and the neighboring relations among pixels. Furthermore the Tensor-Train, differently from other tensor decompositions, does not suffer from the curse of dimensionality making it an extremely powerful strategy when dealing with high-dimensional data. It also allows data compression when combined with truncation strategies that reduce memory requirements without spoiling classification performance. The first model we propose is based on a direct decomposition of the database by means of the TT decomposition to find basis vectors used to classify a new object. The second model is a tensor dictionary learning model, based on the TT decomposition where the terms of the decomposition are estimated using a proximal alternating linearized minimization algorithm with a spectral stepsize.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The thesis deals with the problem of Model Selection (MS) motivated by information and prediction theory, focusing on parametric time series (TS) models. The main contribution of the thesis is the extension to the multivariate case of the Misspecification-Resistant Information Criterion (MRIC), a criterion introduced recently that solves Akaike’s original research problem posed 50 years ago, which led to the definition of the AIC. The importance of MS is witnessed by the huge amount of literature devoted to it and published in scientific journals of many different disciplines. Despite such a widespread treatment, the contributions that adopt a mathematically rigorous approach are not so numerous and one of the aims of this project is to review and assess them. Chapter 2 discusses methodological aspects of MS from information theory. Information criteria (IC) for the i.i.d. setting are surveyed along with their asymptotic properties; and the cases of small samples, misspecification, further estimators. Chapter 3 surveys criteria for TS. IC and prediction criteria are considered for: univariate models (AR, ARMA) in the time and frequency domain, parametric multivariate (VARMA, VAR); nonparametric nonlinear (NAR); and high-dimensional models. The MRIC answers Akaike’s original question on efficient criteria, for possibly-misspecified (PM) univariate TS models in multi-step prediction with high-dimensional data and nonlinear models. Chapter 4 extends the MRIC to PM multivariate TS models for multi-step prediction introducing the Vectorial MRIC (VMRIC). We show that the VMRIC is asymptotically efficient by proving the decomposition of the MSPE matrix and the consistency of its Method-of-Moments Estimator (MoME), for Least Squares multi-step prediction with univariate regressor. Chapter 5 extends the VMRIC to the general multiple regressor case, by showing that the MSPE matrix decomposition holds, obtaining consistency for its MoME, and proving its efficiency. The chapter concludes with a digression on the conditions for PM VARX models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The study of random probability measures is a lively research topic that has attracted interest from different fields in recent years. In this thesis, we consider random probability measures in the context of Bayesian nonparametrics, where the law of a random probability measure is used as prior distribution, and in the context of distributional data analysis, where the goal is to perform inference given avsample from the law of a random probability measure. The contributions contained in this thesis can be subdivided according to three different topics: (i) the use of almost surely discrete repulsive random measures (i.e., whose support points are well separated) for Bayesian model-based clustering, (ii) the proposal of new laws for collections of random probability measures for Bayesian density estimation of partially exchangeable data subdivided into different groups, and (iii) the study of principal component analysis and regression models for probability distributions seen as elements of the 2-Wasserstein space. Specifically, for point (i) above we propose an efficient Markov chain Monte Carlo algorithm for posterior inference, which sidesteps the need of split-merge reversible jump moves typically associated with poor performance, we propose a model for clustering high-dimensional data by introducing a novel class of anisotropic determinantal point processes, and study the distributional properties of the repulsive measures, shedding light on important theoretical results which enable more principled prior elicitation and more efficient posterior simulation algorithms. For point (ii) above, we consider several models suitable for clustering homogeneous populations, inducing spatial dependence across groups of data, extracting the characteristic traits common to all the data-groups, and propose a novel vector autoregressive model to study of growth curves of Singaporean kids. Finally, for point (iii), we propose a novel class of projected statistical methods for distributional data analysis for measures on the real line and on the unit-circle.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Assimilation in the Unstable Subspace (AUS) was introduced by Trevisan and Uboldi in 2004, and developed by Trevisan, Uboldi and Carrassi, to minimize the analysis and forecast errors by exploiting the flow-dependent instabilities of the forecast-analysis cycle system, which may be thought of as a system forced by observations. In the AUS scheme the assimilation is obtained by confining the analysis increment in the unstable subspace of the forecast-analysis cycle system so that it will have the same structure of the dominant instabilities of the system. The unstable subspace is estimated by Breeding on the Data Assimilation System (BDAS). AUS- BDAS has already been tested in realistic models and observational configurations, including a Quasi-Geostrophicmodel and a high dimensional, primitive equation ocean model; the experiments include both fixed and“adaptive”observations. In these contexts, the AUS-BDAS approach greatly reduces the analysis error, with reasonable computational costs for data assimilation with respect, for example, to a prohibitive full Extended Kalman Filter. This is a follow-up study in which we revisit the AUS-BDAS approach in the more basic, highly nonlinear Lorenz 1963 convective model. We run observation system simulation experiments in a perfect model setting, and with two types of model error as well: random and systematic. In the different configurations examined, and in a perfect model setting, AUS once again shows better efficiency than other advanced data assimilation schemes. In the present study, we develop an iterative scheme that leads to a significant improvement of the overall assimilation performance with respect also to standard AUS. In particular, it boosts the efficiency of regime’s changes tracking, with a low computational cost. Other data assimilation schemes need estimates of ad hoc parameters, which have to be tuned for the specific model at hand. In Numerical Weather Prediction models, tuning of parameters — and in particular an estimate of the model error covariance matrix — may turn out to be quite difficult. Our proposed approach, instead, may be easier to implement in operational models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nuclear Magnetic Resonance (NMR) is a branch of spectroscopy that is based on the fact that many atomic nuclei may be oriented by a strong magnetic field and will absorb radiofrequency radiation at characteristic frequencies. The parameters that can be measured on the resulting spectral lines (line positions, intensities, line widths, multiplicities and transients in time-dependent experi-ments) can be interpreted in terms of molecular structure, conformation, molecular motion and other rate processes. In this way, high resolution (HR) NMR allows performing qualitative and quantitative analysis of samples in solution, in order to determine the structure of molecules in solution and not only. In the past, high-field NMR spectroscopy has mainly concerned with the elucidation of chemical structure in solution, but today is emerging as a powerful exploratory tool for probing biochemical and physical processes. It represents a versatile tool for the analysis of foods. In literature many NMR studies have been reported on different type of food such as wine, olive oil, coffee, fruit juices, milk, meat, egg, starch granules, flour, etc using different NMR techniques. Traditionally, univariate analytical methods have been used to ex-plore spectroscopic data. This method is useful to measure or to se-lect a single descriptive variable from the whole spectrum and , at the end, only this variable is analyzed. This univariate methods ap-proach, applied to HR-NMR data, lead to different problems due especially to the complexity of an NMR spectrum. In fact, the lat-ter is composed of different signals belonging to different mole-cules, but it is also true that the same molecules can be represented by different signals, generally strongly correlated. The univariate methods, in this case, takes in account only one or a few variables, causing a loss of information. Thus, when dealing with complex samples like foodstuff, univariate analysis of spectra data results not enough powerful. Spectra need to be considered in their wholeness and, for analysing them, it must be taken in consideration the whole data matrix: chemometric methods are designed to treat such multivariate data. Multivariate data analysis is used for a number of distinct, differ-ent purposes and the aims can be divided into three main groups: • data description (explorative data structure modelling of any ge-neric n-dimensional data matrix, PCA for example); • regression and prediction (PLS); • classification and prediction of class belongings for new samples (LDA and PLS-DA and ECVA). The aim of this PhD thesis was to verify the possibility of identify-ing and classifying plants or foodstuffs, in different classes, based on the concerted variation in metabolite levels, detected by NMR spectra and using the multivariate data analysis as a tool to inter-pret NMR information. It is important to underline that the results obtained are useful to point out the metabolic consequences of a specific modification on foodstuffs, avoiding the use of a targeted analysis for the different metabolites. The data analysis is performed by applying chemomet-ric multivariate techniques to the NMR dataset of spectra acquired. The research work presented in this thesis is the result of a three years PhD study. This thesis reports the main results obtained from these two main activities: A1) Evaluation of a data pre-processing system in order to mini-mize unwanted sources of variations, due to different instrumental set up, manual spectra processing and to sample preparations arte-facts; A2) Application of multivariate chemiometric models in data analy-sis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Natural hazard related to the volcanic activity represents a potential risk factor, particularly in the vicinity of human settlements. Besides to the risk related to the explosive and effusive activity, the instability of volcanic edifices may develop into large landslides often catastrophically destructive, as shown by the collapse of the northern flank of Mount St. Helens in 1980. A combined approach was applied to analyse slope failures that occurred at Stromboli volcano. SdF slope stability was evaluated by using high-resolution multi-temporal DTMMs and performing limit equilibrium stability analyses. High-resolution topographical data collected with remote sensing techniques and three-dimensional slope stability analysis play a key role in understanding instability mechanism and the related risks. Analyses carried out on the 2002–2003 and 2007 Stromboli eruptions, starting from high-resolution data acquired through airborne remote sensing surveys, permitted the estimation of the lava volumes emplaced on the SdF slope and contributed to the investigation of the link between magma emission and slope instabilities. Limit Equilibrium analyses were performed on the 2001 and 2007 3D models, in order to simulate the slope behavior before 2002-2003 landslide event and after the 2007 eruption. Stability analyses were conducted to understand the mechanisms that controlled the slope deformations which occurred shortly after the 2007 eruption onset, involving the upper part of slope. Limit equilibrium analyses applied to both cases yielded results which are congruent with observations and monitoring data. The results presented in this work undoubtedly indicate that hazard assessment for the island of Stromboli should take into account the fact that a new magma intrusion could lead to further destabilisation of the slope, which may be more significant than the one recently observed because it will affect an already disarranged deposit and fractured and loosened crater area. The two-pronged approach based on the analysis of 3D multi-temporal mapping datasets and on the application of LE methods contributed to better understanding volcano flank behaviour and to be prepared to undertake actions aimed at risk mitigation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this thesis, the viability of the Dynamic Mode Decomposition (DMD) as a technique to analyze and model complex dynamic real-world systems is presented. This method derives, directly from data, computationally efficient reduced-order models (ROMs) which can replace too onerous or unavailable high-fidelity physics-based models. Optimizations and extensions to the standard implementation of the methodology are proposed, investigating diverse case studies related to the decoding of complex flow phenomena. The flexibility of this data-driven technique allows its application to high-fidelity fluid dynamics simulations, as well as time series of real systems observations. The resulting ROMs are tested against two tasks: (i) reduction of the storage requirements of high-fidelity simulations or observations; (ii) interpolation and extrapolation of missing data. The capabilities of DMD can also be exploited to alleviate the cost of onerous studies that require many simulations, such as uncertainty quantification analysis, especially when dealing with complex high-dimensional systems. In this context, a novel approach to address parameter variability issues when modeling systems with space and time-variant response is proposed. Specifically, DMD is merged with another model-reduction technique, namely the Polynomial Chaos Expansion, for uncertainty quantification purposes. Useful guidelines for DMD deployment result from the study, together with the demonstration of its potential to ease diagnosis and scenario analysis when complex flow processes are involved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The research is part of a survey for the detection of the hydraulic and geotechnical conditions of river embankments funded by the Reno River Basin Regional Technical Service of the Region Emilia-Romagna. The hydraulic safety of the Reno River, one of the main rivers in North-Eastern Italy, is indeed of primary importance to the Emilia-Romagna regional administration. The large longitudinal extent of the banks (several hundreds of kilometres) has placed great interest in non-destructive geophysical methods, which, compared to other methods such as drilling, allow for the faster and often less expensive acquisition of high-resolution data. The present work aims to experience the Ground Penetrating Radar (GPR) for the detection of local non-homogeneities (mainly stratigraphic contacts, cavities and conduits) inside the Reno River and its tributaries embankments, taking into account supplementary data collected with traditional destructive tests (boreholes, cone penetration tests etc.). A comparison with non-destructive methodologies likewise electric resistivity tomography (ERT), Multi-channels Analysis of Surface Waves (MASW), FDEM induction, was also carried out in order to verify the usability of GPR and to provide integration of various geophysical methods in the process of regular maintenance and check of the embankments condition. The first part of this thesis is dedicated to the explanation of the state of art concerning the geographic, geomorphologic and geotechnical characteristics of Reno River and its tributaries embankments, as well as the description of some geophysical applications provided on embankments belonging to European and North-American Rivers, which were used as bibliographic basis for this thesis realisation. The second part is an overview of the geophysical methods that were employed for this research, (with a particular attention to the GPR), reporting also their theoretical basis and a deepening of some techniques of the geophysical data analysis and representation, when applied to river embankments. The successive chapters, following the main scope of this research that is to highlight advantages and drawbacks in the use of Ground Penetrating Radar applied to Reno River and its tributaries embankments, show the results obtained analyzing different cases that could yield the formation of weakness zones, which successively lead to the embankment failure. As advantages, a considerable velocity of acquisition and a spatial resolution of the obtained data, incomparable with respect to other methodologies, were recorded. With regard to the drawbacks, some factors, related to the attenuation losses of wave propagation, due to different content in clay, silt, and sand, as well as surface effects have significantly limited the correlation between GPR profiles and geotechnical information and therefore compromised the embankment safety assessment. Recapitulating, the Ground Penetrating Radar could represent a suitable tool for checking up river dike conditions, but its use has significantly limited by geometric and geotechnical characteristics of the Reno River and its tributaries levees. As a matter of facts, only the shallower part of the embankment was investigate, achieving also information just related to changes in electrical properties, without any numerical measurement. Furthermore, GPR application is ineffective for a preliminary assessment of embankment safety conditions, while for detailed campaigns at shallow depth, which aims to achieve immediate results with optimal precision, its usage is totally recommended. The cases where multidisciplinary approach was tested, reveal an optimal interconnection of the various geophysical methodologies employed, producing qualitative results concerning the preliminary phase (FDEM), assuring quantitative and high confidential description of the subsoil (ERT) and finally, providing fast and highly detailed analysis (GPR). Trying to furnish some recommendations for future researches, the simultaneous exploitation of many geophysical devices to assess safety conditions of river embankments is absolutely suggested, especially to face reliable flood event, when the entire extension of the embankments themselves must be investigated.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis is devoted to the study of the properties of high-redsfhit galaxies in the epoch 1 < z < 3, when a substantial fraction of galaxy mass was assembled, and when the evolution of the star-formation rate density peaked. Following a multi-perspective approach and using the most recent and high-quality data available (spectra, photometry and imaging), the morphologies and the star-formation properties of high-redsfhit galaxies were investigated. Through an accurate morphological analyses, the built up of the Hubble sequence was placed around z ~ 2.5. High-redshift galaxies appear, in general, much more irregular and asymmetric than local ones. Moreover, the occurrence of morphological k-­correction is less pronounced than in the local Universe. Different star-formation rate indicators were also studied. The comparison of ultra-violet and optical based estimates, with the values derived from infra-red luminosity showed that the traditional way of addressing the dust obscuration is problematic, at high-redshifts, and new models of dust geometry and composition are required. Finally, by means of stacking techniques applied to rest-frame ultra-violet spectra of star-forming galaxies at z~2, the warm phase of galactic-scale outflows was studied. Evidence was found of escaping gas at velocities of ~ 100 km/s. Studying the correlation of inter-­stellar absorption lines equivalent widths with galaxy physical properties, the intensity of the outflow-related spectral features was proven to depend strongly on a combination of the velocity dispersion of the gas and its geometry.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Different types of proteins exist with diverse functions that are essential for living organisms. An important class of proteins is represented by transmembrane proteins which are specifically designed to be inserted into biological membranes and devised to perform very important functions in the cell such as cell communication and active transport across the membrane. Transmembrane β-barrels (TMBBs) are a sub-class of membrane proteins largely under-represented in structure databases because of the extreme difficulty in experimental structure determination. For this reason, computational tools that are able to predict the structure of TMBBs are needed. In this thesis, two computational problems related to TMBBs were addressed: the detection of TMBBs in large datasets of proteins and the prediction of the topology of TMBB proteins. Firstly, a method for TMBB detection was presented based on a novel neural network framework for variable-length sequence classification. The proposed approach was validated on a non-redundant dataset of proteins. Furthermore, we carried-out genome-wide detection using the entire Escherichia coli proteome. In both experiments, the method significantly outperformed other existing state-of-the-art approaches, reaching very high PPV (92%) and MCC (0.82). Secondly, a method was also introduced for TMBB topology prediction. The proposed approach is based on grammatical modelling and probabilistic discriminative models for sequence data labeling. The method was evaluated using a newly generated dataset of 38 TMBB proteins obtained from high-resolution data in the PDB. Results have shown that the model is able to correctly predict topologies of 25 out of 38 protein chains in the dataset. When tested on previously released datasets, the performances of the proposed approach were measured as comparable or superior to the current state-of-the-art of TMBB topology prediction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis consists of three self-contained papers. In the first paper I analyze the labor supply behavior of Bologna Pizza Delivery Vendors. Recent influential papers analyze labor supply behavior of taxi drivers (Camerer et al., 1997; and Crawford and Meng, 2011) and suggest that reference-dependence preferences have an important influence on drivers’ labor-supply decisions. Unlike previous papers, I am able to identify an exogenous and transitory change in labor demand. Using high frequency data on orders and rainfall as an exogenous demand shifter, I invariably find that reference-dependent preferences play no role in their labor’ supply decisions and the behavior of pizza vendors is perfectly consistent with the predictions of the standard model of labor’ supply. In the second paper, I investigate how the voting behavior of Members of Parliament is influenced by the Members seating nearby. By exploiting the random seating arrangements in the Icelandic Parliament, I show that being seated next to Members of a different party increases the probability of not being aligned with one’s own party. Using the exact spatial orientation of the peers, I provide evidence that supports the hypothesis that interaction is the main channel that explain these results. In the third paper, I provide an estimate of the trade flows that there would have been between the UK and Europe if the UK had joined the Euro. As an alternative approach to the standard log-linear gravity equation I employ the synthetic control method. I show that the aggregate trade flows between Britain and Europe would have been 13% higher if the UK had adopted the Euro.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The first paper sheds light on the informational content of high frequency data and daily data. I assess the economic value of the two family models comparing their performance in forecasting asset volatility through the Value at Risk metric. In running the comparison this paper introduces two key assumptions: jumps in prices and leverage effect in volatility dynamics. Findings suggest that high frequency data models do not exhibit a superior performance over daily data models. In the second paper, building on Majewski et al. (2015), I propose an affine-discrete time model, labeled VARG-J, which is characterized by a multifactor volatility specification. In the VARG-J model volatility experiences periods of extreme movements through a jump factor modeled as an Autoregressive Gamma Zero process. The estimation under historical measure is done by quasi-maximum likelihood and the Extended Kalman Filter. This strategy allows to filter out both volatility factors introducing a measurement equation that relates the Realized Volatility to latent volatility. The risk premia parameters are calibrated using call options written on S&P500 Index. The results clearly illustrate the important contribution of the jump factor in the pricing performance of options and the economic significance of the volatility jump risk premia. In the third paper, I analyze whether there is empirical evidence of contagion at the bank level, measuring the direction and the size of contagion transmission between European markets. In order to understand and quantify the contagion transmission on banking market, I estimate the econometric model by Aït-Sahalia et al. (2015) in which contagion is defined as the within and between countries transmission of shocks and asset returns are directly modeled as a Hawkes jump diffusion process. The empirical analysis indicates that there is a clear evidence of contagion from Greece to European countries as well as self-contagion in all countries.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Prokaryotic organisms are one of the most successful forms of life, they are present in all known ecosystems. The deluge diversity of bacteria reflects their ability to colonise every environment. Also, human beings host trillions of microorganisms in their body districts, including skin, mucosae, and gut. This symbiosis is active for all other terrestrial and marine animals, as well as plants. With the term holobiont we refer, with a single word, to the systems including both the host and its symbiotic microbial species. The coevolution of bacteria within their ecological niches reflects the adaptation of both host and guest species, and it is shaped by complex interactions that are pivotal for determining the host state. Nowadays, thanks to the current sequencing technologies, Next Generation Sequencing, we have unprecedented tools for investigating the bacterial life by studying the prokaryotic genome sequences. NGS revolution has been sustained by the advancements in computational performance, in terms of speed, storage capacity, algorithm development and hardware costs decreasing following the Moore’s Law. Bioinformaticians and computational biologists design and implement ad hoc tools able to analyse high-throughput data and extract valuable biological information. Metagenomics requires the integration of life and computational sciences and it is uncovering the deluge diversity of the bacterial world. The present thesis work focuses mainly on the analysis of prokaryotic genomes under different aspects. Being supervised by two groups at the University of Bologna, the Biocomputing group and the group of Microbial Ecology of Health, I investigated three different topics: i) antimicrobial resistance, particularly with respect to missense point mutations involved in the resistant phenotype, ii) bacterial mechanisms involved in xenobiotic degradation via the computational analysis of metagenomic samples, and iii) the variation of the human gut microbiota through ageing, in elderly and longevous individuals.