11 resultados para Mini-scale method
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.
Resumo:
The continuous increase of genome sequencing projects produced a huge amount of data in the last 10 years: currently more than 600 prokaryotic and 80 eukaryotic genomes are fully sequenced and publically available. However the sole sequencing process of a genome is able to determine just raw nucleotide sequences. This is only the first step of the genome annotation process that will deal with the issue of assigning biological information to each sequence. The annotation process is done at each different level of the biological information processing mechanism, from DNA to protein, and cannot be accomplished only by in vitro analysis procedures resulting extremely expensive and time consuming when applied at a this large scale level. Thus, in silico methods need to be used to accomplish the task. The aim of this work was the implementation of predictive computational methods to allow a fast, reliable, and automated annotation of genomes and proteins starting from aminoacidic sequences. The first part of the work was focused on the implementation of a new machine learning based method for the prediction of the subcellular localization of soluble eukaryotic proteins. The method is called BaCelLo, and was developed in 2006. The main peculiarity of the method is to be independent from biases present in the training dataset, which causes the over‐prediction of the most represented examples in all the other available predictors developed so far. This important result was achieved by a modification, made by myself, to the standard Support Vector Machine (SVM) algorithm with the creation of the so called Balanced SVM. BaCelLo is able to predict the most important subcellular localizations in eukaryotic cells and three, kingdom‐specific, predictors were implemented. In two extensive comparisons, carried out in 2006 and 2008, BaCelLo reported to outperform all the currently available state‐of‐the‐art methods for this prediction task. BaCelLo was subsequently used to completely annotate 5 eukaryotic genomes, by integrating it in a pipeline of predictors developed at the Bologna Biocomputing group by Dr. Pier Luigi Martelli and Dr. Piero Fariselli. An online database, called eSLDB, was developed by integrating, for each aminoacidic sequence extracted from the genome, the predicted subcellular localization merged with experimental and similarity‐based annotations. In the second part of the work a new, machine learning based, method was implemented for the prediction of GPI‐anchored proteins. Basically the method is able to efficiently predict from the raw aminoacidic sequence both the presence of the GPI‐anchor (by means of an SVM), and the position in the sequence of the post‐translational modification event, the so called ω‐site (by means of an Hidden Markov Model (HMM)). The method is called GPIPE and reported to greatly enhance the prediction performances of GPI‐anchored proteins over all the previously developed methods. GPIPE was able to predict up to 88% of the experimentally annotated GPI‐anchored proteins by maintaining a rate of false positive prediction as low as 0.1%. GPIPE was used to completely annotate 81 eukaryotic genomes, and more than 15000 putative GPI‐anchored proteins were predicted, 561 of which are found in H. sapiens. In average 1% of a proteome is predicted as GPI‐anchored. A statistical analysis was performed onto the composition of the regions surrounding the ω‐site that allowed the definition of specific aminoacidic abundances in the different considered regions. Furthermore the hypothesis that compositional biases are present among the four major eukaryotic kingdoms, proposed in literature, was tested and rejected. All the developed predictors and databases are freely available at: BaCelLo http://gpcr.biocomp.unibo.it/bacello eSLDB http://gpcr.biocomp.unibo.it/esldb GPIPE http://gpcr.biocomp.unibo.it/gpipe
Resumo:
In this PhD thesis the crashworthiness topic is studied with the perspective of the development of a small-scale experimental test able to characterize a material in terms of energy absorption. The material properties obtained are then used to validate a nu- merical model of the experimental test itself. Consequently, the numerical model, calibrated on the specific ma- terial, can be extended to more complex structures and used to simulate their energy absorption behavior. The experimental activity started at University of Washington in Seattle, WA (USA) and continued at Second Faculty of Engi- neering, University of Bologna, Forl`ı (Italy), where the numerical model for the simulation of the experimental test was implemented and optimized.
Resumo:
In the thesis we present the implementation of the quadratic maximum likelihood (QML) method, ideal to estimate the angular power spectrum of the cross-correlation between cosmic microwave background (CMB) and large scale structure (LSS) maps as well as their individual auto-spectra. Such a tool is an optimal method (unbiased and with minimum variance) in pixel space and goes beyond all the previous harmonic analysis present in the literature. We describe the implementation of the QML method in the {\it BolISW} code and demonstrate its accuracy on simulated maps throughout a Monte Carlo. We apply this optimal estimator to WMAP 7-year and NRAO VLA Sky Survey (NVSS) data and explore the robustness of the angular power spectrum estimates obtained by the QML method. Taking into account the shot noise and one of the systematics (declination correction) in NVSS, we can safely use most of the information contained in this survey. On the contrary we neglect the noise in temperature since WMAP is already cosmic variance dominated on the large scales. Because of a discrepancy in the galaxy auto spectrum between the estimates and the theoretical model, we use two different galaxy distributions: the first one with a constant bias $b$ and the second one with a redshift dependent bias $b(z)$. Finally, we make use of the angular power spectrum estimates obtained by the QML method to derive constraints on the dark energy critical density in a flat $\Lambda$CDM model by different likelihood prescriptions. When using just the cross-correlation between WMAP7 and NVSS maps with 1.8° resolution, we show that $\Omega_\Lambda$ is about the 70\% of the total energy density, disfavouring an Einstein-de Sitter Universe at more than 2 $\sigma$ CL (confidence level).
Resumo:
Concerns over global change and its effect on coral reef survivorship have highlighted the need for long-term datasets and proxy records, to interpret environmental trends and inform policymakers. Citizen science programs have showed to be a valid method for collecting data, reducing financial and time costs for institutions. This study is based on the elaboration of data collected by recreational divers and its main purpose is to evaluate changes in the state of coral reef biodiversity in the Red Sea over a long term period and validate the volunteer-based monitoring method. Volunteers recreational divers completed a questionnaire after each dive, recording the presence of 72 animal taxa and negative reef conditions. Comparisons were made between records from volunteers and independent records from a marine biologist who performed the same dive at the same time. A total of 500 volunteers were tested in 78 validation trials. Relative values of accuracy, reliability and similarity seem to be comparable to those performed by volunteer divers on precise transects in other projects, or in community-based terrestrial monitoring. 9301 recreational divers participated in the monitoring program, completing 23,059 survey questionnaires in a 5-year period. The volunteer-sightings-based index showed significant differences between the geographical areas. The area of Hurghada is distinguished by a medium-low biodiversity index, heavily damaged by a not controlled anthropic exploitation. Coral reefs along the Ras Mohammed National Park at Sharm el Sheikh, conversely showed high biodiversity index. The detected pattern seems to be correlated with the conservation measures adopted. In our experience and that of other research institutes, citizen science can integrate conventional methods and significantly reduce costs and time. Involving recreational divers we were able to build a large data set, covering a wide geographic area. The main limitation remains the difficulty of obtaining an homogeneous spatial sampling distribution.
Resumo:
La Tesi analizza le relazioni tra i processi di sviluppo agricolo e l’uso delle risorse naturali, in particolare di quelle energetiche, a livello internazionale (paesi in via di sviluppo e sviluppati), nazionale (Italia), regionale (Emilia Romagna) e aziendale, con lo scopo di valutare l’eco-efficienza dei processi di sviluppo agricolo, la sua evoluzione nel tempo e le principali dinamiche in relazione anche ai problemi di dipendenza dalle risorse fossili, della sicurezza alimentare, della sostituzione tra superfici agricole dedicate all’alimentazione umana ed animale. Per i due casi studio a livello macroeconomico è stata adottata la metodologia denominata “SUMMA” SUstainability Multi-method, multi-scale Assessment (Ulgiati et al., 2006), che integra una serie di categorie d’impatto dell’analisi del ciclo di vita, LCA, valutazioni costi-benefici e la prospettiva di analisi globale della contabilità emergetica. L’analisi su larga scala è stata ulteriormente arricchita da un caso studio sulla scala locale, di una fattoria produttrice di latte e di energia elettrica rinnovabile (fotovoltaico e biogas). Lo studio condotto mediante LCA e valutazione contingente ha valutato gli effetti ambientali, economici e sociali di scenari di riduzione della dipendenza dalle fonti fossili. I casi studio a livello macroeconomico dimostrano che, nonostante le politiche di supporto all’aumento di efficienza e a forme di produzione “verdi”, l’agricoltura a livello globale continua ad evolvere con un aumento della sua dipendenza dalle fonti energetiche fossili. I primi effetti delle politiche agricole comunitarie verso una maggiore sostenibilità sembrano tuttavia intravedersi per i Paesi Europei. Nel complesso la energy footprint si mantiene alta poiché la meccanizzazione continua dei processi agricoli deve necessariamente attingere da fonti energetiche sostitutive al lavoro umano. Le terre agricole diminuiscono nei paesi europei analizzati e in Italia aumentando i rischi d’insicurezza alimentare giacché la popolazione nazionale sta invece aumentando.
Resumo:
The energy released during a seismic crisis in volcanic areas is strictly related to the physical processes in the volcanic structure. In particular Long Period seismicity, that seems to be related to the oscillation of a fluid-filled crack (Chouet , 1996, Chouet, 2003, McNutt, 2005), can precedes or accompanies an eruption. The present doctoral thesis is focused on the study of the LP seismicity recorded in the Campi Flegrei volcano (Campania, Italy) during the October 2006 crisis. Campi Flegrei Caldera is an active caldera; the combination of an active magmatic system and a dense populated area make the Campi Flegrei a critical volcano. The source dynamic of LP seismicity is thought to be very different from the other kind of seismicity ( Tectonic or Volcano Tectonic): it’s characterized by a time sustained source and a low content in frequency. This features implies that the duration–magnitude, that is commonly used for VT events and sometimes for LPs as well, is unadapted for LP magnitude evaluation. The main goal of this doctoral work was to develop a method for the determination of the magnitude for the LP seismicity; it’s based on the comparison of the energy of VT event and LP event, linking the energy to the VT moment magnitude. So the magnitude of the LP event would be the moment magnitude of a VT event with the same energy of the LP. We applied this method to the LP data-set recorded at Campi Flegrei caldera in 2006, to an LP data-set of Colima volcano recorded in 2005 – 2006 and for an event recorded at Etna volcano. Experimenting this method to lots of waveforms recorded at different volcanoes we tested its easy applicability and consequently its usefulness in the routinely and in the quasi-real time work of a volcanological observatory.
Resumo:
The Vrancea region, at the south-eastern bend of the Carpathian Mountains in Romania, represents one of the most puzzling seismically active zones of Europe. Beside some shallow seismicity spread across the whole Romanian territory, Vrancea is the place of an intense seismicity with the presence of a cluster of intermediate-depth foci placed in a narrow nearly vertical volume. Although large-scale mantle seismic tomographic studies have revealed the presence of a narrow, almost vertical, high-velocity body in the upper mantle, the nature and the geodynamic of this deep intra-continental seismicity is still questioned. High-resolution seismic tomography could help to reveal more details in the subcrustal structure of Vrancea. Recent developments in computational seismology as well as the availability of parallel computing now allow to potentially retrieve more information out of seismic waveforms and to reach such high-resolution models. This study was aimed to evaluate the application of a full waveform inversion tomography at regional scale for the Vrancea lithosphere using data from the 1999 six months temporary local network CALIXTO. Starting from a detailed 3D Vp, Vs and density model, built on classical travel-time tomography together with gravity data, I evaluated the improvements obtained with the full waveform inversion approach. The latter proved to be highly problem dependent and highly computational expensive. The model retrieved after the first two iterations does not show large variations with respect to the initial model but remains in agreement with previous tomographic models. It presents a well-defined downgoing slab shape high velocity anomaly, composed of a N-S horizontal anomaly in the depths between 40 and 70km linked to a nearly vertical NE-SW anomaly from 70 to 180km.
Resumo:
Several decision and control tasks in cyber-physical networks can be formulated as large- scale optimization problems with coupling constraints. In these "constraint-coupled" problems, each agent is associated to a local decision variable, subject to individual constraints. This thesis explores the use of primal decomposition techniques to develop tailored distributed algorithms for this challenging set-up over graphs. We first develop a distributed scheme for convex problems over random time-varying graphs with non-uniform edge probabilities. The approach is then extended to unknown cost functions estimated online. Subsequently, we consider Mixed-Integer Linear Programs (MILPs), which are of great interest in smart grid control and cooperative robotics. We propose a distributed methodological framework to compute a feasible solution to the original MILP, with guaranteed suboptimality bounds, and extend it to general nonconvex problems. Monte Carlo simulations highlight that the approach represents a substantial breakthrough with respect to the state of the art, thus representing a valuable solution for new toolboxes addressing large-scale MILPs. We then propose a distributed Benders decomposition algorithm for asynchronous unreliable networks. The framework has been then used as starting point to develop distributed methodologies for a microgrid optimal control scenario. We develop an ad-hoc distributed strategy for a stochastic set-up with renewable energy sources, and show a case study with samples generated using Generative Adversarial Networks (GANs). We then introduce a software toolbox named ChoiRbot, based on the novel Robot Operating System 2, and show how it facilitates simulations and experiments in distributed multi-robot scenarios. Finally, we consider a Pickup-and-Delivery Vehicle Routing Problem for which we design a distributed method inspired to the approach of general MILPs, and show the efficacy through simulations and experiments in ChoiRbot with ground and aerial robots.
Resumo:
Landslides are common features of the landscape of the north-central Apennine mountain range and cause frequent damage to human facilities and infrastructure. Most of these landslides move periodically with moderate velocities and, only after particular rainfall events, some accelerate abruptly. Synthetic aperture radar interferometry (InSAR) provides a particularly convenient method for studying deforming slopes. We use standard two-pass interferometry, taking advantage of the short revisit time of the Sentinel-1 satellites. In this paper we present the results of the InSAR analysis developed on several study areas in central and Northern Italian Apennines. The aims of the work described within the articles contained in this paper, concern: i) the potential of the standard two-pass interferometric technique for the recognition of active landslides; ii) the exploration of the potential related to the displacement time series resulting from a two-pass multiple time-scale InSAR analysis; iii) the evaluation of the possibility of making comparisons with climate forcing for cognitive and risk assessment purposes. Our analysis successfully identified more than 400 InSAR deformation signals (IDS) in the different study areas corresponding to active slope movements. The comparison between IDSs and thematic maps allowed us to identify the main characteristics of the slopes most prone to landslides. The analysis of displacement time series derived from monthly interferometric stacks or single 6-day interferograms allowed the establishment of landslide activity thresholds. This information, combined with the displacement time series, allowed the relationship between ground deformation and climate forcing to be successfully investigated. The InSAR data also gave access to the possibility of validating geographical warning systems and comparing the activity state of landslides with triggering probability thresholds.