4 resultados para Need for Uniqueness Scale
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
The continuous increase of genome sequencing projects produced a huge amount of data in the last 10 years: currently more than 600 prokaryotic and 80 eukaryotic genomes are fully sequenced and publically available. However the sole sequencing process of a genome is able to determine just raw nucleotide sequences. This is only the first step of the genome annotation process that will deal with the issue of assigning biological information to each sequence. The annotation process is done at each different level of the biological information processing mechanism, from DNA to protein, and cannot be accomplished only by in vitro analysis procedures resulting extremely expensive and time consuming when applied at a this large scale level. Thus, in silico methods need to be used to accomplish the task. The aim of this work was the implementation of predictive computational methods to allow a fast, reliable, and automated annotation of genomes and proteins starting from aminoacidic sequences. The first part of the work was focused on the implementation of a new machine learning based method for the prediction of the subcellular localization of soluble eukaryotic proteins. The method is called BaCelLo, and was developed in 2006. The main peculiarity of the method is to be independent from biases present in the training dataset, which causes the over‐prediction of the most represented examples in all the other available predictors developed so far. This important result was achieved by a modification, made by myself, to the standard Support Vector Machine (SVM) algorithm with the creation of the so called Balanced SVM. BaCelLo is able to predict the most important subcellular localizations in eukaryotic cells and three, kingdom‐specific, predictors were implemented. In two extensive comparisons, carried out in 2006 and 2008, BaCelLo reported to outperform all the currently available state‐of‐the‐art methods for this prediction task. BaCelLo was subsequently used to completely annotate 5 eukaryotic genomes, by integrating it in a pipeline of predictors developed at the Bologna Biocomputing group by Dr. Pier Luigi Martelli and Dr. Piero Fariselli. An online database, called eSLDB, was developed by integrating, for each aminoacidic sequence extracted from the genome, the predicted subcellular localization merged with experimental and similarity‐based annotations. In the second part of the work a new, machine learning based, method was implemented for the prediction of GPI‐anchored proteins. Basically the method is able to efficiently predict from the raw aminoacidic sequence both the presence of the GPI‐anchor (by means of an SVM), and the position in the sequence of the post‐translational modification event, the so called ω‐site (by means of an Hidden Markov Model (HMM)). The method is called GPIPE and reported to greatly enhance the prediction performances of GPI‐anchored proteins over all the previously developed methods. GPIPE was able to predict up to 88% of the experimentally annotated GPI‐anchored proteins by maintaining a rate of false positive prediction as low as 0.1%. GPIPE was used to completely annotate 81 eukaryotic genomes, and more than 15000 putative GPI‐anchored proteins were predicted, 561 of which are found in H. sapiens. In average 1% of a proteome is predicted as GPI‐anchored. A statistical analysis was performed onto the composition of the regions surrounding the ω‐site that allowed the definition of specific aminoacidic abundances in the different considered regions. Furthermore the hypothesis that compositional biases are present among the four major eukaryotic kingdoms, proposed in literature, was tested and rejected. All the developed predictors and databases are freely available at: BaCelLo http://gpcr.biocomp.unibo.it/bacello eSLDB http://gpcr.biocomp.unibo.it/esldb GPIPE http://gpcr.biocomp.unibo.it/gpipe
Resumo:
This work presents hybrid Constraint Programming (CP) and metaheuristic methods for the solution of Large Scale Optimization Problems; it aims at integrating concepts and mechanisms from the metaheuristic methods to a CP-based tree search environment in order to exploit the advantages of both approaches. The modeling and solution of large scale combinatorial optimization problem is a topic which has arisen the interest of many researcherers in the Operations Research field; combinatorial optimization problems are widely spread in everyday life and the need of solving difficult problems is more and more urgent. Metaheuristic techniques have been developed in the last decades to effectively handle the approximate solution of combinatorial optimization problems; we will examine metaheuristics in detail, focusing on the common aspects of different techniques. Each metaheuristic approach possesses its own peculiarities in designing and guiding the solution process; our work aims at recognizing components which can be extracted from metaheuristic methods and re-used in different contexts. In particular we focus on the possibility of porting metaheuristic elements to constraint programming based environments, as constraint programming is able to deal with feasibility issues of optimization problems in a very effective manner. Moreover, CP offers a general paradigm which allows to easily model any type of problem and solve it with a problem-independent framework, differently from local search and metaheuristic methods which are highly problem specific. In this work we describe the implementation of the Local Branching framework, originally developed for Mixed Integer Programming, in a CP-based environment. Constraint programming specific features are used to ease the search process, still mantaining an absolute generality of the approach. We also propose a search strategy called Sliced Neighborhood Search, SNS, that iteratively explores slices of large neighborhoods of an incumbent solution by performing CP-based tree search and encloses concepts from metaheuristic techniques. SNS can be used as a stand alone search strategy, but it can alternatively be embedded in existing strategies as intensification and diversification mechanism. In particular we show its integration within the CP-based local branching. We provide an extensive experimental evaluation of the proposed approaches on instances of the Asymmetric Traveling Salesman Problem and of the Asymmetric Traveling Salesman Problem with Time Windows. The proposed approaches achieve good results on practical size problem, thus demonstrating the benefit of integrating metaheuristic concepts in CP-based frameworks.
Resumo:
Concerns over global change and its effect on coral reef survivorship have highlighted the need for long-term datasets and proxy records, to interpret environmental trends and inform policymakers. Citizen science programs have showed to be a valid method for collecting data, reducing financial and time costs for institutions. This study is based on the elaboration of data collected by recreational divers and its main purpose is to evaluate changes in the state of coral reef biodiversity in the Red Sea over a long term period and validate the volunteer-based monitoring method. Volunteers recreational divers completed a questionnaire after each dive, recording the presence of 72 animal taxa and negative reef conditions. Comparisons were made between records from volunteers and independent records from a marine biologist who performed the same dive at the same time. A total of 500 volunteers were tested in 78 validation trials. Relative values of accuracy, reliability and similarity seem to be comparable to those performed by volunteer divers on precise transects in other projects, or in community-based terrestrial monitoring. 9301 recreational divers participated in the monitoring program, completing 23,059 survey questionnaires in a 5-year period. The volunteer-sightings-based index showed significant differences between the geographical areas. The area of Hurghada is distinguished by a medium-low biodiversity index, heavily damaged by a not controlled anthropic exploitation. Coral reefs along the Ras Mohammed National Park at Sharm el Sheikh, conversely showed high biodiversity index. The detected pattern seems to be correlated with the conservation measures adopted. In our experience and that of other research institutes, citizen science can integrate conventional methods and significantly reduce costs and time. Involving recreational divers we were able to build a large data set, covering a wide geographic area. The main limitation remains the difficulty of obtaining an homogeneous spatial sampling distribution.
Resumo:
This thesis proposes an integrated holistic approach to the study of neuromuscular fatigue in order to encompass all the causes and all the consequences underlying the phenomenon. Starting from the metabolic processes occurring at the cellular level, the reader is guided toward the physiological changes at the motorneuron and motor unit level and from this to the more general biomechanical alterations. In Chapter 1 a list of the various definitions for fatigue spanning several contexts has been reported. In Chapter 2, the electrophysiological changes in terms of motor unit behavior and descending neural drive to the muscle have been studied extensively as well as the biomechanical adaptations induced. In Chapter 3 a study based on the observation of temporal features extracted from sEMG signals has been reported leading to the need of a more robust and reliable indicator during fatiguing tasks. Therefore, in Chapter 4, a novel bi-dimensional parameter is proposed. The study on sEMG-based indicators opened a scenario also on neurophysiological mechanisms underlying fatigue. For this purpose, in Chapter 5, a protocol designed for the analysis of motor unit-related parameters during prolonged fatiguing contractions is presented. In particular, two methodologies have been applied to multichannel sEMG recordings of isometric contractions of the Tibialis Anterior muscle: the state-of-the-art technique for sEMG decomposition and a coherence analysis on MU spike trains. The importance of a multi-scale approach has been finally highlighted in the context of the evaluation of cycling performance, where fatigue is one of the limiting factors. In particular, the last chapter of this thesis can be considered as a paradigm: physiological, metabolic, environmental, psychological and biomechanical factors influence the performance of a cyclist and only when all of these are kept together in a novel integrative way it is possible to derive a clear model and make correct assessments.