9 resultados para Python molurus
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
Bioinformatics is a recent and emerging discipline which aims at studying biological problems through computational approaches. Most branches of bioinformatics such as Genomics, Proteomics and Molecular Dynamics are particularly computationally intensive, requiring huge amount of computational resources for running algorithms of everincreasing complexity over data of everincreasing size. In the search for computational power, the EGEE Grid platform, world's largest community of interconnected clusters load balanced as a whole, seems particularly promising and is considered the new hope for satisfying the everincreasing computational requirements of bioinformatics, as well as physics and other computational sciences. The EGEE platform, however, is rather new and not yet free of problems. In addition, specific requirements of bioinformatics need to be addressed in order to use this new platform effectively for bioinformatics tasks. In my three years' Ph.D. work I addressed numerous aspects of this Grid platform, with particular attention to those needed by the bioinformatics domain. I hence created three major frameworks, Vnas, GridDBManager and SETest, plus an additional smaller standalone solution, to enhance the support for bioinformatics applications in the Grid environment and to reduce the effort needed to create new applications, additionally addressing numerous existing Grid issues and performing a series of optimizations. The Vnas framework is an advanced system for the submission and monitoring of Grid jobs that provides an abstraction with reliability over the Grid platform. In addition, Vnas greatly simplifies the development of new Grid applications by providing a callback system to simplify the creation of arbitrarily complex multistage computational pipelines and provides an abstracted virtual sandbox which bypasses Grid limitations. Vnas also reduces the usage of Grid bandwidth and storage resources by transparently detecting equality of virtual sandbox files based on content, across different submissions, even when performed by different users. BGBlast, evolution of the earlier project GridBlast, now provides a Grid Database Manager (GridDBManager) component for managing and automatically updating biological flatfile databases in the Grid environment. GridDBManager sports very novel features such as an adaptive replication algorithm that constantly optimizes the number of replicas of the managed databases in the Grid environment, balancing between response times (performances) and storage costs according to a programmed cost formula. GridDBManager also provides a very optimized automated management for older versions of the databases based on reverse delta files, which reduces the storage costs required to keep such older versions available in the Grid environment by two orders of magnitude. The SETest framework provides a way to the user to test and regressiontest Python applications completely scattered with side effects (this is a common case with Grid computational pipelines), which could not easily be tested using the more standard methods of unit testing or test cases. The technique is based on a new concept of datasets containing invocations and results of filtered calls. The framework hence significantly accelerates the development of new applications and computational pipelines for the Grid environment, and the efforts required for maintenance. An analysis of the impact of these solutions will be provided in this thesis. This Ph.D. work originated various publications in journals and conference proceedings as reported in the Appendix. Also, I orally presented my work at numerous international conferences related to Grid and bioinformatics.
Resumo:
The main scope of my PhD is the reconstruction of the large-scale bivalve phylogeny on the basis of four mitochondrial genes, with samples taken from all major groups of the class. To my knowledge, it is the first attempt of such a breadth in Bivalvia. I decided to focus on both ribosomal and protein coding DNA sequences (two ribosomal encoding genes -12s and 16s -, and two protein coding ones - cytochrome c oxidase I and cytochrome b), since either bibliography and my preliminary results confirmed the importance of combined gene signals in improving evolutionary pathways of the group. Moreover, I wanted to propose a methodological pipeline that proved to be useful to obtain robust results in bivalves phylogeny. Actually, best-performing taxon sampling and alignment strategies were tested, and several data partitioning and molecular evolution models were analyzed, thus demonstrating the importance of molding and implementing non-trivial evolutionary models. In the line of a more rigorous approach to data analysis, I also proposed a new method to assess taxon sampling, by developing Clarke and Warwick statistics: taxon sampling is a major concern in phylogenetic studies, and incomplete, biased, or improper taxon assemblies can lead to misleading results in reconstructing evolutionary trees. Theoretical methods are already available to optimize taxon choice in phylogenetic analyses, but most involve some knowledge about genetic relationships of the group of interest, or even a well-established phylogeny itself; these data are not always available in general phylogenetic applications. The method I proposed measures the "phylogenetic representativeness" of a given sample or set of samples and it is based entirely on the pre-existing available taxonomy of the ingroup, which is commonly known to investigators. Moreover, it also accounts for instability and discordance in taxonomies. A Python-based script suite, called PhyRe, has been developed to implement all analyses.
Resumo:
A first phase of the research activity has been related to the study of the state of art of the infrastructures for cycling, bicycle use and methods for evaluation. In this part, the candidate has studied the "bicycle system" in countries with high bicycle use and in particular in the Netherlands. Has been carried out an evaluation of the questionnaires of the survey conducted within the European project BICY on mobility in general in 13 cities of the participating countries. The questionnaire was designed, tested and implemented, and was later validated by a test in Bologna. The results were corrected with information on demographic situation and compared with official data. The cycling infrastructure analysis was conducted on the basis of information from the OpenStreetMap database. The activity consisted in programming algorithms in Python that allow to extract data from the database infrastructure for a region, to sort and filter cycling infrastructure calculating some attributes, such as the length of the arcs paths. The results obtained were compared with official data where available. The structure of the thesis is as follows: 1. Introduction: description of the state of cycling in several advanced countries, description of methods of analysis and their importance to implement appropriate policies for cycling. Supply and demand of bicycle infrastructures. 2. Survey on mobility: it gives details of the investigation developed and the method of evaluation. The results obtained are presented and compared with official data. 3. Analysis cycling infrastructure based on information from the database of OpenStreetMap: describes the methods and algorithms developed during the PhD. The results obtained by the algorithms are compared with official data. 4. Discussion: The above results are discussed and compared. In particular the cycle demand is compared with the length of cycle networks within a city. 5. Conclusions
Resumo:
This dissertation explores the link between hate crimes that occurred in the United Kingdom in June 2017, June 2018 and June 2019 through the posts of a robust sample of Conservative and radical right users on Twitter. In order to avoid the traditional challenges of this kind of research, I adopted a four staged research protocol that enabled me to merge content produced by a group of randomly selected users to observe the phenomenon from different angles. I collected tweets from thirty Conservative/right wing accounts for each month of June over the three years with the help of programming languages such as Python and CygWin tools. I then examined the language of my data focussing on humorous content in order to reveal whether, and if so how, radical users online often use humour as a tool to spread their views in conditions of heightened disgust and wide-spread political instability. A reflection on humour as a moral occurrence, expanding on the works of Christie Davies as well as applying recent findings on the behavioural immune system on online data, offers new insights on the overlooked humorous nature of radical political discourse. An unorthodox take on the moral foundations pioneered by Jonathan Haidt enriched my understanding of the analysed material through the addition of a moral-based layer of enquiry to my more traditional content-based one. This convergence of theoretical, data driven and real life events constitutes a viable “collection of strategies” for academia, data scientists; NGO’s fighting hate crimes and the wider public alike. Bringing together the ideas of Davies, Haidt and others to my data, helps us to perceive humorous online content in terms of complex radical narratives that are all too often compressed into a single tweet.
Resumo:
This thesis deals with optimization techniques and modeling of vehicular networks. Thanks to the models realized with the integer linear programming (ILP) and the heuristic ones, it was possible to study the performances in 5G networks for the vehicular. Thanks to Software-defined networking (SDN) and Network functions virtualization (NFV) paradigms it was possible to study the performances of different classes of service, such as the Ultra Reliable Low Latency Communications (URLLC) class and enhanced Mobile BroadBand (eMBB) class, and how the functional split can have positive effects on network resource management. Two different protection techniques have been studied: Shared Path Protection (SPP) and Dedicated Path Protection (DPP). Thanks to these different protections, it is possible to achieve different network reliability requirements, according to the needs of the end user. Finally, thanks to a simulator developed in Python, it was possible to study the dynamic allocation of resources in a 5G metro network. Through different provisioning algorithms and different dynamic resource management techniques, useful results have been obtained for understanding the needs in the vehicular networks that will exploit 5G. Finally, two models are shown for reconfiguring backup resources when using shared resource protection.
Resumo:
Deep learning methods are extremely promising machine learning tools to analyze neuroimaging data. However, their potential use in clinical settings is limited because of the existing challenges of applying these methods to neuroimaging data. In this study, first a data leakage type caused by slice-level data split that is introduced during training and validation of a 2D CNN is surveyed and a quantitative assessment of the model’s performance overestimation is presented. Second, an interpretable, leakage-fee deep learning software written in a python language with a wide range of options has been developed to conduct both classification and regression analysis. The software was applied to the study of mild cognitive impairment (MCI) in patients with small vessel disease (SVD) using multi-parametric MRI data where the cognitive performance of 58 patients measured by five neuropsychological tests is predicted using a multi-input CNN model taking brain image and demographic data. Each of the cognitive test scores was predicted using different MRI-derived features. As MCI due to SVD has been hypothesized to be the effect of white matter damage, DTI-derived features MD and FA produced the best prediction outcome of the TMT-A score which is consistent with the existing literature. In a second study, an interpretable deep learning system aimed at 1) classifying Alzheimer disease and healthy subjects 2) examining the neural correlates of the disease that causes a cognitive decline in AD patients using CNN visualization tools and 3) highlighting the potential of interpretability techniques to capture a biased deep learning model is developed. Structural magnetic resonance imaging (MRI) data of 200 subjects was used by the proposed CNN model which was trained using a transfer learning-based approach producing a balanced accuracy of 71.6%. Brain regions in the frontal and parietal lobe showing the cerebral cortex atrophy were highlighted by the visualization tools.
Resumo:
Nuclear cross sections are the pillars onto which the transport simulation of particles and radiations is built on. Since the nuclear data libraries production chain is extremely complex and made of different steps, it is mandatory to foresee stringent verification and validation procedures to be applied to it. The work here presented has been focused on the development of a new python based software called JADE, whose objective is to give a significant help in increasing the level of automation and standardization of these procedures in order to reduce the time passing between new libraries releases and, at the same time, increasing their quality. After an introduction to nuclear fusion (which is the field where the majority of the V\&V action was concentrated for the time being) and to the simulation of particles and radiations transport, the motivations leading to JADE development are discussed. Subsequently, the code general architecture and the implemented benchmarks (both experimental and computational) are described. After that, the results coming from the major application of JADE during the research years are presented. At last, after a final discussion on the objective reached by JADE, the possible brief, mid and long time developments for the project are discussed.
Resumo:
The cation chloride cotransporters (CCCs) represent a vital family of ion transporters, with several members implicated in significant neurological disorders. Specifically, conditions such as cerebrospinal fluid accumulation, epilepsy, Down’s syndrome, Asperger’s syndrome, and certain cancers have been attributed to various CCCs. This thesis delves into these pharmacological targets using advanced computational methodologies. I primarily employed GPU-accelerated all-atom molecular dynamics simulations, deep learning-based collective variables, enhanced sampling methods, and custom Python scripts for comprehensive simulation analyses. Our research predominantly centered on KCC1 and NKCC1 transporters. For KCC1, I examined its equilibrium dynamics in the presence/absence of an inhibitor and assessed the functional implications of different ion loading states. In contrast, our work on NKCC1 revealed its unique alternating access mechanism, termed the rocking-bundle mechanism. I identified a previously unobserved occluded state and demonstrated the transporter's potential for water permeability under specific conditions. Furthermore, I confirmed the actual water flow through its permeable states. In essence, this thesis leverages cutting-edge computational techniques to deepen our understanding of the CCCs, a family of ion transporters with profound clinical significance.
Resumo:
In recent decades, two prominent trends have influenced the data modeling field, namely network analysis and machine learning. This thesis explores the practical applications of these techniques within the domain of drug research, unveiling their multifaceted potential for advancing our comprehension of complex biological systems. The research undertaken during this PhD program is situated at the intersection of network theory, computational methods, and drug research. Across six projects presented herein, there is a gradual increase in model complexity. These projects traverse a diverse range of topics, with a specific emphasis on drug repurposing and safety in the context of neurological diseases. The aim of these projects is to leverage existing biomedical knowledge to develop innovative approaches that bolster drug research. The investigations have produced practical solutions, not only providing insights into the intricacies of biological systems, but also allowing the creation of valuable tools for their analysis. In short, the achievements are: • A novel computational algorithm to identify adverse events specific to fixed-dose drug combinations. • A web application that tracks the clinical drug research response to SARS-CoV-2. • A Python package for differential gene expression analysis and the identification of key regulatory "switch genes". • The identification of pivotal events causing drug-induced impulse control disorders linked to specific medications. • An automated pipeline for discovering potential drug repurposing opportunities. • The creation of a comprehensive knowledge graph and development of a graph machine learning model for predictions. Collectively, these projects illustrate diverse applications of data science and network-based methodologies, highlighting the profound impact they can have in supporting drug research activities.