7 resultados para Genome Sequences

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Motivation An actual issue of great interest, both under a theoretical and an applicative perspective, is the analysis of biological sequences for disclosing the information that they encode. The development of new technologies for genome sequencing in the last years, opened new fundamental problems since huge amounts of biological data still deserve an interpretation. Indeed, the sequencing is only the first step of the genome annotation process that consists in the assignment of biological information to each sequence. Hence given the large amount of available data, in silico methods became useful and necessary in order to extract relevant information from sequences. The availability of data from Genome Projects gave rise to new strategies for tackling the basic problems of computational biology such as the determination of the tridimensional structures of proteins, their biological function and their reciprocal interactions. Results The aim of this work has been the implementation of predictive methods that allow the extraction of information on the properties of genomes and proteins starting from the nucleotide and aminoacidic sequences, by taking advantage of the information provided by the comparison of the genome sequences from different species. In the first part of the work a comprehensive large scale genome comparison of 599 organisms is described. 2,6 million of sequences coming from 551 prokaryotic and 48 eukaryotic genomes were aligned and clustered on the basis of their sequence identity. This procedure led to the identification of classes of proteins that are peculiar to the different groups of organisms. Moreover the adopted similarity threshold produced clusters that are homogeneous on the structural point of view and that can be used for structural annotation of uncharacterized sequences. The second part of the work focuses on the characterization of thermostable proteins and on the development of tools able to predict the thermostability of a protein starting from its sequence. By means of Principal Component Analysis the codon composition of a non redundant database comprising 116 prokaryotic genomes has been analyzed and it has been showed that a cross genomic approach can allow the extraction of common determinants of thermostability at the genome level, leading to an overall accuracy in discriminating thermophilic coding sequences equal to 95%. This result outperform those obtained in previous studies. Moreover, we investigated the effect of multiple mutations on protein thermostability. This issue is of great importance in the field of protein engineering, since thermostable proteins are generally more suitable than their mesostable counterparts in technological applications. A Support Vector Machine based method has been trained to predict if a set of mutations can enhance the thermostability of a given protein sequence. The developed predictor achieves 88% accuracy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

I linfomi primitivi cutanei riconosciuti nella classificazione della WHO/EORTC si presentano come “entità cliniche distinte” su base clinica, morfologica, immunofenotipica e molecolare. Il fenotipo linfocitario T helper CD4+ caratterizza i CTCL, ma alcune entità a prognosi aggressiva presentano un immunofenotipo citotossico CD8+. Numerosi studi di citogenetica (CGH) e gene-expression profiling (GEP) sono stati condotti negli ultimi anni sui CTCL e sono state riscontrate numerose aberrazioni cromosomiche correlate ai meccanismi di controllo del ciclo cellulare. Scopo del nostro studio è la valutazione delle alterazioni genomiche coinvolte nella tumorigenesi di alcuni CTCL aggressivi: il linfoma extranodale NK/T nasal-type, il linfoma primitivo cutaneo aggressivo epidermotropo (AECTCL) e il gruppo dei PTCL/NOS pleomorfo CD8+. Il materiale bioptico dei pazienti è stato sottoposto alla metodica dell’array-CGH per identificare le anomalie cromosomiche; in alcuni casi di AECTCL è stata applicata la GEP, che evidenzia il profilo di espressione genica delle cellule neoplastiche. I dati ottenuti sono stati valutati in modo statistico, evidenziando le alterazioni cromosomiche comuni significative di ogni entità. In CGH, sono state evidenziate alcune aberrazioni comuni fra le entità studiate, la delezione di 9p21.3, l’amplificazione di 17q, 19p13, 19q13.11-q13.32 , 12q13 e 16p13.3, che determinano la delezione dei geni CDKN2A e CDKN2B e l’attivazione del JAK/STAT signaling pathway. Altre alterazioni definiscono l’amplificazione di c-MYC (8q24) e CCND1/CDK4-6 (11q13). In particolare, sono state evidenziate numerose anomalie genomiche comuni in casi di AECTCL e PTCL/NOS pleomorfo. L’applicazione della GEP in 5 casi di AECTCL ha confermato l’alterata espressione dei geni CDKN2A, JAK3 e STAT6, che potrebbero avere un ruolo diretto nella linfomagenesi. Lo studio di un numero maggiore di casi in GEP e l’introduzione delle nuove indagini molecolari come l’analisi dei miRNA, della whole-exome e whole genome sequences consentiranno di evidenziare alterazioni molecolari correlate con la prognosi, definendo anche nuovi target terapeutici.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Prokaryotic organisms are one of the most successful forms of life, they are present in all known ecosystems. The deluge diversity of bacteria reflects their ability to colonise every environment. Also, human beings host trillions of microorganisms in their body districts, including skin, mucosae, and gut. This symbiosis is active for all other terrestrial and marine animals, as well as plants. With the term holobiont we refer, with a single word, to the systems including both the host and its symbiotic microbial species. The coevolution of bacteria within their ecological niches reflects the adaptation of both host and guest species, and it is shaped by complex interactions that are pivotal for determining the host state. Nowadays, thanks to the current sequencing technologies, Next Generation Sequencing, we have unprecedented tools for investigating the bacterial life by studying the prokaryotic genome sequences. NGS revolution has been sustained by the advancements in computational performance, in terms of speed, storage capacity, algorithm development and hardware costs decreasing following the Moore’s Law. Bioinformaticians and computational biologists design and implement ad hoc tools able to analyse high-throughput data and extract valuable biological information. Metagenomics requires the integration of life and computational sciences and it is uncovering the deluge diversity of the bacterial world. The present thesis work focuses mainly on the analysis of prokaryotic genomes under different aspects. Being supervised by two groups at the University of Bologna, the Biocomputing group and the group of Microbial Ecology of Health, I investigated three different topics: i) antimicrobial resistance, particularly with respect to missense point mutations involved in the resistant phenotype, ii) bacterial mechanisms involved in xenobiotic degradation via the computational analysis of metagenomic samples, and iii) the variation of the human gut microbiota through ageing, in elderly and longevous individuals.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The continuous increase of genome sequencing projects produced a huge amount of data in the last 10 years: currently more than 600 prokaryotic and 80 eukaryotic genomes are fully sequenced and publically available. However the sole sequencing process of a genome is able to determine just raw nucleotide sequences. This is only the first step of the genome annotation process that will deal with the issue of assigning biological information to each sequence. The annotation process is done at each different level of the biological information processing mechanism, from DNA to protein, and cannot be accomplished only by in vitro analysis procedures resulting extremely expensive and time consuming when applied at a this large scale level. Thus, in silico methods need to be used to accomplish the task. The aim of this work was the implementation of predictive computational methods to allow a fast, reliable, and automated annotation of genomes and proteins starting from aminoacidic sequences. The first part of the work was focused on the implementation of a new machine learning based method for the prediction of the subcellular localization of soluble eukaryotic proteins. The method is called BaCelLo, and was developed in 2006. The main peculiarity of the method is to be independent from biases present in the training dataset, which causes the over‐prediction of the most represented examples in all the other available predictors developed so far. This important result was achieved by a modification, made by myself, to the standard Support Vector Machine (SVM) algorithm with the creation of the so called Balanced SVM. BaCelLo is able to predict the most important subcellular localizations in eukaryotic cells and three, kingdom‐specific, predictors were implemented. In two extensive comparisons, carried out in 2006 and 2008, BaCelLo reported to outperform all the currently available state‐of‐the‐art methods for this prediction task. BaCelLo was subsequently used to completely annotate 5 eukaryotic genomes, by integrating it in a pipeline of predictors developed at the Bologna Biocomputing group by Dr. Pier Luigi Martelli and Dr. Piero Fariselli. An online database, called eSLDB, was developed by integrating, for each aminoacidic sequence extracted from the genome, the predicted subcellular localization merged with experimental and similarity‐based annotations. In the second part of the work a new, machine learning based, method was implemented for the prediction of GPI‐anchored proteins. Basically the method is able to efficiently predict from the raw aminoacidic sequence both the presence of the GPI‐anchor (by means of an SVM), and the position in the sequence of the post‐translational modification event, the so called ω‐site (by means of an Hidden Markov Model (HMM)). The method is called GPIPE and reported to greatly enhance the prediction performances of GPI‐anchored proteins over all the previously developed methods. GPIPE was able to predict up to 88% of the experimentally annotated GPI‐anchored proteins by maintaining a rate of false positive prediction as low as 0.1%. GPIPE was used to completely annotate 81 eukaryotic genomes, and more than 15000 putative GPI‐anchored proteins were predicted, 561 of which are found in H. sapiens. In average 1% of a proteome is predicted as GPI‐anchored. A statistical analysis was performed onto the composition of the regions surrounding the ω‐site that allowed the definition of specific aminoacidic abundances in the different considered regions. Furthermore the hypothesis that compositional biases are present among the four major eukaryotic kingdoms, proposed in literature, was tested and rejected. All the developed predictors and databases are freely available at: BaCelLo http://gpcr.biocomp.unibo.it/bacello eSLDB http://gpcr.biocomp.unibo.it/esldb GPIPE http://gpcr.biocomp.unibo.it/gpipe

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This PhD Thesis is the result of my research activity in the last three years. My main research interest was centered on the evolution of mitochondrial genome (mtDNA), and on its usefulness as a phylogeographic and phylogenetic marker at different taxonomic levels in different taxa of Metazoa. From a methodological standpoint, my main effort was dedicated to the sequencing of complete mitochondrial genomes, and the approach to whole-genome sequencing was based on the application of Long-PCR and shotgun sequences. Moreover, this research project is a part of a bigger sequencing project of mtDNAs in many different Metazoans’ taxa, and I mostly dedicated myself to sequence and analyze mtDNAs in selected taxa of bivalves and hexapods (Insecta). Sequences of bivalve mtDNAs are particularly limited, and my study contributed to extend the sampling. Moreover, I used the bivalve Musculista senhousia as model taxon to investigate the molecular mechanisms and the evolutionary significance of their aberrant mode of mitochondrial inheritance (Doubly Uniparental Inheritance, see below). In Insects, I focused my attention on the Genus Bacillus (Insecta Phasmida). A detailed phylogenetic analysis was performed in order to assess phylogenetic relationships within the genus, and to investigate the placement of Phasmida in the phylogenetic tree of Insecta. The main goal of this part of my study was to add to the taxonomic coverage of sequenced mtDNAs in basal insects, which were only partially analyzed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The objective of this work is to characterize the genome of the chromosome 1 of A.thaliana, a small flowering plants used as a model organism in studies of biology and genetics, on the basis of a recent mathematical model of the genetic code. I analyze and compare different portions of the genome: genes, exons, coding sequences (CDS), introns, long introns, intergenes, untranslated regions (UTR) and regulatory sequences. In order to accomplish the task, I transformed nucleotide sequences into binary sequences based on the definition of the three different dichotomic classes. The descriptive analysis of binary strings indicate the presence of regularities in each portion of the genome considered. In particular, there are remarkable differences between coding sequences (CDS and exons) and non-coding sequences, suggesting that the frame is important only for coding sequences and that dichotomic classes can be useful to recognize them. Then, I assessed the existence of short-range dependence between binary sequences computed on the basis of the different dichotomic classes. I used three different measures of dependence: the well-known chi-squared test and two indices derived from the concept of entropy i.e. Mutual Information (MI) and Sρ, a normalized version of the “Bhattacharya Hellinger Matusita distance”. The results show that there is a significant short-range dependence structure only for the coding sequences whose existence is a clue of an underlying error detection and correction mechanism. No doubt, further studies are needed in order to assess how the information carried by dichotomic classes could discriminate between coding and noncoding sequence and, therefore, contribute to unveil the role of the mathematical structure in error detection and correction mechanisms. Still, I have shown the potential of the approach presented for understanding the management of genetic information.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The artisanal food chain is enriched by a wide diversity of local food productions with delightful organoleptic characteristics and valuable nutritional properties. Despite their increasing worldwide popularity and appeal, several food safety challenges are addressed in artisanal facilities context suffering from less standardized processing conditions. In such scenario, recent advances in molecular typing and genomic surveillance (e.g., Whole Genome Sequencing [WGS]) represent an unprecedent solution capable of inferring sources of contamination as well as contributing to food safety along the artisanal food continuum. The overall objective of this PhD thesis was to explore potential microbial hazards among different artisanal food productions of animal origins (dairy and meat-derived) typical of the food culture and heritage landscape belonging to Mediterranean countries. Three different studies were then carried out, specifically focussing on: 1) compare the seasonal variability of microbiological quality and potential occurrence of microbial hazards in two batches of Italian artisanal fermented dairy and meat productions; 2) Investigate genetic relationships as well as virulome and resistome of foodborne pathogens isolated within dairy and meat-derived productions located in Italy, Spain, Portugal and Morocco; 3) investigate the population structure, virulome, resistome and mobilome of Klebsiella spp. isolates collected from study 1, including an extended range of public sequences.