12 resultados para Complete Genome Sequence
em AMS Tesi di Dottorato - Alm@DL - Universit
Resumo:
In the last decades, the increase of industrial activities and of the request for the world food requirement, the intensification of natural resources exploitation, directly connected to pollution, have aroused an increasing interest of the public opinion towards initiatives linked to the regulation of food production, as well to the institution of a modern legislation for the consumer guardianship. This work was planned taking into account some important thematics related to marine environment, collecting and showing the data obtained from the studies made on different marine species of commercial interest (Chamelea gallina, Mytilus edulis, Ostrea edulis, Crassostrea gigas, Salmo salar, Gadus morhua). These studies have evaluated the effects of important physic and chemical parameters variations (temperature, xenobiotics like drugs, hydrocarbons and pesticides) on cells involved in the immune defence (haemocytes) and on some important enzymatic systems involved in xenobiotic biotransformation processes (cytochrome P450 complex) and in the related antioxidant defence processes (Superoxide dismutase, Catalase, Heat Shock Protein), from a biochemical and bimolecular point of view. Oxygen is essential in the biological answer of a living organism. Its consume in the normal cellular breathing physiological processes and foreign substances biotransformation, leads to reactive oxygen species (ROS) formation, potentially toxic and responsible of biological macromolecules damages with consequent pathologies worsening. Such processes can bring to a qualitative alteration of the derived products, but also to a general state of suffering that in the most serious cases can provoke the death of the organism, with important repercussions in economic field, in the output of the breedings, of fishing and of aquaculture. In this study it seemed interesting to apply also alternative methodologies currently in use in the medical field (cytofluorimetry) and in proteomic studies (bidimensional electrophoresis, mass spectrometry) with the aim of identify new biomarkers to place beside the traditional methods for the control of the animal origin food quality. From the results it’s possible to point out some relevant aspects from each experiment: 1. The cytofluorimetric techniques applied to O. edulis and C. gigas could bring to important developments in the search of alternative methods that quickly allows to identify with precision the origin of a specific sample, contributing to oppose possible alimentary frauds, in this case for example related to presence of a different species, also under a qualitative profile, but morpholgically similar. A concrete perspective for the application in the inspective field of this method has to be confirmed by further laboratory tests that take also in account in vivo experiments to evaluate the effect in the whole organism of the factors evaluated only on haemocytes in vitro. These elements suggest therefore the possibility to suit the cytofluorimetric methods for the study of animal organisms of food interest, still before these enter the phase of industrial working processes, giving useful information about the possible presence of contaminants sources that can induce an increase of the immune defence and an alteration of normal cellular parameter values. 2. C. gallina immune system has shown an interesting answer to benzo[a]pyrene (B[a]P) exposure, dose and time dependent, with a significant decrease of the expression and of the activity of one of the most important enzymes involved in the antioxidant defence in haemocytes and haemolymph. The data obtained are confirmed by several measurements of physiological parameters, that together with the decrease of the activity of 7-etossi-resourifine-O-deetilase (EROD linked to xenobiotic biotransformation processes) during exposure, underline the major effects of B[a]P action. The identification of basal levels of EROD supports the possible presence of CYP1A subfamily in the invertebrates, still today controversial, never identified previously in C. gallina and never isolated in the immune cells, as confirmed instead in this study with the identification of CYP1A-immunopositive protein (CYP1A-IPP). This protein could reveal a good biomarker at the base of a simple and quick method that could give clear information about specific pollutants presence, even at low concentrations in the environment where usually these organisms are fished before being commercialized. 3. In this experiment it has been evaluated the effect of the antibiotic chloramphenicol (CA) in an important species of commercial interest, Chamelea gallina. Chloramphenicol is a drug still used in some developing countries, also in veterinary field. Controls to evaluate its presence in the alimentary products of animal origin, can reveal ineffective whereas the concentration results to be below the limit of sensitivity of the instruments usually used in this type of analysis. Negative effects of CA towards the CYP1A- IPP proteins, underlined in this work, seem to be due to the attack of free radicals resultant from the action of the antibiotic. This brings to a meaningful alteration of the biotransformation mechanisms through the free radicals. It seems particularly interesting to pay attention to the narrow relationships in C. gallina, between SOD/CAT and CYP450 system, actively involved in detoxification mechanism, especially if compared with the few similar works today present about mollusc, a group that is composed by numerous species that enter in the food field and on which constant controls are necessary to evaluate in a rapid and effective way the presence of possible contaminations. 4. The investigations on fishes (Gadus morhua, and Salmo salar) and on a bivalve mollusc (Mytilus edulis) have allowed to evaluate different aspects related to the possibility to identify a biomarker for the evaluation of the health of organisms of food interest and consequently for the quality of the final product through 2DE methodologies. In the seafood field these techniques are currently used with a discreet success only for vertebrates (fishes), while in the study of the invertebrates (molluscs) there are a lot of difficulties. The results obtained in this work have underline several problems in the correct identification of the isolated proteins in animal organisms of which doesn’t currently exist a complete genomic sequence. This brings to attribute some identities on the base of the comparison with similar proteins in other animal groups, incurring in the possibility to obtain inaccurate data and above all discordant with those obtained on the same animals by other authors. Nevertheless the data obtained in this work after MALDI-ToF analysis, result however objective and the spectra collected could be again analyzed in the future after the update of genomic database related to the species studied. 4-A. The investigation about the presence of HSP70 isoforms directly induced by different phenomena of stress like B[a]P presence, has used bidimensional electrophoresis methods in C. gallina, that have allowed to isolate numerous protein on 2DE gels, allowing the collection of several spots currently in phase of analysis with MALDI-ToF-MS. The present preliminary work has allowed therefore to acquire and to improve important methodologies in the study of cellular parameters and in the proteomic field, that is not only revealed of great potentiality in the application in medical and veterinary field, but also in the field of the inspection of the foods with connections to the toxicology and the environmental pollution. Such study contributes therefore to the search of rapid and new methodologies, that can increase the inspective strategies, integrating themselves with those existing, but improving at the same time the general background of information related to the state of health of the considered animal organism, with the possibility, still hypothetical, to replace in particular cases the employment of the traditional techniques in the alimentary field.
Resumo:
Grape berry is considered a non climacteric fruit, but there are some evidences that ethylene plays a role in the control of berry ripening. This PhD thesis aimed to give insights in the role of ethylene and ethylene-related genes in the regulation of grape berry ripening. During this study a small increase in ethylene concentration one week before véraison has been measured in Vitis vinifera L. ‘Pinot Noir’ grapes confirming previous findings in ‘Cabernet Sauvignon’. In addition, ethylene-related genes have been identified in the grapevine genome sequence. Similarly to other species, biosynthesis and ethylene receptor genes are present in grapevine as multi-gene families and their expression appeared tissue or developmental specific. All the other elements of the ethylene signal transduction cascade were also identified in the grape genome. Among them, there were ethylene response factors (ERF) which modulate the transcription of many effector genes in response to ethylene. In this study seven grapevine ERFs have been characterized and they showed tissue and berry development specific expression profiles. Two sequences, VvERF045 and VvERF063, seemed likely involved in berry ripening control due to their expression profiles and their sequence annotation. VvERF045 was induced before véraison and was specific of the ripe berry, by sequence similarity it was likely a transcription activator. VvERF063 displayed high sequence similarity to repressors of transcription and its expression, very high in green berries, was lowest at véraison and during ripening. To functionally characterize VvERF045 and VvERF063, a stable transformation strategy was chosen. Both sequences were cloned in vectors for over-expression and silencing and transferred in grape by Agrobacterium-mediated or biolistic-mediated gene transfer. In vitro, transgenic VvERF045 over-expressing plants displayed an epinastic phenotype whose extent was correlated to the transgene expression level. Four pathogen stress response genes were significantly induced in the transgenic plants, suggesting a putative function of VvERF045 in biotic stress defense during berry ripening. Further molecular analysis on the transgenic plants will help in identifying the actual VvERF045 target genes and together with the phenotypic characterization of the adult transgenic plants, will allow to extensively define the role of VvERF045 in berry ripening.
Resumo:
Clostridium difficile is an obligate anaerobic, Gram-positive, endospore-forming bacterium. Although an opportunistic pathogen, it is one of the important causes of healthcare-associated infections. While toxins TcdA and TcdB are the main virulence factors of C. difficile, the factors or processes involved in gut colonization during infection remain unclear. The biofilm-forming ability of bacterial pathogens has been associated with increased antibiotic resistance and chronic recurrent infections. Little is known about biofilm formation by anaerobic gut species. Biofilm formation by C. difficile could play a role in virulence and persistence of C. difficile, as seen for other intestinal pathogens. We demonstrate that C. difficile clinical strains, 630, and the strain isolated in the outbreak, R20291, form structured biofilms in vitro. Biofilm matrix is made of proteins, DNA and polysaccharide. Strain R20291 accumulates substantially more biofilm. Employing isogenic mutants, we show that virulence-associated proteins, Cwp84, flagella and a putative quorum sensing regulator, LuxS, Spo0A, are required for maximal biofilm formation by C. difficile. Moreover we demonstrate that bacteria in C. difficile biofilms are more resistant to high concentrations of vancomycin, a drug commonly used for treatment of CDI, and that inhibitory and sub-inhibitory concentrations of the same antibiotic induce biofilm formation. Surprisingly, clinical C. difficile strains from the same out-break, but from different origin, show differences in biofilm formation. Genome sequence analysis of these strains showed presence of a single nucleoide polymorphism (SNP) in the anti-σ factor RsbW, which regulates the stress-induced alternative sigma factor B (σB). We further demonstrate that RsbW, a negative regulator of alternative sigma factor B, has a role in biofilm formation and sporulation of C. difficile. Our data suggest that biofilm formation by C. difficile is a complex multifactorial process and may be a crucial mechanism for clostridial persistence in the host.
Resumo:
This PhD Thesis is the result of my research activity in the last three years. My main research interest was centered on the evolution of mitochondrial genome (mtDNA), and on its usefulness as a phylogeographic and phylogenetic marker at different taxonomic levels in different taxa of Metazoa. From a methodological standpoint, my main effort was dedicated to the sequencing of complete mitochondrial genomes, and the approach to whole-genome sequencing was based on the application of Long-PCR and shotgun sequences. Moreover, this research project is a part of a bigger sequencing project of mtDNAs in many different Metazoans’ taxa, and I mostly dedicated myself to sequence and analyze mtDNAs in selected taxa of bivalves and hexapods (Insecta). Sequences of bivalve mtDNAs are particularly limited, and my study contributed to extend the sampling. Moreover, I used the bivalve Musculista senhousia as model taxon to investigate the molecular mechanisms and the evolutionary significance of their aberrant mode of mitochondrial inheritance (Doubly Uniparental Inheritance, see below). In Insects, I focused my attention on the Genus Bacillus (Insecta Phasmida). A detailed phylogenetic analysis was performed in order to assess phylogenetic relationships within the genus, and to investigate the placement of Phasmida in the phylogenetic tree of Insecta. The main goal of this part of my study was to add to the taxonomic coverage of sequenced mtDNAs in basal insects, which were only partially analyzed.
Resumo:
Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.
Resumo:
The continuous increase of genome sequencing projects produced a huge amount of data in the last 10 years: currently more than 600 prokaryotic and 80 eukaryotic genomes are fully sequenced and publically available. However the sole sequencing process of a genome is able to determine just raw nucleotide sequences. This is only the first step of the genome annotation process that will deal with the issue of assigning biological information to each sequence. The annotation process is done at each different level of the biological information processing mechanism, from DNA to protein, and cannot be accomplished only by in vitro analysis procedures resulting extremely expensive and time consuming when applied at a this large scale level. Thus, in silico methods need to be used to accomplish the task. The aim of this work was the implementation of predictive computational methods to allow a fast, reliable, and automated annotation of genomes and proteins starting from aminoacidic sequences. The first part of the work was focused on the implementation of a new machine learning based method for the prediction of the subcellular localization of soluble eukaryotic proteins. The method is called BaCelLo, and was developed in 2006. The main peculiarity of the method is to be independent from biases present in the training dataset, which causes the over‐prediction of the most represented examples in all the other available predictors developed so far. This important result was achieved by a modification, made by myself, to the standard Support Vector Machine (SVM) algorithm with the creation of the so called Balanced SVM. BaCelLo is able to predict the most important subcellular localizations in eukaryotic cells and three, kingdom‐specific, predictors were implemented. In two extensive comparisons, carried out in 2006 and 2008, BaCelLo reported to outperform all the currently available state‐of‐the‐art methods for this prediction task. BaCelLo was subsequently used to completely annotate 5 eukaryotic genomes, by integrating it in a pipeline of predictors developed at the Bologna Biocomputing group by Dr. Pier Luigi Martelli and Dr. Piero Fariselli. An online database, called eSLDB, was developed by integrating, for each aminoacidic sequence extracted from the genome, the predicted subcellular localization merged with experimental and similarity‐based annotations. In the second part of the work a new, machine learning based, method was implemented for the prediction of GPI‐anchored proteins. Basically the method is able to efficiently predict from the raw aminoacidic sequence both the presence of the GPI‐anchor (by means of an SVM), and the position in the sequence of the post‐translational modification event, the so called ω‐site (by means of an Hidden Markov Model (HMM)). The method is called GPIPE and reported to greatly enhance the prediction performances of GPI‐anchored proteins over all the previously developed methods. GPIPE was able to predict up to 88% of the experimentally annotated GPI‐anchored proteins by maintaining a rate of false positive prediction as low as 0.1%. GPIPE was used to completely annotate 81 eukaryotic genomes, and more than 15000 putative GPI‐anchored proteins were predicted, 561 of which are found in H. sapiens. In average 1% of a proteome is predicted as GPI‐anchored. A statistical analysis was performed onto the composition of the regions surrounding the ω‐site that allowed the definition of specific aminoacidic abundances in the different considered regions. Furthermore the hypothesis that compositional biases are present among the four major eukaryotic kingdoms, proposed in literature, was tested and rejected. All the developed predictors and databases are freely available at: BaCelLo http://gpcr.biocomp.unibo.it/bacello eSLDB http://gpcr.biocomp.unibo.it/esldb GPIPE http://gpcr.biocomp.unibo.it/gpipe
Resumo:
Motivation An actual issue of great interest, both under a theoretical and an applicative perspective, is the analysis of biological sequences for disclosing the information that they encode. The development of new technologies for genome sequencing in the last years, opened new fundamental problems since huge amounts of biological data still deserve an interpretation. Indeed, the sequencing is only the first step of the genome annotation process that consists in the assignment of biological information to each sequence. Hence given the large amount of available data, in silico methods became useful and necessary in order to extract relevant information from sequences. The availability of data from Genome Projects gave rise to new strategies for tackling the basic problems of computational biology such as the determination of the tridimensional structures of proteins, their biological function and their reciprocal interactions. Results The aim of this work has been the implementation of predictive methods that allow the extraction of information on the properties of genomes and proteins starting from the nucleotide and aminoacidic sequences, by taking advantage of the information provided by the comparison of the genome sequences from different species. In the first part of the work a comprehensive large scale genome comparison of 599 organisms is described. 2,6 million of sequences coming from 551 prokaryotic and 48 eukaryotic genomes were aligned and clustered on the basis of their sequence identity. This procedure led to the identification of classes of proteins that are peculiar to the different groups of organisms. Moreover the adopted similarity threshold produced clusters that are homogeneous on the structural point of view and that can be used for structural annotation of uncharacterized sequences. The second part of the work focuses on the characterization of thermostable proteins and on the development of tools able to predict the thermostability of a protein starting from its sequence. By means of Principal Component Analysis the codon composition of a non redundant database comprising 116 prokaryotic genomes has been analyzed and it has been showed that a cross genomic approach can allow the extraction of common determinants of thermostability at the genome level, leading to an overall accuracy in discriminating thermophilic coding sequences equal to 95%. This result outperform those obtained in previous studies. Moreover, we investigated the effect of multiple mutations on protein thermostability. This issue is of great importance in the field of protein engineering, since thermostable proteins are generally more suitable than their mesostable counterparts in technological applications. A Support Vector Machine based method has been trained to predict if a set of mutations can enhance the thermostability of a given protein sequence. The developed predictor achieves 88% accuracy.
Resumo:
The objective of this work is to characterize the genome of the chromosome 1 of A.thaliana, a small flowering plants used as a model organism in studies of biology and genetics, on the basis of a recent mathematical model of the genetic code. I analyze and compare different portions of the genome: genes, exons, coding sequences (CDS), introns, long introns, intergenes, untranslated regions (UTR) and regulatory sequences. In order to accomplish the task, I transformed nucleotide sequences into binary sequences based on the definition of the three different dichotomic classes. The descriptive analysis of binary strings indicate the presence of regularities in each portion of the genome considered. In particular, there are remarkable differences between coding sequences (CDS and exons) and non-coding sequences, suggesting that the frame is important only for coding sequences and that dichotomic classes can be useful to recognize them. Then, I assessed the existence of short-range dependence between binary sequences computed on the basis of the different dichotomic classes. I used three different measures of dependence: the well-known chi-squared test and two indices derived from the concept of entropy i.e. Mutual Information (MI) and Sρ, a normalized version of the “Bhattacharya Hellinger Matusita distance”. The results show that there is a significant short-range dependence structure only for the coding sequences whose existence is a clue of an underlying error detection and correction mechanism. No doubt, further studies are needed in order to assess how the information carried by dichotomic classes could discriminate between coding and noncoding sequence and, therefore, contribute to unveil the role of the mathematical structure in error detection and correction mechanisms. Still, I have shown the potential of the approach presented for understanding the management of genetic information.
Resumo:
The recent advent of Next-generation sequencing technologies has revolutionized the way of analyzing the genome. This innovation allows to get deeper information at a lower cost and in less time, and provides data that are discrete measurements. One of the most important applications with these data is the differential analysis, that is investigating if one gene exhibit a different expression level in correspondence of two (or more) biological conditions (such as disease states, treatments received and so on). As for the statistical analysis, the final aim will be statistical testing and for modeling these data the Negative Binomial distribution is considered the most adequate one especially because it allows for "over dispersion". However, the estimation of the dispersion parameter is a very delicate issue because few information are usually available for estimating it. Many strategies have been proposed, but they often result in procedures based on plug-in estimates, and in this thesis we show that this discrepancy between the estimation and the testing framework can lead to uncontrolled first-type errors. We propose a mixture model that allows each gene to share information with other genes that exhibit similar variability. Afterwards, three consistent statistical tests are developed for differential expression analysis. We show that the proposed method improves the sensitivity of detecting differentially expressed genes with respect to the common procedures, since it is the best one in reaching the nominal value for the first-type error, while keeping elevate power. The method is finally illustrated on prostate cancer RNA-seq data.
Resumo:
In this thesis we will see that the DNA sequence is constantly shaped by the interactions with its environment at multiple levels, showing footprints of DNA methylation, of its 3D organization and, in the case of bacteria, of the interaction with the host organisms. In the first chapter, we will see that analyzing the distribution of distances between consecutive dinucleotides of the same type along the sequence, we can detect epigenetic and structural footprints. In particular, we will see that CG distance distribution allows to distinguish among organisms of different biological complexity, depending on how much CG sites are involved in DNA methylation. Moreover, we will see that CG and TA can be described by the same fitting function, suggesting a relationship between the two. We will also provide an interpretation of the observed trend, simulating a positioning process guided by the presence and absence of memory. In the end, we will focus on TA distance distribution, characterizing deviations from the trend predicted by the best fitting function, and identifying specific patterns that might be related to peculiar mechanical properties of the DNA and also to epigenetic and structural processes. In the second chapter, we will see how we can map the 3D structure of the DNA onto its sequence. In particular, we devised a network-based algorithm that produces a genome assembly starting from its 3D configuration, using as inputs Hi-C contact maps. Specifically, we will see how we can identify the different chromosomes and reconstruct their sequences by exploiting the spectral properties of the Laplacian operator of a network. In the third chapter, we will see a novel method for source clustering and source attribution, based on a network approach, that allows to identify host-bacteria interaction starting from the detection of Single-Nucleotide Polymorphisms along the sequence of bacterial genomes.
Resumo:
Parvovirus B19 (B19V) is a ssDNA virus, with a 5596 nt long genome encapsidated within an icosahedral capsid with a diameter of 22 nm. Viral proteins are subdivided into structural and non-structural: the main non-structural one is the NS1, while the 2 structural proteins VP1 and VP2 assemble originating the capsid shell. B19V tropism is mainly limited to erythroid progenitor cells (EPCs), however, virus can be detected in several districts persisting in tissues possibly lifelong. The virus can induce anemia and erythroid aplasia. Therapeutic strategies are only symptomatic, so the search for antivirals is strongly active, with screenings showing the activity in vitro of different compounds like hydroxyurea, cidofovir and brincidofovir. In the first project, a functional minigenome of B19V was developed, able to express only the NS1 protein. This minigenome proved able to replicate and express the NS1 at levels comparable to unmodified clones. Furthermore, the ability of this minigenome to complement the function of NS1-deficient genomes was demonstrated, thus providing a proof-of-concept of B19V genome editing possibility and, at the same time, a useful tool to study the NS1 protein also as an antiviral target. In the second project I addressed the interplay between B19V and the cellular restriction factor APOBEC3B (A3B), a cytidine deaminase acting on ssDNA, whose footprint on B19V genome was proved by a bioinformatic sequence analysis performed by the hosting lab. To understand whether A3B still exerts activity and a potential antiviral effect on B19V, the UT7/EpoS1 cells were transduced with lentiviral vectors to silence A3B expression, then used as a model to study viral behavior. No significant role of A3B on B19V was demonstrated, in agreement with the hypothesis of viral adaptation to this cellular restriction factor; anyway, virus ability to alter A3B expression would deserve further investigations.
Resumo:
Bivalvia represents an ancient taxon including around 25,000 living species that have adapted to a wide range of environmental conditions, and show a great diversity in body size, shell shapes, and anatomic structure. Bivalves are characterized by highly variable genome sizes and extremely high levels of heterozygosity, which obstacle complete and accurate genome assemblies and hinder further genomic studies. Moreover, some bivalve species presented a stable evolutionary exception to the strictly maternal inheritance of mitochondria, namely doubly uniparental inheritance (DUI), making these species a precious model to study mitochondrial biology. During my PhD, I focused on a DUI species, the Manila clam Ruditapes philippinarum, and my work was two-folded. First, taking advantage of a newly assembled draft genome and a large RNA-seq dataset from different tissues of both sexes, I investigated 1) the role of gene expression and alternative splicing in tissue differentiation; 2) the relationship across tissue specificity, regulatory network connectivity, and sequence evolution; 3) sexual contrasting genetic markers potentially associated with sexual differentiation. The detailed information for this part is in Chapter 2. Second, using the same RNA-seq data, I investigated how nuclear oxidative phosphorylation (OXPHOS) genes coordinate with two divergent mitochondrial genomes in DUI species (mito-nuclear coordination and coevolution). To address this question, I compared transcription, polymorphism, and synonymous codon usage in the mitochondrial and nuclear OXPHOS genes of R. philippinarum in Chapter 3. To my knowledge, this thesis represents the first study exploring the role of alternative splicing in tissue differentiation, and the first study analyzing both transcriptional regulation and sequence evolution to investigate the coordination of OXPHOS genes in bivalves.