8 resultados para bioinformatic
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
Here I will focus on three main topics that best address and include the projects I have been working in during my three year PhD period that I have spent in different research laboratories addressing both computationally and practically important problems all related to modern molecular genomics. The first topic is the use of livestock species (pigs) as a model of obesity, a complex human dysfunction. My efforts here concern the detection and annotation of Single Nucleotide Polymorphisms. I developed a pipeline for mining human and porcine sequences. Starting from a set of human genes related with obesity the platform returns a list of annotated porcine SNPs extracted from a new set of potential obesity-genes. 565 of these SNPs were analyzed on an Illumina chip to test the involvement in obesity on a population composed by more than 500 pigs. Results will be discussed. All the computational analysis and experiments were done in collaboration with the Biocomputing group and Dr.Luca Fontanesi, respectively, under the direction of prof. Rita Casadio at the Bologna University, Italy. The second topic concerns developing a methodology, based on Factor Analysis, to simultaneously mine information from different levels of biological organization. With specific test cases we develop models of the complexity of the mRNA-miRNA molecular interaction in brain tumors measured indirectly by microarray and quantitative PCR. This work was done under the supervision of Prof. Christine Nardini, at the “CAS-MPG Partner Institute for Computational Biology” of Shangai, China (co-founded by the Max Planck Society and the Chinese Academy of Sciences jointly) The third topic concerns the development of a new method to overcome the variety of PCR technologies routinely adopted to characterize unknown flanking DNA regions of a viral integration locus of the human genome after clinical gene therapy. This new method is entirely based on next generation sequencing and it reduces the time required to detect insertion sites, decreasing the complexity of the procedure. This work was done in collaboration with the group of Dr. Manfred Schmidt at the Nationales Centrum für Tumorerkrankungen (Heidelberg, Germany) supervised by Dr. Annette Deichmann and Dr. Ali Nowrouzi. Furthermore I add as an Appendix the description of a R package for gene network reconstruction that I helped to develop for scientific usage (http://www.bioconductor.org/help/bioc-views/release/bioc/html/BUS.html).
Resumo:
The aging process is characterized by the progressive fitness decline experienced at all the levels of physiological organization, from single molecules up to the whole organism. Studies confirmed inflammaging, a chronic low-level inflammation, as a deeply intertwined partner of the aging process, which may provide the “common soil” upon which age-related diseases develop and flourish. Thus, albeit inflammation per se represents a physiological process, it can rapidly become detrimental if it goes out of control causing an excess of local and systemic inflammatory response, a striking risk factor for the elderly population. Developing interventions to counteract the establishment of this state is thus a top priority. Diet, among other factors, represents a good candidate to regulate inflammation. Building on top of this consideration, the EU project NU-AGE is now trying to assess if a Mediterranean diet, fortified for the elderly population needs, may help in modulating inflammaging. To do so, NU-AGE enrolled a total of 1250 subjects, half of which followed a 1-year long diet, and characterized them by mean of the most advanced –omics and non –omics analyses. The aim of this thesis was the development of a solid data management pipeline able to efficiently cope with the results of these assays, which are now flowing inside a centralized database, ready to be used to test the most disparate scientific hypotheses. At the same time, the work hereby described encompasses the data analysis of the GEHA project, which was focused on identifying the genetic determinants of longevity, with a particular focus on developing and applying a method for detecting epistatic interactions in human mtDNA. Eventually, in an effort to propel the adoption of NGS technologies in everyday pipeline, we developed a NGS variant calling pipeline devoted to solve all the sequencing-related issues of the mtDNA.
Resumo:
Cardiac morphogenesis is a complex process governed by evolutionarily conserved transcription factors and signaling molecules. The Drosophila cardiac tube is linear, made of 52 pairs of cardiomyocytes (CMs), which express specific transcription factor genes that have human homologues implicated in Congenital Heart Diseases (CHDs) (NKX2-5, GATA4 and TBX5). The Drosophila cardiac tube is linear and composed of a rostral portion named aorta and a caudal one called heart, distinguished by morphological and functional differences controlled by Hox genes, key regulators of axial patterning. Overexpression and inactivation of the Hox gene abdominal-A (abd-A), which is expressed exclusively in the heart, revealed that abd-A controls heart identity. The aim of our work is to isolate the heart-specific cisregulatory sequences of abd-A direct target genes, the realizator genes granting heart identity. In each segment of the heart, four pairs of cardiomyocytes (CMs) express tinman (tin), homologous to NKX2-5, and acquire strong contractile and automatic rhythmic activities. By tyramide amplified FISH, we found that seven genes, encoding ion channels, pumps or transporters, are specifically expressed in the Tin-CMs of the heart. We initially used online available tools to identify their heart-specific cisregutatory modules by looking for Conserved Non-coding Sequences containing clusters of binding sites for various cardiac transcription factors, including Hox proteins. Based on these data we generated several reporter gene constructs and transgenic embryos, but none of them showed reporter gene expression in the heart. In order to identify additional abd-A target genes, we performed microarray experiments comparing the transcriptomes of aorta versus heart and identified 144 genes overexpressed in the heart. In order to find the heart-specific cis-regulatory regions of these target genes we developed a new bioinformatic approach where prediction is based on pattern matching and ordered statistics. We first retrieved Conserved Noncoding Sequences from the alignment between the D.melanogaster and D.pseudobscura genomes. We scored for combinations of conserved occurrences of ABD-A, ABD-B, TIN, PNR, dMEF2, MADS box, T-box and E-box sites and we ranked these results based on two independent strategies. On one hand we ranked the putative cis-regulatory sequences according to best scored ABD-A biding sites, on the other hand we scored according to conservation of binding sites. We integrated and ranked again the two lists obtained independently to produce a final rank. We generated nGFP reporter construct flies for in vivo validation. We identified three 1kblong heart-specific enhancers. By in vivo and in vitro experiments we are determining whether they are direct abd-A targets, demonstrating the role of a Hox gene in the realization of heart identity. The identified abd-A direct target genes may be targets also of the NKX2-5, GATA4 and/or TBX5 homologues tin, pannier and Doc genes, respectively. The identification of sequences coregulated by a Hox protein and the homologues of transcription factors causing CHDs, will provide a mean to test whether these factors function as Hox cofactors granting cardiac specificity to Hox proteins, increasing our knowledge on the molecular mechanisms underlying CHDs. Finally, it may be investigated whether these Hox targets are involved in CHDs.
Resumo:
The study of protein expression profiles for biomarker discovery in serum and in mammalian cell populations needs the continuous improvement and combination of proteins/peptides separation techniques, mass spectrometry, statistical and bioinformatic approaches. In this thesis work two different mass spectrometry-based protein profiling strategies have been developed and applied to liver and inflammatory bowel diseases (IBDs) for the discovery of new biomarkers. The first of them, based on bulk solid-phase extraction combined with matrix-assisted laser desorption/ionization - Time of Flight mass spectrometry (MALDI-TOF MS) and chemometric analysis of serum samples, was applied to the study of serum protein expression profiles both in IBDs (Crohn’s disease and ulcerative colitis) and in liver diseases (cirrhosis, hepatocellular carcinoma, viral hepatitis). The approach allowed the enrichment of serum proteins/peptides due to the high interaction surface between analytes and solid phase and the high recovery due to the elution step performed directly on the MALDI-target plate. Furthermore the use of chemometric algorithm for the selection of the variables with higher discriminant power permitted to evaluate patterns of 20-30 proteins involved in the differentiation and classification of serum samples from healthy donors and diseased patients. These proteins profiles permit to discriminate among the pathologies with an optimum classification and prediction abilities. In particular in the study of inflammatory bowel diseases, after the analysis using C18 of 129 serum samples from healthy donors and Crohn’s disease, ulcerative colitis and inflammatory controls patients, a 90.7% of classification ability and a 72.9% prediction ability were obtained. In the study of liver diseases (hepatocellular carcinoma, viral hepatitis and cirrhosis) a 80.6% of prediction ability was achieved using IDA-Cu(II) as extraction procedure. The identification of the selected proteins by MALDITOF/ TOF MS analysis or by their selective enrichment followed by enzymatic digestion and MS/MS analysis may give useful information in order to identify new biomarkers involved in the diseases. The second mass spectrometry-based protein profiling strategy developed was based on a label-free liquid chromatography electrospray ionization quadrupole - time of flight differential analysis approach (LC ESI-QTOF MS), combined with targeted MS/MS analysis of only identified differences. The strategy was used for biomarker discovery in IBDs, and in particular of Crohn’s disease. The enriched serum peptidome and the subcellular fractions of intestinal epithelial cells (IECs) from healthy donors and Crohn’s disease patients were analysed. The combining of the low molecular weight serum proteins enrichment step and the LCMS approach allowed to evaluate a pattern of peptides derived from specific exoprotease activity in the coagulation and complement activation pathways. Among these peptides, particularly interesting was the discovery of clusters of peptides from fibrinopeptide A, Apolipoprotein E and A4, and complement C3 and C4. Further studies need to be performed to evaluate the specificity of these clusters and validate the results, in order to develop a rapid serum diagnostic test. The analysis by label-free LC ESI-QTOF MS differential analysis of the subcellular fractions of IECs from Crohn’s disease patients and healthy donors permitted to find many proteins that could be involved in the inflammation process. Among them heat shock protein 70, tryptase alpha-1 precursor and proteins whose upregulation can be explained by the increased activity of IECs in Crohn’s disease were identified. Follow-up studies for the validation of the results and the in-depth investigation of the inflammation pathways involved in the disease will be performed. Both the developed mass spectrometry-based protein profiling strategies have been proved to be useful tools for the discovery of disease biomarkers that need to be validated in further studies.
Resumo:
The aim of the present study is understanding the properties of a new group of redox proteins having in common a DOMON-type domain with characteristics of cytochromes b. The superfamily of proteins containing a DOMON of this type includes a few protein families. With the aim of better characterizing this new protein family, the present work addresses both a CyDOM protein (a cytochrome b561) and a protein only comprised of DOMON(AIR12), both of plant origin. Apoplastic ascorbate can be regenerated from monodehydroascorbate by a trans-plasma membrane redox system which uses cytosolic ascorbate as a reductant and comprises a high potential cytochrome b. We identified the major plasma membrane (PM) ascorbate-reducible b-type cytochrome of bean (Phaseolus vulgaris) and soybean (Glycine max) hypocotyls as orthologs of Arabidopsis auxin-responsive gene air12. The protein, which is glycosylated and glycosylphosphatidylinositol-anchored to the external side of the PM in vivo, was expressed in Pichia pastoris in a recombinant form, lacking the glycosylphosphatidylinositol-modification signal, and purified from the culture medium. Recombinant AIR12 is a soluble protein predicted to fold into a β-sandwich domain and belonging to the DOMON superfamily. It is shown to be a b-type cytochrome with a symmetrical α-band at 561 nm, to be fully reduced by ascorbate and fully oxidized by monodehydroascorbate. Redox potentiometry suggests that AIR12 binds two high-potential hemes (Em,7 +135 and +236 mV). Phylogenetic analyses reveal that the auxin-responsive genes AIR12 constitute a new family of plasma membrane b-type cytochromes specific to flowering plants. Although AIR12 is one of the few redox proteins of the PM characterized to date, the role of AIR12 in trans-PM electron transfer would imply interaction with other partners which are still to be identified. Another part of the present project was aimed at understanding of a soybean protein comprised of a DOMON fused with a well-defined b561 cytochrome domain (CyDOM). Various bioinformatic approaches show this protein to be composed of an N-terminal DOMON followed by b561 domain. The latter contains five transmembrane helices featuring highly conserved histidines, which might bind haem groups. The CyDOM has been cloned and expressed in the yeast Pichia pastoris, and spectroscopic analyses have been accomplished on solubilized yeast membranes. CyDOM clearly reveal the properties of b-type cytochrome. The results highlight the fact that CyDOM is clearly able to lead an electron flux through the plasmamembrane. Voltage clamp experiments demonstrate that Xenopus laevis oocytes transformed with CyDOM of soybean exhibit negative electrical currents in presence of an external electron acceptor. Analogous investigations were carried out with SDR2, a CyDOM of Drosophila melanogaster which shows an electron transport capacity even higher than plant CyDOM. As quoted above, these data reinforce those obtained in plant CyDOM on the one hand, and on the other hand allow to attribute to SDR2-like proteins the properties assigned to CyDOM. Was expressed in Regenerated tobacco roots, transiently transformed with infected a with chimeral construct GFP: CyDOM (by A. rhizogenes infection) reveals a plasmamembrane localization of CyDOM both in epidermal cells of the elongation zone of roots and in root hairs. In conclusion. Although the data presented here await to be expanded and in part clarified, it is safe to say they open a new perspective about the role of this group of proteins. The biological relevance of the functional and physiological implications of DOMON redox domains seems noteworthy, and it can but increase with future advances in research. Beyond the very finding, however interesting in itself, of DOMON domains as extracellular cytochromes, the present study testifies to the fact that cytochrome proteins containing DOMON domains of the type of “CyDOM” can transfer electrons through membranes and may represent the most important redox component of the plasmamembrane as yet discovered.
Resumo:
Bioinformatic analysis of Group A Streptococcus (GAS) genomes aiming at the identification of new vaccine antigens, revealed the presence of a gene coding for a putative surface-associated protein, named GAS40, inducing protective antibodies in an animal model of sepsis. The aim of our study was to unravel the involvement of GAS40 in cell division processes and to identify the putative interactor. Firstly, bioinformatic analysis showed that gas40 shares homology with ezrA, a gene coding for a negative regulator of Z-ring formation during cell division process. Both scanning and transmission electron microscopy indicated morphological differences between wild-type and the GAS40 knock-out mutant strain, with the latter showing an impaired capacity to divide resulting in the formation of very long chains. Moreover, when the localization of the antigen on the bacterial surface was analyzed, we found that in bacteria grown at exponential phase GAS40 specifically localized at septum, indicating a possible role in cell division. Furthermore, by ELISA and co-sedimentation assays, we found that GAS40 is able to interact with FtsZ, a protein involved in Z-ring formation during cell division process. These data together with the co-localization of GAS40/FtsZ at bacterial septum demonstrated by by confocal microscopy, strongly support the hypothesis for a key role of GAS40 in bacterial cell division.
Resumo:
Il Tumore a Cellule Giganti dell’osso (TCG) è una rara neoplasia che rappresenta il 5% dei tumori di natura ossea; sebbene venga considerato un tumore a decorso benigno può manifestare caratteri di aggressività locale dando origine a recidive locali nel 10-25% dei casi, e nel 2-4% dei casi metastatizza a livello polmonare. In questo studio è stata valutata l’espressione dei miRNA mediante miRNA microarray in 10 pazienti affetti da TCG, 5 con metastasi e 5 liberi da malattia; sono stati riscontrati miRNA differenzialmente espressi tra i 2 gruppi di pazienti e la successiva validazione mediante Real Time PCR ha confermato una differenza significativa per il miR-136 (p=0.04). Mediante analisi bioinformatica con il software TargetScan abbiamo identificato RANK e NF1B come target del miR-136 e ne abbiamo studiato l’espressione mediante Real Time PCR su una più ampia casistica di pazienti affetti da TCG, metastatico e non, evidenziando una maggior espressione di NF1B nel gruppo di pazienti metastatici, mentre RANK non ha dimostrato una differenza significativa. L’analisi di Western Blot ha rilevato una maggiore espressione di entrambe le proteine nei pazienti metastatici rispetto ai non metastatici. Successivamente è stato condotto uno studio di immunoistochimica su TMA di 163 campioni di pazienti affetti da TCG a diverso decorso clinico che ha dimostrato una maggiore e significativa espressione di entrambe i target nei pazienti con metastasi rispetto ai non metastatici; le analisi di popolazione mediante Kaplan-Meier hanno confermato la correlazione tra over-espressione di RANK, NF1B e ricaduta con metastasi (p=0.001 e p<0.0005 rispettivamente). Lo studio di immunoistochimica è stato ampliato alle proteine maggiormente coinvolte nell’osteolisi che risultano avere un significato prognostico; tuttavia mediante analisi di ROC, la co-over-espressione di RANK, RANKL e NF1B rappresenta il migliore modello per predire la comparsa di metastasi (AUC=0.782, p<0.0005).
Resumo:
Il progresso tecnologico nel campo della biologia molecolare, pone la comunità scientifica di fronte all’esigenza di dare un’interpretazione all’enormità di sequenze biologiche che a mano a mano vanno a costituire le banche dati, siano esse proteine o acidi nucleici. In questo contesto la bioinformatica gioca un ruolo di primaria importanza. Un nuovo livello di possibilità conoscitive è stato introdotto con le tecnologie di Next Generation Sequencing (NGS), per mezzo delle quali è possibile ottenere interi genomi o trascrittomi in poco tempo e con bassi costi. Tra le applicazioni del NGS più rilevanti ci sono senza dubbio quelle oncologiche che prevedono la caratterizzazione genomica di tessuti tumorali e lo sviluppo di nuovi approcci diagnostici e terapeutici per il trattamento del cancro. Con l’analisi NGS è possibile individuare il set completo di variazioni che esistono nel genoma tumorale come varianti a singolo nucleotide, riarrangiamenti cromosomici, inserzioni e delezioni. Va però sottolineato che le variazioni trovate nei geni vanno in ultima battuta osservate dal punto di vista degli effetti a livello delle proteine in quanto esse sono le responsabili più dirette dei fenotipi alterati riscontrabili nella cellula tumorale. L’expertise bioinformatica va quindi collocata sia a livello dell’analisi del dato prodotto per mezzo di NGS ma anche nelle fasi successive ove è necessario effettuare l’annotazione dei geni contenuti nel genoma sequenziato e delle relative strutture proteiche che da esso sono espresse, o, come nel caso dello studio mutazionale, la valutazione dell’effetto della variazione genomica. È in questo contesto che si colloca il lavoro presentato: da un lato lo sviluppo di metodologie computazionali per l’annotazione di sequenze proteiche e dall’altro la messa a punto di una pipeline di analisi di dati prodotti con tecnologie NGS in applicazioni oncologiche avente come scopo finale quello della individuazione e caratterizzazione delle mutazioni genetiche tumorali a livello proteico.