10 resultados para CpGV resistance baculovirus whole genome sequencing

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This PhD Thesis is the result of my research activity in the last three years. My main research interest was centered on the evolution of mitochondrial genome (mtDNA), and on its usefulness as a phylogeographic and phylogenetic marker at different taxonomic levels in different taxa of Metazoa. From a methodological standpoint, my main effort was dedicated to the sequencing of complete mitochondrial genomes, and the approach to whole-genome sequencing was based on the application of Long-PCR and shotgun sequences. Moreover, this research project is a part of a bigger sequencing project of mtDNAs in many different Metazoans’ taxa, and I mostly dedicated myself to sequence and analyze mtDNAs in selected taxa of bivalves and hexapods (Insecta). Sequences of bivalve mtDNAs are particularly limited, and my study contributed to extend the sampling. Moreover, I used the bivalve Musculista senhousia as model taxon to investigate the molecular mechanisms and the evolutionary significance of their aberrant mode of mitochondrial inheritance (Doubly Uniparental Inheritance, see below). In Insects, I focused my attention on the Genus Bacillus (Insecta Phasmida). A detailed phylogenetic analysis was performed in order to assess phylogenetic relationships within the genus, and to investigate the placement of Phasmida in the phylogenetic tree of Insecta. The main goal of this part of my study was to add to the taxonomic coverage of sequenced mtDNAs in basal insects, which were only partially analyzed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pediatric acute myeloid leukemia (AML) is a molecularly heterogeneous disease that arises from genetic alterations in pathways that regulate self-renewal and myeloid differentiation. While the majority of patients carry recurrent chromosomal translocations, almost 20% of childhood AML do not show any recognizable cytogenetic alteration and are defined as cytogenetically normal (CN)-AML. CN-AML patients have always showed a great variability in response to therapy and overall outcome, underlining the presence of unknown genetic changes, not detectable by conventional analyses, but relevant for pathogenesis, and outcome of AML. The development of novel genome-wide techniques such as next-generation sequencing, have tremendously improved our ability to interrogate the cancer genome. Based on this background, the aim of this research study was to investigate the mutational landscape of pediatric CN-AML patients negative for all the currently known somatic mutations reported in AML through whole-transcriptome sequencing (RNA-seq). RNA-seq performed on diagnostic leukemic blasts from 19 pediatric CN-AML cases revealed a considerable incidence of cryptic chromosomal rearrangements, with the identification of 21 putative fusion genes. Several of the fusion genes that were identified in this study are recurrent and might have a prognostic and/or therapeutic relevance. A paradigm of that is the CBFA2T3-GLIS2 fusion, which has been demonstrated to be a common alteration in pediatric CN-AML, predicting poor outcome. Important findings have been also obtained in the identification of novel therapeutic targets. On one side, the identification of NUP98-JARID1A fusion suggests the use of disulfiram; on the other, here we describe alteration-activating tyrosine kinases, providing functional data supporting the use of tyrosine kinase inhibitors to specifically inhibit leukemia cells. This study provides new insights in the knowledge of genetic alterations underlying pediatric AML, defines novel prognostic markers and putative therapeutic targets, and prospectively ensures a correct risk stratification and risk-adapted therapy also for the “all-neg” AML subgroup.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The continuous increase of genome sequencing projects produced a huge amount of data in the last 10 years: currently more than 600 prokaryotic and 80 eukaryotic genomes are fully sequenced and publically available. However the sole sequencing process of a genome is able to determine just raw nucleotide sequences. This is only the first step of the genome annotation process that will deal with the issue of assigning biological information to each sequence. The annotation process is done at each different level of the biological information processing mechanism, from DNA to protein, and cannot be accomplished only by in vitro analysis procedures resulting extremely expensive and time consuming when applied at a this large scale level. Thus, in silico methods need to be used to accomplish the task. The aim of this work was the implementation of predictive computational methods to allow a fast, reliable, and automated annotation of genomes and proteins starting from aminoacidic sequences. The first part of the work was focused on the implementation of a new machine learning based method for the prediction of the subcellular localization of soluble eukaryotic proteins. The method is called BaCelLo, and was developed in 2006. The main peculiarity of the method is to be independent from biases present in the training dataset, which causes the over‐prediction of the most represented examples in all the other available predictors developed so far. This important result was achieved by a modification, made by myself, to the standard Support Vector Machine (SVM) algorithm with the creation of the so called Balanced SVM. BaCelLo is able to predict the most important subcellular localizations in eukaryotic cells and three, kingdom‐specific, predictors were implemented. In two extensive comparisons, carried out in 2006 and 2008, BaCelLo reported to outperform all the currently available state‐of‐the‐art methods for this prediction task. BaCelLo was subsequently used to completely annotate 5 eukaryotic genomes, by integrating it in a pipeline of predictors developed at the Bologna Biocomputing group by Dr. Pier Luigi Martelli and Dr. Piero Fariselli. An online database, called eSLDB, was developed by integrating, for each aminoacidic sequence extracted from the genome, the predicted subcellular localization merged with experimental and similarity‐based annotations. In the second part of the work a new, machine learning based, method was implemented for the prediction of GPI‐anchored proteins. Basically the method is able to efficiently predict from the raw aminoacidic sequence both the presence of the GPI‐anchor (by means of an SVM), and the position in the sequence of the post‐translational modification event, the so called ω‐site (by means of an Hidden Markov Model (HMM)). The method is called GPIPE and reported to greatly enhance the prediction performances of GPI‐anchored proteins over all the previously developed methods. GPIPE was able to predict up to 88% of the experimentally annotated GPI‐anchored proteins by maintaining a rate of false positive prediction as low as 0.1%. GPIPE was used to completely annotate 81 eukaryotic genomes, and more than 15000 putative GPI‐anchored proteins were predicted, 561 of which are found in H. sapiens. In average 1% of a proteome is predicted as GPI‐anchored. A statistical analysis was performed onto the composition of the regions surrounding the ω‐site that allowed the definition of specific aminoacidic abundances in the different considered regions. Furthermore the hypothesis that compositional biases are present among the four major eukaryotic kingdoms, proposed in literature, was tested and rejected. All the developed predictors and databases are freely available at: BaCelLo http://gpcr.biocomp.unibo.it/bacello eSLDB http://gpcr.biocomp.unibo.it/esldb GPIPE http://gpcr.biocomp.unibo.it/gpipe

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Motivation An actual issue of great interest, both under a theoretical and an applicative perspective, is the analysis of biological sequences for disclosing the information that they encode. The development of new technologies for genome sequencing in the last years, opened new fundamental problems since huge amounts of biological data still deserve an interpretation. Indeed, the sequencing is only the first step of the genome annotation process that consists in the assignment of biological information to each sequence. Hence given the large amount of available data, in silico methods became useful and necessary in order to extract relevant information from sequences. The availability of data from Genome Projects gave rise to new strategies for tackling the basic problems of computational biology such as the determination of the tridimensional structures of proteins, their biological function and their reciprocal interactions. Results The aim of this work has been the implementation of predictive methods that allow the extraction of information on the properties of genomes and proteins starting from the nucleotide and aminoacidic sequences, by taking advantage of the information provided by the comparison of the genome sequences from different species. In the first part of the work a comprehensive large scale genome comparison of 599 organisms is described. 2,6 million of sequences coming from 551 prokaryotic and 48 eukaryotic genomes were aligned and clustered on the basis of their sequence identity. This procedure led to the identification of classes of proteins that are peculiar to the different groups of organisms. Moreover the adopted similarity threshold produced clusters that are homogeneous on the structural point of view and that can be used for structural annotation of uncharacterized sequences. The second part of the work focuses on the characterization of thermostable proteins and on the development of tools able to predict the thermostability of a protein starting from its sequence. By means of Principal Component Analysis the codon composition of a non redundant database comprising 116 prokaryotic genomes has been analyzed and it has been showed that a cross genomic approach can allow the extraction of common determinants of thermostability at the genome level, leading to an overall accuracy in discriminating thermophilic coding sequences equal to 95%. This result outperform those obtained in previous studies. Moreover, we investigated the effect of multiple mutations on protein thermostability. This issue is of great importance in the field of protein engineering, since thermostable proteins are generally more suitable than their mesostable counterparts in technological applications. A Support Vector Machine based method has been trained to predict if a set of mutations can enhance the thermostability of a given protein sequence. The developed predictor achieves 88% accuracy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The DNA topology is an important modifier of DNA functions. Torsional stress is generated when right handed DNA is either over- or underwound, producing structural deformations which drive or are driven by processes such as replication, transcription, recombination and repair. DNA topoisomerases are molecular machines that regulate the topological state of the DNA in the cell. These enzymes accomplish this task by either passing one strand of the DNA through a break in the opposing strand or by passing a region of the duplex from the same or a different molecule through a double-stranded cut generated in the DNA. Because of their ability to cut one or two strands of DNA they are also target for some of the most successful anticancer drugs used in standard combination therapies of human cancers. An effective anticancer drug is Camptothecin (CPT) that specifically targets DNA topoisomerase 1 (TOP 1). The research project of the present thesis has been focused on the role of human TOP 1 during transcription and on the transcriptional consequences associated with TOP 1 inhibition by CPT in human cell lines. Previous findings demonstrate that TOP 1 inhibition by CPT perturbs RNA polymerase (RNAP II) density at promoters and along transcribed genes suggesting an involvement of TOP 1 in RNAP II promoter proximal pausing site. Within the transcription cycle, promoter pausing is a fundamental step the importance of which has been well established as a means of coupling elongation to RNA maturation. By measuring nascent RNA transcripts bound to chromatin, we demonstrated that TOP 1 inhibition by CPT can enhance RNAP II escape from promoter proximal pausing site of the human Hypoxia Inducible Factor 1 (HIF-1) and c-MYC genes in a dose dependent manner. This effect is dependent from Cdk7/Cdk9 activities since it can be reversed by the kinases inhibitor DRB. Since CPT affects RNAP II by promoting the hyperphosphorylation of its Rpb1 subunit the findings suggest that TOP 1inhibition by CPT may increase the activity of Cdks which in turn phosphorylate the Rpb1 subunit of RNAP II enhancing its escape from pausing. Interestingly, the transcriptional consequences of CPT induced topological stress are wider than expected. CPT increased co-transcriptional splicing of exon1 and 2 and markedly affected alternative splicing at exon 11. Surprisingly despite its well-established transcription inhibitory activity, CPT can trigger the production of a novel long RNA (5’aHIF-1) antisense to the human HIF-1 mRNA and a known antisense RNA at the 3’ end of the gene, while decreasing mRNA levels. The effects require TOP 1 and are independent from CPT induced DNA damage. Thus, when the supercoiling imbalance promoted by CPT occurs at promoter, it may trigger deregulation of the RNAP II pausing, increased chromatin accessibility and activation/derepression of antisense transcripts in a Cdks dependent manner. A changed balance of antisense transcripts and mRNAs may regulate the activity of HIF-1 and contribute to the control of tumor progression After focusing our TOP 1 investigations at a single gene level, we have extended the study to the whole genome by developing the “Topo-Seq” approach which generates a map of genome-wide distribution of sites of TOP 1 activity sites in human cells. The preliminary data revealed that TOP 1 preferentially localizes at intragenic regions and in particular at 5’ and 3’ ends of genes. Surprisingly upon TOP 1 downregulation, which impairs protein expression by 80%, TOP 1 molecules are mostly localized around 3’ ends of genes, thus suggesting that its activity is essential at these regions and can be compensate at 5’ ends. The developed procedure is a pioneer tool for the detection of TOP 1 cleavage sites across the genome and can open the way to further investigations of the enzyme roles in different nuclear processes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the past decade, the advent of efficient genome sequencing tools and high-throughput experimental biotechnology has lead to enormous progress in the life science. Among the most important innovations is the microarray tecnology. It allows to quantify the expression for thousands of genes simultaneously by measurin the hybridization from a tissue of interest to probes on a small glass or plastic slide. The characteristics of these data include a fair amount of random noise, a predictor dimension in the thousand, and a sample noise in the dozens. One of the most exciting areas to which microarray technology has been applied is the challenge of deciphering complex disease such as cancer. In these studies, samples are taken from two or more groups of individuals with heterogeneous phenotypes, pathologies, or clinical outcomes. these samples are hybridized to microarrays in an effort to find a small number of genes which are strongly correlated with the group of individuals. Eventhough today methods to analyse the data are welle developed and close to reach a standard organization (through the effort of preposed International project like Microarray Gene Expression Data -MGED- Society [1]) it is not unfrequant to stumble in a clinician's question that do not have a compelling statistical method that could permit to answer it.The contribution of this dissertation in deciphering disease regards the development of new approaches aiming at handle open problems posed by clinicians in handle specific experimental designs. In Chapter 1 starting from a biological necessary introduction, we revise the microarray tecnologies and all the important steps that involve an experiment from the production of the array, to the quality controls ending with preprocessing steps that will be used into the data analysis in the rest of the dissertation. While in Chapter 2 a critical review of standard analysis methods are provided stressing most of problems that In Chapter 3 is introduced a method to adress the issue of unbalanced design of miacroarray experiments. In microarray experiments, experimental design is a crucial starting-point for obtaining reasonable results. In a two-class problem, an equal or similar number of samples it should be collected between the two classes. However in some cases, e.g. rare pathologies, the approach to be taken is less evident. We propose to address this issue by applying a modified version of SAM [2]. MultiSAM consists in a reiterated application of a SAM analysis, comparing the less populated class (LPC) with 1,000 random samplings of the same size from the more populated class (MPC) A list of the differentially expressed genes is generated for each SAM application. After 1,000 reiterations, each single probe given a "score" ranging from 0 to 1,000 based on its recurrence in the 1,000 lists as differentially expressed. The performance of MultiSAM was compared to the performance of SAM and LIMMA [3] over two simulated data sets via beta and exponential distribution. The results of all three algorithms over low- noise data sets seems acceptable However, on a real unbalanced two-channel data set reagardin Chronic Lymphocitic Leukemia, LIMMA finds no significant probe, SAM finds 23 significantly changed probes but cannot separate the two classes, while MultiSAM finds 122 probes with score >300 and separates the data into two clusters by hierarchical clustering. We also report extra-assay validation in terms of differentially expressed genes Although standard algorithms perform well over low-noise simulated data sets, multi-SAM seems to be the only one able to reveal subtle differences in gene expression profiles on real unbalanced data. In Chapter 4 a method to adress similarities evaluation in a three-class prblem by means of Relevance Vector Machine [4] is described. In fact, looking at microarray data in a prognostic and diagnostic clinical framework, not only differences could have a crucial role. In some cases similarities can give useful and, sometimes even more, important information. The goal, given three classes, could be to establish, with a certain level of confidence, if the third one is similar to the first or the second one. In this work we show that Relevance Vector Machine (RVM) [2] could be a possible solutions to the limitation of standard supervised classification. In fact, RVM offers many advantages compared, for example, with his well-known precursor (Support Vector Machine - SVM [3]). Among these advantages, the estimate of posterior probability of class membership represents a key feature to address the similarity issue. This is a highly important, but often overlooked, option of any practical pattern recognition system. We focused on Tumor-Grade-three-class problem, so we have 67 samples of grade I (G1), 54 samples of grade 3 (G3) and 100 samples of grade 2 (G2). The goal is to find a model able to separate G1 from G3, then evaluate the third class G2 as test-set to obtain the probability for samples of G2 to be member of class G1 or class G3. The analysis showed that breast cancer samples of grade II have a molecular profile more similar to breast cancer samples of grade I. Looking at the literature this result have been guessed, but no measure of significance was gived before.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

I linfomi primitivi cutanei riconosciuti nella classificazione della WHO/EORTC si presentano come “entità cliniche distinte” su base clinica, morfologica, immunofenotipica e molecolare. Il fenotipo linfocitario T helper CD4+ caratterizza i CTCL, ma alcune entità a prognosi aggressiva presentano un immunofenotipo citotossico CD8+. Numerosi studi di citogenetica (CGH) e gene-expression profiling (GEP) sono stati condotti negli ultimi anni sui CTCL e sono state riscontrate numerose aberrazioni cromosomiche correlate ai meccanismi di controllo del ciclo cellulare. Scopo del nostro studio è la valutazione delle alterazioni genomiche coinvolte nella tumorigenesi di alcuni CTCL aggressivi: il linfoma extranodale NK/T nasal-type, il linfoma primitivo cutaneo aggressivo epidermotropo (AECTCL) e il gruppo dei PTCL/NOS pleomorfo CD8+. Il materiale bioptico dei pazienti è stato sottoposto alla metodica dell’array-CGH per identificare le anomalie cromosomiche; in alcuni casi di AECTCL è stata applicata la GEP, che evidenzia il profilo di espressione genica delle cellule neoplastiche. I dati ottenuti sono stati valutati in modo statistico, evidenziando le alterazioni cromosomiche comuni significative di ogni entità. In CGH, sono state evidenziate alcune aberrazioni comuni fra le entità studiate, la delezione di 9p21.3, l’amplificazione di 17q, 19p13, 19q13.11-q13.32 , 12q13 e 16p13.3, che determinano la delezione dei geni CDKN2A e CDKN2B e l’attivazione del JAK/STAT signaling pathway. Altre alterazioni definiscono l’amplificazione di c-MYC (8q24) e CCND1/CDK4-6 (11q13). In particolare, sono state evidenziate numerose anomalie genomiche comuni in casi di AECTCL e PTCL/NOS pleomorfo. L’applicazione della GEP in 5 casi di AECTCL ha confermato l’alterata espressione dei geni CDKN2A, JAK3 e STAT6, che potrebbero avere un ruolo diretto nella linfomagenesi. Lo studio di un numero maggiore di casi in GEP e l’introduzione delle nuove indagini molecolari come l’analisi dei miRNA, della whole-exome e whole genome sequences consentiranno di evidenziare alterazioni molecolari correlate con la prognosi, definendo anche nuovi target terapeutici.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

MYC is a transcription factor that can activate transcription of several targets by direct binding to their promoters at specific DNA sequences (E-box). Recent findings have also shown that it can exert its biological role by repressing transcription of other set of genes. C-MYC can mediate repression on its target genes through interaction with factors bound to promoter regions but not through direct recognition of typical E-Boxes. In this thesis, we investigated whether MYCN can also repress gene transcription and how this is mechanistically achieved. Moreover, expression of TRKA, P75NTR and ABCC3 is attenuated in aggressive MYCN-amplified tumors, suggesting a causal link between elevated MYCN activity and transcriptional repression of these three genes. We found that MYCN is physically associated with gene promoters in vivo in proximity of the transcriptional start sites and this association requires interactions with SP1 and/or MIZ-1. Furthermore, we show that this interaction could interfere with SP1 and MIZ-1 activation functions by recruiting co-repressors such as DNMT3a or HDACs. Studies in vitro suggest that MYCN interacts through distinct domains with SP1, MIZ-1 and HDAC1 supporting the idea that MYCN may form different complexes by interacting with different proteins. Re-expression of endogenous TRKA and P75NTR with exposure to the TSA sensitizes neuroblastoma to NGF-mediated apoptosis, whereas ectopic expression of ABCC3 decreases cell motility without interfering with growth. Finally, using shRNA whole genome library, we dissected the P75NTR repression trying to identify novel factors inside and/or outside MYCN complex for future therapeutic approaches. Overall, our results support a model in which MYCN can repress gene transcription by direct interaction with SP1 and/or MIZ-1, and provide further lines of evidence on the importance of transcriptional repression induced by Myc in tumor biology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Epigenetic variability is a new mechanism for the study of human microevolution, because it creates both phenotypic diversity within an individual and within population. This mechanism constitutes an important reservoir for adaptation in response to new stimuli and recent studies have demonstrated that selective pressures shape not only the genetic code but also DNA methylation profiles. The aim of this thesis is the study of the role of DNA methylation changes in human adaptive processes, considering the Italian peninsula and macro-geographical areas. A whole-genome analysis of DNA methylation profile across the Italian penisula identified some genes whose methylation levels differ between individuals of different Italian districts (South, Centre and North of Italy). These genes are involved in nitrogen compound metabolism and genes involved in pathogens response. Considering individuals with different macro-geographical origins (individuals of Asians, European and African ancestry) more significant DMRs (differentially methylated regions) were identified and are located in genes involved in glucoronidation, in immune response as well as in cell comunication processes. A "profile" of each ancestry (African, Asian and European) was described. Moreover a deepen analysis of three candidate genes (KRTCAP3, MAD1L and BRSK2) in a cohort of individuals of different countries (Morocco, Nigeria, China and Philippines) living in Bologna, was performed in order to explore genetic and epigenetic diversity. Moreover this thesis have paved the way for the application of DNA methylation for the study of hystorical remains and in particular for the age-estimation of individuals starting from biological samples (such as teeth or blood). Noteworthy, a mathematical model that considered methylation values of DNA extracted from cementum and pulp of living individuals can estimate chronological age with high accuracy (median absolute difference between age estimated from DNA methylation and chronological age was 1.2 years).

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The objective of this work is to characterize the genome of the chromosome 1 of A.thaliana, a small flowering plants used as a model organism in studies of biology and genetics, on the basis of a recent mathematical model of the genetic code. I analyze and compare different portions of the genome: genes, exons, coding sequences (CDS), introns, long introns, intergenes, untranslated regions (UTR) and regulatory sequences. In order to accomplish the task, I transformed nucleotide sequences into binary sequences based on the definition of the three different dichotomic classes. The descriptive analysis of binary strings indicate the presence of regularities in each portion of the genome considered. In particular, there are remarkable differences between coding sequences (CDS and exons) and non-coding sequences, suggesting that the frame is important only for coding sequences and that dichotomic classes can be useful to recognize them. Then, I assessed the existence of short-range dependence between binary sequences computed on the basis of the different dichotomic classes. I used three different measures of dependence: the well-known chi-squared test and two indices derived from the concept of entropy i.e. Mutual Information (MI) and Sρ, a normalized version of the “Bhattacharya Hellinger Matusita distance”. The results show that there is a significant short-range dependence structure only for the coding sequences whose existence is a clue of an underlying error detection and correction mechanism. No doubt, further studies are needed in order to assess how the information carried by dichotomic classes could discriminate between coding and noncoding sequence and, therefore, contribute to unveil the role of the mathematical structure in error detection and correction mechanisms. Still, I have shown the potential of the approach presented for understanding the management of genetic information.