8 resultados para Sequencing data

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Eukaryotic ribosomal DNA constitutes a multi gene family organized in a cluster called nucleolar organizer region (NOR); this region is composed usually by hundreds to thousands of tandemly repeated units. Ribosomal genes, being repeated sequences, evolve following the typical pattern of concerted evolution. The autonomous retroelement R2 inserts in the ribosomal gene 28S, leading to defective 28S rDNA genes. R2 element, being a retrotransposon, performs its activity in the genome multiplying its copy number through a “copy and paste” mechanism called target primed reverse transcription. It consists in the retrotranscription of the element’s mRNA into DNA, then the DNA is integrated in the target site. Since the retrotranscription can be interrupted, but the integration will be carried out anyway, truncated copies of the element will also be present in the genome. The study of these truncated variants is a tool to examine the activity of the element. R2 phylogeny appears, in general, not consistent with that of its hosts, except some cases (e.g. Drosophila spp. and Reticulitermes spp.); moreover R2 is absent in some species (Fugu rubripes, human, mouse, etc.), while other species have more R2 lineages in their genome (the turtle Mauremys reevesii, the Japanese beetle Popilia japonica, etc). R2 elements here presented are isolated in 4 species of notostracan branchiopods and in two species of stick insects, whose reproductive strategies range from strict gonochorism to unisexuality. From sequencing data emerges that in Triops cancriformis (Spanish gonochoric population), in Lepidurus arcticus (two putatively unisexual populations from Iceland) and in Bacillus rossius (gonochoric population from Capalbio) the R2 elements are complete and encode functional proteins, reflecting the general features of this family of transposable elements. On the other hand, R2 from Italian and Austrian populations of T. cancriformis (respectively unisexual and hermaphroditic), Lepidurus lubbocki (two elements within the same Italian population, gonochoric but with unfunctional males) and Bacillus grandii grandii (gonochoric population from Ponte Manghisi) have sequences that encode incomplete or non-functional proteins in which it is possible to recognize only part of the characteristic domains. In Lepidurus couesii (Italian gonochoric populations) different elements were found as in L. lubbocki, and the sequencing is still in progress. Two hypothesis are given to explain the inconsistency of R2/host phylogeny: vertical inheritance of the element followed by extinction/diversification or horizontal transmission. My data support previous study that state the vertical transmission as the most likely explanation; nevertheless horizontal transfer events can’t be excluded. I also studied the element’s activity in Spanish populations of T. cancriformis, in L. lubbocki, in L. arcticus and in gonochoric and parthenogenetic populations of B. rossius. In gonochoric populations of T. cancriformis and B. rossius I found that each individual has its own private set of truncated variants. The situation is the opposite for the remaining hermaphroditic/parthenogenetic species and populations, all individuals sharing – in the so far analyzed samples - the majority of variants. This situation is very interesting, because it isn’t concordant with the Muller’s ratchet theory that hypothesizes the parthenogenetic populations being either devoided of transposable elements or TEs overloaded. My data suggest a possible epigenetic mechanism that can block the retrotransposon activity, and in this way deleterious mutations don’t accumulate.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Autism Spectrum Disorder (ASD) is a heterogeneous and highly heritable neurodevelopmental disorder with a complex genetic architecture, consisting of a combination of common low-risk and more penetrant rare variants. This PhD project aimed to explore the contribution of rare variants in ASD susceptibility through NGS approaches in a cohort of 106 ASD families including 125 ASD individuals. Firstly, I explored the contribution of inherited rare variants towards the ASD phenotype in a girl with a maternally inherited pathogenic NRXN1 deletion. Whole exome sequencing of the trio family identified an increased burden of deleterious variants in the proband that could modulate the CNV penetrance and determine the disease development. In the second part of the project, I investigated the role of rare variants emerging from whole genome sequencing in ASD aetiology. To properly manage and analyse sequencing data, a robust and efficient variant filtering and prioritization pipeline was developed, and by its application a stringent set of rare recessive-acting and ultra-rare variants was obtained. As a first follow-up, I performed a preliminary analysis on de novo variants, identifying the most likely deleterious variants and highlighting candidate genes for further analyses. In the third part of the project, considering the well-established involvement of calcium signalling in the molecular bases of ASD, I investigated the role of rare variants in voltage-gated calcium channels genes, that mainly regulate intracellular calcium concentration, and whose alterations have been correlated with enhanced ASD risk. Specifically, I functionally tested the effect of rare damaging variants identified in CACNA1H, showing that CACNA1H variation may be involved in ASD development by additively combining with other high risk variants. This project highlights the challenges in the analysis and interpretation of variants from NGS analysis in ASD, and underlines the importance of a comprehensive assessment of the genomic landscape of ASD individuals.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Ewing sarcoma (EWS) and CIC-DUX4 sarcoma (CDS) are pediatric fusion gene-driven tumors of mesenchymal origin characterized by an extremely stable genome and limited clinical solutions. Post-transcriptional regulatory mechanisms are crucial for understanding the development of this class of tumors. RNA binding proteins (RBPs) play a crucial role in the aggressiveness of these tumors. Numerous RBP families are dysregulated in cancer, including IGF2BPs. Among these, IGF2BP3 is a negative prognostic factor in EWS because it promotes cell growth, chemoresistence, and induces the metastatic process. Based on preliminary RNA sequencing data from clinical samples of EWS vs CDS patients, three major axes that are more expressed in CDS have been identified, two of which are dissected in this PhD work. The first involves the transcription factor HMGA2, IGF2BP2-3, and IGF2; the other involves the ephrin receptor system, particularly EphA2. EphA2 is involved in numerous cellular functions during embryonic stages, and its increased expression in adult tissues is often associated with pathological conditions. In tumors, its role is controversial because it can be associated with both pro- and anti-tumoral mechanisms. In EWS, it has been shown to play a role in promoting cell migration and neoangiogenesis. Our study has confirmed that the HMGA2/IGF2BPs/IGF2 axis contributes to CDS malignancy, and Akt hyperactivation has a strong impact on migration. Using loss/gain of function models for EphA2, we confirmed that it is a substrate of Akt, and Akt hyperactivation in CDS triggers ligand-independent activation of EphA2 through phosphorylation of S897. Moreover, the combination of Trabectedin and NVP/BEZ235 partially inhibits Akt/mTOR activation, resulting in reduced tumor growth in vivo. Inhibition of EphA2 through ALWII 41_27 significantly reduces migration in vitro. The project aim is the identification of target molecules in CDS that can distinguish it from EWS and thus develop new targeted therapeutic strategies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the past decade, the advent of efficient genome sequencing tools and high-throughput experimental biotechnology has lead to enormous progress in the life science. Among the most important innovations is the microarray tecnology. It allows to quantify the expression for thousands of genes simultaneously by measurin the hybridization from a tissue of interest to probes on a small glass or plastic slide. The characteristics of these data include a fair amount of random noise, a predictor dimension in the thousand, and a sample noise in the dozens. One of the most exciting areas to which microarray technology has been applied is the challenge of deciphering complex disease such as cancer. In these studies, samples are taken from two or more groups of individuals with heterogeneous phenotypes, pathologies, or clinical outcomes. these samples are hybridized to microarrays in an effort to find a small number of genes which are strongly correlated with the group of individuals. Eventhough today methods to analyse the data are welle developed and close to reach a standard organization (through the effort of preposed International project like Microarray Gene Expression Data -MGED- Society [1]) it is not unfrequant to stumble in a clinician's question that do not have a compelling statistical method that could permit to answer it.The contribution of this dissertation in deciphering disease regards the development of new approaches aiming at handle open problems posed by clinicians in handle specific experimental designs. In Chapter 1 starting from a biological necessary introduction, we revise the microarray tecnologies and all the important steps that involve an experiment from the production of the array, to the quality controls ending with preprocessing steps that will be used into the data analysis in the rest of the dissertation. While in Chapter 2 a critical review of standard analysis methods are provided stressing most of problems that In Chapter 3 is introduced a method to adress the issue of unbalanced design of miacroarray experiments. In microarray experiments, experimental design is a crucial starting-point for obtaining reasonable results. In a two-class problem, an equal or similar number of samples it should be collected between the two classes. However in some cases, e.g. rare pathologies, the approach to be taken is less evident. We propose to address this issue by applying a modified version of SAM [2]. MultiSAM consists in a reiterated application of a SAM analysis, comparing the less populated class (LPC) with 1,000 random samplings of the same size from the more populated class (MPC) A list of the differentially expressed genes is generated for each SAM application. After 1,000 reiterations, each single probe given a "score" ranging from 0 to 1,000 based on its recurrence in the 1,000 lists as differentially expressed. The performance of MultiSAM was compared to the performance of SAM and LIMMA [3] over two simulated data sets via beta and exponential distribution. The results of all three algorithms over low- noise data sets seems acceptable However, on a real unbalanced two-channel data set reagardin Chronic Lymphocitic Leukemia, LIMMA finds no significant probe, SAM finds 23 significantly changed probes but cannot separate the two classes, while MultiSAM finds 122 probes with score >300 and separates the data into two clusters by hierarchical clustering. We also report extra-assay validation in terms of differentially expressed genes Although standard algorithms perform well over low-noise simulated data sets, multi-SAM seems to be the only one able to reveal subtle differences in gene expression profiles on real unbalanced data. In Chapter 4 a method to adress similarities evaluation in a three-class prblem by means of Relevance Vector Machine [4] is described. In fact, looking at microarray data in a prognostic and diagnostic clinical framework, not only differences could have a crucial role. In some cases similarities can give useful and, sometimes even more, important information. The goal, given three classes, could be to establish, with a certain level of confidence, if the third one is similar to the first or the second one. In this work we show that Relevance Vector Machine (RVM) [2] could be a possible solutions to the limitation of standard supervised classification. In fact, RVM offers many advantages compared, for example, with his well-known precursor (Support Vector Machine - SVM [3]). Among these advantages, the estimate of posterior probability of class membership represents a key feature to address the similarity issue. This is a highly important, but often overlooked, option of any practical pattern recognition system. We focused on Tumor-Grade-three-class problem, so we have 67 samples of grade I (G1), 54 samples of grade 3 (G3) and 100 samples of grade 2 (G2). The goal is to find a model able to separate G1 from G3, then evaluate the third class G2 as test-set to obtain the probability for samples of G2 to be member of class G1 or class G3. The analysis showed that breast cancer samples of grade II have a molecular profile more similar to breast cancer samples of grade I. Looking at the literature this result have been guessed, but no measure of significance was gived before.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The recent advent of Next-generation sequencing technologies has revolutionized the way of analyzing the genome. This innovation allows to get deeper information at a lower cost and in less time, and provides data that are discrete measurements. One of the most important applications with these data is the differential analysis, that is investigating if one gene exhibit a different expression level in correspondence of two (or more) biological conditions (such as disease states, treatments received and so on). As for the statistical analysis, the final aim will be statistical testing and for modeling these data the Negative Binomial distribution is considered the most adequate one especially because it allows for "over dispersion". However, the estimation of the dispersion parameter is a very delicate issue because few information are usually available for estimating it. Many strategies have been proposed, but they often result in procedures based on plug-in estimates, and in this thesis we show that this discrepancy between the estimation and the testing framework can lead to uncontrolled first-type errors. We propose a mixture model that allows each gene to share information with other genes that exhibit similar variability. Afterwards, three consistent statistical tests are developed for differential expression analysis. We show that the proposed method improves the sensitivity of detecting differentially expressed genes with respect to the common procedures, since it is the best one in reaching the nominal value for the first-type error, while keeping elevate power. The method is finally illustrated on prostate cancer RNA-seq data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Pediatric acute myeloid leukemia (AML) is a molecularly heterogeneous disease that arises from genetic alterations in pathways that regulate self-renewal and myeloid differentiation. While the majority of patients carry recurrent chromosomal translocations, almost 20% of childhood AML do not show any recognizable cytogenetic alteration and are defined as cytogenetically normal (CN)-AML. CN-AML patients have always showed a great variability in response to therapy and overall outcome, underlining the presence of unknown genetic changes, not detectable by conventional analyses, but relevant for pathogenesis, and outcome of AML. The development of novel genome-wide techniques such as next-generation sequencing, have tremendously improved our ability to interrogate the cancer genome. Based on this background, the aim of this research study was to investigate the mutational landscape of pediatric CN-AML patients negative for all the currently known somatic mutations reported in AML through whole-transcriptome sequencing (RNA-seq). RNA-seq performed on diagnostic leukemic blasts from 19 pediatric CN-AML cases revealed a considerable incidence of cryptic chromosomal rearrangements, with the identification of 21 putative fusion genes. Several of the fusion genes that were identified in this study are recurrent and might have a prognostic and/or therapeutic relevance. A paradigm of that is the CBFA2T3-GLIS2 fusion, which has been demonstrated to be a common alteration in pediatric CN-AML, predicting poor outcome. Important findings have been also obtained in the identification of novel therapeutic targets. On one side, the identification of NUP98-JARID1A fusion suggests the use of disulfiram; on the other, here we describe alteration-activating tyrosine kinases, providing functional data supporting the use of tyrosine kinase inhibitors to specifically inhibit leukemia cells. This study provides new insights in the knowledge of genetic alterations underlying pediatric AML, defines novel prognostic markers and putative therapeutic targets, and prospectively ensures a correct risk stratification and risk-adapted therapy also for the “all-neg” AML subgroup.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aging process is characterized by the progressive fitness decline experienced at all the levels of physiological organization, from single molecules up to the whole organism. Studies confirmed inflammaging, a chronic low-level inflammation, as a deeply intertwined partner of the aging process, which may provide the “common soil” upon which age-related diseases develop and flourish. Thus, albeit inflammation per se represents a physiological process, it can rapidly become detrimental if it goes out of control causing an excess of local and systemic inflammatory response, a striking risk factor for the elderly population. Developing interventions to counteract the establishment of this state is thus a top priority. Diet, among other factors, represents a good candidate to regulate inflammation. Building on top of this consideration, the EU project NU-AGE is now trying to assess if a Mediterranean diet, fortified for the elderly population needs, may help in modulating inflammaging. To do so, NU-AGE enrolled a total of 1250 subjects, half of which followed a 1-year long diet, and characterized them by mean of the most advanced –omics and non –omics analyses. The aim of this thesis was the development of a solid data management pipeline able to efficiently cope with the results of these assays, which are now flowing inside a centralized database, ready to be used to test the most disparate scientific hypotheses. At the same time, the work hereby described encompasses the data analysis of the GEHA project, which was focused on identifying the genetic determinants of longevity, with a particular focus on developing and applying a method for detecting epistatic interactions in human mtDNA. Eventually, in an effort to propel the adoption of NGS technologies in everyday pipeline, we developed a NGS variant calling pipeline devoted to solve all the sequencing-related issues of the mtDNA.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Gastrointestinal stromal tumors (GIST) are the most common di tumors of the gastrointestinal tract, arising from the interstitial cells of Cajal (ICCs) or their precursors. The vast majority of GISTs (75–85% of GIST) harbor KIT or PDGFRA mutations. A small percentage of GIST (about 10‐15%) do not harbor any of these driver mutations and have historically been called wild-type (WT). Among them, from 20% to 40% show loss of function of the succinate dehydrogenase complex (SDH), also defined as SDH‐deficient GIST. SDH-deficient GISTs display distinctive clinical and pathological features, and can be sporadic or associated with Carney triad or Carney-Stratakis syndrome. These tumors arise most frequently in the stomach with predilection to distal stomach and antrum, have a multi-nodular growth, display a histological epithelioid phenotype, and present frequent lympho-vascular invasion. Occurrence of lymph node metastases and indolent course are representative features of SDH-deficient GISTs. This subset of GIST is known for the immunohistochemical loss of succinate dehydrogenase subunit B (SDHB), which signals the loss of function of the entire SDH-complex. The overall aim of my PhD project consists of the comprehensive characterization of SDH deficient GIST. Throughout the project, clinical, molecular and cellular characterizations were performed using next-generation sequencing technologies (NGS), that has the potential to allow the identification of molecular patterns useful for the diagnosis and development of novel treatments. Moreover, while there are many different cell lines and preclinical models of KIT/PDGFRA mutant GIST, no reliable cell model of SDH-deficient GIST has currently been developed, which could be used for studies on tumor evolution and in vitro assessments of drug response. Therefore, another aim of this project was to develop a pre-clinical model of SDH deficient GIST using the novel technology of induced pluripotent stem cells (iPSC).