958 resultados para RNA-seq data


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the brain, mutations in SLC25A12 gene encoding AGC1 cause an ultra-rare genetic disease reported as a developmental and epileptic encephalopathy associated with global cerebral hypomyelination. Symptoms of the disease include diffused hypomyelination, arrested psychomotor development, severe hypotonia, seizures and are common to other neurological and developmental disorders. Amongst the biological components believed to be most affected by AGC1 deficiency are oligodendrocytes, glial cells responsible for myelination. Recent studies (Poeta et al, 2022) have also shown how altered levels of transcription factors and epigenetic modifications greatly affect proliferation and differentiation in oligodendrocyte precursor cells (OPCs). In this study we explore the transcriptomic landscape of Agc1 in two different system models: OPCs silenced for Agc1 and iPSCs from human patients differentiated to neural progenitors. Analyses range from differential expression analysis, alternative splicing, master regulator analysis. ATAC-seq results on OPCs were integrated with results from RNA-Seq to assess the activity of a TF based on the accessibility data from its putative targets, which allows to integrate RNA-Seq data to infer their role as either activators or repressors. All the findings for this model were also integrated with early data from iPSCs RNA-seq results, looking for possible commonalities between the two different system models, among which we find a downregulation in genes encoding for SREBP, a transcription factor regulating fatty acids biosynthesis, a key process for myelination which could explain the hypomyelinated state of patients. We also find that in both systems cells tend to form more neurites, likely losing their ability to differentiate, considering their progenitor state. We also report several alterations in the chromatin state of cells lacking Agc1, which confirms the hypothesis for which Agc1 is not a disease restricted only to metabolic alterations in the cells, but there is a profound shift of the regulatory state of these cells.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Machine Learning makes computers capable of performing tasks typically requiring human intelligence. A domain where it is having a considerable impact is the life sciences, allowing to devise new biological analysis protocols, develop patients’ treatments efficiently and faster, and reduce healthcare costs. This Thesis work presents new Machine Learning methods and pipelines for the life sciences focusing on the unsupervised field. At a methodological level, two methods are presented. The first is an “Ab Initio Local Principal Path” and it is a revised and improved version of a pre-existing algorithm in the manifold learning realm. The second contribution is an improvement over the Import Vector Domain Description (one-class learning) through the Kullback-Leibler divergence. It hybridizes kernel methods to Deep Learning obtaining a scalable solution, an improved probabilistic model, and state-of-the-art performances. Both methods are tested through several experiments, with a central focus on their relevance in life sciences. Results show that they improve the performances achieved by their previous versions. At the applicative level, two pipelines are presented. The first one is for the analysis of RNA-Seq datasets, both transcriptomic and single-cell data, and is aimed at identifying genes that may be involved in biological processes (e.g., the transition of tissues from normal to cancer). In this project, an R package is released on CRAN to make the pipeline accessible to the bioinformatic Community through high-level APIs. The second pipeline is in the drug discovery domain and is useful for identifying druggable pockets, namely regions of a protein with a high probability of accepting a small molecule (a drug). Both these pipelines achieve remarkable results. Lastly, a detour application is developed to identify the strengths/limitations of the “Principal Path” algorithm by analyzing Convolutional Neural Networks induced vector spaces. This application is conducted in the music and visual arts domains.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Telomerase RNAs (TERs) are highly divergent between species, varying in size and sequence composition. Here, we identify a candidate for the telomerase RNA component of Leishmania genus, which includes species that cause leishmaniasis, a neglected tropical disease. Merging a thorough computational screening combined with RNA-seq evidence, we mapped a non-coding RNA gene localized in a syntenic locus on chromosome 25 of five Leishmania species that shares partial synteny with both Trypanosoma brucei TER locus and a putative TER candidate-containing locus of Crithidia fasciculata. Using target-driven molecular biology approaches, we detected a ∼2,100 nt transcript (LeishTER) that contains a 5' spliced leader (SL) cap, a putative 3' polyA tail and a predicted C/D box snoRNA domain. LeishTER is expressed at similar levels in the logarithmic and stationary growth phases of promastigote forms. A 5'SL capped LeishTER co-immunoprecipitated and co-localized with the telomerase protein component (TERT) in a cell cycle-dependent manner. Prediction of its secondary structure strongly suggests the existence of a bona fide single-stranded template sequence and a conserved C[U/C]GUCA motif-containing helix II, representing the template boundary element. This study paves the way for further investigations on the biogenesis of parasite TERT ribonucleoproteins (RNPs) and its role in parasite telomere biology.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Thanks to recent advances in molecular biology, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as cDNA microarrays and RNA-Seq. Particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. Methods have been developed for gene networks modeling and identification from expression profiles. However, an important open problem regards how to validate such approaches and its results. This work presents an objective approach for validation of gene network modeling and identification which comprises the following three main aspects: (1) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (2) a computational method for gene network identification from the simulated data, which is founded on a feature selection approach where a target gene is fixed and the expression profile is observed for all other genes in order to identify a relevant subset of predictors; and (3) validation of the identified AGN-based network through comparison with the original network. The proposed framework allows several types of AGNs to be generated and used in order to simulate temporal expression data. The results of the network identification method can then be compared to the original network in order to estimate its properties and accuracy. Some of the most important theoretical models of complex networks have been assessed: the uniformly-random Erdos-Renyi (ER), the small-world Watts-Strogatz (WS), the scale-free Barabasi-Albert (BA), and geographical networks (GG). The experimental results indicate that the inference method was sensitive to average degree k variation, decreasing its network recovery rate with the increase of k. The signal size was important for the inference method to get better accuracy in the network identification rate, presenting very good results with small expression profiles. However, the adopted inference method was not sensible to recognize distinct structures of interaction among genes, presenting a similar behavior when applied to different network topologies. In summary, the proposed framework, though simple, was adequate for the validation of the inferred networks by identifying some properties of the evaluated method, which can be extended to other inference methods.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Alternative splicing of gene transcripts greatly expands the functional capacity of the genome, and certain splice isoforms may indicate specific disease states such as cancer. Splice junction microarrays interrogate thousands of splice junctions, but data analysis is difficult and error prone because of the increased complexity compared to differential gene expression analysis. We present Rank Change Detection (RCD) as a method to identify differential splicing events based upon a straightforward probabilistic model comparing the over-or underrepresentation of two or more competing isoforms. RCD has advantages over commonly used methods because it is robust to false positive errors due to nonlinear trends in microarray measurements. Further, RCD does not depend on prior knowledge of splice isoforms, yet it takes advantage of the inherent structure of mutually exclusive junctions, and it is conceptually generalizable to other types of splicing arrays or RNA-Seq. RCD specifically identifies the biologically important cases when a splice junction becomes more or less prevalent compared to other mutually exclusive junctions. The example data is from different cell lines of glioblastoma tumors assayed with Agilent microarrays.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

RESUMO: Actualmente, a única possibilidade de cura para doentes com adenocarcinoma do pâncreas (PDAC) é a ressecção cirúrgica, no início deste estudo, perguntamo-nos se os predictores clínico-patológicos clássicos de prognostico poderiam ser validados em uma grande cohort de doentes com cancro do pâncreas ressecável e se outros predictores clínicos poderiam ter um papel na decisão de que doentes beneficiariam de ressecção cirúrgica. No capítulo 2, observamos que até 30% dos doentes morrem no primeiro ano após a ressecção cirúrgica, pelo que o nosso objectivo foi determinar factores pré-operatórios que se correlacionam com mortalidade precoce após ressecação cirúrgica com recurso a um instrumento estatisticamente validado, o Charlson-Age Comorbidity Index (CACI), determinamos que um CACI score superior a 4 foi preditivo de internamentos prolongados (p <0,001), complicações pós-operatórias (p = 0,042), e mortalidade em 1 ano pós- ressecção cirúrgica (p <0,001). Um CACI superior a 6 triplicou a mortalidade no primeiro ano pós-cirurgia e estes doentes têm menos de 50% de probabilidade de estarem vivos um ano após a cirurgia. No capítulo 3, o nosso objectivo foi identificar uma proteína de superfície que se correlacionasse estatisticamente com o prognostico de doentes com adenocarcinoma do pâncreas e permitisse a distinção de subgrupos de doentes de acordo com as suas diferenças moleculares, perguntamo-nos ainda se essa proteína poderia ser um marcador de células-estaminais. No nosso trabalho anterior observamos que as células tumorais na circulação sanguínea apresentavam genes com características bifenotípica epitelial e mesenquimal, enriquecimento para genes de células estaminais (ALDH1A1 / ALDH1A2 e KLF4), e uma super-expressão de genes da matriz extracelular (colagénios, SPARC, e DCN) normalmente identificados no estroma de PDAC. Após a avaliação dos tumores primários com RNA-ISH, muitos dos genes identificados, foram encontrados co-localizando em uma sub-população de células na região basal dos ductos pancreáticos malignos. Além disso, observamos que estas células expressam o marcador SV2A neuroendócrino, e o marcador de células estaminais ALDH1A1/2. Em comparação com tumores negativos para SV2, os doentes com tumores SV2 positivos apresentaram níveis mais baixos de CA 19-9 (69% vs. 52%, p = 0,012), tumores maiores (> 4 cm, 23% vs. 10%, p = 0,0430), menor invasão de gânglios linfáticos (69% vs. 86%, p = 0,005) e tumores mais diferenciados (69% vs. 57%, p = 0,047). A presença de SV2A foi associada com uma sobrevida livre de doença mais longa (HR: 0,49 p = 0,009) bem como melhor sobrevida global (HR: 0,54 p = 0,018). Em conjunto, esta informação aponta para dois subtipos diferentes de adenocarcinoma do pâncreas, e estes subtipos co-relacionam estatisticamente com o prognostico de doentes, sendo este subgrupo definido pela presença do clone celular SV2A / ALDH1A1/2 positivo com características neuroendócrinas. No Capítulo 4, a expressão de SV2A no cancro do pâncreas foi validado em linhas celulares primárias. Demonstramos a heterogeneidade do adenocarcinoma do pâncreas de acordo com características clonais neuroendócrinas. Ao comparar as linhas celulares expressando SV2 com linhas celulares negativas, verificamos que as linhas celulares SV2+ eram mais diferenciadas, diferindo de linhas celulares SV2 negativas no que respeita a mutação KRAS, proliferação e a resposta à quimioterapia. No capítulo 5, perguntamo-nos se o clone celular SV2 positivo poderia explicar a resistência a quimioterapia observada em doentes. Observamos um aumento absoluto de clones celulares expressando SV2A, em múltiplas linhas de evidência - doentes, linhas de células primárias e xenotransplantes. Embora, tenhamos sido capazes de demonstrar que o adenocarcinoma do pâncreas é uma doença heterogénea, consideramos que a caracterização genética destes clones celulares expressando SV2A é de elevada importância. Pretendemos colmatar esta limitação com as seguintes estratégias: Após o tratamento com quimioterapia neoadjuvante na nossa coorte, realizamos microdissecação a laser das amostras primarias em parafina, de forma a analisar mutações genéticas observadas no adenocarcinoma pancreático; em segundo lugar, pretendemos determinar consequências de knockdown da expressão de SV2A em nossas linhas celulares seguindo-se o tratamento com gemicitabina para determinação do papel funcional de SV2A; finalmente, uma vez que os nossos esforços anteriores com um promotor - repórter e SmartFlare ™ falharam, o próximo passo será realizar RNA-ISH PrimeFlow™ seguido de FACS e RNA-seq para caracterização deste clone celular. Em conjunto, conseguimos provar com várias linhas de evidência, que o adenocarcinoma pancreático é uma doença heterogénea, definido por um clone de células que expressam SV2A, com características neuroendócrinas. A presença deste clone no tecido de doentes correlaciona-se estatisticamente com o prognostico da doença, incluindo sobrevida livre de doença e sobrevida global. Juntamente com padrões de proliferação e co-expressão de ALDH1A1/2, este clone parece apresentar um comportamento de células estaminais e está associado a resistência a quimioterapia, uma vez que a sua expressão aumenta após agressão química, quer em doentes, quer em linhas de células primárias.----------------------------- ABSTRACT: Currently, the only chance of cure for patients with pancreatic adenocarcinoma is surgical resection, at the beginning of my thesis studies, we asked if the classical clinicopathologic predictors of outcome could be validated in a large cohort of patients with early stage pancreatic cancer and if other clinical predictors could have a role on deciding which patients would benefit from surgery. In chapter 2, we found that up to 30% of patients die within the first year after curative intent surgery for pancreatic adenocarcinoma. We aimed at determining pre-operative factors that would correlate with early mortality following resection for pancreatic cancer using a statistically validated tool, the Charlson-Age Comorbidity Index (CACI). We found that a CACI score greater than 4 was predictive of increased length of stay (p<0.001), post-operative complications (p=0.042), and mortality within 1-year of pancreatic resection (p<0.001). A CACI score of 6 or greater increased 3-fold the odds of death within the first year. Patients with a high CACI score have less than 50% likelihood of being alive 1 year after surgery. In chapter 3 we aimed at identifying a surface protein that correlates with patient’s outcome and distinguishes sub-groups of patients according to their molecular differences and if this protein could be a cancer stem cell marker. The most abundant class of circulating tumor cells identified in our previous work was found to have biphenotypic features of epithelial to mesenchymal transition, enrichment for stem-cell associated genes (ALDH1A1/ALDH1A2 and KLF4), and an overexpression of extracellular matrix genes (Collagens, SPARC, and DCN) normally found in the stromal microenvironment of PDAC primary tumors. Upon evaluation of matched primary tumors with RNA-ISH, many of the genes identified were found to co-localize in a sub-population of cells at the basal region of malignant pancreatic ducts. In addition, these cells expressed the neuroendocrine marker SV2A, and the stem cell marker ALDH1A1/2. Compared to SV2 negative tumors, patients with SV2 positive tumors were more likely to present with lower CA 19-9 (69% vs. 52%, p = 0.012), bigger tumors (size > 4 cm, 23% vs. 10%, p= 0.0430), less nodal involvement (69% vs. 86%, p = 0.005) and lower histologic grade (69% vs. 57%, p = 0.047). The presence of SV2A expressing cells was associated with an improved disease free survival (HR: 0.49 p=0.009) and overall survival (HR: 0.54 p=0.018) and correlated linearly with ALDH1A2. Together, this information points to two different sub-types of pancreatic adenocarcinoma, and these sub-types correlated with patients’ outcome and were defined by the presence of a SV2A/ ALDH1A1/2 expressing clone with neuroendocrine features. In Chapter 4, SV2A expression in cancer was validated in primary cell lines. We were able to demonstrate pancreatic adenocarcinoma heterogeneity according to neuroendocrine clonal features. When comparing SV2 expressing cell lines with SV2 negative cell lines, we found that SV2+ cell lines were more differentiated and differ from SV2 negative cell lines regarding KRAS mutation, proliferation and response to chemotherapy. In Chapter 5 we aimed at determining if this SV2 positive clone could explain chemoresistance observed in patients. We found an absolute increase in SV2A expressing cells, with multiple lines of evidence, in patients, primary cell lines and xenografts. Although, we have been able to show evidence that pancreatic adenocarcinoma is a heterogeneous disease, our findings warrant further investigation. To further characterize SV2A expressing clones after treatment with neoadjuvant chemotherapy in our cohort, we have performed laser capture microdissection of the paraffin embedded tissue in this study and will analyze the tissue for known genetic mutations in pancreatic adenocarcinoma; secondly, we want to know what will happen after knocking down SV2A expression in our cell lines followed by treatment with gemcitabine to determine if SV2A is functionally important; finally, since our previous efforts with a promoter – reporter and SmartFlare™ have failed, we will utilize a novel PrimeFlow™ RNA-ISH assay followed by FACS and RNA sequencing to further characterize this cellular clone. Overall our data proves, with multiple lines of evidence, that pancreatic adenocarcinoma is a heterogeneous disease, defined by a clone of SV2A expressing cells, with neuroendocrine features. The presence of this clone in patients’ tissue correlates with patient’s disease free survival and overall survival. Together with patterns of proliferation and ALDH1A1/2 co-expression, this clone seems to present a stem-cell-like behavior and is associated with chemoresistance, since it increases after chemotherapy, both in patients and primary cell lines.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

SUMMARY : Eukaryotic DNA interacts with the nuclear proteins using non-covalent ionic interactions. Proteins can recognize specific nucleotide sequences based on the sterical interactions with the DNA and these specific protein-DNA interactions are the basis for many nuclear processes, e.g. gene transcription, chromosomal replication, and recombination. New technology termed ChIP-Seq has been recently developed for the analysis of protein-DNA interactions on a whole genome scale and it is based on immunoprecipitation of chromatin and high-throughput DNA sequencing procedure. ChIP-Seq is a novel technique with a great potential to replace older techniques for mapping of protein-DNA interactions. In this thesis, we bring some new insights into the ChIP-Seq data analysis. First, we point out to some common and so far unknown artifacts of the method. Sequence tag distribution in the genome does not follow uniform distribution and we have found extreme hot-spots of tag accumulation over specific loci in the human and mouse genomes. These artifactual sequence tags accumulations will create false peaks in every ChIP-Seq dataset and we propose different filtering methods to reduce the number of false positives. Next, we propose random sampling as a powerful analytical tool in the ChIP-Seq data analysis that could be used to infer biological knowledge from the massive ChIP-Seq datasets. We created unbiased random sampling algorithm and we used this methodology to reveal some of the important biological properties of Nuclear Factor I DNA binding proteins. Finally, by analyzing the ChIP-Seq data in detail, we revealed that Nuclear Factor I transcription factors mainly act as activators of transcription, and that they are associated with specific chromatin modifications that are markers of open chromatin. We speculate that NFI factors only interact with the DNA wrapped around the nucleosome. We also found multiple loci that indicate possible chromatin barrier activity of NFI proteins, which could suggest the use of NFI binding sequences as chromatin insulators in biotechnology applications. RESUME : L'ADN des eucaryotes interagit avec les protéines nucléaires par des interactions noncovalentes ioniques. Les protéines peuvent reconnaître les séquences nucléotidiques spécifiques basées sur l'interaction stérique avec l'ADN, et des interactions spécifiques contrôlent de nombreux processus nucléaire, p.ex. transcription du gène, la réplication chromosomique, et la recombinaison. Une nouvelle technologie appelée ChIP-Seq a été récemment développée pour l'analyse des interactions protéine-ADN à l'échelle du génome entier et cette approche est basée sur l'immuno-précipitation de la chromatine et sur la procédure de séquençage de l'ADN à haut débit. La nouvelle approche ChIP-Seq a donc un fort potentiel pour remplacer les anciennes techniques de cartographie des interactions protéine-ADN. Dans cette thèse, nous apportons de nouvelles perspectives dans l'analyse des données ChIP-Seq. Tout d'abord, nous avons identifié des artefacts très communs associés à cette méthode qui étaient jusqu'à présent insoupçonnés. La distribution des séquences dans le génome ne suit pas une distribution uniforme et nous avons constaté des positions extrêmes d'accumulation de séquence à des régions spécifiques, des génomes humains et de la souris. Ces accumulations des séquences artéfactuelles créera de faux pics dans toutes les données ChIP-Seq, et nous proposons différentes méthodes de filtrage pour réduire le nombre de faux positifs. Ensuite, nous proposons un nouvel échantillonnage aléatoire comme un outil puissant d'analyse des données ChIP-Seq, ce qui pourraient augmenter l'acquisition de connaissances biologiques à partir des données ChIP-Seq. Nous avons créé un algorithme d'échantillonnage aléatoire et nous avons utilisé cette méthode pour révéler certaines des propriétés biologiques importantes de protéines liant à l'ADN nommés Facteur Nucléaire I (NFI). Enfin, en analysant en détail les données de ChIP-Seq pour la famille de facteurs de transcription nommés Facteur Nucléaire I, nous avons révélé que ces protéines agissent principalement comme des activateurs de transcription, et qu'elles sont associées à des modifications de la chromatine spécifiques qui sont des marqueurs de la chromatine ouverte. Nous pensons que lés facteurs NFI interagir uniquement avec l'ADN enroulé autour du nucléosome. Nous avons également constaté plusieurs régions génomiques qui indiquent une éventuelle activité de barrière chromatinienne des protéines NFI, ce qui pourrait suggérer l'utilisation de séquences de liaison NFI comme séquences isolatrices dans des applications de la biotechnologie.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Cancer genomes frequently contain somatic copy number alterations (SCNA) that can significantly perturb the expression level of affected genes and thus disrupt pathways controlling normal growth. In melanoma, many studies have focussed on the copy number and gene expression levels of the BRAF, PTEN and MITF genes, but little has been done to identify new genes using these parameters at the genome-wide scale. Using karyotyping, SNP and CGH arrays, and RNA-seq, we have identified SCNA affecting gene expression ('SCNA-genes') in seven human metastatic melanoma cell lines. We showed that the combination of these techniques is useful to identify candidate genes potentially involved in tumorigenesis. Since few of these alterations were recurrent across our samples, we used a protein network-guided approach to determine whether any pathways were enriched in SCNA-genes in one or more samples. From this unbiased genome-wide analysis, we identified 28 significantly enriched pathway modules. Comparison with two large, independent melanoma SCNA datasets showed less than 10% overlap at the individual gene level, but network-guided analysis revealed 66% shared pathways, including all but three of the pathways identified in our data. Frequently altered pathways included WNT, cadherin signalling, angiogenesis and melanogenesis. Additionally, our results emphasize the potential of the EPHA3 and FRS2 gene products, involved in angiogenesis and migration, as possible therapeutic targets in melanoma. Our study demonstrates the utility of network-guided approaches, for both large and small datasets, to identify pathways recurrently perturbed in cancer.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In addition to differences in protein-coding gene sequences, changes in expression resulting from mutations in regulatory sequences have long been hypothesized to be responsible for phenotypic differences between species. However, unlike comparison of genome sequences, few studies, generally restricted to pairwise comparisons of closely related mammalian species, have assessed between-species differences at the transcriptome level. They reported that gene expression evolves at different rates in various organs and in a pattern that is overall consistent with neutral models of evolution. In the first part of my thesis, I investigated the evolution of gene expression in therian mammals (i.e.7 placental and marsupials), based on microarray data from human, mouse and the gray short-tailed opossum (Monodelphis domestica). In addition to autosomal genes, a special focus was given to the evolution of X-linked genes. The therian X chromosome was recently shown to be younger than previously thought and to harbor a specific gene content (e.g., genes involved in brain or reproductive functions) that is thought to have been shaped by specific sex-related evolutionary forces. Sex chromosomes derive from ordinary autosomes and their differentiation led to the degeneration of the Y chromosome (in mammals) or W chromosome (in birds). Consequently, X- or Z-linked genes differ in gene dose between males and females such that the heterogametic sex has half the X/Z gene dose compared to the ancestral state. To cope with this dosage imbalance, mammals have been reported to have evolved mechanisms of dosage compensation.¦In the first project, I could first show that transcriptomes evolve at different rates in different organs. Out of the five tissues I investigated, the testis is the most rapidly evolving organ at the gene expression level while the brain has the most conserved transcriptome. Second, my analyses revealed that mammalian gene expression evolution is compatible with a neutral model, where the rates of change in gene expression levels is linked to the efficiency of purifying selection in a given lineage, which, in turn, is determined by the long-term effective population size in that lineage. Thus, the rate of DNA sequence evolution, which could be expected to determine the rate of regulatory sequence change, does not seem to be a major determinant of the rate of gene expression evolution. Thus, most gene expression changes seem to be (slightly) deleterious. Finally, X-linked genes seem to have experienced elevated rates of gene expression change during the early stage of X evolution. To further investigate the evolution of mammalian gene expression, we generated an extensive RNA-Seq gene expression dataset for nine mammalian species and a bird. The analyses of this dataset confirmed the patterns previously observed with microarrays and helped to significantly deepen our view on gene expression evolution.¦In a specific project based on these data, I sought to assess in detail patterns of evolution of dosage compensation in amniotes. My analyses revealed the absence of male to female dosage compensation in monotremes and its presence in marsupials and, in addition, confirmed patterns previously described for placental mammals and birds. I then assessed the global level of expression of X/Z chromosomes and contrasted this with its ancestral gene expression levels estimated from orthologous autosomal genes in species with non-homologous sex chromosomes. This analysis revealed a lack of up-regulation for placental mammals, the level of expression of X-linked genes being proportional to gene dose. Interestingly, the ancestral gene expression level was at least partially restored in marsupials as well as in the heterogametic sex of monotremes and birds. Finally, I investigated alternative mechanisms of dosage compensation and found that gene duplication did not seem to be a widespread mechanism to restore the ancestral gene dose. However, I could show that placental mammals have preferentially down-regulated autosomal genes interacting with X-linked genes which underwent gene expression decrease, and thus identified a novel alternative mechanism of dosage compensation.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

AbstractAlthough the genomes from any two human individuals are more than 99.99% identical at the sequence level, some structural variation can be observed. Differences between genomes include single nucleotide polymorphism (SNP), inversion and copy number changes (gain or loss of DNA). The latter can range from submicroscopic events (CNVs, at least 1kb in size) to complete chromosomal aneuploidies. Small copy number variations have often no (lethal) consequences to the cell, but a few were associated to disease susceptibility and phenotypic variations. Larger re-arrangements (i.e. complete chromosome gain) are frequently associated with more severe consequences on health such as genomic disorders and cancer. High-throughput technologies like DNA microarrays enable the detection of CNVs in a genome-wide fashion. Since the initial catalogue of CNVs in the human genome in 2006, there has been tremendous interest in CNVs both in the context of population and medical genetics. Understanding CNV patterns within and between human populations is essential to elucidate their possible contribution to disease. But genome analysis is a challenging task; the technology evolves rapidly creating needs for novel, efficient and robust analytical tools which need to be compared with existing ones. Also, while the link between CNV and disease has been established, the relative CNV contribution is not fully understood and the predisposition to disease from CNVs of the general population has not been yet investigated.During my PhD thesis, I worked on several aspects related to CNVs. As l will report in chapter 3, ! was interested in computational methods to detect CNVs from the general population. I had access to the CoLaus dataset, a population-based study with more than 6,000 participants from the Lausanne area. All these individuals were analysed on SNP arrays and extensive clinical information were available. My work explored existing CNV detection methods and I developed a variety of metrics to compare their performance. Since these methods were not producing entirely satisfactory results, I implemented my own method which outperformed two existing methods. I also devised strategies to combine CNVs from different individuals into CNV regions.I was also interested in the clinical impact of CNVs in common disease (chapter 4). Through an international collaboration led by the Centre Hospitalier Universitaire Vaudois (CHUV) and the Imperial College London I was involved as a main data analyst in the investigation of a rare deletion at chromosome 16p11 detected in obese patients. Specifically, we compared 8,456 obese patients and 11,856 individuals from the general population and we found that the deletion was accounting for 0.7% of the morbid obesity cases and was absent in healthy non- obese controls. This highlights the importance of rare variants with strong impact and provides new insights in the design of clinical studies to identify the missing heritability in common disease.Furthermore, I was interested in the detection of somatic copy number alterations (SCNA) and their consequences in cancer (chapter 5). This project was a collaboration initiated by the Ludwig Institute for Cancer Research and involved other groups from the Swiss Institute of Bioinformatics, the CHUV and Universities of Lausanne and Geneva. The focus of my work was to identify genes with altered expression levels within somatic copy number alterations (SCNA) in seven metastatic melanoma ceil lines, using CGH and SNP arrays, RNA-seq, and karyotyping. Very few SCNA genes were shared by even two melanoma samples making it difficult to draw any conclusions at the individual gene level. To overcome this limitation, I used a network-guided analysis to determine whether any pathways, defined by amplified or deleted genes, were common among the samples. Six of the melanoma samples were potentially altered in four pathways and five samples harboured copy-number and expression changes in components of six pathways. In total, this approach identified 28 pathways. Validation with two external, large melanoma datasets confirmed all but three of the detected pathways and demonstrated the utility of network-guided approaches for both large and small datasets analysis.RésuméBien que le génome de deux individus soit similaire à plus de 99.99%, des différences de structure peuvent être observées. Ces différences incluent les polymorphismes simples de nucléotides, les inversions et les changements en nombre de copies (gain ou perte d'ADN). Ces derniers varient de petits événements dits sous-microscopiques (moins de 1kb en taille), appelés CNVs (copy number variants) jusqu'à des événements plus large pouvant affecter des chromosomes entiers. Les petites variations sont généralement sans conséquence pour la cellule, toutefois certaines ont été impliquées dans la prédisposition à certaines maladies, et à des variations phénotypiques dans la population générale. Les réarrangements plus grands (par exemple, une copie additionnelle d'un chromosome appelée communément trisomie) ont des répercutions plus grave pour la santé, comme par exemple dans certains syndromes génomiques et dans le cancer. Les technologies à haut-débit telle les puces à ADN permettent la détection de CNVs à l'échelle du génome humain. La cartographie en 2006 des CNV du génome humain, a suscité un fort intérêt en génétique des populations et en génétique médicale. La détection de différences au sein et entre plusieurs populations est un élément clef pour élucider la contribution possible des CNVs dans les maladies. Toutefois l'analyse du génome reste une tâche difficile, la technologie évolue très rapidement créant de nouveaux besoins pour le développement d'outils, l'amélioration des précédents, et la comparaison des différentes méthodes. De plus, si le lien entre CNV et maladie a été établit, leur contribution précise n'est pas encore comprise. De même que les études sur la prédisposition aux maladies par des CNVs détectés dans la population générale n'ont pas encore été réalisées.Pendant mon doctorat, je me suis concentré sur trois axes principaux ayant attrait aux CNV. Dans le chapitre 3, je détaille mes travaux sur les méthodes d'analyses des puces à ADN. J'ai eu accès aux données du projet CoLaus, une étude de la population de Lausanne. Dans cette étude, le génome de plus de 6000 individus a été analysé avec des puces SNP et de nombreuses informations cliniques ont été récoltées. Pendant mes travaux, j'ai utilisé et comparé plusieurs méthodes de détection des CNVs. Les résultats n'étant pas complètement satisfaisant, j'ai implémenté ma propre méthode qui donne de meilleures performances que deux des trois autres méthodes utilisées. Je me suis aussi intéressé aux stratégies pour combiner les CNVs de différents individus en régions.Je me suis aussi intéressé à l'impact clinique des CNVs dans le cas des maladies génétiques communes (chapitre 4). Ce projet fut possible grâce à une étroite collaboration avec le Centre Hospitalier Universitaire Vaudois (CHUV) et l'Impérial College à Londres. Dans ce projet, j'ai été l'un des analystes principaux et j'ai travaillé sur l'impact clinique d'une délétion rare du chromosome 16p11 présente chez des patients atteints d'obésité. Dans cette collaboration multidisciplinaire, nous avons comparés 8'456 patients atteint d'obésité et 11 '856 individus de la population générale. Nous avons trouvés que la délétion était impliquée dans 0.7% des cas d'obésité morbide et était absente chez les contrôles sains (non-atteint d'obésité). Notre étude illustre l'importance des CNVs rares qui peuvent avoir un impact clinique très important. De plus, ceci permet d'envisager une alternative aux études d'associations pour améliorer notre compréhension de l'étiologie des maladies génétiques communes.Egalement, j'ai travaillé sur la détection d'altérations somatiques en nombres de copies (SCNA) et de leurs conséquences pour le cancer (chapitre 5). Ce projet fut une collaboration initiée par l'Institut Ludwig de Recherche contre le Cancer et impliquant l'Institut Suisse de Bioinformatique, le CHUV et les Universités de Lausanne et Genève. Je me suis concentré sur l'identification de gènes affectés par des SCNAs et avec une sur- ou sous-expression dans des lignées cellulaires dérivées de mélanomes métastatiques. Les données utilisées ont été générées par des puces ADN (CGH et SNP) et du séquençage à haut débit du transcriptome. Mes recherches ont montrées que peu de gènes sont récurrents entre les mélanomes, ce qui rend difficile l'interprétation des résultats. Pour contourner ces limitations, j'ai utilisé une analyse de réseaux pour définir si des réseaux de signalisations enrichis en gènes amplifiés ou perdus, étaient communs aux différents échantillons. En fait, parmi les 28 réseaux détectés, quatre réseaux sont potentiellement dérégulés chez six mélanomes, et six réseaux supplémentaires sont affectés chez cinq mélanomes. La validation de ces résultats avec deux larges jeux de données publiques, a confirmée tous ces réseaux sauf trois. Ceci démontre l'utilité de cette approche pour l'analyse de petits et de larges jeux de données.Résumé grand publicL'avènement de la biologie moléculaire, en particulier ces dix dernières années, a révolutionné la recherche en génétique médicale. Grâce à la disponibilité du génome humain de référence dès 2001, de nouvelles technologies telles que les puces à ADN sont apparues et ont permis d'étudier le génome dans son ensemble avec une résolution dite sous-microscopique jusque-là impossible par les techniques traditionnelles de cytogénétique. Un des exemples les plus importants est l'étude des variations structurales du génome, en particulier l'étude du nombre de copies des gènes. Il était établi dès 1959 avec l'identification de la trisomie 21 par le professeur Jérôme Lejeune que le gain d'un chromosome supplémentaire était à l'origine de syndrome génétique avec des répercussions graves pour la santé du patient. Ces observations ont également été réalisées en oncologie sur les cellules cancéreuses qui accumulent fréquemment des aberrations en nombre de copies (telles que la perte ou le gain d'un ou plusieurs chromosomes). Dès 2004, plusieurs groupes de recherches ont répertorié des changements en nombre de copies dans des individus provenant de la population générale (c'est-à-dire sans symptômes cliniques visibles). En 2006, le Dr. Richard Redon a établi la première carte de variation en nombre de copies dans la population générale. Ces découvertes ont démontrées que les variations dans le génome était fréquentes et que la plupart d'entre elles étaient bénignes, c'est-à-dire sans conséquence clinique pour la santé de l'individu. Ceci a suscité un très grand intérêt pour comprendre les variations naturelles entre individus mais aussi pour mieux appréhender la prédisposition génétique à certaines maladies.Lors de ma thèse, j'ai développé de nouveaux outils informatiques pour l'analyse de puces à ADN dans le but de cartographier ces variations à l'échelle génomique. J'ai utilisé ces outils pour établir les variations dans la population suisse et je me suis consacré par la suite à l'étude de facteurs pouvant expliquer la prédisposition aux maladies telles que l'obésité. Cette étude en collaboration avec le Centre Hospitalier Universitaire Vaudois a permis l'identification d'une délétion sur le chromosome 16 expliquant 0.7% des cas d'obésité morbide. Cette étude a plusieurs répercussions. Tout d'abord elle permet d'effectuer le diagnostique chez les enfants à naître afin de déterminer leur prédisposition à l'obésité. Ensuite ce locus implique une vingtaine de gènes. Ceci permet de formuler de nouvelles hypothèses de travail et d'orienter la recherche afin d'améliorer notre compréhension de la maladie et l'espoir de découvrir un nouveau traitement Enfin notre étude fournit une alternative aux études d'association génétique qui n'ont eu jusqu'à présent qu'un succès mitigé.Dans la dernière partie de ma thèse, je me suis intéressé à l'analyse des aberrations en nombre de copies dans le cancer. Mon choix s'est porté sur l'étude de mélanomes, impliqués dans le cancer de la peau. Le mélanome est une tumeur très agressive, elle est responsable de 80% des décès des cancers de la peau et est souvent résistante aux traitements utilisés en oncologie (chimiothérapie, radiothérapie). Dans le cadre d'une collaboration entre l'Institut Ludwig de Recherche contre le Cancer, l'Institut Suisse de Bioinformatique, le CHUV et les universités de Lausanne et Genève, nous avons séquencés l'exome (les gènes) et le transcriptome (l'expression des gènes) de sept mélanomes métastatiques, effectués des analyses du nombre de copies par des puces à ADN et des caryotypes. Mes travaux ont permis le développement de nouvelles méthodes d'analyses adaptées au cancer, d'établir la liste des réseaux de signalisation cellulaire affectés de façon récurrente chez le mélanome et d'identifier deux cibles thérapeutiques potentielles jusqu'alors ignorées dans les cancers de la peau.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

BACKGROUND: Modern sequencing technologies have massively increased the amount of data available for comparative genomics. Whole-transcriptome shotgun sequencing (RNA-seq) provides a powerful basis for comparative studies. In particular, this approach holds great promise for emerging model species in fields such as evolutionary developmental biology (evo-devo). RESULTS: We have sequenced early embryonic transcriptomes of two non-drosophilid dipteran species: the moth midge Clogmia albipunctata, and the scuttle fly Megaselia abdita. Our analysis includes a third, published, transcriptome for the hoverfly Episyrphus balteatus. These emerging models for comparative developmental studies close an important phylogenetic gap between Drosophila melanogaster and other insect model systems. In this paper, we provide a comparative analysis of early embryonic transcriptomes across species, and use our data for a phylogenomic re-evaluation of dipteran phylogenetic relationships. CONCLUSIONS: We show how comparative transcriptomics can be used to create useful resources for evo-devo, and to investigate phylogenetic relationships. Our results demonstrate that de novo assembly of short (Illumina) reads yields high-quality, high-coverage transcriptomic data sets. We use these data to investigate deep dipteran phylogenetic relationships. Our results, based on a concatenation of 160 orthologous genes, provide support for the traditional view of Clogmia being the sister group of Brachycera (Megaselia, Episyrphus, Drosophila), rather than that of Culicomorpha (which includes mosquitoes and blackflies).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background MicroRNAs (miRNAs) are short non-coding regulatory RNAs that control gene expression usually producing translational repression and gene silencing. High-throughput sequencing technologies have revealed heterogeneity at length and sequence level for the majority of mature miRNAs (IsomiRs). Most isomiRs can be explained by variability in either Dicer1 or Drosha cleavage during miRNA biogenesis at 5" or 3" of the miRNA (trimming variants). Although isomiRs have been described in different tissues and organisms, their functional validation as modulators of gene expression remains elusive. Here we have characterized the expression and function of a highly abundant miR-101 5"-trimming variant (5"-isomiR-101). Results The analysis of small RNA sequencing data in several human tissues and cell lines indicates that 5"-isomiR-101 is ubiquitously detected and a highly abundant, especially in the brain. 5"- isomiR-101 was found in Ago-2 immunocomplexes and complementary approaches showed that 5"-isomiR-101 interacted with different members of the silencing (RISC) complex. In addition, 5"-isomiR-101 decreased the expression of five validated miR-101 targets, suggesting that it is a functional variant. Both the binding to RISC members and the degree of silencing were less efficient for 5"-isomiR-101 compared with miR-101. For some targets, both miR-101 and 5"-isomiR-101 significantly decreased protein expression with no changes in the respective mRNA levels. Although a high number of overlapping predicted targets suggest similar targeted biological pathways, a correlation analysis of the expression profiles of miR-101 variants and predicted mRNA targets in human brains at different ages, suggest specific functions for miR-101- and 5"-isomiR-101. Conclusions These results suggest that isomiRs are functional variants and further indicate that for a given miRNA, the different isomiRs may contribute to the overall effect as quantitative and qualitative fine-tuners of gene expression.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

UNLABELLED: In vivo transcriptional analyses of microbial pathogens are often hampered by low proportions of pathogen biomass in host organs, hindering the coverage of full pathogen transcriptome. We aimed to address the transcriptome profiles of Candida albicans, the most prevalent fungal pathogen in systemically infected immunocompromised patients, during systemic infection in different hosts. We developed a strategy for high-resolution quantitative analysis of the C. albicans transcriptome directly from early and late stages of systemic infection in two different host models, mouse and the insect Galleria mellonella. Our results show that transcriptome sequencing (RNA-seq) libraries were enriched for fungal transcripts up to 1,600-fold using biotinylated bait probes to capture C. albicans sequences. This enrichment biased the read counts of only ~3% of the genes, which can be identified and removed based on a priori criteria. This allowed an unprecedented resolution of C. albicans transcriptome in vivo, with detection of over 86% of its genes. The transcriptional response of the fungus was surprisingly similar during infection of the two hosts and at the two time points, although some host- and time point-specific genes could be identified. Genes that were highly induced during infection were involved, for instance, in stress response, adhesion, iron acquisition, and biofilm formation. Of the in vivo-regulated genes, 10% are still of unknown function, and their future study will be of great interest. The fungal RNA enrichment procedure used here will help a better characterization of the C. albicans response in infected hosts and may be applied to other microbial pathogens. IMPORTANCE: Understanding the mechanisms utilized by pathogens to infect and cause disease in their hosts is crucial for rational drug development. Transcriptomic studies may help investigations of these mechanisms by determining which genes are expressed specifically during infection. This task has been difficult so far, since the proportion of microbial biomass in infected tissues is often extremely low, thus limiting the depth of sequencing and comprehensive transcriptome analysis. Here, we adapted a technology to capture and enrich C. albicans RNA, which was next used for deep RNA sequencing directly from infected tissues from two different host organisms. The high-resolution transcriptome revealed a large number of genes that were so far unknown to participate in infection, which will likely constitute a focus of study in the future. More importantly, this method may be adapted to perform transcript profiling of any other microbes during host infection or colonization.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

L'ARN polymérase 3 transcrit un petit groupe de gènes fortement exprimés et impliqués dans plusieurs mécanismes moléculaires. Les ARNs de transfert ou ARNt représentent plus ou moins la moitié du transcriptome de l'ARN polymérase 3. Ils sont directement impliqués dans la traduction des protéines en agissant comme transporteurs d'acides aminés qui sont incorporés à la chaîne naissante de polypeptides. Chez des levures cultivées dans un milieu jusqu'à épuisement des nutriments, Maf1 réprime la transcription par l'ARN polymérase 3, favorisant ainsi l'économie énergétique cellulaire. Dans un modèle de cellules de mammifères, MAF1 réprime aussi la transcription de l'ARN polymérase 3 dans des conditions de stress, cependant il n'existe aucune donnée quant à son rôle chez un mammifère vivant. Pendant mon doctorat, j'ai utilisé une souris délétée pour le gène Maf1 afin de connaître les effets de ce gène chez un mammifère. Etonnamment, la souris Maf1-­‐/-­‐ est résistante à l'obésité même si celle-­‐ci est nourrie avec une nourriture riche en matières grasses. Des études moléculaires et de métabolomiques ont montré qu'il existe des cycles futiles de production et dégradation des lipides et des ARNt, ce qui entraîne une augmentation de la dépense énergique et favorise la résistance à l'obésité. En plus de la caractérisation de la souris Maf1-­‐/-­‐, pendant ma thèse j'ai également développé une méthode afin de normaliser les données de ChIP-­‐sequencing. Cette méthode est fondée sur l'utilisation d'un contrôle interne, représenté ici par l'ajout d'une quantité fixe de chromatine provenant d'un organisme différent de celui étudié. La méthode a amélioré considérablement la reproductibilité des valeurs entre réplicas biologiques. Elle a aussi révélé des différences entre échantillons issus de conditions différentes. Une occupation supérieure de l'ARN polymérase 3 sur les gènes Pol 3 chez les souris Maf1 KO entraîne une augmentation du niveau de précurseurs d'ARNt, ayant pour effet probable la saturation de la machinerie de maturation des ARNt. En effet, chez les souris Maf1 KO, le pourcentage d'ARNt modifiés est plus faible que chez les souris type sauvage. Ce déséquilibre entre le niveau de précurseurs et d'ARNt matures entraîne une diminution de la traduction protéique. Ces résultats ont permis d'identifier de nouvelles fonctions pour la protéine MAF1, comme étant une protéine régulatrice à la fois de la transcription mais aussi de la traduction et en étant un cible potentielle au traitement à l'obésité. -- RNA polymerase III (Pol 3) transcribes a small set of highly expressed genes involved in different molecular mechanisms. tRNAs account for almost half of the Pol 3 transcriptome and are involved in translation, bringing a new amino into the nascent polypeptide chain. In yeast, under nutrient deprivation, Maf1 acts for cell energetic economy by repressing Pol 3 transcription. In mammalian cells, MAF1 also represses Pol 3 activity under conditions of serum deprivation or DNA damages but nothing is known about its role in a mammalian organism. During my thesis studies, I used a Maf1 KO mouse model to characterize the effects of Maf1 deletion in a living animal. Surprisingly, the MAF1 KO mouse developed an unexpected phenotype, being resistant to high fat diet-­‐induced obesity and displaying an extended lifespan. Molecular and metabolomics characterizations revealed futile cycles of lipids and tRNAs, which are produced and immediately degraded, which increases energy consumption in the Maf1 KO mouse and probably explains in part the protection to obesity. Additionally to the mouse characterization, I also developed a method to normalize ChIP-­‐seq data, based on the addition of a foreign chromatin to be used as an internal control. The method improved reproducibility between replicates and revealed differences of Pol 3 occupancy between WT and Maf1 KO samples that were not seen without normalization to the internal control. I then established that increased Pol 3 occupancy in the Maf1 KO mouse liver was associated with increased levels of tRNA precursor but not of mature tRNAs, the effective molecules involved in translation. The overproduction of precursor tRNAs associated with the deletion of Maf1 apparently overwhelms the tRNA processing machinery as the Maf1 KO mice have lower levels of fully modified tRNAs. This maturation defect directly impacts on translation efficiency as polysomic fractions and newly synthetized protein levels were reduced in the liver of the Maf1 KO mouse. Altogether, these results indicate new functions for MAF1, a regulator of both transcription and translation as well as a potential target for obesity treatment.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

La méthode ChIP-seq est une technologie combinant la technique de chromatine immunoprecipitation avec le séquençage haut-débit et permettant l’analyse in vivo des facteurs de transcription à grande échelle. Le traitement des grandes quantités de données ainsi générées nécessite des moyens informatiques performants et de nombreux outils ont vu le jour récemment. Reste cependant que cette multiplication des logiciels réalisant chacun une étape de l’analyse engendre des problèmes de compatibilité et complique les analyses. Il existe ainsi un besoin important pour une suite de logiciels performante et flexible permettant l’identification des motifs. Nous proposons ici un ensemble complet d’analyse de données ChIP-seq disponible librement dans R et composé de trois modules PICS, rGADEM et MotIV. A travers l’analyse de quatre jeux de données des facteurs de transcription CTCF, STAT1, FOXA1 et ER nous avons démontré l’efficacité de notre ensemble d’analyse et mis en avant les fonctionnalités novatrices de celui-ci, notamment concernant le traitement des résultats par MotIV conduisant à la découverte de motifs non détectés par les autres algorithmes.