915 resultados para RNA-seq


Relevância:

60.00% 60.00%

Publicador:

Resumo:

RESUMO: Actualmente, a única possibilidade de cura para doentes com adenocarcinoma do pâncreas (PDAC) é a ressecção cirúrgica, no início deste estudo, perguntamo-nos se os predictores clínico-patológicos clássicos de prognostico poderiam ser validados em uma grande cohort de doentes com cancro do pâncreas ressecável e se outros predictores clínicos poderiam ter um papel na decisão de que doentes beneficiariam de ressecção cirúrgica. No capítulo 2, observamos que até 30% dos doentes morrem no primeiro ano após a ressecção cirúrgica, pelo que o nosso objectivo foi determinar factores pré-operatórios que se correlacionam com mortalidade precoce após ressecação cirúrgica com recurso a um instrumento estatisticamente validado, o Charlson-Age Comorbidity Index (CACI), determinamos que um CACI score superior a 4 foi preditivo de internamentos prolongados (p <0,001), complicações pós-operatórias (p = 0,042), e mortalidade em 1 ano pós- ressecção cirúrgica (p <0,001). Um CACI superior a 6 triplicou a mortalidade no primeiro ano pós-cirurgia e estes doentes têm menos de 50% de probabilidade de estarem vivos um ano após a cirurgia. No capítulo 3, o nosso objectivo foi identificar uma proteína de superfície que se correlacionasse estatisticamente com o prognostico de doentes com adenocarcinoma do pâncreas e permitisse a distinção de subgrupos de doentes de acordo com as suas diferenças moleculares, perguntamo-nos ainda se essa proteína poderia ser um marcador de células-estaminais. No nosso trabalho anterior observamos que as células tumorais na circulação sanguínea apresentavam genes com características bifenotípica epitelial e mesenquimal, enriquecimento para genes de células estaminais (ALDH1A1 / ALDH1A2 e KLF4), e uma super-expressão de genes da matriz extracelular (colagénios, SPARC, e DCN) normalmente identificados no estroma de PDAC. Após a avaliação dos tumores primários com RNA-ISH, muitos dos genes identificados, foram encontrados co-localizando em uma sub-população de células na região basal dos ductos pancreáticos malignos. Além disso, observamos que estas células expressam o marcador SV2A neuroendócrino, e o marcador de células estaminais ALDH1A1/2. Em comparação com tumores negativos para SV2, os doentes com tumores SV2 positivos apresentaram níveis mais baixos de CA 19-9 (69% vs. 52%, p = 0,012), tumores maiores (> 4 cm, 23% vs. 10%, p = 0,0430), menor invasão de gânglios linfáticos (69% vs. 86%, p = 0,005) e tumores mais diferenciados (69% vs. 57%, p = 0,047). A presença de SV2A foi associada com uma sobrevida livre de doença mais longa (HR: 0,49 p = 0,009) bem como melhor sobrevida global (HR: 0,54 p = 0,018). Em conjunto, esta informação aponta para dois subtipos diferentes de adenocarcinoma do pâncreas, e estes subtipos co-relacionam estatisticamente com o prognostico de doentes, sendo este subgrupo definido pela presença do clone celular SV2A / ALDH1A1/2 positivo com características neuroendócrinas. No Capítulo 4, a expressão de SV2A no cancro do pâncreas foi validado em linhas celulares primárias. Demonstramos a heterogeneidade do adenocarcinoma do pâncreas de acordo com características clonais neuroendócrinas. Ao comparar as linhas celulares expressando SV2 com linhas celulares negativas, verificamos que as linhas celulares SV2+ eram mais diferenciadas, diferindo de linhas celulares SV2 negativas no que respeita a mutação KRAS, proliferação e a resposta à quimioterapia. No capítulo 5, perguntamo-nos se o clone celular SV2 positivo poderia explicar a resistência a quimioterapia observada em doentes. Observamos um aumento absoluto de clones celulares expressando SV2A, em múltiplas linhas de evidência - doentes, linhas de células primárias e xenotransplantes. Embora, tenhamos sido capazes de demonstrar que o adenocarcinoma do pâncreas é uma doença heterogénea, consideramos que a caracterização genética destes clones celulares expressando SV2A é de elevada importância. Pretendemos colmatar esta limitação com as seguintes estratégias: Após o tratamento com quimioterapia neoadjuvante na nossa coorte, realizamos microdissecação a laser das amostras primarias em parafina, de forma a analisar mutações genéticas observadas no adenocarcinoma pancreático; em segundo lugar, pretendemos determinar consequências de knockdown da expressão de SV2A em nossas linhas celulares seguindo-se o tratamento com gemicitabina para determinação do papel funcional de SV2A; finalmente, uma vez que os nossos esforços anteriores com um promotor - repórter e SmartFlare ™ falharam, o próximo passo será realizar RNA-ISH PrimeFlow™ seguido de FACS e RNA-seq para caracterização deste clone celular. Em conjunto, conseguimos provar com várias linhas de evidência, que o adenocarcinoma pancreático é uma doença heterogénea, definido por um clone de células que expressam SV2A, com características neuroendócrinas. A presença deste clone no tecido de doentes correlaciona-se estatisticamente com o prognostico da doença, incluindo sobrevida livre de doença e sobrevida global. Juntamente com padrões de proliferação e co-expressão de ALDH1A1/2, este clone parece apresentar um comportamento de células estaminais e está associado a resistência a quimioterapia, uma vez que a sua expressão aumenta após agressão química, quer em doentes, quer em linhas de células primárias.----------------------------- ABSTRACT: Currently, the only chance of cure for patients with pancreatic adenocarcinoma is surgical resection, at the beginning of my thesis studies, we asked if the classical clinicopathologic predictors of outcome could be validated in a large cohort of patients with early stage pancreatic cancer and if other clinical predictors could have a role on deciding which patients would benefit from surgery. In chapter 2, we found that up to 30% of patients die within the first year after curative intent surgery for pancreatic adenocarcinoma. We aimed at determining pre-operative factors that would correlate with early mortality following resection for pancreatic cancer using a statistically validated tool, the Charlson-Age Comorbidity Index (CACI). We found that a CACI score greater than 4 was predictive of increased length of stay (p<0.001), post-operative complications (p=0.042), and mortality within 1-year of pancreatic resection (p<0.001). A CACI score of 6 or greater increased 3-fold the odds of death within the first year. Patients with a high CACI score have less than 50% likelihood of being alive 1 year after surgery. In chapter 3 we aimed at identifying a surface protein that correlates with patient’s outcome and distinguishes sub-groups of patients according to their molecular differences and if this protein could be a cancer stem cell marker. The most abundant class of circulating tumor cells identified in our previous work was found to have biphenotypic features of epithelial to mesenchymal transition, enrichment for stem-cell associated genes (ALDH1A1/ALDH1A2 and KLF4), and an overexpression of extracellular matrix genes (Collagens, SPARC, and DCN) normally found in the stromal microenvironment of PDAC primary tumors. Upon evaluation of matched primary tumors with RNA-ISH, many of the genes identified were found to co-localize in a sub-population of cells at the basal region of malignant pancreatic ducts. In addition, these cells expressed the neuroendocrine marker SV2A, and the stem cell marker ALDH1A1/2. Compared to SV2 negative tumors, patients with SV2 positive tumors were more likely to present with lower CA 19-9 (69% vs. 52%, p = 0.012), bigger tumors (size > 4 cm, 23% vs. 10%, p= 0.0430), less nodal involvement (69% vs. 86%, p = 0.005) and lower histologic grade (69% vs. 57%, p = 0.047). The presence of SV2A expressing cells was associated with an improved disease free survival (HR: 0.49 p=0.009) and overall survival (HR: 0.54 p=0.018) and correlated linearly with ALDH1A2. Together, this information points to two different sub-types of pancreatic adenocarcinoma, and these sub-types correlated with patients’ outcome and were defined by the presence of a SV2A/ ALDH1A1/2 expressing clone with neuroendocrine features. In Chapter 4, SV2A expression in cancer was validated in primary cell lines. We were able to demonstrate pancreatic adenocarcinoma heterogeneity according to neuroendocrine clonal features. When comparing SV2 expressing cell lines with SV2 negative cell lines, we found that SV2+ cell lines were more differentiated and differ from SV2 negative cell lines regarding KRAS mutation, proliferation and response to chemotherapy. In Chapter 5 we aimed at determining if this SV2 positive clone could explain chemoresistance observed in patients. We found an absolute increase in SV2A expressing cells, with multiple lines of evidence, in patients, primary cell lines and xenografts. Although, we have been able to show evidence that pancreatic adenocarcinoma is a heterogeneous disease, our findings warrant further investigation. To further characterize SV2A expressing clones after treatment with neoadjuvant chemotherapy in our cohort, we have performed laser capture microdissection of the paraffin embedded tissue in this study and will analyze the tissue for known genetic mutations in pancreatic adenocarcinoma; secondly, we want to know what will happen after knocking down SV2A expression in our cell lines followed by treatment with gemcitabine to determine if SV2A is functionally important; finally, since our previous efforts with a promoter – reporter and SmartFlare™ have failed, we will utilize a novel PrimeFlow™ RNA-ISH assay followed by FACS and RNA sequencing to further characterize this cellular clone. Overall our data proves, with multiple lines of evidence, that pancreatic adenocarcinoma is a heterogeneous disease, defined by a clone of SV2A expressing cells, with neuroendocrine features. The presence of this clone in patients’ tissue correlates with patient’s disease free survival and overall survival. Together with patterns of proliferation and ALDH1A1/2 co-expression, this clone seems to present a stem-cell-like behavior and is associated with chemoresistance, since it increases after chemotherapy, both in patients and primary cell lines.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In mammalian circadian clockwork, the CLOCK-BMAL1 complex binds to DNA enhancers of target genes and drives circadian oscillation of transcription. Here we identified 7,978 CLOCK-binding sites in mouse liver by chromatin immunoprecipitation-sequencing (ChIP-Seq), and a newly developed bioinformatics method, motif centrality analysis of ChIP-Seq (MOCCS), revealed a genome-wide distribution of previously unappreciated noncanonical E-boxes targeted by CLOCK. In vitro promoter assays showed that CACGNG, CACGTT, and CATG(T/C)G are functional CLOCK-binding motifs. Furthermore, we extensively revealed rhythmically expressed genes by poly(A)-tailed RNA-Seq and identified 1,629 CLOCK target genes within 11,926 genes expressed in the liver. Our analysis also revealed rhythmically expressed genes that have no apparent CLOCK-binding site, indicating the importance of indirect transcriptional and posttranscriptional regulations. Indirect transcriptional regulation is represented by rhythmic expression of CLOCK-regulated transcription factors, such as Krüppel-like factors (KLFs). Indirect posttranscriptional regulation involves rhythmic microRNAs that were identified by small-RNA-Seq. Collectively, CLOCK-dependent direct transactivation through multiple E-boxes and indirect regulations polyphonically orchestrate dynamic circadian outputs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

As part of the development of the database Bgee (a dataBase for Gene Expression Evolution), we annotate and analyse expression data from different types and different sources, notably Affymetrix data from GEO and ArrayExpress, and RNA-Seq data from SRA. During our quality control procedure, we have identified duplicated content in GEO and ArrayExpress, affecting ∼14% of our data: fully or partially duplicated experiments from independent data submissions, Affymetrix chips reused in several experiments, or reused within an experiment. We present here the procedure that we have established to filter such duplicates from Affymetrix data, and our procedure to identify future potential duplicates in RNA-Seq data. Database URL: http://bgee.unil.ch/

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Access to online repositories for genomic and associated "-omics" datasets is now an essential part of everyday research activity. It is important therefore that the Tuberculosis community is aware of the databases and tools available to them online, as well as for the database hosts to know what the needs of the research community are. One of the goals of the Tuberculosis Annotation Jamboree, held in Washington DC on March 7th-8th 2012, was therefore to provide an overview of the current status of three key Tuberculosis resources, TubercuList (tuberculist.epfl.ch), TB Database (www.tbdb.org), and Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org). Here we summarize some key updates and upcoming features in TubercuList, and provide an overview of the PATRIC site and its online tools for pathogen RNA-Seq analysis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Aleppo pine (Pinus halepensis Mill.) is a relevant conifer species for studying adaptive responses to drought and fire regimes in the Mediterranean region. In this study, we performed Illumina next-generation sequencing of two phenotypically divergent Aleppo pine accessions with the aims of (i) characterizing the transcriptome through Illumina RNA-Seq on trees phenotypically divergent for adaptive traits linked to fire adaptation and drought, (ii) performing a functional annotation of the assembled transcriptome, (iii) identifying genes with accelerated evolutionary rates, (iv) studying the expression levels of the annotated genes and (v) developing gene-based markers for population genomic and association genetic studies. The assembled transcriptome consisted of 48,629 contigs and covered about 54.6 Mbp. The comparison of Aleppo pine transcripts to Picea sitchensis protein-coding sequences resulted in the detection of 34,014 SNPs across species, with a Ka /Ks average value of 0.216, suggesting that the majority of the assembled genes are under negative selection. Several genes were differentially expressed across the two pine accessions with contrasted phenotypes, including a glutathione-s-transferase, a cellulose synthase and a cobra-like protein. A large number of new markers (3334 amplifiable SSRs and 28,236 SNPs) have been identified which should facilitate future population genomics and association genetics in this species. A 384-SNP Oligo Pool Assay for genotyping with the Illumina VeraCode technology has been designed which showed an high overall SNP conversion rate (76.6%). Our results showed that Illumina next-generation sequencing is a valuable technology to obtain an extensive overview on whole transcriptomes of nonmodel species with large genomes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Cancer genomes frequently contain somatic copy number alterations (SCNA) that can significantly perturb the expression level of affected genes and thus disrupt pathways controlling normal growth. In melanoma, many studies have focussed on the copy number and gene expression levels of the BRAF, PTEN and MITF genes, but little has been done to identify new genes using these parameters at the genome-wide scale. Using karyotyping, SNP and CGH arrays, and RNA-seq, we have identified SCNA affecting gene expression ('SCNA-genes') in seven human metastatic melanoma cell lines. We showed that the combination of these techniques is useful to identify candidate genes potentially involved in tumorigenesis. Since few of these alterations were recurrent across our samples, we used a protein network-guided approach to determine whether any pathways were enriched in SCNA-genes in one or more samples. From this unbiased genome-wide analysis, we identified 28 significantly enriched pathway modules. Comparison with two large, independent melanoma SCNA datasets showed less than 10% overlap at the individual gene level, but network-guided analysis revealed 66% shared pathways, including all but three of the pathways identified in our data. Frequently altered pathways included WNT, cadherin signalling, angiogenesis and melanogenesis. Additionally, our results emphasize the potential of the EPHA3 and FRS2 gene products, involved in angiogenesis and migration, as possible therapeutic targets in melanoma. Our study demonstrates the utility of network-guided approaches, for both large and small datasets, to identify pathways recurrently perturbed in cancer.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Epigenetic post-transcriptional modifications of histone tails are thought to help in coordinating gene expression during development. An epigenetic signature is set in pluripotent cells and interpreted later at the onset of differentiation. In pluripotent cells, epigenetic marks normally associated with active genes (H3K4me3) and with silent genes (H3K27me3) atypically co-occupy chromatin regions surrounding the promoters of important developmental genes. However, it is unclear how these epigenetic marks are recognized when cell differentiation starts and what precise role they play. Here, we report the essential role of the nuclear receptor peroxisome proliferator-activated receptor β (PPARβ, NR1C2) in Xenopus laevis early development. By combining loss-of-function approaches, large throughput transcript expression analysis by the mean of RNA-seq and intensive chromatin immunoprecipitation experiments, we unveil an important cooperation between epigenetic marks and PPARβ. During Xenopus laevis gastrulation PPARβ recognizes H3K27me3 marks that have been deposited earlier at the pluripotent stage to activate early differentiation genes. Thus, PPARβis the first identified transcription factor that interprets an epigenetic signature of pluripotency, in vivo, during embryonic development. This work paves the way for a better mechanistic understanding of how the activation of hundreds of genes is coordinated during early development.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Since the advent of high-throughput DNA sequencing technologies, the ever-increasing rate at which genomes have been published has generated new challenges notably at the level of genome annotation. Even if gene predictors and annotation softwares are more and more efficient, the ultimate validation is still in the observation of predicted gene product( s). Mass-spectrometry based proteomics provides the necessary high throughput technology to show evidences of protein presence and, from the identified sequences, confirmation or invalidation of predicted annotations. We review here different strategies used to perform a MS-based proteogenomics experiment with a bottom-up approach. We start from the strengths and weaknesses of the different database construction strategies, based on different genomic information (whole genome, ORF, cDNA, EST or RNA-Seq data), which are then used for matching mass spectra to peptides and proteins. We also review the important points to be considered for a correct statistical assessment of the peptide identifications. Finally, we provide references for tools used to map and visualize the peptide identifications back to the original genomic information.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

AbstractIn addition to genetic changes affecting the function of gene products, changes in gene expression have been suggested to underlie many or even most of the phenotypic differences among mammals. However, detailed gene expression comparisons were, until recently, restricted to closely related species, owing to technological limitations. Thus, we took advantage of the latest technologies (RNA-Seq) to generate extensive qualitative and quantitative transcriptome data for a unique collection of somatic and germline tissues from representatives of all major mammalian lineages (placental mammals, marsupials and monotremes) and birds, the evolutionary outgroup.In the first major project of my thesis, we performed global comparative analyses of gene expression levels based on these data. Our analyses provided fundamental insights into the dynamics of transcriptome change during mammalian evolution (e.g., the rate of expression change across species, tissues and chromosomes) and allowed the exploration of the functional relevance and phenotypic implications of transcription changes at a genome-wide scale (e.g., we identified numerous potentially selectively driven expression switches).In a second project of my thesis, which was also based on the unique transcriptome data generated in the context of the first project we focused on the evolution of alternative splicing in mammals. Alternative splicing contributes to transcriptome complexity by generating several transcript isoforms from a single gene, which can, thus, perform various functions. To complete the global comparative analysis of gene expression changes, we explored patterns of alternative splicing evolution. This work uncovered several general and unexpected patterns of alternative splicing evolution (e.g., we found that alternative splicing evolves extremely rapidly) as well as a large number of conserved alternative isoforms that may be crucial for the functioning of mammalian organs.Finally, the third and final project of my PhD consisted in analyzing in detail the unique functional and evolutionary properties of the testis by exploring the extent of its transcriptome complexity. This organ was previously shown to evolve rapidly both at the phenotypic and molecular level, apparently because of the specific pressures that act on this organ and are associated with its reproductive function. Moreover, my analyses of the amniote tissue transcriptome data described above, revealed strikingly widespread transcriptional activity of both functional and nonfunctional genomic elements in the testis compared to the other organs. To elucidate the cellular source and mechanisms underlying this promiscuous transcription in the testis, we generated deep coverage RNA-Seq data for all major testis cell types as well as epigenetic data (DNA and histone methylation) using the mouse as model system. The integration of these complete dataset revealed that meiotic and especially post-meiotic germ cells are the major contributors to the widespread functional and nonfunctional transcriptome complexity of the testis, and that this "promiscuous" spermatogenic transcription is resulting, at least partially, from an overall transcriptionally permissive chromatin state. We hypothesize that this particular open state of the chromatin results from the extensive chromatin remodeling that occurs during spermatogenesis which ultimately leads to the replacement of histones by protamines in the mature spermatozoa. Our results have important functional and evolutionary implications (e.g., regarding new gene birth and testicular gene expression evolution).Generally, these three large-scale projects of my thesis provide complete and massive datasets that constitute valuables resources for further functional and evolutionary analyses of mammalian genomes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In addition to differences in protein-coding gene sequences, changes in expression resulting from mutations in regulatory sequences have long been hypothesized to be responsible for phenotypic differences between species. However, unlike comparison of genome sequences, few studies, generally restricted to pairwise comparisons of closely related mammalian species, have assessed between-species differences at the transcriptome level. They reported that gene expression evolves at different rates in various organs and in a pattern that is overall consistent with neutral models of evolution. In the first part of my thesis, I investigated the evolution of gene expression in therian mammals (i.e.7 placental and marsupials), based on microarray data from human, mouse and the gray short-tailed opossum (Monodelphis domestica). In addition to autosomal genes, a special focus was given to the evolution of X-linked genes. The therian X chromosome was recently shown to be younger than previously thought and to harbor a specific gene content (e.g., genes involved in brain or reproductive functions) that is thought to have been shaped by specific sex-related evolutionary forces. Sex chromosomes derive from ordinary autosomes and their differentiation led to the degeneration of the Y chromosome (in mammals) or W chromosome (in birds). Consequently, X- or Z-linked genes differ in gene dose between males and females such that the heterogametic sex has half the X/Z gene dose compared to the ancestral state. To cope with this dosage imbalance, mammals have been reported to have evolved mechanisms of dosage compensation.¦In the first project, I could first show that transcriptomes evolve at different rates in different organs. Out of the five tissues I investigated, the testis is the most rapidly evolving organ at the gene expression level while the brain has the most conserved transcriptome. Second, my analyses revealed that mammalian gene expression evolution is compatible with a neutral model, where the rates of change in gene expression levels is linked to the efficiency of purifying selection in a given lineage, which, in turn, is determined by the long-term effective population size in that lineage. Thus, the rate of DNA sequence evolution, which could be expected to determine the rate of regulatory sequence change, does not seem to be a major determinant of the rate of gene expression evolution. Thus, most gene expression changes seem to be (slightly) deleterious. Finally, X-linked genes seem to have experienced elevated rates of gene expression change during the early stage of X evolution. To further investigate the evolution of mammalian gene expression, we generated an extensive RNA-Seq gene expression dataset for nine mammalian species and a bird. The analyses of this dataset confirmed the patterns previously observed with microarrays and helped to significantly deepen our view on gene expression evolution.¦In a specific project based on these data, I sought to assess in detail patterns of evolution of dosage compensation in amniotes. My analyses revealed the absence of male to female dosage compensation in monotremes and its presence in marsupials and, in addition, confirmed patterns previously described for placental mammals and birds. I then assessed the global level of expression of X/Z chromosomes and contrasted this with its ancestral gene expression levels estimated from orthologous autosomal genes in species with non-homologous sex chromosomes. This analysis revealed a lack of up-regulation for placental mammals, the level of expression of X-linked genes being proportional to gene dose. Interestingly, the ancestral gene expression level was at least partially restored in marsupials as well as in the heterogametic sex of monotremes and birds. Finally, I investigated alternative mechanisms of dosage compensation and found that gene duplication did not seem to be a widespread mechanism to restore the ancestral gene dose. However, I could show that placental mammals have preferentially down-regulated autosomal genes interacting with X-linked genes which underwent gene expression decrease, and thus identified a novel alternative mechanism of dosage compensation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Understanding the extent of genomic transcription and its functional relevance is a central goal in genomics research. However, detailed genome-wide investigations of transcriptome complexity in major mammalian organs have been scarce. Here, using extensive RNA-seq data, we show that transcription of the genome is substantially more widespread in the testis than in other organs across representative mammals. Furthermore, we reveal that meiotic spermatocytes and especially postmeiotic round spermatids have remarkably diverse transcriptomes, which explains the high transcriptome complexity of the testis as a whole. The widespread transcriptional activity in spermatocytes and spermatids encompasses protein-coding and long noncoding RNA genes but also poorly conserves intergenic sequences, suggesting that it may not be of immediate functional relevance. Rather, our analyses of genome-wide epigenetic data suggest that this prevalent transcription, which most likely promoted the birth of new genes during evolution, is facilitated by an overall permissive chromatin in these germ cells that results from extensive chromatin remodeling.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND: There is an ever-increasing volume of data on host genes that are modulated during HIV infection, influence disease susceptibility or carry genetic variants that impact HIV infection. We created GuavaH (Genomic Utility for Association and Viral Analyses in HIV, http://www.GuavaH.org), a public resource that supports multipurpose analysis of genome-wide genetic variation and gene expression profile across multiple phenotypes relevant to HIV biology. FINDINGS: We included original data from 8 genome and transcriptome studies addressing viral and host responses in and ex vivo. These studies cover phenotypes such as HIV acquisition, plasma viral load, disease progression, viral replication cycle, latency and viral-host genome interaction. This represents genome-wide association data from more than 4,000 individuals, exome sequencing data from 392 individuals, in vivo transcriptome microarray data from 127 patients/conditions, and 60 sets of RNA-seq data. Additionally, GuavaH allows visualization of protein variation in ~8,000 individuals from the general population. The publicly available GuavaH framework supports queries on (i) unique single nucleotide polymorphism across different HIV related phenotypes, (ii) gene structure and variation, (iii) in vivo gene expression in the setting of human infection (CD4+ T cells), and (iv) in vitro gene expression data in models of permissive infection, latency and reactivation. CONCLUSIONS: The complexity of the analysis of host genetic influences on HIV biology and pathogenesis calls for comprehensive motors of research on curated data. The tool developed here allows queries and supports validation of the rapidly growing body of host genomic information pertinent to HIV research.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

AbstractAlthough the genomes from any two human individuals are more than 99.99% identical at the sequence level, some structural variation can be observed. Differences between genomes include single nucleotide polymorphism (SNP), inversion and copy number changes (gain or loss of DNA). The latter can range from submicroscopic events (CNVs, at least 1kb in size) to complete chromosomal aneuploidies. Small copy number variations have often no (lethal) consequences to the cell, but a few were associated to disease susceptibility and phenotypic variations. Larger re-arrangements (i.e. complete chromosome gain) are frequently associated with more severe consequences on health such as genomic disorders and cancer. High-throughput technologies like DNA microarrays enable the detection of CNVs in a genome-wide fashion. Since the initial catalogue of CNVs in the human genome in 2006, there has been tremendous interest in CNVs both in the context of population and medical genetics. Understanding CNV patterns within and between human populations is essential to elucidate their possible contribution to disease. But genome analysis is a challenging task; the technology evolves rapidly creating needs for novel, efficient and robust analytical tools which need to be compared with existing ones. Also, while the link between CNV and disease has been established, the relative CNV contribution is not fully understood and the predisposition to disease from CNVs of the general population has not been yet investigated.During my PhD thesis, I worked on several aspects related to CNVs. As l will report in chapter 3, ! was interested in computational methods to detect CNVs from the general population. I had access to the CoLaus dataset, a population-based study with more than 6,000 participants from the Lausanne area. All these individuals were analysed on SNP arrays and extensive clinical information were available. My work explored existing CNV detection methods and I developed a variety of metrics to compare their performance. Since these methods were not producing entirely satisfactory results, I implemented my own method which outperformed two existing methods. I also devised strategies to combine CNVs from different individuals into CNV regions.I was also interested in the clinical impact of CNVs in common disease (chapter 4). Through an international collaboration led by the Centre Hospitalier Universitaire Vaudois (CHUV) and the Imperial College London I was involved as a main data analyst in the investigation of a rare deletion at chromosome 16p11 detected in obese patients. Specifically, we compared 8,456 obese patients and 11,856 individuals from the general population and we found that the deletion was accounting for 0.7% of the morbid obesity cases and was absent in healthy non- obese controls. This highlights the importance of rare variants with strong impact and provides new insights in the design of clinical studies to identify the missing heritability in common disease.Furthermore, I was interested in the detection of somatic copy number alterations (SCNA) and their consequences in cancer (chapter 5). This project was a collaboration initiated by the Ludwig Institute for Cancer Research and involved other groups from the Swiss Institute of Bioinformatics, the CHUV and Universities of Lausanne and Geneva. The focus of my work was to identify genes with altered expression levels within somatic copy number alterations (SCNA) in seven metastatic melanoma ceil lines, using CGH and SNP arrays, RNA-seq, and karyotyping. Very few SCNA genes were shared by even two melanoma samples making it difficult to draw any conclusions at the individual gene level. To overcome this limitation, I used a network-guided analysis to determine whether any pathways, defined by amplified or deleted genes, were common among the samples. Six of the melanoma samples were potentially altered in four pathways and five samples harboured copy-number and expression changes in components of six pathways. In total, this approach identified 28 pathways. Validation with two external, large melanoma datasets confirmed all but three of the detected pathways and demonstrated the utility of network-guided approaches for both large and small datasets analysis.RésuméBien que le génome de deux individus soit similaire à plus de 99.99%, des différences de structure peuvent être observées. Ces différences incluent les polymorphismes simples de nucléotides, les inversions et les changements en nombre de copies (gain ou perte d'ADN). Ces derniers varient de petits événements dits sous-microscopiques (moins de 1kb en taille), appelés CNVs (copy number variants) jusqu'à des événements plus large pouvant affecter des chromosomes entiers. Les petites variations sont généralement sans conséquence pour la cellule, toutefois certaines ont été impliquées dans la prédisposition à certaines maladies, et à des variations phénotypiques dans la population générale. Les réarrangements plus grands (par exemple, une copie additionnelle d'un chromosome appelée communément trisomie) ont des répercutions plus grave pour la santé, comme par exemple dans certains syndromes génomiques et dans le cancer. Les technologies à haut-débit telle les puces à ADN permettent la détection de CNVs à l'échelle du génome humain. La cartographie en 2006 des CNV du génome humain, a suscité un fort intérêt en génétique des populations et en génétique médicale. La détection de différences au sein et entre plusieurs populations est un élément clef pour élucider la contribution possible des CNVs dans les maladies. Toutefois l'analyse du génome reste une tâche difficile, la technologie évolue très rapidement créant de nouveaux besoins pour le développement d'outils, l'amélioration des précédents, et la comparaison des différentes méthodes. De plus, si le lien entre CNV et maladie a été établit, leur contribution précise n'est pas encore comprise. De même que les études sur la prédisposition aux maladies par des CNVs détectés dans la population générale n'ont pas encore été réalisées.Pendant mon doctorat, je me suis concentré sur trois axes principaux ayant attrait aux CNV. Dans le chapitre 3, je détaille mes travaux sur les méthodes d'analyses des puces à ADN. J'ai eu accès aux données du projet CoLaus, une étude de la population de Lausanne. Dans cette étude, le génome de plus de 6000 individus a été analysé avec des puces SNP et de nombreuses informations cliniques ont été récoltées. Pendant mes travaux, j'ai utilisé et comparé plusieurs méthodes de détection des CNVs. Les résultats n'étant pas complètement satisfaisant, j'ai implémenté ma propre méthode qui donne de meilleures performances que deux des trois autres méthodes utilisées. Je me suis aussi intéressé aux stratégies pour combiner les CNVs de différents individus en régions.Je me suis aussi intéressé à l'impact clinique des CNVs dans le cas des maladies génétiques communes (chapitre 4). Ce projet fut possible grâce à une étroite collaboration avec le Centre Hospitalier Universitaire Vaudois (CHUV) et l'Impérial College à Londres. Dans ce projet, j'ai été l'un des analystes principaux et j'ai travaillé sur l'impact clinique d'une délétion rare du chromosome 16p11 présente chez des patients atteints d'obésité. Dans cette collaboration multidisciplinaire, nous avons comparés 8'456 patients atteint d'obésité et 11 '856 individus de la population générale. Nous avons trouvés que la délétion était impliquée dans 0.7% des cas d'obésité morbide et était absente chez les contrôles sains (non-atteint d'obésité). Notre étude illustre l'importance des CNVs rares qui peuvent avoir un impact clinique très important. De plus, ceci permet d'envisager une alternative aux études d'associations pour améliorer notre compréhension de l'étiologie des maladies génétiques communes.Egalement, j'ai travaillé sur la détection d'altérations somatiques en nombres de copies (SCNA) et de leurs conséquences pour le cancer (chapitre 5). Ce projet fut une collaboration initiée par l'Institut Ludwig de Recherche contre le Cancer et impliquant l'Institut Suisse de Bioinformatique, le CHUV et les Universités de Lausanne et Genève. Je me suis concentré sur l'identification de gènes affectés par des SCNAs et avec une sur- ou sous-expression dans des lignées cellulaires dérivées de mélanomes métastatiques. Les données utilisées ont été générées par des puces ADN (CGH et SNP) et du séquençage à haut débit du transcriptome. Mes recherches ont montrées que peu de gènes sont récurrents entre les mélanomes, ce qui rend difficile l'interprétation des résultats. Pour contourner ces limitations, j'ai utilisé une analyse de réseaux pour définir si des réseaux de signalisations enrichis en gènes amplifiés ou perdus, étaient communs aux différents échantillons. En fait, parmi les 28 réseaux détectés, quatre réseaux sont potentiellement dérégulés chez six mélanomes, et six réseaux supplémentaires sont affectés chez cinq mélanomes. La validation de ces résultats avec deux larges jeux de données publiques, a confirmée tous ces réseaux sauf trois. Ceci démontre l'utilité de cette approche pour l'analyse de petits et de larges jeux de données.Résumé grand publicL'avènement de la biologie moléculaire, en particulier ces dix dernières années, a révolutionné la recherche en génétique médicale. Grâce à la disponibilité du génome humain de référence dès 2001, de nouvelles technologies telles que les puces à ADN sont apparues et ont permis d'étudier le génome dans son ensemble avec une résolution dite sous-microscopique jusque-là impossible par les techniques traditionnelles de cytogénétique. Un des exemples les plus importants est l'étude des variations structurales du génome, en particulier l'étude du nombre de copies des gènes. Il était établi dès 1959 avec l'identification de la trisomie 21 par le professeur Jérôme Lejeune que le gain d'un chromosome supplémentaire était à l'origine de syndrome génétique avec des répercussions graves pour la santé du patient. Ces observations ont également été réalisées en oncologie sur les cellules cancéreuses qui accumulent fréquemment des aberrations en nombre de copies (telles que la perte ou le gain d'un ou plusieurs chromosomes). Dès 2004, plusieurs groupes de recherches ont répertorié des changements en nombre de copies dans des individus provenant de la population générale (c'est-à-dire sans symptômes cliniques visibles). En 2006, le Dr. Richard Redon a établi la première carte de variation en nombre de copies dans la population générale. Ces découvertes ont démontrées que les variations dans le génome était fréquentes et que la plupart d'entre elles étaient bénignes, c'est-à-dire sans conséquence clinique pour la santé de l'individu. Ceci a suscité un très grand intérêt pour comprendre les variations naturelles entre individus mais aussi pour mieux appréhender la prédisposition génétique à certaines maladies.Lors de ma thèse, j'ai développé de nouveaux outils informatiques pour l'analyse de puces à ADN dans le but de cartographier ces variations à l'échelle génomique. J'ai utilisé ces outils pour établir les variations dans la population suisse et je me suis consacré par la suite à l'étude de facteurs pouvant expliquer la prédisposition aux maladies telles que l'obésité. Cette étude en collaboration avec le Centre Hospitalier Universitaire Vaudois a permis l'identification d'une délétion sur le chromosome 16 expliquant 0.7% des cas d'obésité morbide. Cette étude a plusieurs répercussions. Tout d'abord elle permet d'effectuer le diagnostique chez les enfants à naître afin de déterminer leur prédisposition à l'obésité. Ensuite ce locus implique une vingtaine de gènes. Ceci permet de formuler de nouvelles hypothèses de travail et d'orienter la recherche afin d'améliorer notre compréhension de la maladie et l'espoir de découvrir un nouveau traitement Enfin notre étude fournit une alternative aux études d'association génétique qui n'ont eu jusqu'à présent qu'un succès mitigé.Dans la dernière partie de ma thèse, je me suis intéressé à l'analyse des aberrations en nombre de copies dans le cancer. Mon choix s'est porté sur l'étude de mélanomes, impliqués dans le cancer de la peau. Le mélanome est une tumeur très agressive, elle est responsable de 80% des décès des cancers de la peau et est souvent résistante aux traitements utilisés en oncologie (chimiothérapie, radiothérapie). Dans le cadre d'une collaboration entre l'Institut Ludwig de Recherche contre le Cancer, l'Institut Suisse de Bioinformatique, le CHUV et les universités de Lausanne et Genève, nous avons séquencés l'exome (les gènes) et le transcriptome (l'expression des gènes) de sept mélanomes métastatiques, effectués des analyses du nombre de copies par des puces à ADN et des caryotypes. Mes travaux ont permis le développement de nouvelles méthodes d'analyses adaptées au cancer, d'établir la liste des réseaux de signalisation cellulaire affectés de façon récurrente chez le mélanome et d'identifier deux cibles thérapeutiques potentielles jusqu'alors ignorées dans les cancers de la peau.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fungi are a large group of eukaryotes found in nearly all ecosystems. More than 250 fungal genomes have already been sequenced, greatly improving our understanding of fungal evolution, physiology, and development. However, for the Pezizomycetes, an early-diverging lineage of filamentous ascomycetes, there is so far only one genome available, namely that of the black truffle, Tuber melanosporum, a mycorrhizal species with unusual subterranean fruiting bodies. To help close the sequence gap among basal filamentous ascomycetes, and to allow conclusions about the evolution of fungal development, we sequenced the genome and assayed transcriptomes during development of Pyronema confluens, a saprobic Pezizomycete with a typical apothecium as fruiting body. With a size of 50 Mb and ~13,400 protein-coding genes, the genome is more characteristic of higher filamentous ascomycetes than the large, repeat-rich truffle genome; however, some typical features are different in the P. confluens lineage, e.g. the genomic environment of the mating type genes that is conserved in higher filamentous ascomycetes, but only partly conserved in P. confluens. On the other hand, P. confluens has a full complement of fungal photoreceptors, and expression studies indicate that light perception might be similar to distantly related ascomycetes and, thus, represent a basic feature of filamentous ascomycetes. Analysis of spliced RNA-seq sequence reads allowed the detection of natural antisense transcripts for 281 genes. The P. confluens genome contains an unusually high number of predicted orphan genes, many of which are upregulated during sexual development, consistent with the idea of rapid evolution of sex-associated genes. Comparative transcriptomics identified the transcription factor gene pro44 that is upregulated during development in P. confluens and the Sordariomycete Sordaria macrospora. The P. confluens pro44 gene (PCON_06721) was used to complement the S. macrospora pro44 deletion mutant, showing functional conservation of this developmental regulator.