201 resultados para Human Genome Project.
em Université de Lausanne, Switzerland
Resumo:
2 Abstract2.1 En françaisLe séquençage du génome humain est un pré-requis fondamental à la compréhension de la biologie de l'être humain. Ce projet achevé, les scientifiques ont dû faire face à une tâche aussi importante, comprendre cette suite de 3 milliards de lettres qui compose notre génome. Le consortium ENCODE (ENCyclopedia Of Dna Elements) fût formé comme une suite logique au projet du génome humain. Son rôle est d'identifier tous les éléments fonctionnels de notre génome incluant les régions transcrites, les sites d'attachement des facteurs de transcription, les sites hypersensibles à la DNAse I ainsi que les marqueurs de modification des histones. Dans le cadre de ma thèse doctorale, j'ai participé à 2 sous-projets d'ENCODE. En premier lieu, j'ai eu la tâche de développer et d'optimiser une technique de validation expérimentale à haut rendement de modèles de gènes qui m'a permis d'estimer la qualité de la plus récente annotation manuelle. Ce nouveau processus de validation est bien plus efficace que la technique RNAseq qui est actuellement en train de devenir la norme. Cette technique basée sur la RT-PCR, m'a notamment permis de découvrir de nouveaux exons dans 10% des régions interrogées. En second lieu j'ai participé à une étude ayant pour but d'identifier les extrémités de tous les gènes des chromosomes humains 21 et 22. Cette étude à permis l'identification à large échelle de transcrits chimères comportant des séquences provenant de deux gènes distincts pouvant être à une grande distance l'un de autre.2.2 In EnglishThe completion of the human genome sequence js the prerequisite to fully understand the biology of human beings. This project achieved, scientists had to face another challenging task, understanding the meaning of the 3 billion letters composing this genome. As a logical continuation of the human genome project, the ENCODE (ENCyclopedia Of DNA Elements) consortium was formed with the aim of annotating all its functional elements. These elements include transcribed regions, transcription binding sites, DNAse I hypersensitive sites and histone modification marks. In the frame of my PhD thesis, I was involved in two sub-projects of ENCODE. Firstly I developed and optimized an high throughput method to validate gene models, which allowed me to assess the quality of the most recent manually-curated annotation. This novel experimental validation pipeline is extremely effective, far more so than transcriptome profiling through RNA sequencing, which is becoming the norm. This RT-PCR-seq targeted-approach is likewise particularly efficient in identifying novel exons, as we discovered about 10% of loci with unannotated exons. Secondly, I participated to a study aiming to identify the gene boundaries of all genes in the human chromosome 21 and 22. This study led to the identification of chimeric transcripts that are composed of sequences coming form two distinct genes that can be map far away from each other.
Resumo:
In recent years, analysis of the genomes of many organisms has received increasing international attention. The bulk of the effort to date has centred on the Human Genome Project and analysis of model organisms such as yeast, Drosophila and Caenorhabditis elegans. More recently, the revolution in genome sequencing and gene identification has begun to impact on infectious disease organisms. Initially, much of the effort was concentrated on prokaryotes, but small eukaryotic genomes, including the protozoan parasites Plasmodium, Toxoplasma and trypanosomatids (Leishmania, Trypanosoma brucei and T. cruzi), as well as some multicellular organisms, such as Brugia and Schistosoma, are benefiting from the technological advances of the genome era. These advances promise a radical new approach to the development of novel diagnostic tools, chemotherapeutic targets and vaccines for infectious disease organisms, as well as to the more detailed analysis of cell biology and function.Several networks or consortia linking laboratories around the world have been established to support these parasite genome projects[1] (for more information, see http://www.ebi.ac.uk/ parasites/paratable.html). Five of these networks were supported by an initiative launched in 1994 by the Specific Programme for Research and Tropical Diseases (TDR) of the WHO[2, 3, 4, 5, 6]. The Leishmania Genome Network (LGN) is one of these[3]. Its activities are reported at http://www.ebi.ac.uk/parasites/leish.html, and its current aim is to map and sequence the genome of Leishmania by the year 2002. All the mapping, hybridization and sequence data are also publicly available from LeishDB, an AceDB-based genome database (http://www.ebi.ac.uk/parasites/LGN/leissssoft.html).
Resumo:
We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.
Resumo:
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
Resumo:
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
Resumo:
Dramatic improvements in DNA sequencing technologies have led to amore than 1,000-fold reduction in sequencing costs over the past five years.Genome-wide research approaches can thus now be applied beyond medicallyrelevant questions to examine the molecular-genetic basis of behavior,development and unique life histories in almost any organism. A first step foran emerging model organism is usually establishing a reference genomesequence. I offer insight gained from the fire ant genome project. First, I detailhow the project came to be and how sequencing, assembly and annotationstrategies were chosen. Subsequently, I describe some of the issues linked toworking with data from recently sequenced genomes. Finally, I discuss anapproach undertaken in a follow-up project based on the fire ant genomesequence.
Resumo:
Functional RNA structures play an important role both in the context of noncoding RNA transcripts as well as regulatory elements in mRNAs. Here we present a computational study to detect functional RNA structures within the ENCODE regions of the human genome. Since structural RNAs in general lack characteristic signals in primary sequence, comparative approaches evaluating evolutionary conservation of structures are most promising. We have used three recently introduced programs based on either phylogenetic-stochastic context-free grammar (EvoFold) or energy directed folding (RNAz and AlifoldZ), yielding several thousand candidate structures (corresponding to approximately 2.7% of the ENCODE regions). EvoFold has its highest sensitivity in highly conserved and relatively AU-rich regions, while RNAz favors slightly GC-rich regions, resulting in a relatively small overlap between methods. Comparison with the GENCODE annotation points to functional RNAs in all genomic contexts, with a slightly increased density in 3'-UTRs. While we estimate a significant false discovery rate of approximately 50%-70% many of the predictions can be further substantiated by additional criteria: 248 loci are predicted by both RNAz and EvoFold, and an additional 239 RNAz or EvoFold predictions are supported by the (more stringent) AlifoldZ algorithm. Five hundred seventy RNAz structure predictions fall into regions that show signs of selection pressure also on the sequence level (i.e., conserved elements). More than 700 predictions overlap with noncoding transcripts detected by oligonucleotide tiling arrays. One hundred seventy-five selected candidates were tested by RT-PCR in six tissues, and expression could be verified in 43 cases (24.6%).
Resumo:
A report of the annual meeting of the European Society of Human Genetics, Amsterdam, 6-9 May 2006.
Resumo:
Most approaches aiming at finding genes involved in adaptive events have focused on the detection of outlier loci, which resulted in the discovery of individually "significant" genes with strong effects. However, a collection of small effect mutations could have a large effect on a given biological pathway that includes many genes, and such a polygenic mode of adaptation has not been systematically investigated in humans. We propose here to evidence polygenic selection by detecting signals of adaptation at the pathway or gene set level instead of analyzing single independent genes. Using a gene-set enrichment test to identify genome-wide signals of adaptation among human populations, we find that most pathways globally enriched for signals of positive selection are either directly or indirectly involved in immune response. We also find evidence for long-distance genotypic linkage disequilibrium, suggesting functional epistatic interactions between members of the same pathway. Our results show that past interactions with pathogens have elicited widespread and coordinated genomic responses, and suggest that adaptation to pathogens can be considered as a primary example of polygenic selection.
Resumo:
Hypertension is one of the most common complex genetic disorders. We have described previously 38 single nucleotide polymorphisms (SNPs) with suggestive association with hypertension in Japanese individuals. In this study we extend our previous findings by analyzing a large sample of Japanese individuals (n=14 105) for the most associated SNPs. We also conducted replication analyses in Japanese of susceptibility loci for hypertension identified recently from genome-wide association studies of European ancestries. Association analysis revealed significant association of the ATP2B1 rs2070759 polymorphism with hypertension (P=5.3×10(-5); allelic odds ratio: 1.17 [95% CI: 1.09 to 1.26]). Additional SNPs in ATP2B1 were subsequently genotyped, and the most significant association was with rs11105378 (odds ratio: 1.31 [95% CI: 1.21 to 1.42]; P=4.1×10(-11)). Association of rs11105378 with hypertension was cross-validated by replication analysis with the Global Blood Pressure Genetics consortium data set (odds ratio: 1.13 [95% CI: 1.05 to 1.21]; P=5.9×10(-4)). Mean adjusted systolic blood pressure was highly significantly associated with the same SNP in a meta-analysis with individuals of European descent (P=1.4×10(-18)). ATP2B1 mRNA expression levels in umbilical artery smooth muscle cells were found to be significantly different among rs11105378 genotypes. Seven SNPs discovered in published genome-wide association studies were also genotyped in the Japanese population. In the combined analysis with replicated 3 genes, FGF5 rs1458038, CYP17A1, rs1004467, and CSK rs1378942, odds ratio of the highest risk group was 2.27 (95% CI: 1.65 to 3.12; P=4.6×10(-7)) compared with the lower risk group. In summary, this study confirmed common genetic variation in ATP2B1, as well as FGF5, CYP17A1, and CSK, to be associated with blood pressure levels and risk of hypertension.
Resumo:
The goals of the human genome project did not include sequencing of the heterochromatic regions. We describe here an initial sequence of 1.1 Mb of the short arm of human chromosome 21 (HSA21p), estimated to be 10% of 21p. This region contains extensive euchromatic-like sequence and includes on average one transcript every 100 kb. These transcripts show multiple inter- and intrachromosomal copies, and extensive copy number and sequence variability. The sequencing of the "heterochromatic" regions of the human genome is likely to reveal many additional functional elements and provide important evolutionary information.
Resumo:
Given that retroposed copies of genes are presumed to lack the regulatory elements required for their expression, retroposition has long been considered a mechanism without functional relevance. However, through an in silico assay for transcriptional activity, we identify here >1,000 transcribed retrocopies in the human genome, of which at least approximately 120 have evolved into bona fide genes. Among these, approximately 50 retrogenes have evolved functions in testes, more than half of which were recruited as functional autosomal counterparts of X-linked genes during spermatogenesis. Generally, retrogenes emerge "out of the testis," because they are often initially transcribed in testis and later evolve stronger and sometimes more diverse spatial expression patterns. We find a significant excess of transcribed retrocopies close to other genes or within introns, suggesting that retrocopies can exploit the regulatory elements and/or open chromatin of neighboring genes to become transcribed. In direct support of this hypothesis, we identify 36 retrocopy-host gene fusions, including primate-specific chimeric genes. Strikingly, 27 intergenic retrogenes have acquired untranslated exons de novo during evolution to achieve high expression levels. Notably, our screen for highly transcribed retrocopies also uncovered a retrogene linked to a human recessive disorder, gelatinous drop-like corneal dystrophy, a form of blindness. These functional implications for retroposition notwithstanding, we find that the insertion of retrocopies into genes is generally deleterious, because it may interfere with the transcription of host genes. Our results demonstrate that natural selection has been fundamental in shaping the retrocopy repertoire of the human genome.
Resumo:
Within the ENCODE Consortium, GENCODE aimed to accurately annotate all protein-coding genes, pseudogenes, and noncoding transcribed loci in the human genome through manual curation and computational methods. Annotated transcript structures were assessed, and less well-supported loci were systematically, experimentally validated. Predicted exon-exon junctions were evaluated by RT-PCR amplification followed by highly multiplexed sequencing readout, a method we called RT-PCR-seq. Seventy-nine percent of all assessed junctions are confirmed by this evaluation procedure, demonstrating the high quality of the GENCODE gene set. RT-PCR-seq was also efficient to screen gene models predicted using the Human Body Map (HBM) RNA-seq data. We validated 73% of these predictions, thus confirming 1168 novel genes, mostly noncoding, which will further complement the GENCODE annotation. Our novel experimental validation pipeline is extremely sensitive, far more than unbiased transcriptome profiling through RNA sequencing, which is becoming the norm. For example, exon-exon junctions unique to GENCODE annotated transcripts are five times more likely to be corroborated with our targeted approach than with extensive large human transcriptome profiling. Data sets such as the HBM and ENCODE RNA-seq data fail sampling of low-expressed transcripts. Our RT-PCR-seq targeted approach also has the advantage of identifying novel exons of known genes, as we discovered unannotated exons in ~11% of assessed introns. We thus estimate that at least 18% of known loci have yet-unannotated exons. Our work demonstrates that the cataloging of all of the genic elements encoded in the human genome will necessitate a coordinated effort between unbiased and targeted approaches, like RNA-seq and RT-PCR-seq.