GENCODE: producing a reference annotation for ENCODE


Autoria(s): Harrow, Jennifer; Denoeud, France; Frankish, Adam; Reymond, Alexandre; Chen, Chao-Kung; Chrast, Jacqueline; Lagarde, Julien; Gilbert, James GR; Storey, Roy; Swarbreck, David; Rossier, Colette; Ucla, Catherine; Hubbard, Tim; Antonarakis, Stylianos E.; Guigó Serra, Roderic
Contribuinte(s)

Universitat Pompeu Fabra

Data(s)

02/07/2013

Resumo

Background: The GENCODE consortium was formed to identify and map all protein-coding genes within the ENCODE regions. This was achieved by a combination of initial manualannotation by the HAVANA team, experimental validation by the GENCODE consortium and a refinement of the annotation based on these experimental results.Results: The GENCODE gene features are divided into eight different categories of which onlythe first two (known and novel coding sequence) are confidently predicted to be protein-codinggenes. 5’ rapid amplification of cDNA ends (RACE) and RT-PCR were used to experimentallyverify the initial annotation. Of the 420 coding loci tested, 229 RACE products have beensequenced. They supported 5’ extensions of 30 loci and new splice variants in 50 loci. In addition,46 loci without evidence for a coding sequence were validated, consisting of 31 novel and 15putative transcripts. We assessed the comprehensiveness of the GENCODE annotation byattempting to validate all the predicted exon boundaries outside the GENCODE annotation. Outof 1,215 tested in a subset of the ENCODE regions, 14 novel exon pairs were validated, only twoof them in intergenic regions.Conclusions: In total, 487 loci, of which 434 are coding, have been annotated as part of theGENCODE reference set available from the UCSC browser. Comparison of GENCODEannotation with RefSeq and ENSEMBL show only 40% of GENCODE exons are contained withinthe two sets, which is a reflection of the high number of alternative splice forms with uniqueexons annotated. Over 50% of coding loci have been experimentally verified by 5’ RACE forEGASP and the GENCODE collaboration is continuing to refine its annotation of 1% humangenome with the aid of experimental validation.

Identificador

http://hdl.handle.net/10230/12925

Idioma(s)

eng

Publicador

BioMed Central

Direitos

© 2006 Harrow et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This article is also available at <a href="http://genomebiology.com/2006/7/S1/S4">http://genomebiology.com/2006/7/S1/S4</a>

<a href="http://creativecommons.org/licenses/by/2.0">http://creativecommons.org/licenses/by/2.0</a>

Palavras-Chave #Bioinformàtica #Biologia molecular -- Tècnica #GENCODE #ENCODE #cDNA #RACE #Gene Loci
Tipo

info:eu-repo/semantics/article

info:eu-repo/semantics/publishedVersion