3 resultados para Molecular Sequence Annotation

em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background: The GENCODE consortium was formed to identify and map all protein-coding genes within the ENCODE regions. This was achieved by a combination of initial manualannotation by the HAVANA team, experimental validation by the GENCODE consortium and a refinement of the annotation based on these experimental results.Results: The GENCODE gene features are divided into eight different categories of which onlythe first two (known and novel coding sequence) are confidently predicted to be protein-codinggenes. 5’ rapid amplification of cDNA ends (RACE) and RT-PCR were used to experimentallyverify the initial annotation. Of the 420 coding loci tested, 229 RACE products have beensequenced. They supported 5’ extensions of 30 loci and new splice variants in 50 loci. In addition,46 loci without evidence for a coding sequence were validated, consisting of 31 novel and 15putative transcripts. We assessed the comprehensiveness of the GENCODE annotation byattempting to validate all the predicted exon boundaries outside the GENCODE annotation. Outof 1,215 tested in a subset of the ENCODE regions, 14 novel exon pairs were validated, only twoof them in intergenic regions.Conclusions: In total, 487 loci, of which 434 are coding, have been annotated as part of theGENCODE reference set available from the UCSC browser. Comparison of GENCODEannotation with RefSeq and ENSEMBL show only 40% of GENCODE exons are contained withinthe two sets, which is a reflection of the high number of alternative splice forms with uniqueexons annotated. Over 50% of coding loci have been experimentally verified by 5’ RACE forEGASP and the GENCODE collaboration is continuing to refine its annotation of 1% humangenome with the aid of experimental validation.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background: We present the results of EGASP, a community experiment to assess the state-ofthe-art in genome annotation within the ENCODE regions, which span 1% of the human genomesequence. The experiment had two major goals: the assessment of the accuracy of computationalmethods to predict protein coding genes; and the overall assessment of the completeness of thecurrent human genome annotations as represented in the ENCODE regions. For thecomputational prediction assessment, eighteen groups contributed gene predictions. Weevaluated these submissions against each other based on a ‘reference set’ of annotationsgenerated as part of the GENCODE project. These annotations were not available to theprediction groups prior to the submission deadline, so that their predictions were blind and anexternal advisory committee could perform a fair assessment.Results: The best methods had at least one gene transcript correctly predicted for close to 70%of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into accountalternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotidelevel, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programsrelying on mRNA and protein sequences were the most accurate in reproducing the manuallycurated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could beverified.Conclusions: This is the first such experiment in human DNA, and we have followed thestandards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe theresults presented here contribute to the value of ongoing large-scale annotation projects and shouldguide further experimental methods when being scaled up to the entire human genome sequence.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

El objetivo de este estudio se centró en analizar una colección privada de germoplasma de Vitis vinifera L., de 338 cultivares procedentes de 24 países, para caracterizarlas creando una base de datos, utilizando 11 marcadores microsatélites o SSR (Simple Sequence Repeat). Como resultado se encontraron que algunas de las muestras analizadas presentaron un perfil idéntico de SSR, indicando que se trata de una sinonimia (la misma variedad pero con diferente nombre). Se detectaron 293 perfiles únicos. Adicionalmente, 15 pares de variedades presentaron diferencias en un solo locus y otros 7 grupos difieren en 2 loci, lo cual indicaría la alta proximidad genética entre esas variedades, sin llegar a ser la misma. El germoplasma analizado cuenta con una compleja biodiversidad varietal que se debe preservar. El estudio se realizó programando para el primer año la revisión bibliográfica detallada, recolección de las hojas y el inicio de la puesta a punto de la metodología, en el segundo año se completa la puesta a punto de la metodología, se trituran las hojas y se realiza la extraccinón del ADN. El tercer año se emplea para amplificar los fragmentos de ADN por medio de la PCR (reacción en cadena de la polimerasa), obtener la longitud de los fragmentos con un secuenciador ABI PRISM 310, valorar resultados y realizar repeticiones. El último año se analizan los resultados obtenidos, se realizan repeticiones pertinentes y se comienza la redacción de artículos científicos.