1000 resultados para data curation
Resumo:
WormBase (http://www.wormbase.org) is a web-based resource for the Caenorhabditis elegans genome and its biology. It builds upon the existing ACeDB database of the C.elegans genome by providing data curation services, a significantly expanded range of subject areas and a user-friendly front end.
Resumo:
We present the Hungarian National Scientific Bibliography project: the MTMT. We argue that presently available commercial systems cannot be used as a comprehensive national bibliometric tool. The new database was created from existing databases of the Hungarian Academy of Sciences, but expected to be re-engineered in the future. The data curation model includes harvesting, the work of expert bibliographers and author feedback. MTMT will work together with the other services in the web of scientific information, using standard protocols and formats, and act as a hub. It will present the scientific output of Hungary together with the repositories containing the full text, wherever available. The database will be open, but not freely harvestable, and only for non-commercial use.
Resumo:
We present some recent trends in the field of digital cultural heritage management and applications including digital cultural data curation, interoperability, open linked data publishing, crowd sourcing, visualization, platforms for digital cultural heritage, and applications. We present some examples from research and development projects of MUSIC/TUC in those areas.
Resumo:
Se presentan los resultados de un estudio sobre la formación universitaria que reciben los profesionales de la información sobre la materia gestión de proyectos, tras el análisis de las titulaciones en Información y Documentación a nivel internacional. Para ello, se han utilizado fuentes y directorios sobre la educación internacional en Library and Information Science y se ha creado una base de datos con 106 registros de asignaturas sobre gestión de proyectos incluidas en las titulaciones en Información y Documentación. Como resultado de este proceso, los parámetros de análisis de la investigación han sido la ubicación geográfica, las instituciones de educación superior, las titulaciones académicas y las propias asignaturas sobre gestión de proyectos. Entre las conclusiones más notables, destaca la obligatoriedad de las asignaturas, la enseñanza de las mismas en modo presencial y el caso de las universidades públicas de Estados Unidos, Alemania y Francia.
Resumo:
Background: A major goal in the post-genomic era is to identify and characterise disease susceptibility genes and to apply this knowledge to disease prevention and treatment. Rodents and humans have remarkably similar genomes and share closely related biochemical, physiological and pathological pathways. In this work we utilised the latest information on the mouse transcriptome as revealed by the RIKEN FANTOM2 project to identify novel human disease-related candidate genes. We define a new term patholog to mean a homolog of a human disease-related gene encoding a product ( transcript, anti-sense or protein) potentially relevant to disease. Rather than just focus on Mendelian inheritance, we applied the analysis to all potential pathologs regardless of their inheritance pattern. Results: Bioinformatic analysis and human curation of 60,770 RIKEN full-length mouse cDNA clones produced 2,578 sequences that showed similarity ( 70 - 85% identity) to known human-disease genes. Using a newly developed biological information extraction and annotation tool ( FACTS) in parallel with human expert analysis of 17,051 MEDLINE scientific abstracts we identified 182 novel potential pathologs. Of these, 36 were identified by computational tools only, 49 by human expert analysis only and 97 by both methods. These pathologs were related to neoplastic ( 53%), hereditary ( 24%), immunological ( 5%), cardio-vascular (4%), or other (14%), disorders. Conclusions: Large scale genome projects continue to produce a vast amount of data with potential application to the study of human disease. For this potential to be realised we need intelligent strategies for data categorisation and the ability to link sequence data with relevant literature. This paper demonstrates the power of combining human expert annotation with FACTS, a newly developed bioinformatics tool, to identify novel pathologs from within large-scale mouse transcript datasets.
Resumo:
Genome-scale metabolic models are valuable tools in the metabolic engineering process, based on the ability of these models to integrate diverse sources of data to produce global predictions of organism behavior. At the most basic level, these models require only a genome sequence to construct, and once built, they may be used to predict essential genes, culture conditions, pathway utilization, and the modifications required to enhance a desired organism behavior. In this chapter, we address two key challenges associated with the reconstruction of metabolic models: (a) leveraging existing knowledge of microbiology, biochemistry, and available omics data to produce the best possible model; and (b) applying available tools and data to automate the reconstruction process. We consider these challenges as we progress through the model reconstruction process, beginning with genome assembly, and culminating in the integration of constraints to capture the impact of transcriptional regulation. We divide the reconstruction process into ten distinct steps: (1) genome assembly from sequenced reads; (2) automated structural and functional annotation; (3) phylogenetic tree-based curation of genome annotations; (4) assembly and standardization of biochemistry database; (5) genome-scale metabolic reconstruction; (6) generation of core metabolic model; (7) generation of biomass composition reaction; (8) completion of draft metabolic model; (9) curation of metabolic model; and (10) integration of regulatory constraints. Each of these ten steps is documented in detail.
Resumo:
Animal toxins are of interest to a wide range of scientists, due to their numerous applications in pharmacology, neurology, hematology, medicine, and drug research. This, and to a lesser extent the development of new performing tools in transcriptomics and proteomics, has led to an increase in toxin discovery. In this context, providing publicly available data on animal toxins has become essential. The UniProtKB/Swiss-Prot Tox-Prot program (http://www.uniprot.org/program/Toxins) plays a crucial role by providing such an access to venom protein sequences and functions from all venomous species. This program has up to now curated more than 5000 venom proteins to the high-quality standards of UniProtKB/Swiss-Prot (release 2012_02). Proteins targeted by these toxins are also available in the knowledgebase. This paper describes in details the type of information provided by UniProtKB/Swiss-Prot for toxins, as well as the structured format of the knowledgebase.
Resumo:
Expert curation and complete collection of mutations in genes that affect human health is essential for proper genetic healthcare and research. Expert curation is given by the curators of gene-specific mutation databases or locus-specific databases (LSDBs). While there are over 700 such databases, they vary in their content, completeness, time available for curation, and the expertise of the curator. Curation and LSDBs have been discussed, written about, and protocols have been provided for over 10 years, but there have been no formal recommendations for the ideal form of these entities. This work initiates a discussion on this topic to assist future efforts in human genetics. Further discussion is welcome.
Genetic Variations and Diseases in UniProtKB/Swiss-Prot: The Ins and Outs of Expert Manual Curation.
Resumo:
During the last few years, next-generation sequencing (NGS) technologies have accelerated the detection of genetic variants resulting in the rapid discovery of new disease-associated genes. However, the wealth of variation data made available by NGS alone is not sufficient to understand the mechanisms underlying disease pathogenesis and manifestation. Multidisciplinary approaches combining sequence and clinical data with prior biological knowledge are needed to unravel the role of genetic variants in human health and disease. In this context, it is crucial that these data are linked, organized, and made readily available through reliable online resources. The Swiss-Prot section of the Universal Protein Knowledgebase (UniProtKB/Swiss-Prot) provides the scientific community with a collection of information on protein functions, interactions, biological pathways, as well as human genetic diseases and variants, all manually reviewed by experts. In this article, we present an overview of the information content of UniProtKB/Swiss-Prot to show how this knowledgebase can support researchers in the elucidation of the mechanisms leading from a molecular defect to a disease phenotype.
Resumo:
Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand.Here we present a community-driven curation effort, supported by ELIXIR-the European infrastructure for biological information-that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners.As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.
Resumo:
Workshop at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
Poster at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
This paper discusses many of the issues associated with formally publishing data in academia, focusing primarily on the structures that need to be put in place for peer review and formal citation of datasets. Data publication is becoming increasingly important to the scientific community, as it will provide a mechanism for those who create data to receive academic credit for their work and will allow the conclusions arising from an analysis to be more readily verifiable, thus promoting transparency in the scientific process. Peer review of data will also provide a mechanism for ensuring the quality of datasets, and we provide suggestions on the types of activities one expects to see in the peer review of data. A simple taxonomy of data publication methodologies is presented and evaluated, and the paper concludes with a discussion of dataset granularity, transience and semantics, along with a recommended human-readable citation syntax.