28 resultados para metadata repository
em Université de Lausanne, Switzerland
MetaNetX.org: a website and repository for accessing, analysing and manipulating metabolic networks.
Resumo:
SUMMARY: MetaNetX.org is a website for accessing, analysing and manipulating genome-scale metabolic networks (GSMs) as well as biochemical pathways. It consistently integrates data from various public resources and makes the data accessible in a standardized format using a common namespace. Currently, it provides access to hundreds of GSMs and pathways that can be interactively compared (two or more), analysed (e.g. detection of dead-end metabolites and reactions, flux balance analysis or simulation of reaction and gene knockouts), manipulated and exported. Users can also upload their own metabolic models, choose to automatically map them into the common namespace and subsequently make use of the website's functionality. Availability and implementation: MetaNetX.org is available at http://metanetx.org. CONTACT: help@metanetx.org.
Resumo:
A population register is an inventory of residents within a country, with their characteristics (date of birth, sex, marital status, etc.) and other socio-economic data, such as occupation or education. However, data on population are also stored in numerous other public registers such as tax, land, building and housing, military, foreigners, vehicles, etc. Altogether they contain vast amounts of personal and sensitive information. Access to public information is granted by law in many countries, but this transparency is generally subject to tensions with data protection laws. This paper proposes a framework to analyze data access (or protection) requirements, as well as a model of metadata for data exchange.
Resumo:
The broad aim of biomedical science in the postgenomic era is to link genomic and phenotype information to allow deeper understanding of the processes leading from genomic changes to altered phenotype and disease. The EuroPhenome project (http://www.EuroPhenome.org) is a comprehensive resource for raw and annotated high-throughput phenotyping data arising from projects such as EUMODIC. EUMODIC is gathering data from the EMPReSSslim pipeline (http://www.empress.har.mrc.ac.uk/) which is performed on inbred mouse strains and knock-out lines arising from the EUCOMM project. The EuroPhenome interface allows the user to access the data via the phenotype or genotype. It also allows the user to access the data in a variety of ways, including graphical display, statistical analysis and access to the raw data via web services. The raw phenotyping data captured in EuroPhenome is annotated by an annotation pipeline which automatically identifies statistically different mutants from the appropriate baseline and assigns ontology terms for that specific test. Mutant phenotypes can be quickly identified using two EuroPhenome tools: PhenoMap, a graphical representation of statistically relevant phenotypes, and mining for a mutant using ontology terms. To assist with data definition and cross-database comparisons, phenotype data is annotated using combinations of terms from biological ontologies.
Resumo:
BACKGROUND: Several European HIV observational data bases have, over the last decade, accumulated a substantial number of resistance test results and developed large sample repositories, There is a need to link these efforts together, We here describe the development of such a novel tool that allows to bind these data bases together in a distributed fashion for which the control and data remains with the cohorts rather than classic data mergers.METHODS: As proof-of-concept we entered two basic queries into the tool: available resistance tests and available samples. We asked for patients still alive after 1998-01-01, and between 180 and 195 cm of height, and how many samples or resistance tests there would be available for these patients, The queries were uploaded with the tool to a central web server from which each participating cohort downloaded the queries with the tool and ran them against their database, The numbers gathered were then submitted back to the server and we could accumulate the number of available samples and resistance tests.RESULTS: We obtained the following results from the cohorts on available samples/resistance test: EuResist: not availableI11,194; EuroSIDA: 20,71611,992; ICONA: 3,751/500; Rega: 302/302; SHCS: 53,78311,485, In total, 78,552 samples and 15,473 resistance tests were available amongst these five cohorts. Once these data items have been identified, it is trivial to generate lists of relevant samples that would be usefuI for ultra deep sequencing in addition to the already available resistance tests, Saon the tool will include small analysis packages that allow each cohort to pull a report on their cohort profile and also survey emerging resistance trends in their own cohort,CONCLUSIONS: We plan on providing this tool to all cohorts within the Collaborative HIV and Anti-HIV Drug Resistance Network (CHAIN) and will provide the tool free of charge to others for any non-commercial use, The potential of this tool is to ease collaborations, that is, in projects requiring data to speed up identification of novel resistance mutations by increasing the number of observations across multiple cohorts instead of awaiting single cohorts or studies to reach the critical number needed to address such issues.
Resumo:
BACKGROUND: The Nuclear Factor I (NFI) family of DNA binding proteins (also called CCAAT box transcription factors or CTF) is involved in both DNA replication and gene expression regulation. Using chromatin immuno-precipitation and high throughput sequencing (ChIP-Seq), we performed a genome-wide mapping of NFI DNA binding sites in primary mouse embryonic fibroblasts. RESULTS: We found that in vivo and in vitro NFI DNA binding specificities are indistinguishable, as in vivo ChIP-Seq NFI binding sites matched predictions based on previously established position weight matrix models of its in vitro binding specificity. Combining ChIP-Seq with mRNA profiling data, we found that NFI preferentially associates with highly expressed genes that it up-regulates, while binding sites were under-represented at expressed but unregulated genes. Genomic binding also correlated with markers of transcribed genes such as histone modifications H3K4me3 and H3K36me3, even outside of annotated transcribed loci, implying NFI in the control of the deposition of these modifications. Positional correlation between + and - strand ChIP-Seq tags revealed that, in contrast to other transcription factors, NFI associates with a nucleosomal length of cleavage-resistant DNA, suggesting an interaction with positioned nucleosomes. In addition, NFI binding prominently occurred at boundaries displaying discontinuities in histone modifications specific of expressed and silent chromatin, such as loci submitted to parental allele-specific imprinted expression. CONCLUSIONS: Our data thus suggest that NFI nucleosomal interaction may contribute to the partitioning of distinct chromatin domains and to epigenetic gene expression regulation.NFI ChIP-Seq and input control DNA data were deposited at Gene Expression Omnibus (GEO) repository under accession number GSE15844. Gene expression microarray data for mouse embryonic fibroblasts are on GEO accession number GSE15871.
Resumo:
The HUPO Proteomics Standards Initiative has developed several standardized data formats to facilitate data sharing in mass spectrometry (MS)-based proteomics. These allow researchers to report their complete results in a unified way. However, at present, there is no format to describe the final qualitative and quantitative results for proteomics and metabolomics experiments in a simple tabular format. Many downstream analysis use cases are only concerned with the final results of an experiment and require an easily accessible format, compatible with tools such as Microsoft Excel or R. We developed the mzTab file format for MS-based proteomics and metabolomics results to meet this need. mzTab is intended as a lightweight supplement to the existing standard XML-based file formats (mzML, mzIdentML, mzQuantML), providing a comprehensive summary, similar in concept to the supplemental material of a scientific publication. mzTab files can contain protein, peptide, and small molecule identifications together with experimental metadata and basic quantitative information. The format is not intended to store the complete experimental evidence but provides mechanisms to report results at different levels of detail. These range from a simple summary of the final results to a representation of the results including the experimental design. This format is ideally suited to make MS-based proteomics and metabolomics results available to a wider biological community outside the field of MS. Several software tools for proteomics and metabolomics have already adapted the format as an output format. The comprehensive mzTab specification document and extensive additional documentation can be found online.
Resumo:
BACKGROUND: DNA sequence integrity, mRNA concentrations and protein-DNA interactions have been subject to genome-wide analyses based on microarrays with ever increasing efficiency and reliability over the past fifteen years. However, very recently novel technologies for Ultra High-Throughput DNA Sequencing (UHTS) have been harnessed to study these phenomena with unprecedented precision. As a consequence, the extensive bioinformatics environment available for array data management, analysis, interpretation and publication must be extended to include these novel sequencing data types. DESCRIPTION: MIMAS was originally conceived as a simple, convenient and local Microarray Information Management and Annotation System focused on GeneChips for expression profiling studies. MIMAS 3.0 enables users to manage data from high-density oligonucleotide SNP Chips, expression arrays (both 3'UTR and tiling) and promoter arrays, BeadArrays as well as UHTS data using MIAME-compliant standardized vocabulary. Importantly, researchers can export data in MAGE-TAB format and upload them to the EBI's ArrayExpress certified data repository using a one-step procedure. CONCLUSION: We have vastly extended the capability of the system such that it processes the data output of six types of GeneChips (Affymetrix), two different BeadArrays for mRNA and miRNA (Illumina) and the Genome Analyzer (a popular Ultra-High Throughput DNA Sequencer, Illumina), without compromising on its flexibility and user-friendliness. MIMAS, appropriately renamed into Multiomics Information Management and Annotation System, is currently used by scientists working in approximately 50 academic laboratories and genomics platforms in Switzerland and France. MIMAS 3.0 is freely available via http://multiomics.sourceforge.net/.
Resumo:
BACKGROUND: Genital herpes is one of the most prevalent sexually-transmitted diseases, and accounts for a substantial morbidity. Genital herpes puts newborns at risk for very severe disease and also increases the risk of horizontal HIV transmission. It thus stands as an important public health problem. The recent availability of type-specific gG-based assays detecting IgG against HSV-1 and HSV-2 allows to establish the prevalence of each subtype. Worldwide, few data have been published regarding the seroprevalence in general populations of HSV-2, the major causative agent for genital herpes, while no data exist regarding the Swiss population. METHODS: To evaluate the prevalence of IgG antibodies against HSV-1 and HSV-2 in Switzerland, we used a population-based serum repository from a health examination survey conducted in the Western and Southern area of Switzerland in 1992-93. A total of 3,120 sera were analysed by type-specific gG-based ELISA and seroprevalence was correlated with available volunteers characteristics by logistic regression. RESULTS: Overall, seroprevalence rates were 80.0 +/- 0.9% (SE, 95% CI: 78.1-81.8) for HSV-1 and 19.3 +/- 0.9% (SE, 95% CI: 17.6-21.1) for HSV-2 in adults 35-64 year old. HSV-1 and HSV-2 seroprevalence increased with age, with a peak HSV-2 seroprevalence in elderly gentlemen, possibly a seroarcheological evidence of sexually transmitted disease epidemics during World War II. Risk factors for HSV-2 infection included female sex, marital status other than married, and size of town of residence larger than 1500 inhabitants. Unexpectedly and conversely to HSV-1, HSV-2 seroprevalence increased with educational level. HSV-2 infection was less prevalent among HSV-1 infected individuals when compared to HSV-1 uninfected individuals. This effect was most apparent among women at high risk for HSV-2 infection. CONCLUSIONS: Our data demonstrate that by the early nineties, HSV-2 had spread quite largely in the Swiss population. However, the epidemiology of HSV-2 in Switzerland presents paradoxical characteristics, e.g. positive correlation with education level, that have not been observed elsewhere.
Resumo:
Next-generation sequencing offers an unprecedented opportunity to jointly analyze cellular and viral transcriptional activity without prerequisite knowledge of the nature of the transcripts. SupT1 cells were infected with a vesicular stomatitis virus G envelope protein (VSV-G)-pseudotyped HIV vector. At 24 h postinfection, both cellular and viral transcriptomes were analyzed by serial analysis of gene expression followed by high-throughput sequencing (SAGE-Seq). Read mapping resulted in 33 to 44 million tags aligning with the human transcriptome and 0.23 to 0.25 million tags aligning with the genome of the HIV-1 vector. Thus, at peak infection, 1 transcript in 143 is of viral origin (0.7%), including a small component of antisense viral transcription. Of the detected cellular transcripts, 826 (2.3%) were differentially expressed between mock- and HIV-infected samples. The approach also assessed whether HIV-1 infection modulates the expression of repetitive elements or endogenous retroviruses. We observed very active transcription of these elements, with 1 transcript in 237 being of such origin, corresponding on average to 123,123 reads in mock-infected samples (0.40%) and 129,149 reads in HIV-1-infected samples (0.45%) mapping to the genomic Repbase repository. This analysis highlights key details in the generation and interpretation of high-throughput data in the setting of HIV-1 cellular infection.
Resumo:
Résumé Lors d'une recherche d'information, l'apprenant est très souvent confronté à des problèmes de guidage et de personnalisation. Ceux-ci sont d'autant plus importants que la recherche se fait dans un environnement ouvert tel que le Web. En effet, dans ce cas, il n'y a actuellement pas de contrôle de pertinence sur les ressources proposées pas plus que sur l'adéquation réelle aux besoins spécifiques de l'apprenant. A travers l'étude de l'état de l'art, nous avons constaté l'absence d'un modèle de référence qui traite des problématiques liées (i) d'une part aux ressources d'apprentissage notamment à l'hétérogénéité de la structure et de la description et à la protection en terme de droits d'auteur et (ii) d'autre part à l'apprenant en tant qu'utilisateur notamment l'acquisition des éléments le caractérisant et la stratégie d'adaptation à lui offrir. Notre objectif est de proposer un système adaptatif à base de ressources d'apprentissage issues d'un environnement à ouverture contrôlée. Celui-ci permet de générer automatiquement sans l'intervention d'un expert pédagogue un parcours d'apprentissage personnalisé à partir de ressources rendues disponibles par le biais de sources de confiance. L'originalité de notre travail réside dans la proposition d'un modèle de référence dit de Lausanne qui est basé sur ce que nous considérons comme étant les meilleures pratiques des communautés : (i) du Web en terme de moyens d'ouverture, (ii) de l'hypermédia adaptatif en terme de stratégie d'adaptation et (iii) de l'apprentissage à distance en terme de manipulation des ressources d'apprentissage. Dans notre modèle, la génération des parcours personnalisés se fait sur la base (i) de ressources d'apprentissage indexées et dont le degré de granularité en favorise le partage et la réutilisation. Les sources de confiance utilisées en garantissent l'utilité et la qualité. (ii) de caractéristiques de l'utilisateur, compatibles avec les standards existants, permettant le passage de l'apprenant d'un environnement à un autre. (iii) d'une adaptation à la fois individuelle et sociale. Pour cela, le modèle de Lausanne propose : (i) d'utiliser ISO/MLR (Metadata for Learning Resources) comme formalisme de description. (ii) de décrire le modèle d'utilisateur avec XUN1 (eXtended User Model), notre proposition d'un modèle compatible avec les standards IEEE/PAPI et IMS/LIP. (iii) d'adapter l'algorithme des fourmis au contexte de l'apprentissage à distance afin de générer des parcours personnalisés. La dimension individuelle est aussi prise en compte par la mise en correspondance de MLR et de XUM. Pour valider notre modèle, nous avons développé une application et testé plusieurs scenarii mettant en action des utilisateurs différents à des moments différents. Nous avons ensuite procédé à des comparaisons entre ce que retourne le système et ce que suggère l'expert. Les résultats s'étant avérés satisfaisants dans la mesure où à chaque fois le système retourne un parcours semblable à celui qu'aurait proposé l'expert, nous sommes confortées dans notre approche.
Resumo:
SUMMARY: We present a tool designed for visualization of large-scale genetic and genomic data exemplified by results from genome-wide association studies. This software provides an integrated framework to facilitate the interpretation of SNP association studies in genomic context. Gene annotations can be retrieved from Ensembl, linkage disequilibrium data downloaded from HapMap and custom data imported in BED or WIG format. AssociationViewer integrates functionalities that enable the aggregation or intersection of data tracks. It implements an efficient cache system and allows the display of several, very large-scale genomic datasets. AVAILABILITY: The Java code for AssociationViewer is distributed under the GNU General Public Licence and has been tested on Microsoft Windows XP, MacOSX and GNU/Linux operating systems. It is available from the SourceForge repository. This also includes Java webstart, documentation and example datafiles.
Resumo:
Understanding how communities of living organisms assemble has been a central question in ecology since the early days of the discipline. Disentangling the different processes involved in community assembly is not only interesting in itself but also crucial for an understanding of how communities will behave under future environmental scenarios. The traditional concept of assembly rules reflects the notion that species do not co-occur randomly but are restricted in their co-occurrence by interspecific competition. This concept can be redefined in a more general framework where the co-occurrence of species is a product of chance, historical patterns of speciation and migration, dispersal, abiotic environmental factors, and biotic interactions, with none of these processes being mutually exclusive. Here we present a survey and meta-analyses of 59 papers that compare observed patterns in plant communities with null models simulating random patterns of species assembly. According to the type of data under study and the different methods that are applied to detect community assembly, we distinguish four main types of approach in the published literature: species co-occurrence, niche limitation, guild proportionality and limiting similarity. Results from our meta-analyses suggest that non-random co-occurrence of plant species is not a widespread phenomenon. However, whether this finding reflects the individualistic nature of plant communities or is caused by methodological shortcomings associated with the studies considered cannot be discerned from the available metadata. We advocate that more thorough surveys be conducted using a set of standardized methods to test for the existence of assembly rules in data sets spanning larger biological and geographical scales than have been considered until now. We underpin this general advice with guidelines that should be considered in future assembly rules research. This will enable us to draw more accurate and general conclusions about the non-random aspect of assembly in plant communities.