926 resultados para Genomic data integration
Resumo:
Traceability is a concept that arose from the need for monitoring of production processes, this concept is usually used in sectors related to food production or activities involving some kind of direct risk to people. Agribusiness in the cotton industry does not have a comprehensive infrastructure for all stages of the processes involved in production. Map and define the data to enable traceability of products is synonymous to delegate responsibilities for all involved in the production, the collection of aggregate data on cotton production is done in stages and specific pre-defined since the choice of the variety through the processing, the scope of this article specifically addresses the production of lint cotton. The paper presents a proposal based on service oriented architecture (SOA) for data integration processes in the cotton industry, this proposal provide support for the implementation of platform independent solutions.
Resumo:
Background: Ontologies have increasingly been used in the biomedical domain, which has prompted the emergence of different initiatives to facilitate their development and integration. The Open Biological and Biomedical Ontologies (OBO) Foundry consortium provides a repository of life-science ontologies, which are developed according to a set of shared principles. This consortium has developed an ontology called OBO Relation Ontology aiming at standardizing the different types of biological entity classes and associated relationships. Since ontologies are primarily intended to be used by humans, the use of graphical notations for ontology development facilitates the capture, comprehension and communication of knowledge between its users. However, OBO Foundry ontologies are captured and represented basically using text-based notations. The Unified Modeling Language (UML) provides a standard and widely-used graphical notation for modeling computer systems. UML provides a well-defined set of modeling elements, which can be extended using a built-in extension mechanism named Profile. Thus, this work aims at developing a UML profile for the OBO Relation Ontology to provide a domain-specific set of modeling elements that can be used to create standard UML-based ontologies in the biomedical domain. Results: We have studied the OBO Relation Ontology, the UML metamodel and the UML profiling mechanism. Based on these studies, we have proposed an extension to the UML metamodel in conformance with the OBO Relation Ontology and we have defined a profile that implements the extended metamodel. Finally, we have applied the proposed UML profile in the development of a number of fragments from different ontologies. Particularly, we have considered the Gene Ontology (GO), the PRotein Ontology (PRO) and the Xenopus Anatomy and Development Ontology (XAO). Conclusions: The use of an established and well-known graphical language in the development of biomedical ontologies provides a more intuitive form of capturing and representing knowledge than using only text-based notations. The use of the profile requires the domain expert to reason about the underlying semantics of the concepts and relationships being modeled, which helps preventing the introduction of inconsistencies in an ontology under development and facilitates the identification and correction of errors in an already defined ontology.
Resumo:
The University of São Paulo has been experiencing the increase in contents in electronic and digital formats, distributed by different suppliers and hosted remotely or in clouds, and is faced with the also increasing difficulties related to facilitating access to this digital collection by its users besides coexisting with the traditional world of physical collections. A possible solution was identified in the new generation of systems called Web Scale Discovery, which allow better management, data integration and agility of search. Aiming to identify if and how such a system would meet the USP demand and expectation and, in case it does, to identify what the analysis criteria of such a tool would be, an analytical study with an essentially documental base was structured, as from a revision of the literature and from data available in official websites and of libraries using this kind of resources. The conceptual base of the study was defined after the identification of software assessment methods already available, generating a standard with 40 analysis criteria, from details on the unique access interface to information contents, web 2.0 characteristics, intuitive interface, facet navigation, among others. The details of the studies conducted into four of the major systems currently available in this software category are presented, providing subsidies for the decision-making of other libraries interested in such systems.
Resumo:
Genome-wide association studies have failed to establish common variant risk for the majority of common human diseases. The underlying reasons for this failure are explained by recent studies of resequencing and comparison of over 1200 human genomes and 10 000 exomes, together with the delineation of DNA methylation patterns (epigenome) and full characterization of coding and noncoding RNAs (transcriptome) being transcribed. These studies have provided the most comprehensive catalogues of functional elements and genetic variants that are now available for global integrative analysis and experimental validation in prospective cohort studies. With these datasets, researchers will have unparalleled opportunities for the alignment, mining, and testing of hypotheses for the roles of specific genetic variants, including copy number variations, single nucleotide polymorphisms, and indels as the cause of specific phenotypes and diseases. Through the use of next-generation sequencing technologies for genotyping and standardized ontological annotation to systematically analyze the effects of genomic variation on humans and model organism phenotypes, we will be able to find candidate genes and new clues for disease’s etiology and treatment. This article describes essential concepts in genetics and genomic technologies as well as the emerging computational framework to comprehensively search websites and platforms available for the analysis and interpretation of genomic data.
Resumo:
In the post genomic era with the massive production of biological data the understanding of factors affecting protein stability is one of the most important and challenging tasks for highlighting the role of mutations in relation to human maladies. The problem is at the basis of what is referred to as molecular medicine with the underlying idea that pathologies can be detailed at a molecular level. To this purpose scientific efforts focus on characterising mutations that hamper protein functions and by these affect biological processes at the basis of cell physiology. New techniques have been developed with the aim of detailing single nucleotide polymorphisms (SNPs) at large in all the human chromosomes and by this information in specific databases are exponentially increasing. Eventually mutations that can be found at the DNA level, when occurring in transcribed regions may then lead to mutated proteins and this can be a serious medical problem, largely affecting the phenotype. Bioinformatics tools are urgently needed to cope with the flood of genomic data stored in database and in order to analyse the role of SNPs at the protein level. In principle several experimental and theoretical observations are suggesting that protein stability in the solvent-protein space is responsible of the correct protein functioning. Then mutations that are found disease related during DNA analysis are often assumed to perturb protein stability as well. However so far no extensive analysis at the proteome level has investigated whether this is the case. Also computationally methods have been developed to infer whether a mutation is disease related and independently whether it affects protein stability. Therefore whether the perturbation of protein stability is related to what it is routinely referred to as a disease is still a big question mark. In this work we have tried for the first time to explore the relation among mutations at the protein level and their relevance to diseases with a large-scale computational study of the data from different databases. To this aim in the first part of the thesis for each mutation type we have derived two probabilistic indices (for 141 out of 150 possible SNPs): the perturbing index (Pp), which indicates the probability that a given mutation effects protein stability considering all the “in vitro” thermodynamic data available and the disease index (Pd), which indicates the probability of a mutation to be disease related, given all the mutations that have been clinically associated so far. We find with a robust statistics that the two indexes correlate with the exception of all the mutations that are somatic cancer related. By this each mutation of the 150 can be coded by two values that allow a direct comparison with data base information. Furthermore we also implement computational methods that starting from the protein structure is suited to predict the effect of a mutation on protein stability and find that overpasses a set of other predictors performing the same task. The predictor is based on support vector machines and takes as input protein tertiary structures. We show that the predicted data well correlate with the data from the databases. All our efforts therefore add to the SNP annotation process and more importantly found the relationship among protein stability perturbation and the human variome leading to the diseasome.
Resumo:
Background: ;Rates of molecular evolution vary widely among species. While significant deviations from molecular clock have been found in many taxa, effects of life histories on molecular evolution are not fully understood. In plants, annual/perennial life history traits have long been suspected to influence the evolutionary rates at the molecular level. To date, however, the number of genes investigated on this subject is limited and the conclusions are mixed. To evaluate the possible heterogeneity in evolutionary rates between annual and perennial plants at the genomic level, we investigated 85 nuclear housekeeping genes, 10 non-housekeeping families, and 34 chloroplast;genes using the genomic data from model plants including Arabidopsis thaliana and Medicago truncatula for annuals and grape (Vitis vinifera) and popular (Populus trichocarpa) for perennials.;Results: ;According to the cross-comparisons among the four species, 74-82% of the nuclear genes and 71-97% of the chloroplast genes suggested higher rates of molecular evolution in the two annuals than those in the two perennials. The significant heterogeneity in evolutionary rate between annuals and perennials was consistently found both in nonsynonymous sites and synonymous sites. While a linear correlation of evolutionary rates in orthologous genes between species was observed in nonsynonymous sites, the correlation was weak or invisible in synonymous sites. This tendency was clearer in nuclear genes than in chloroplast genes, in which the overall;evolutionary rate was small. The slope of the regression line was consistently lower than unity, further confirming the higher evolutionary rate in annuals at the genomic level.;Conclusions: ;The higher evolutionary rate in annuals than in perennials appears to be a universal phenomenon both in nuclear and chloroplast genomes in the four dicot model plants we investigated. Therefore, such heterogeneity in evolutionary rate should result from factors that have genome-wide influence, most likely those associated with annual/perennial life history. Although we acknowledge current limitations of this kind of study, mainly due to a small sample size available and a distant taxonomic relationship of the model organisms, our results indicate that the genome-wide survey is a promising approach toward further understanding of the;mechanism determining the molecular evolutionary rate at the genomic level.
Resumo:
Simulation-based assessment is a popular and frequently necessary approach to evaluation of statistical procedures. Sometimes overlooked is the ability to take advantage of underlying mathematical relations and we focus on this aspect. We show how to take advantage of large-sample theory when conducting a simulation using the analysis of genomic data as a motivating example. The approach uses convergence results to provide an approximation to smaller-sample results, results that are available only by simulation. We consider evaluating and comparing a variety of ranking-based methods for identifying the most highly associated SNPs in a genome-wide association study, derive integral equation representations of the pre-posterior distribution of percentiles produced by three ranking methods, and provide examples comparing performance. These results are of interest in their own right and set the framework for a more extensive set of comparisons.
Resumo:
Gap junctions are clustered channels between contacting cells through which direct intercellular communication via diffusion of ions and metabolites can occur. Two hemichannels, each built up of six connexin protein subunits in the plasma membrane of adjacent cells, can dock to each other to form conduits between cells. We have recently screened mouse and human genomic data bases and have found 19 connexin (Cx) genes in the mouse genome and 20 connexin genes in the human genome. One mouse connexin gene and two human connexin genes do not appear to have orthologs in the other genome. With three exceptions, the characterized connexin genes comprise two exons whereby the complete reading frame is located on the second exon. Targeted ablation of eleven mouse connexin genes revealed basic insights into the functional diversity of the connexin gene family. In addition, the phenotypes of human genetic disorders caused by mutated connexin genes further complement our understanding of connexin functions in the human organism. In this review we compare currently identified connexin genes in both the mouse and human genome and discuss the functions of gap junctions deduced from targeted mouse mutants and human genetic disorders.
Resumo:
Salmonella enterica subspecies I serovars are common bacterial pathogens causing diseases ranging from enterocolitis to systemic infections. Some serovars are adapted to specific hosts, whereas others have a broad host range. The molecular mechanisms defining the virulence characteristics and the host range of a given S. enterica serovar are unknown. Streptomycin pretreated mice provide a surrogate host model for studying molecular aspects of the intestinal inflammation (colitis) caused by serovar Typhimurium (S. Hapfelmeier and W. D. Hardt, Trends Microbiol. 13:497-503, 2005). Here, we studied whether this animal model is also useful for studying other S. enterica subspecies I serovars. All three tested strains of the broad-host-range serovar Enteritidis (125109, 5496/98, and 832/99) caused pronounced colitis and systemic infection in streptomycin pretreated mice. Different levels of virulence were observed among three tested strains of the host-adapted serovar Dublin (SARB13, SD2229, and SD3246). Several strains of host restricted serovars were also studied. Two serovar Pullorum strains (X3543 and 449/87) caused intermediate levels of colitis. No intestinal inflammation was observed upon infection with three different serovar Paratyphi A strains (SARB42, 2804/96, and 5314/98) and one serovar Gallinarum strain (X3796). A second serovar Gallinarum strain (287/91) was highly virulent and caused severe colitis. This strain awaits future analysis. In conclusion, the streptomycin pretreated mouse model can provide an additional tool to study virulence factors (i.e., those involved in enteropathogenesis) of various S. enterica subspecies I serovars. Five of these strains (125109, 2229, 287/91, 449/87, and SARB42) are subject of Salmonella genome sequencing projects. The streptomycin pretreated mouse model may be useful for testing hypotheses derived from this genomic data.
Resumo:
OBJECTIVE Blood-borne biomarkers reflecting atherosclerotic plaque burden have great potential to improve clinical management of atherosclerotic coronary artery disease and acute coronary syndrome (ACS). APPROACH AND RESULTS Using data integration from gene expression profiling of coronary thrombi versus peripheral blood mononuclear cells and proteomic analysis of atherosclerotic plaque-derived secretomes versus healthy tissue secretomes, we identified fatty acid-binding protein 4 (FABP4) as a biomarker candidate for coronary artery disease. Its diagnostic and prognostic performance was validated in 3 different clinical settings: (1) in a cross-sectional cohort of patients with stable coronary artery disease, ACS, and healthy individuals (n=820), (2) in a nested case-control cohort of patients with ACS with 30-day follow-up (n=200), and (3) in a population-based nested case-control cohort of asymptomatic individuals with 5-year follow-up (n=414). Circulating FABP4 was marginally higher in patients with ST-segment-elevation myocardial infarction (24.9 ng/mL) compared with controls (23.4 ng/mL; P=0.01). However, elevated FABP4 was associated with adverse secondary cerebrovascular or cardiovascular events during 30-day follow-up after index ACS, independent of age, sex, renal function, and body mass index (odds ratio, 1.7; 95% confidence interval, 1.1-2.5; P=0.02). Circulating FABP4 predicted adverse events with similar prognostic performance as the GRACE in-hospital risk score or N-terminal pro-brain natriuretic peptide. Finally, no significant difference between baseline FABP4 was found in asymptomatic individuals with or without coronary events during 5-year follow-up. CONCLUSIONS Circulating FABP4 may prove useful as a prognostic biomarker in risk stratification of patients with ACS.
Resumo:
Phylogenetic reconstruction of the evolutionary history of closely related organisms may be difficult because of the presence of unsorted lineages and of a relatively high proportion of heterozygous sites that are usually not handled well by phylogenetic programs. Genomic data may provide enough fixed polymorphisms to resolve phylogenetic trees, but the diploid nature of sequence data remains analytically challenging. Here, we performed a phylogenomic reconstruction of the evolutionary history of the common vole (Microtus arvalis) with a focus on the influence of heterozygosity on the estimation of intraspecific divergence times. We used genome-wide sequence information from 15 voles distributed across the European range. We provide a novel approach to integrate heterozygous information in existing phylogenetic programs by repeated random haplotype sampling from sequences with multiple unphased heterozygous sites. We evaluated the impact of the use of full, partial, or no heterozygous information for tree reconstructions on divergence time estimates. All results consistently showed four deep and strongly supported evolutionary lineages in the vole data. These lineages undergoing divergence processes split only at the end or after the last glacial maximum based on calibration with radiocarbon-dated paleontological material. However, the incorporation of information from heterozygous sites had a significant impact on absolute and relative branch length estimations. Ignoring heterozygous information led to an overestimation of divergence times between the evolutionary lineages of M. arvalis. We conclude that the exclusion of heterozygous sites from evolutionary analyses may cause biased and misleading divergence time estimates in closely related taxa.
Resumo:
The interaction between sibling species that share a zone of contact is a multifaceted relationship affected by climate change [ 1, 2 ]. Between sibling species, interactions may occur at whole-organism (direct or indirect competition) or genomic (hybridization and introgression) levels [ 3–5 ]. Tracking hybrid zone movements can provide insights about influences of environmental change on species interactions [ 1 ]. Here, we explore the extent and mechanism of movement of the contact zone between black-capped chickadees (Poecile atricapillus) and Carolina chickadees (Poecile carolinensis) at whole-organism and genomic levels. We find strong evidence that winter temperatures limit the northern extent of P. carolinensis by demonstrating a current-day association between the range limit of this species and minimum winter temperatures. We further show that this temperature limitation has been consistent over time because we are able to accurately hindcast the previous northern range limit under earlier climate conditions. Using genomic data, we confirm northward movement of this contact zone over the past decade and highlight temporally consistent differential—but limited—geographic introgression of alleles. Our results provide an informative example of the influence of climate change on a contact zone between sibling species.
Resumo:
Based on bacterial genomic data, we developed a one-step multiplex PCR assay to identify Salmonella and simultaneously differentiate the two invasive avian-adapted S. enterica serovar Gallinarum biotypes Gallinarum and Pullorum, and the most frequent, specific, and asymptomatic colonizers of chickens, serovars Enteritidis, Heidelberg, and Kentucky.