22 resultados para Genomic data integration

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: A current challenge in gene annotation is to define the gene function in the context of the network of relationships instead of using single genes. The inference of gene networks (GNs) has emerged as an approach to better understand the biology of the system and to study how several components of this network interact with each other and keep their functions stable. However, in general there is no sufficient data to accurately recover the GNs from their expression levels leading to the curse of dimensionality, in which the number of variables is higher than samples. One way to mitigate this problem is to integrate biological data instead of using only the expression profiles in the inference process. Nowadays, the use of several biological information in inference methods had a significant increase in order to better recover the connections between genes and reduce the false positives. What makes this strategy so interesting is the possibility of confirming the known connections through the included biological data, and the possibility of discovering new relationships between genes when observed the expression data. Although several works in data integration have increased the performance of the network inference methods, the real contribution of adding each type of biological information in the obtained improvement is not clear. Methods: We propose a methodology to include biological information into an inference algorithm in order to assess its prediction gain by using biological information and expression profile together. We also evaluated and compared the gain of adding four types of biological information: (a) protein-protein interaction, (b) Rosetta stone fusion proteins, (c) KEGG and (d) KEGG+GO. Results and conclusions: This work presents a first comparison of the gain in the use of prior biological information in the inference of GNs by considering the eukaryote (P. falciparum) organism. Our results indicates that information based on direct interaction can produce a higher improvement in the gain than data about a less specific relationship as GO or KEGG. Also, as expected, the results show that the use of biological information is a very important approach for the improvement of the inference. We also compared the gain in the inference of the global network and only the hubs. The results indicates that the use of biological information can improve the identification of the most connected proteins.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background The use of the knowledge produced by sciences to promote human health is the main goal of translational medicine. To make it feasible we need computational methods to handle the large amount of information that arises from bench to bedside and to deal with its heterogeneity. A computational challenge that must be faced is to promote the integration of clinical, socio-demographic and biological data. In this effort, ontologies play an essential role as a powerful artifact for knowledge representation. Chado is a modular ontology-oriented database model that gained popularity due to its robustness and flexibility as a generic platform to store biological data; however it lacks supporting representation of clinical and socio-demographic information. Results We have implemented an extension of Chado – the Clinical Module - to allow the representation of this kind of information. Our approach consists of a framework for data integration through the use of a common reference ontology. The design of this framework has four levels: data level, to store the data; semantic level, to integrate and standardize the data by the use of ontologies; application level, to manage clinical databases, ontologies and data integration process; and web interface level, to allow interaction between the user and the system. The clinical module was built based on the Entity-Attribute-Value (EAV) model. We also proposed a methodology to migrate data from legacy clinical databases to the integrative framework. A Chado instance was initialized using a relational database management system. The Clinical Module was implemented and the framework was loaded using data from a factual clinical research database. Clinical and demographic data as well as biomaterial data were obtained from patients with tumors of head and neck. We implemented the IPTrans tool that is a complete environment for data migration, which comprises: the construction of a model to describe the legacy clinical data, based on an ontology; the Extraction, Transformation and Load (ETL) process to extract the data from the source clinical database and load it in the Clinical Module of Chado; the development of a web tool and a Bridge Layer to adapt the web tool to Chado, as well as other applications. Conclusions Open-source computational solutions currently available for translational science does not have a model to represent biomolecular information and also are not integrated with the existing bioinformatics tools. On the other hand, existing genomic data models do not represent clinical patient data. A framework was developed to support translational research by integrating biomolecular information coming from different “omics” technologies with patient’s clinical and socio-demographic data. This framework should present some features: flexibility, compression and robustness. The experiments accomplished from a use case demonstrated that the proposed system meets requirements of flexibility and robustness, leading to the desired integration. The Clinical Module can be accessed in http://dcm.ffclrp.usp.br/caib/pg=iptrans webcite.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background Trypanosomatids of the genera Angomonas and Strigomonas live in a mutualistic association characterized by extensive metabolic cooperation with obligate endosymbiotic Betaproteobacteria. However, the role played by the symbiont has been more guessed by indirect means than evidenced. Symbiont-harboring trypanosomatids, in contrast to their counterparts lacking symbionts, exhibit lower nutritional requirements and are autotrophic for essential amino acids. To evidence the symbiont’s contributions to this autotrophy, entire genomes of symbionts and trypanosomatids with and without symbionts were sequenced here. Results Analyses of the essential amino acid pathways revealed that most biosynthetic routes are in the symbiont genome. By contrast, the host trypanosomatid genome contains fewer genes, about half of which originated from different bacterial groups, perhaps only one of which (ornithine cyclodeaminase, EC:4.3.1.12) derived from the symbiont. Nutritional, enzymatic, and genomic data were jointly analyzed to construct an integrated view of essential amino acid metabolism in symbiont-harboring trypanosomatids. This comprehensive analysis showed perfect concordance among all these data, and revealed that the symbiont contains genes for enzymes that complete essential biosynthetic routes for the host amino acid production, thus explaining the low requirement for these elements in symbiont-harboring trypanosomatids. Phylogenetic analyses show that the cooperation between symbionts and their hosts is complemented by multiple horizontal gene transfers, from bacterial lineages to trypanosomatids, that occurred several times in the course of their evolution. Transfers occur preferentially in parts of the pathways that are missing from other eukaryotes. Conclusion We have herein uncovered the genetic and evolutionary bases of essential amino acid biosynthesis in several trypanosomatids with and without endosymbionts, explaining and complementing decades of experimental results. We uncovered the remarkable plasticity in essential amino acid biosynthesis pathway evolution in these protozoans, demonstrating heavy influence of horizontal gene transfer events, from Bacteria to trypanosomatid nuclei, in the evolution of these pathways.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Sediment quality from Paranagua Estuarine System (PES), a highly important port and ecological zone, was evaluated by assessing three lines of evidence: (1) sediment physical-chemical characteristics; (2) sediment toxicity (elutriates, sediment-water interface, and whole sediment); and (3) benthic community structure. Results revealed a gradient of increasing degradation of sediments (i.e. higher concentrations of trace metals, higher toxicity, and impoverishment of benthic community structure) towards inner PES. Data integration by principal component analysis (PCA) showed positive correlation between some contaminants (mainly As, Cr, Ni, and Pb) and toxicity in samples collected from stations located in upper estuary and one station placed away from contamination sources. Benthic community structure seems to be affected by both pollution and natural fine characteristics of the sediments, which reinforces the importance of a weight-of-evidence approach to evaluate sediments of PES. (C) 2008 Elsevier Inc. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

XML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being commonly modeled as Ordered Labeled Trees. Yet, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an integrated and fine-grained comparison framework to deal with both structural and semantic similarities in XML documents (detecting the occurrences and repetitions of structurally and semantically similar sub-trees), and to allow the end-user to adjust the comparison process according to her requirements. Our framework consists of four main modules for (i) discovering the structural commonalities between sub-trees, (ii) identifying sub-tree semantic resemblances, (iii) computing tree-based edit operations costs, and (iv) computing tree edit distance. Experimental results demonstrate higher comparison accuracy with respect to alternative methods, while timing experiments reflect the impact of semantic similarity on overall system performance.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Traceability is a concept that arose from the need for monitoring of production processes, this concept is usually used in sectors related to food production or activities involving some kind of direct risk to people. Agribusiness in the cotton industry does not have a comprehensive infrastructure for all stages of the processes involved in production. Map and define the data to enable traceability of products is synonymous to delegate responsibilities for all involved in the production, the collection of aggregate data on cotton production is done in stages and specific pre-defined since the choice of the variety through the processing, the scope of this article specifically addresses the production of lint cotton. The paper presents a proposal based on service oriented architecture (SOA) for data integration processes in the cotton industry, this proposal provide support for the implementation of platform independent solutions.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background: Ontologies have increasingly been used in the biomedical domain, which has prompted the emergence of different initiatives to facilitate their development and integration. The Open Biological and Biomedical Ontologies (OBO) Foundry consortium provides a repository of life-science ontologies, which are developed according to a set of shared principles. This consortium has developed an ontology called OBO Relation Ontology aiming at standardizing the different types of biological entity classes and associated relationships. Since ontologies are primarily intended to be used by humans, the use of graphical notations for ontology development facilitates the capture, comprehension and communication of knowledge between its users. However, OBO Foundry ontologies are captured and represented basically using text-based notations. The Unified Modeling Language (UML) provides a standard and widely-used graphical notation for modeling computer systems. UML provides a well-defined set of modeling elements, which can be extended using a built-in extension mechanism named Profile. Thus, this work aims at developing a UML profile for the OBO Relation Ontology to provide a domain-specific set of modeling elements that can be used to create standard UML-based ontologies in the biomedical domain. Results: We have studied the OBO Relation Ontology, the UML metamodel and the UML profiling mechanism. Based on these studies, we have proposed an extension to the UML metamodel in conformance with the OBO Relation Ontology and we have defined a profile that implements the extended metamodel. Finally, we have applied the proposed UML profile in the development of a number of fragments from different ontologies. Particularly, we have considered the Gene Ontology (GO), the PRotein Ontology (PRO) and the Xenopus Anatomy and Development Ontology (XAO). Conclusions: The use of an established and well-known graphical language in the development of biomedical ontologies provides a more intuitive form of capturing and representing knowledge than using only text-based notations. The use of the profile requires the domain expert to reason about the underlying semantics of the concepts and relationships being modeled, which helps preventing the introduction of inconsistencies in an ontology under development and facilitates the identification and correction of errors in an already defined ontology.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The University of São Paulo has been experiencing the increase in contents in electronic and digital formats, distributed by different suppliers and hosted remotely or in clouds, and is faced with the also increasing difficulties related to facilitating access to this digital collection by its users besides coexisting with the traditional world of physical collections. A possible solution was identified in the new generation of systems called Web Scale Discovery, which allow better management, data integration and agility of search. Aiming to identify if and how such a system would meet the USP demand and expectation and, in case it does, to identify what the analysis criteria of such a tool would be, an analytical study with an essentially documental base was structured, as from a revision of the literature and from data available in official websites and of libraries using this kind of resources. The conceptual base of the study was defined after the identification of software assessment methods already available, generating a standard with 40 analysis criteria, from details on the unique access interface to information contents, web 2.0 characteristics, intuitive interface, facet navigation, among others. The details of the studies conducted into four of the major systems currently available in this software category are presented, providing subsidies for the decision-making of other libraries interested in such systems.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Genome-wide association studies have failed to establish common variant risk for the majority of common human diseases. The underlying reasons for this failure are explained by recent studies of resequencing and comparison of over 1200 human genomes and 10 000 exomes, together with the delineation of DNA methylation patterns (epigenome) and full characterization of coding and noncoding RNAs (transcriptome) being transcribed. These studies have provided the most comprehensive catalogues of functional elements and genetic variants that are now available for global integrative analysis and experimental validation in prospective cohort studies. With these datasets, researchers will have unparalleled opportunities for the alignment, mining, and testing of hypotheses for the roles of specific genetic variants, including copy number variations, single nucleotide polymorphisms, and indels as the cause of specific phenotypes and diseases. Through the use of next-generation sequencing technologies for genotyping and standardized ontological annotation to systematically analyze the effects of genomic variation on humans and model organism phenotypes, we will be able to find candidate genes and new clues for disease’s etiology and treatment. This article describes essential concepts in genetics and genomic technologies as well as the emerging computational framework to comprehensively search websites and platforms available for the analysis and interpretation of genomic data.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Abstract Background The study and analysis of gene expression measurements is the primary focus of functional genomics. Once expression data is available, biologists are faced with the task of extracting (new) knowledge associated to the underlying biological phenomenon. Most often, in order to perform this task, biologists execute a number of analysis activities on the available gene expression dataset rather than a single analysis activity. The integration of heteregeneous tools and data sources to create an integrated analysis environment represents a challenging and error-prone task. Semantic integration enables the assignment of unambiguous meanings to data shared among different applications in an integrated environment, allowing the exchange of data in a semantically consistent and meaningful way. This work aims at developing an ontology-based methodology for the semantic integration of gene expression analysis tools and data sources. The proposed methodology relies on software connectors to support not only the access to heterogeneous data sources but also the definition of transformation rules on exchanged data. Results We have studied the different challenges involved in the integration of computer systems and the role software connectors play in this task. We have also studied a number of gene expression technologies, analysis tools and related ontologies in order to devise basic integration scenarios and propose a reference ontology for the gene expression domain. Then, we have defined a number of activities and associated guidelines to prescribe how the development of connectors should be carried out. Finally, we have applied the proposed methodology in the construction of three different integration scenarios involving the use of different tools for the analysis of different types of gene expression data. Conclusions The proposed methodology facilitates the development of connectors capable of semantically integrating different gene expression analysis tools and data sources. The methodology can be used in the development of connectors supporting both simple and nontrivial processing requirements, thus assuring accurate data exchange and information interpretation from exchanged data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The objective of this work was to evaluate extreme water table depths in a watershed, using methods for geographical spatial data analysis. Groundwater spatio-temporal dynamics was evaluated in an outcrop of the Guarani Aquifer System. Water table depths were estimated from monitoring of water levels in 23 piezometers and time series modeling available from April 2004 to April 2011. For generation of spatial scenarios, geostatistical techniques were used, which incorporated into the prediction ancillary information related to the geomorphological patterns of the watershed, using a digital elevation model. This procedure improved estimates, due to the high correlation between water levels and elevation, and aggregated physical sense to predictions. The scenarios showed differences regarding the extreme levels - too deep or too shallow ones - and can subsidize water planning, efficient water use, and sustainable water management in the watershed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: The development of sugarcane as a sustainable crop has unlimited applications. The crop is one of the most economically viable for renewable energy production, and CO2 balance. Linkage maps are valuable tools for understanding genetic and genomic organization, particularly in sugarcane due to its complex polyploid genome of multispecific origins. The overall objective of our study was to construct a novel sugarcane linkage map, compiling AFLP and EST-SSR markers, and to generate data on the distribution of markers anchored to sequences of scIvana_1, a complete sugarcane transposable element, and member of the Copia superfamily. Results: The mapping population parents ('IAC66-6' and 'TUC71-7') contributed equally to polymorphisms, independent of marker type, and generated markers that were distributed into nearly the same number of co-segregation groups (or CGs). Bi-parentally inherited alleles provided the integration of 19 CGs. The marker number per CG ranged from two to 39. The total map length was 4,843.19 cM, with a marker density of 8.87 cM. Markers were assembled into 92 CGs that ranged in length from 1.14 to 404.72 cM, with an estimated average length of 52.64 cM. The greatest distance between two adjacent markers was 48.25 cM. The scIvana_1-based markers (56) were positioned on 21 CGs, but were not regularly distributed. Interestingly, the distance between adjacent scIvana_1-based markers was less than 5 cM, and was observed on five CGs, suggesting a clustered organization. Conclusions: Results indicated the use of a NBS-profiling technique was efficient to develop retrotransposon-based markers in sugarcane. The simultaneous maximum-likelihood estimates of linkage and linkage phase based strategies confirmed the suitability of its approach to estimate linkage, and construct the linkage map. Interestingly, using our genetic data it was possible to calculate the number of retrotransposonscIvana_1 (similar to 60) copies in the sugarcane genome, confirming previously reported molecular results. In addition, this research possibly will have indirect implications in crop economics e. g., productivity enhancement via QTL studies, as the mapping population parents differ in response to an important fungal disease.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we continue the development of the differential calculus started in Aragona et al. (Monatsh. Math. 144: 13-29, 2005). Guided by the so-called sharp topology and the interpretation of Colombeau generalized functions as point functions on generalized point sets, we introduce the notion of membranes and extend the definition of integrals, given in Aragona et al. (Monatsh. Math. 144: 13-29, 2005), to integrals defined on membranes. We use this to prove a generalized version of the Cauchy formula and to obtain the Goursat Theorem for generalized holomorphic functions. A number of results from classical differential and integral calculus, like the inverse and implicit function theorems and Green's theorem, are transferred to the generalized setting. Further, we indicate that solution formulas for transport and wave equations with generalized initial data can be obtained as well.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Costa-Silva JH, Zoccal DB, Machado BH. Chronic intermittent hypoxia alters glutamatergic control of sympathetic and respiratory activities in the commissural NTS of rats. Am J Physiol Regul Integr Comp Physiol 302: R785-R793, 2012. First published December 28, 2011; doi:10.1152/ajpregu.00363.2011.-Sympathetic overactivity and altered respiratory control are commonly observed after chronic intermittent hypoxia (CIH) exposure. However, the central mechanisms underlying such neurovegetative dysfunctions remain unclear. Herein, we hypothesized that CIH (6% O-2 every 9 min, 8 h/day, 10 days) in juvenile rats alters glutamatergic transmission in the commissural nucleus tractus solitarius (cNTS), a pivotal site for integration of peripheral chemoreceptor inputs. Using an in situ working heart-brain stem preparation, we found that L-glutamate microinjections (1, 3, and 10 mM) into the cNTS of control rats (n = 8) evoked increases in thoracic sympathetic nerve (tSN) and central vagus nerve (cVN) activities combined with inhibition of phrenic nerve (PN) activity. Besides, the ionotropic glutamatergic receptor antagonism with kynurenic acid (KYN; 250 mM) in the cNTS of control group (n = 7) increased PN burst duration and frequency. In the CIH group (n = 10), the magnitude of L-glutamate-induced cVN excitation was smaller, and the PN inhibitory response was blunted (P < 0.05). In addition, KYN microinjections into the cNTS of CIH rats (n = 9) did not alter PN burst duration and produced smaller increases in its frequency compared with controls. Moreover, KYN microinjections into the cNTS attenuated the sympathoexcitatory response to peripheral chemoreflex activation in control but not in CIH rats (P < 0.05). These functional CIH-induced alterations were accompanied by a significant 10% increase of N-methyl-D-aspartate receptor 1 (NMDAR1) and glutamate receptor 2/3 (GluR2/3) receptor subunit density in the cNTS (n = 3-8, P < 0.05), evaluated by Western blot analysis. These data indicate that glutamatergic transmission is altered in the cNTS of CIH rats and may contribute to the sympathetic and respiratory changes observed in this experimental model.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Over the last decade, Brazil has pioneered an innovative model of branchless banking, known as correspondent banking, involving distribution partnership between banks, several kinds of retailers and a variety of other participants, which have allowed an unprecedented growth in bank outreach and became a reference worldwide. However, despite the extensive number of studies recently developed focusing on Brazilian branchless banking, there exists a clear research gap in the literature. It is still necessary to identify the different business configurations involving network integration through which the branchless banking channel can be structured, as well as the way they relate to the range of bank services delivered. Given this gap, our objective is to investigate the relationship between network integration models and services delivered through the branchless banking channel. Based on twenty interviews with managers involved with the correspondent banking business and data collected on almost 300 correspondent locations, our research is developed in two steps. First, we created a qualitative taxonomy through which we identified three classes of network integration models. Second, we performed a cluster analysis to explain the groups of financial services that fit each model. By contextualizing correspondents' network integration processes through the lens of transaction costs economics, our results suggest that the more suited to deliver social-oriented, "pro-poor'' services the channel is, the more it is controlled by banks. This research offers contributions to managers and policy makers interested in understanding better how different correspondent banking configurations are related with specific portfolios of services. Researchers interested in the subject of branchless banking can also benefit from the taxonomy presented and the transaction costs analysis of this kind of banking channel, which has been adopted in a number of developing countries all over the world now. (C) 2011 Elsevier B.V. All rights reserved.