16 resultados para data integration

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: A current challenge in gene annotation is to define the gene function in the context of the network of relationships instead of using single genes. The inference of gene networks (GNs) has emerged as an approach to better understand the biology of the system and to study how several components of this network interact with each other and keep their functions stable. However, in general there is no sufficient data to accurately recover the GNs from their expression levels leading to the curse of dimensionality, in which the number of variables is higher than samples. One way to mitigate this problem is to integrate biological data instead of using only the expression profiles in the inference process. Nowadays, the use of several biological information in inference methods had a significant increase in order to better recover the connections between genes and reduce the false positives. What makes this strategy so interesting is the possibility of confirming the known connections through the included biological data, and the possibility of discovering new relationships between genes when observed the expression data. Although several works in data integration have increased the performance of the network inference methods, the real contribution of adding each type of biological information in the obtained improvement is not clear. Methods: We propose a methodology to include biological information into an inference algorithm in order to assess its prediction gain by using biological information and expression profile together. We also evaluated and compared the gain of adding four types of biological information: (a) protein-protein interaction, (b) Rosetta stone fusion proteins, (c) KEGG and (d) KEGG+GO. Results and conclusions: This work presents a first comparison of the gain in the use of prior biological information in the inference of GNs by considering the eukaryote (P. falciparum) organism. Our results indicates that information based on direct interaction can produce a higher improvement in the gain than data about a less specific relationship as GO or KEGG. Also, as expected, the results show that the use of biological information is a very important approach for the improvement of the inference. We also compared the gain in the inference of the global network and only the hubs. The results indicates that the use of biological information can improve the identification of the most connected proteins.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background The use of the knowledge produced by sciences to promote human health is the main goal of translational medicine. To make it feasible we need computational methods to handle the large amount of information that arises from bench to bedside and to deal with its heterogeneity. A computational challenge that must be faced is to promote the integration of clinical, socio-demographic and biological data. In this effort, ontologies play an essential role as a powerful artifact for knowledge representation. Chado is a modular ontology-oriented database model that gained popularity due to its robustness and flexibility as a generic platform to store biological data; however it lacks supporting representation of clinical and socio-demographic information. Results We have implemented an extension of Chado – the Clinical Module - to allow the representation of this kind of information. Our approach consists of a framework for data integration through the use of a common reference ontology. The design of this framework has four levels: data level, to store the data; semantic level, to integrate and standardize the data by the use of ontologies; application level, to manage clinical databases, ontologies and data integration process; and web interface level, to allow interaction between the user and the system. The clinical module was built based on the Entity-Attribute-Value (EAV) model. We also proposed a methodology to migrate data from legacy clinical databases to the integrative framework. A Chado instance was initialized using a relational database management system. The Clinical Module was implemented and the framework was loaded using data from a factual clinical research database. Clinical and demographic data as well as biomaterial data were obtained from patients with tumors of head and neck. We implemented the IPTrans tool that is a complete environment for data migration, which comprises: the construction of a model to describe the legacy clinical data, based on an ontology; the Extraction, Transformation and Load (ETL) process to extract the data from the source clinical database and load it in the Clinical Module of Chado; the development of a web tool and a Bridge Layer to adapt the web tool to Chado, as well as other applications. Conclusions Open-source computational solutions currently available for translational science does not have a model to represent biomolecular information and also are not integrated with the existing bioinformatics tools. On the other hand, existing genomic data models do not represent clinical patient data. A framework was developed to support translational research by integrating biomolecular information coming from different “omics” technologies with patient’s clinical and socio-demographic data. This framework should present some features: flexibility, compression and robustness. The experiments accomplished from a use case demonstrated that the proposed system meets requirements of flexibility and robustness, leading to the desired integration. The Clinical Module can be accessed in http://dcm.ffclrp.usp.br/caib/pg=iptrans webcite.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Sediment quality from Paranagua Estuarine System (PES), a highly important port and ecological zone, was evaluated by assessing three lines of evidence: (1) sediment physical-chemical characteristics; (2) sediment toxicity (elutriates, sediment-water interface, and whole sediment); and (3) benthic community structure. Results revealed a gradient of increasing degradation of sediments (i.e. higher concentrations of trace metals, higher toxicity, and impoverishment of benthic community structure) towards inner PES. Data integration by principal component analysis (PCA) showed positive correlation between some contaminants (mainly As, Cr, Ni, and Pb) and toxicity in samples collected from stations located in upper estuary and one station placed away from contamination sources. Benthic community structure seems to be affected by both pollution and natural fine characteristics of the sediments, which reinforces the importance of a weight-of-evidence approach to evaluate sediments of PES. (C) 2008 Elsevier Inc. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

XML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being commonly modeled as Ordered Labeled Trees. Yet, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an integrated and fine-grained comparison framework to deal with both structural and semantic similarities in XML documents (detecting the occurrences and repetitions of structurally and semantically similar sub-trees), and to allow the end-user to adjust the comparison process according to her requirements. Our framework consists of four main modules for (i) discovering the structural commonalities between sub-trees, (ii) identifying sub-tree semantic resemblances, (iii) computing tree-based edit operations costs, and (iv) computing tree edit distance. Experimental results demonstrate higher comparison accuracy with respect to alternative methods, while timing experiments reflect the impact of semantic similarity on overall system performance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Traceability is a concept that arose from the need for monitoring of production processes, this concept is usually used in sectors related to food production or activities involving some kind of direct risk to people. Agribusiness in the cotton industry does not have a comprehensive infrastructure for all stages of the processes involved in production. Map and define the data to enable traceability of products is synonymous to delegate responsibilities for all involved in the production, the collection of aggregate data on cotton production is done in stages and specific pre-defined since the choice of the variety through the processing, the scope of this article specifically addresses the production of lint cotton. The paper presents a proposal based on service oriented architecture (SOA) for data integration processes in the cotton industry, this proposal provide support for the implementation of platform independent solutions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: Ontologies have increasingly been used in the biomedical domain, which has prompted the emergence of different initiatives to facilitate their development and integration. The Open Biological and Biomedical Ontologies (OBO) Foundry consortium provides a repository of life-science ontologies, which are developed according to a set of shared principles. This consortium has developed an ontology called OBO Relation Ontology aiming at standardizing the different types of biological entity classes and associated relationships. Since ontologies are primarily intended to be used by humans, the use of graphical notations for ontology development facilitates the capture, comprehension and communication of knowledge between its users. However, OBO Foundry ontologies are captured and represented basically using text-based notations. The Unified Modeling Language (UML) provides a standard and widely-used graphical notation for modeling computer systems. UML provides a well-defined set of modeling elements, which can be extended using a built-in extension mechanism named Profile. Thus, this work aims at developing a UML profile for the OBO Relation Ontology to provide a domain-specific set of modeling elements that can be used to create standard UML-based ontologies in the biomedical domain. Results: We have studied the OBO Relation Ontology, the UML metamodel and the UML profiling mechanism. Based on these studies, we have proposed an extension to the UML metamodel in conformance with the OBO Relation Ontology and we have defined a profile that implements the extended metamodel. Finally, we have applied the proposed UML profile in the development of a number of fragments from different ontologies. Particularly, we have considered the Gene Ontology (GO), the PRotein Ontology (PRO) and the Xenopus Anatomy and Development Ontology (XAO). Conclusions: The use of an established and well-known graphical language in the development of biomedical ontologies provides a more intuitive form of capturing and representing knowledge than using only text-based notations. The use of the profile requires the domain expert to reason about the underlying semantics of the concepts and relationships being modeled, which helps preventing the introduction of inconsistencies in an ontology under development and facilitates the identification and correction of errors in an already defined ontology.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The University of São Paulo has been experiencing the increase in contents in electronic and digital formats, distributed by different suppliers and hosted remotely or in clouds, and is faced with the also increasing difficulties related to facilitating access to this digital collection by its users besides coexisting with the traditional world of physical collections. A possible solution was identified in the new generation of systems called Web Scale Discovery, which allow better management, data integration and agility of search. Aiming to identify if and how such a system would meet the USP demand and expectation and, in case it does, to identify what the analysis criteria of such a tool would be, an analytical study with an essentially documental base was structured, as from a revision of the literature and from data available in official websites and of libraries using this kind of resources. The conceptual base of the study was defined after the identification of software assessment methods already available, generating a standard with 40 analysis criteria, from details on the unique access interface to information contents, web 2.0 characteristics, intuitive interface, facet navigation, among others. The details of the studies conducted into four of the major systems currently available in this software category are presented, providing subsidies for the decision-making of other libraries interested in such systems.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Abstract Background The study and analysis of gene expression measurements is the primary focus of functional genomics. Once expression data is available, biologists are faced with the task of extracting (new) knowledge associated to the underlying biological phenomenon. Most often, in order to perform this task, biologists execute a number of analysis activities on the available gene expression dataset rather than a single analysis activity. The integration of heteregeneous tools and data sources to create an integrated analysis environment represents a challenging and error-prone task. Semantic integration enables the assignment of unambiguous meanings to data shared among different applications in an integrated environment, allowing the exchange of data in a semantically consistent and meaningful way. This work aims at developing an ontology-based methodology for the semantic integration of gene expression analysis tools and data sources. The proposed methodology relies on software connectors to support not only the access to heterogeneous data sources but also the definition of transformation rules on exchanged data. Results We have studied the different challenges involved in the integration of computer systems and the role software connectors play in this task. We have also studied a number of gene expression technologies, analysis tools and related ontologies in order to devise basic integration scenarios and propose a reference ontology for the gene expression domain. Then, we have defined a number of activities and associated guidelines to prescribe how the development of connectors should be carried out. Finally, we have applied the proposed methodology in the construction of three different integration scenarios involving the use of different tools for the analysis of different types of gene expression data. Conclusions The proposed methodology facilitates the development of connectors capable of semantically integrating different gene expression analysis tools and data sources. The methodology can be used in the development of connectors supporting both simple and nontrivial processing requirements, thus assuring accurate data exchange and information interpretation from exchanged data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The objective of this work was to evaluate extreme water table depths in a watershed, using methods for geographical spatial data analysis. Groundwater spatio-temporal dynamics was evaluated in an outcrop of the Guarani Aquifer System. Water table depths were estimated from monitoring of water levels in 23 piezometers and time series modeling available from April 2004 to April 2011. For generation of spatial scenarios, geostatistical techniques were used, which incorporated into the prediction ancillary information related to the geomorphological patterns of the watershed, using a digital elevation model. This procedure improved estimates, due to the high correlation between water levels and elevation, and aggregated physical sense to predictions. The scenarios showed differences regarding the extreme levels - too deep or too shallow ones - and can subsidize water planning, efficient water use, and sustainable water management in the watershed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we continue the development of the differential calculus started in Aragona et al. (Monatsh. Math. 144: 13-29, 2005). Guided by the so-called sharp topology and the interpretation of Colombeau generalized functions as point functions on generalized point sets, we introduce the notion of membranes and extend the definition of integrals, given in Aragona et al. (Monatsh. Math. 144: 13-29, 2005), to integrals defined on membranes. We use this to prove a generalized version of the Cauchy formula and to obtain the Goursat Theorem for generalized holomorphic functions. A number of results from classical differential and integral calculus, like the inverse and implicit function theorems and Green's theorem, are transferred to the generalized setting. Further, we indicate that solution formulas for transport and wave equations with generalized initial data can be obtained as well.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Over the last decade, Brazil has pioneered an innovative model of branchless banking, known as correspondent banking, involving distribution partnership between banks, several kinds of retailers and a variety of other participants, which have allowed an unprecedented growth in bank outreach and became a reference worldwide. However, despite the extensive number of studies recently developed focusing on Brazilian branchless banking, there exists a clear research gap in the literature. It is still necessary to identify the different business configurations involving network integration through which the branchless banking channel can be structured, as well as the way they relate to the range of bank services delivered. Given this gap, our objective is to investigate the relationship between network integration models and services delivered through the branchless banking channel. Based on twenty interviews with managers involved with the correspondent banking business and data collected on almost 300 correspondent locations, our research is developed in two steps. First, we created a qualitative taxonomy through which we identified three classes of network integration models. Second, we performed a cluster analysis to explain the groups of financial services that fit each model. By contextualizing correspondents' network integration processes through the lens of transaction costs economics, our results suggest that the more suited to deliver social-oriented, "pro-poor'' services the channel is, the more it is controlled by banks. This research offers contributions to managers and policy makers interested in understanding better how different correspondent banking configurations are related with specific portfolios of services. Researchers interested in the subject of branchless banking can also benefit from the taxonomy presented and the transaction costs analysis of this kind of banking channel, which has been adopted in a number of developing countries all over the world now. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Introduction: The aim of this study was to assess the epidemiological and operational characteristics of the Leprosy Program before and after its integration into the Primary Healthcare Services of the municipality of Aracaju-Sergipe, Brazil. Methods: Data were drawn from the national database. The study periods were divided into preintegration (1996-2000) and postintegration (2001-2007). Annual rates of epidemiological detection were calculated. Frequency data on clinico-epidemiological variables of cases detected and treated for the two periods were compared using the Chi-squared (chi(2)) test adopting a 5% level of significance. Results: Rates of detection overall, and in subjects younger than 15 years, were greater for the postintegration period and were higher than rates recorded for Brazil as a whole during the same periods. A total of 780 and 1,469 cases were registered during the preintegration and postintegration periods, respectively. Observations for the postintegration period were as follows: I) a higher proportion of cases with disability grade assessed at diagnosis, with increase of 60.9% to 78.8% (p < 0.001), and at end of treatment, from 41.4% to 44.4% (p < 0.023); II) an increase in proportion of cases detected by contact examination, from 2.1% to 4.1% (p < 0.001); and III) a lower level of treatment default with a decrease from 5.64 to 3.35 (p < 0.008). Only 34% of cases registered from 2001 to 2007 were examined. Conclusions: The shift observed in rates of detection overall, and in subjects younger than 15 years, during the postintegration period indicate an increased level of health care access. The fall in number of patients abandoning treatment indicates greater adherence to treatment. However, previous shortcomings in key actions, pivotal to attaining the outcomes and impact envisaged for the program, persisted in the postintegration period.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Produced water in oil fields is one of the main sources of wastewater generated in the industry. It contains several organic compounds, such as benzene, toluene, ethyl benzene and xylene (BTEX), whose disposal is regulated by law. The aim of this study is to investigate a treatment of produced water integrating two processes, i.e., induced air flotation (IAF) and photo-Fenton. The experiments were conducted in a column flotation and annular lamp reactor for flotation and photodegradation steps, respectively. The first order kinetic constant of IAF for the wastewater studied was determined to be 0.1765 min(-1) for the surfactant EO 7. Degradation efficiencies of organic loading were assessed using factorial planning. Statistical data analysis shows that H2O2 concentration is a determining factor in process efficiency. Degradations above 90% were reached in all cases after 90 min of reaction, attaining 100% mineralization in the optimized concentrations of Fenton reagents. Process integration was adequate with 100% organic load removal in 20 min. The results of the integration of the IAF with the photo-Fenton allowed to meet the effluent limits established by Brazilian legislation for disposal. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

INTRODUCTION: The aim of this study was to assess the epidemiological and operational characteristics of the Leprosy Program before and after its integration into the Primary healthcare Services of the municipality of Aracaju-Sergipe, Brazil. METHODS: Data were drawn from the national database. The study periods were divided into preintegration (1996-2000) and postintegration (2001-2007). Annual rates of epidemiological detection were calculated. Frequency data on clinico-epidemiological variables of cases detected and treated for the two periods were compared using the Chi-squared (χ2) test adopting a 5% level of significance. RESULTS: Rates of detection overall, and in subjects younger than 15 years, were greater for the postintegration period and were higher than rates recorded for Brazil as a whole during the same periods. A total of 780 and 1,469 cases were registered during the preintegration and postintegration periods, respectively. Observations for the postintegration period were as follows: I) a higher proportion of cases with disability grade assessed at diagnosis, with increase of 60.9% to 78.8% (p < 0.001), and at end of treatment, from 41.4% to 44.4% (p < 0.023); II) an increase in proportion of cases detected by contact examination, from 2.1% to 4.1% (p < 0.001); and III) a lower level of treatment default with a decrease from 5.64 to 3.35 (p < 0.008). Only 34% of cases registered from 2001 to 2007 were examined. CONCLUSIONS: The shift observed in rates of detection overall, and in subjects younger than 15 years, during the postintegration period indicate an increased level of health care access. The fall in number of patients abandoning treatment indicates greater adherence to treatment. However, previous shortcomings in key actions, pivotal to attaining the outcomes and impact envisaged for the program, persisted in the postintegration period.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The transposition of the São Francisco River is considered one of the greatest engineering works in Brazil of all time since it will cross an extensive agricultural region of continental dimensions, involving environmental impacts, water, soil, irrigation, water payment and other multidisciplinary themes. Taking into account its importance, this subject was incorporated into a discipline of UFSCar (Federal University of São Carlos - Brazil) named "Pollution and Environmental Impacts". It was noted strong reaction against the project, even before the presentation. To allow a critical analysis, the first objective was to compile the main technical data and environmental impacts. The second objective was to detect the three most important aspects that cause reaction, concluding for the following reasons: assumption that the volume of water to be transferred was much greater than it actually is proposed in the project; lack of knowledge about similar project already done in Brazil; the idea that the artificial canal to be built was much broader than that proposed by the project. The participants' opinion about "volume to be transferred" was raised quantitatively four times: 2-undergraduate students; 1-graduate; 1-outside community. The average resulted 14 times larger than that proposed in the project, significant according to t-test. It was concluded that the reaction to water transfer project is due in part to the ignorance combined with a preconceived idea that tend to overestimate the magnitude of environmental impacts.