948 resultados para data integration
Resumo:
This study used a multi-analytical approach based on traditional microbiological methods for cultivation and isolation of heterotrophic bacteria in the laboratory associated with the molecular identification of the isolates and physicochemical analysis of environmental samples. The model chosen for data integration was supported by knowledge from computational neuroscience, and composed by three modules: (i) microbiological parameters, contemplating taxonomic data obtained from the partial sequencing of the 16S rRNA gene from 80 colonies of heterotrophic bacteria isolated by plating method in PCA media. For bacterial colonies isolation were used water samples from Atibaia and Jaguarí rivers collected at the site of water captation for use in effluent treatment, upstream from the entrance of treated effluent from the Paulínia refinery (REPLAN/Petrobras) located in the Paulínia-SP municipality, from the output of the biological treatment plant with stabilization pond and from the raw refinery wastewater; (ii) chemical parameters, ending measures of dissolved oxygen (DO), chemical oxygen demand (COD), biochemical oxygen demand (BOD), chloride, acidity CaCO3, alkalinity, ammonia, nitrite, nitrate, dissolved ions, sulfides, oils and greases; and (iii) physical parameters, comprising the pH determination, conductivity, temperature, transparency, settleable solids, suspended and soluble solids, volatile material, remaining fixing material (RFM), apparent color and turbidity. The results revealed interesting theoretical relationships involving two families of bacteria (Carnobacteriaceae and Aeromonadaceae). Carnobacteriaceae revealed positive theoretical relationships with COD, BOD, nitrate, chloride, temperature, conductivity and apparent color and negative theoretical relationships with the OD. Positive theoretical relationships were shown between Aeromonadaceae and OD and nitrate, while this bacterial family showed negative theoretical...
Resumo:
Pós-graduação em Geografia - IGCE
Resumo:
This study used a multi-analytical approach based on traditional microbiological methods for cultivation and isolation of heterotrophic bacteria in the laboratory associated with the molecular identification of the isolates and physicochemical analysis of environmental samples. The model chosen for data integration was supported by knowledge from computational neuroscience, and composed by three modules: (i) microbiological parameters, contemplating taxonomic data obtained from the partial sequencing of the 16S rRNA gene from 80 colonies of heterotrophic bacteria isolated by plating method in PCA media. For bacterial colonies isolation were used water samples from Atibaia and Jaguarí rivers collected at the site of water captation for use in effluent treatment, upstream from the entrance of treated effluent from the Paulínia refinery (REPLAN/Petrobras) located in the Paulínia-SP municipality, from the output of the biological treatment plant with stabilization pond and from the raw refinery wastewater; (ii) chemical parameters, ending measures of dissolved oxygen (DO), chemical oxygen demand (COD), biochemical oxygen demand (BOD), chloride, acidity CaCO3, alkalinity, ammonia, nitrite, nitrate, dissolved ions, sulfides, oils and greases; and (iii) physical parameters, comprising the pH determination, conductivity, temperature, transparency, settleable solids, suspended and soluble solids, volatile material, remaining fixing material (RFM), apparent color and turbidity. The results revealed interesting theoretical relationships involving two families of bacteria (Carnobacteriaceae and Aeromonadaceae). Carnobacteriaceae revealed positive theoretical relationships with COD, BOD, nitrate, chloride, temperature, conductivity and apparent color and negative theoretical relationships with the OD. Positive theoretical relationships were shown between Aeromonadaceae and OD and nitrate, while this bacterial family showed negative theoretical...
Resumo:
Pós-graduação em Geografia - IGCE
Resumo:
Sediment quality from Paranagua Estuarine System (PES), a highly important port and ecological zone, was evaluated by assessing three lines of evidence: (1) sediment physical-chemical characteristics; (2) sediment toxicity (elutriates, sediment-water interface, and whole sediment); and (3) benthic community structure. Results revealed a gradient of increasing degradation of sediments (i.e. higher concentrations of trace metals, higher toxicity, and impoverishment of benthic community structure) towards inner PES. Data integration by principal component analysis (PCA) showed positive correlation between some contaminants (mainly As, Cr, Ni, and Pb) and toxicity in samples collected from stations located in upper estuary and one station placed away from contamination sources. Benthic community structure seems to be affected by both pollution and natural fine characteristics of the sediments, which reinforces the importance of a weight-of-evidence approach to evaluate sediments of PES. (C) 2008 Elsevier Inc. All rights reserved.
Resumo:
XML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being commonly modeled as Ordered Labeled Trees. Yet, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an integrated and fine-grained comparison framework to deal with both structural and semantic similarities in XML documents (detecting the occurrences and repetitions of structurally and semantically similar sub-trees), and to allow the end-user to adjust the comparison process according to her requirements. Our framework consists of four main modules for (i) discovering the structural commonalities between sub-trees, (ii) identifying sub-tree semantic resemblances, (iii) computing tree-based edit operations costs, and (iv) computing tree edit distance. Experimental results demonstrate higher comparison accuracy with respect to alternative methods, while timing experiments reflect the impact of semantic similarity on overall system performance.
Resumo:
Traceability is a concept that arose from the need for monitoring of production processes, this concept is usually used in sectors related to food production or activities involving some kind of direct risk to people. Agribusiness in the cotton industry does not have a comprehensive infrastructure for all stages of the processes involved in production. Map and define the data to enable traceability of products is synonymous to delegate responsibilities for all involved in the production, the collection of aggregate data on cotton production is done in stages and specific pre-defined since the choice of the variety through the processing, the scope of this article specifically addresses the production of lint cotton. The paper presents a proposal based on service oriented architecture (SOA) for data integration processes in the cotton industry, this proposal provide support for the implementation of platform independent solutions.
Resumo:
Background: Ontologies have increasingly been used in the biomedical domain, which has prompted the emergence of different initiatives to facilitate their development and integration. The Open Biological and Biomedical Ontologies (OBO) Foundry consortium provides a repository of life-science ontologies, which are developed according to a set of shared principles. This consortium has developed an ontology called OBO Relation Ontology aiming at standardizing the different types of biological entity classes and associated relationships. Since ontologies are primarily intended to be used by humans, the use of graphical notations for ontology development facilitates the capture, comprehension and communication of knowledge between its users. However, OBO Foundry ontologies are captured and represented basically using text-based notations. The Unified Modeling Language (UML) provides a standard and widely-used graphical notation for modeling computer systems. UML provides a well-defined set of modeling elements, which can be extended using a built-in extension mechanism named Profile. Thus, this work aims at developing a UML profile for the OBO Relation Ontology to provide a domain-specific set of modeling elements that can be used to create standard UML-based ontologies in the biomedical domain. Results: We have studied the OBO Relation Ontology, the UML metamodel and the UML profiling mechanism. Based on these studies, we have proposed an extension to the UML metamodel in conformance with the OBO Relation Ontology and we have defined a profile that implements the extended metamodel. Finally, we have applied the proposed UML profile in the development of a number of fragments from different ontologies. Particularly, we have considered the Gene Ontology (GO), the PRotein Ontology (PRO) and the Xenopus Anatomy and Development Ontology (XAO). Conclusions: The use of an established and well-known graphical language in the development of biomedical ontologies provides a more intuitive form of capturing and representing knowledge than using only text-based notations. The use of the profile requires the domain expert to reason about the underlying semantics of the concepts and relationships being modeled, which helps preventing the introduction of inconsistencies in an ontology under development and facilitates the identification and correction of errors in an already defined ontology.
Resumo:
The University of São Paulo has been experiencing the increase in contents in electronic and digital formats, distributed by different suppliers and hosted remotely or in clouds, and is faced with the also increasing difficulties related to facilitating access to this digital collection by its users besides coexisting with the traditional world of physical collections. A possible solution was identified in the new generation of systems called Web Scale Discovery, which allow better management, data integration and agility of search. Aiming to identify if and how such a system would meet the USP demand and expectation and, in case it does, to identify what the analysis criteria of such a tool would be, an analytical study with an essentially documental base was structured, as from a revision of the literature and from data available in official websites and of libraries using this kind of resources. The conceptual base of the study was defined after the identification of software assessment methods already available, generating a standard with 40 analysis criteria, from details on the unique access interface to information contents, web 2.0 characteristics, intuitive interface, facet navigation, among others. The details of the studies conducted into four of the major systems currently available in this software category are presented, providing subsidies for the decision-making of other libraries interested in such systems.
Resumo:
OBJECTIVE Blood-borne biomarkers reflecting atherosclerotic plaque burden have great potential to improve clinical management of atherosclerotic coronary artery disease and acute coronary syndrome (ACS). APPROACH AND RESULTS Using data integration from gene expression profiling of coronary thrombi versus peripheral blood mononuclear cells and proteomic analysis of atherosclerotic plaque-derived secretomes versus healthy tissue secretomes, we identified fatty acid-binding protein 4 (FABP4) as a biomarker candidate for coronary artery disease. Its diagnostic and prognostic performance was validated in 3 different clinical settings: (1) in a cross-sectional cohort of patients with stable coronary artery disease, ACS, and healthy individuals (n=820), (2) in a nested case-control cohort of patients with ACS with 30-day follow-up (n=200), and (3) in a population-based nested case-control cohort of asymptomatic individuals with 5-year follow-up (n=414). Circulating FABP4 was marginally higher in patients with ST-segment-elevation myocardial infarction (24.9 ng/mL) compared with controls (23.4 ng/mL; P=0.01). However, elevated FABP4 was associated with adverse secondary cerebrovascular or cardiovascular events during 30-day follow-up after index ACS, independent of age, sex, renal function, and body mass index (odds ratio, 1.7; 95% confidence interval, 1.1-2.5; P=0.02). Circulating FABP4 predicted adverse events with similar prognostic performance as the GRACE in-hospital risk score or N-terminal pro-brain natriuretic peptide. Finally, no significant difference between baseline FABP4 was found in asymptomatic individuals with or without coronary events during 5-year follow-up. CONCLUSIONS Circulating FABP4 may prove useful as a prognostic biomarker in risk stratification of patients with ACS.
Resumo:
It is well accepted that tumorigenesis is a multi-step procedure involving aberrant functioning of genes regulating cell proliferation, differentiation, apoptosis, genome stability, angiogenesis and motility. To obtain a full understanding of tumorigenesis, it is necessary to collect information on all aspects of cell activity. Recent advances in high throughput technologies allow biologists to generate massive amounts of data, more than might have been imagined decades ago. These advances have made it possible to launch comprehensive projects such as (TCGA) and (ICGC) which systematically characterize the molecular fingerprints of cancer cells using gene expression, methylation, copy number, microRNA and SNP microarrays as well as next generation sequencing assays interrogating somatic mutation, insertion, deletion, translocation and structural rearrangements. Given the massive amount of data, a major challenge is to integrate information from multiple sources and formulate testable hypotheses. This thesis focuses on developing methodologies for integrative analyses of genomic assays profiled on the same set of samples. We have developed several novel methods for integrative biomarker identification and cancer classification. We introduce a regression-based approach to identify biomarkers predictive to therapy response or survival by integrating multiple assays including gene expression, methylation and copy number data through penalized regression. To identify key cancer-specific genes accounting for multiple mechanisms of regulation, we have developed the integIRTy software that provides robust and reliable inferences about gene alteration by automatically adjusting for sample heterogeneity as well as technical artifacts using Item Response Theory. To cope with the increasing need for accurate cancer diagnosis and individualized therapy, we have developed a robust and powerful algorithm called SIBER to systematically identify bimodally expressed genes using next generation RNAseq data. We have shown that prediction models built from these bimodal genes have the same accuracy as models built from all genes. Further, prediction models with dichotomized gene expression measurements based on their bimodal shapes still perform well. The effectiveness of outcome prediction using discretized signals paves the road for more accurate and interpretable cancer classification by integrating signals from multiple sources.
Resumo:
Digital atlases of animal development provide a quantitative description of morphogenesis, opening the path toward processes modeling. Prototypic atlases offer a data integration framework where to gather information from cohorts of individuals with phenotypic variability. Relevant information for further theoretical reconstruction includes measurements in time and space for cell behaviors and gene expression. The latter as well as data integration in a prototypic model, rely on image processing strategies. Developing the tools to integrate and analyze biological multidimensional data are highly relevant for assessing chemical toxicity or performing drugs preclinical testing. This article surveys some of the most prominent efforts to assemble these prototypes, categorizes them according to salient criteria and discusses the key questions in the field and the future challenges toward the reconstruction of multiscale dynamics in model organisms.
Resumo:
INFOBIOMED is an European Network of Excellence (NoE) funded by the Information Society Directorate-General of the European Commission (EC). A consortium of European organizations from ten different countries is involved within the network. Four pilots, all related to linking clinical and genomic information, are being carried out. From an informatics perspective, various challenges, related to data integration and mining, are included.