27 resultados para literature-data integration
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Resumo:
Background: A current challenge in gene annotation is to define the gene function in the context of the network of relationships instead of using single genes. The inference of gene networks (GNs) has emerged as an approach to better understand the biology of the system and to study how several components of this network interact with each other and keep their functions stable. However, in general there is no sufficient data to accurately recover the GNs from their expression levels leading to the curse of dimensionality, in which the number of variables is higher than samples. One way to mitigate this problem is to integrate biological data instead of using only the expression profiles in the inference process. Nowadays, the use of several biological information in inference methods had a significant increase in order to better recover the connections between genes and reduce the false positives. What makes this strategy so interesting is the possibility of confirming the known connections through the included biological data, and the possibility of discovering new relationships between genes when observed the expression data. Although several works in data integration have increased the performance of the network inference methods, the real contribution of adding each type of biological information in the obtained improvement is not clear. Methods: We propose a methodology to include biological information into an inference algorithm in order to assess its prediction gain by using biological information and expression profile together. We also evaluated and compared the gain of adding four types of biological information: (a) protein-protein interaction, (b) Rosetta stone fusion proteins, (c) KEGG and (d) KEGG+GO. Results and conclusions: This work presents a first comparison of the gain in the use of prior biological information in the inference of GNs by considering the eukaryote (P. falciparum) organism. Our results indicates that information based on direct interaction can produce a higher improvement in the gain than data about a less specific relationship as GO or KEGG. Also, as expected, the results show that the use of biological information is a very important approach for the improvement of the inference. We also compared the gain in the inference of the global network and only the hubs. The results indicates that the use of biological information can improve the identification of the most connected proteins.
Resumo:
Background The use of the knowledge produced by sciences to promote human health is the main goal of translational medicine. To make it feasible we need computational methods to handle the large amount of information that arises from bench to bedside and to deal with its heterogeneity. A computational challenge that must be faced is to promote the integration of clinical, socio-demographic and biological data. In this effort, ontologies play an essential role as a powerful artifact for knowledge representation. Chado is a modular ontology-oriented database model that gained popularity due to its robustness and flexibility as a generic platform to store biological data; however it lacks supporting representation of clinical and socio-demographic information. Results We have implemented an extension of Chado – the Clinical Module - to allow the representation of this kind of information. Our approach consists of a framework for data integration through the use of a common reference ontology. The design of this framework has four levels: data level, to store the data; semantic level, to integrate and standardize the data by the use of ontologies; application level, to manage clinical databases, ontologies and data integration process; and web interface level, to allow interaction between the user and the system. The clinical module was built based on the Entity-Attribute-Value (EAV) model. We also proposed a methodology to migrate data from legacy clinical databases to the integrative framework. A Chado instance was initialized using a relational database management system. The Clinical Module was implemented and the framework was loaded using data from a factual clinical research database. Clinical and demographic data as well as biomaterial data were obtained from patients with tumors of head and neck. We implemented the IPTrans tool that is a complete environment for data migration, which comprises: the construction of a model to describe the legacy clinical data, based on an ontology; the Extraction, Transformation and Load (ETL) process to extract the data from the source clinical database and load it in the Clinical Module of Chado; the development of a web tool and a Bridge Layer to adapt the web tool to Chado, as well as other applications. Conclusions Open-source computational solutions currently available for translational science does not have a model to represent biomolecular information and also are not integrated with the existing bioinformatics tools. On the other hand, existing genomic data models do not represent clinical patient data. A framework was developed to support translational research by integrating biomolecular information coming from different “omics” technologies with patient’s clinical and socio-demographic data. This framework should present some features: flexibility, compression and robustness. The experiments accomplished from a use case demonstrated that the proposed system meets requirements of flexibility and robustness, leading to the desired integration. The Clinical Module can be accessed in http://dcm.ffclrp.usp.br/caib/pg=iptrans webcite.
Resumo:
XML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being commonly modeled as Ordered Labeled Trees. Yet, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an integrated and fine-grained comparison framework to deal with both structural and semantic similarities in XML documents (detecting the occurrences and repetitions of structurally and semantically similar sub-trees), and to allow the end-user to adjust the comparison process according to her requirements. Our framework consists of four main modules for (i) discovering the structural commonalities between sub-trees, (ii) identifying sub-tree semantic resemblances, (iii) computing tree-based edit operations costs, and (iv) computing tree edit distance. Experimental results demonstrate higher comparison accuracy with respect to alternative methods, while timing experiments reflect the impact of semantic similarity on overall system performance.
Resumo:
The University of São Paulo has been experiencing the increase in contents in electronic and digital formats, distributed by different suppliers and hosted remotely or in clouds, and is faced with the also increasing difficulties related to facilitating access to this digital collection by its users besides coexisting with the traditional world of physical collections. A possible solution was identified in the new generation of systems called Web Scale Discovery, which allow better management, data integration and agility of search. Aiming to identify if and how such a system would meet the USP demand and expectation and, in case it does, to identify what the analysis criteria of such a tool would be, an analytical study with an essentially documental base was structured, as from a revision of the literature and from data available in official websites and of libraries using this kind of resources. The conceptual base of the study was defined after the identification of software assessment methods already available, generating a standard with 40 analysis criteria, from details on the unique access interface to information contents, web 2.0 characteristics, intuitive interface, facet navigation, among others. The details of the studies conducted into four of the major systems currently available in this software category are presented, providing subsidies for the decision-making of other libraries interested in such systems.
Resumo:
Sediment quality from Paranagua Estuarine System (PES), a highly important port and ecological zone, was evaluated by assessing three lines of evidence: (1) sediment physical-chemical characteristics; (2) sediment toxicity (elutriates, sediment-water interface, and whole sediment); and (3) benthic community structure. Results revealed a gradient of increasing degradation of sediments (i.e. higher concentrations of trace metals, higher toxicity, and impoverishment of benthic community structure) towards inner PES. Data integration by principal component analysis (PCA) showed positive correlation between some contaminants (mainly As, Cr, Ni, and Pb) and toxicity in samples collected from stations located in upper estuary and one station placed away from contamination sources. Benthic community structure seems to be affected by both pollution and natural fine characteristics of the sediments, which reinforces the importance of a weight-of-evidence approach to evaluate sediments of PES. (C) 2008 Elsevier Inc. All rights reserved.
Resumo:
Life-history information constitutes the raw data for building population models used in species conservation. We provide life-history data for the endangered Santa Catalina Island Rattlesnake, Crotalus catalinensis. We use data from 277 observations of C. catalinensis made between 2002 and 2011 on the island. Mean snout-vent length (SVL) of adult C. catalinensis was 643 mm for males and 631 mm for females; the difference was not significant. The degree of sexual size dimorphism (SSD; using SVL) was -0.02. However, sexes were dimorphic in total length ( SVL + tail length), relative tail length, and stoutness. Juvenile recruitment occurs during late-summer. In their first year of life, juveniles seem to grow at a rate of about 1.7 cm/mo. Females seem to become mature around 570 mm SVL, probably in the year when they become 2 y old. Scattered literature data corroborates the time of juvenile recruitment described herein. Growth in C. catalinensis seems to be slower than that of C. ruber, its sister taxa, but similar to other rattlesnakes.
Resumo:
The dichloromethane extract from taproots of Hortia oreadica afforded six limonoids, these are 9,11-dehydro-12 alpha-acetoxyhortiolide A, hortiolide C, 11 alpha-acetoxy-15-deoxy-6-hydroxyhortiolide C, hortiolide D, hortiolide E, 12 beta-hydroxyhortiolide E, in addition to the known limonoid, guyanin. The dichloromethane extract from stems of H. oreadica also afforded two limonoids 9,11-dehydro12 alpha-hydroxyhortiolide A and 6-hydroxyhortiolide C. As a result of this study and literature data, Hortia has been shown to produce highly specialized limonoids that are similar to those from the Flindersia (Flindersioideae). The taxonomy of Hortia has been debatable, with most authors placing it in the Toddalioideae. Considering the complexity of the isolated limonoids, Hortia does not show any close affinity to the genera of Toddalioideae. That is, the limonoids appear to be of little value in resolving the taxonomic situation of Hortia. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
The present study reports on a survey of the gelatinous zooplankton fauna (Cnidaria, Ctenophora and Thaliacea) from the proposed Baia da Babitonga marine protected area (southern Brazil; similar to 26 degrees S), based on collections from multiple sites over different seasons and from published literature. In order to sample both small and large gelatinous animals, plankton hauls (n = 255) and fishing trawls (n = 126) were employed. More than 20,000 organisms were studied, which, including literature data, totaled 48 species: one cubomedusa, three scyphomedusae, four siphonophores, 36 hydromedusae, two ctenophores, and two thaliaceans. Among these, the hydromedusae Cnidostoma fallax Vanhoffen and Helgicirrha sp. are recorded for the first time from the southwestern Atlantic coast and Paulinum sp. and Protiara sp. are recorded for the first time from the South Atlantic. A description of young stages of the hydromedusa Gossea brachymera Bigelow is presented and shows that Octobulbacea montehermosensis Zamponi is a junior synonym of the former. Although comprehensive local assessment of diverse taxonomic groups is still lacking, the high diversity observed herein underscores the importance of Ba a da Babitonga as a high priority site for conservation of regional marine biodiversity.
Resumo:
Some aspects of the biology, reproduction and morphology of Gundlachia ticaga (MARCUS & MARCUS 1962) were studied and compared with literature data. A sample of similar to 100 specimens, collected in a darn in Ribeirao Preto, were maintained in Petri dishes at the laboratory during a period no longer than a month. The oviposition, number of capsules, and incubation time of eggs were registered for each individual. A morphologic study was carried out in both, alive and fixed specimens. Although some morphological discrepancies were identified, the majority of data confirms what was previously described for the species. Regarding the biology, however, the results were somewhat different from those previously reported. The incubation time of the eggs varied from 10 to 13 days, and the number of embryos in the interior of the ovigerous capsules varied from 1 to 4. In addition, each individual laid 9 eggs during the first month of the reproductive period. The formation of septa occurred in some animals and intermediary phases are figured.
Resumo:
One new meroterpene, 4-[(2'E)-3',7'-dimethylocta-2',6'-dien-1'-yl]-5-methyl-2-(3 ''-methylbut-2 ''-enyl)-benzene-1,3-diol, together with eight known compounds, was isolated from the MeOH extract from the leaves of Peperomia oreophila Hesch. The prenylated phenol was also isolated as main compound from the CH2Cl2:MeOH extract from leaves of Peperomia arifolia Miq. The structures of the substances were established on the basis of the spectral evidences and supported by literature data.
Resumo:
The time is ripe for a comprehensive mission to explore and document Earth's species. This calls for a campaign to educate and inspire the next generation of professional and citizen species explorers, investments in cyber-infrastructure and collections to meet the unique needs of the producers and consumers of taxonomic information, and the formation and coordination of a multi-institutional, international, transdisciplinary community of researchers, scholars and engineers with the shared objective of creating a comprehensive inventory of species and detailed map of the biosphere. We conclude that an ambitious goal to describe 10 million species in less than 50 years is attainable based on the strength of 250 years of progress, worldwide collections, existing experts, technological innovation and collaborative teamwork. Existing digitization projects are overcoming obstacles of the past, facilitating collaboration and mobilizing literature, data, images and specimens through cyber technologies. Charting the biosphere is enormously complex, yet necessary expertise can be found through partnerships with engineers, information scientists, sociologists, ecologists, climate scientists, conservation biologists, industrial project managers and taxon specialists, from agrostologists to zoophytologists. Benefits to society of the proposed mission would be profound, immediate and enduring, from detection of early responses of flora and fauna to climate change to opening access to evolutionary designs for solutions to countless practical problems. The impacts on the biodiversity, environmental and evolutionary sciences would be transformative, from ecosystem models calibrated in detail to comprehensive understanding of the origin and evolution of life over its 3.8 billion year history. The resultant cyber-enabled taxonomy, or cybertaxonomy, would open access to biodiversity data to developing nations, assure access to reliable data about species, and change how scientists and citizens alike access, use and think about biological diversity information.
Resumo:
In Brazil, the increase in the reported cases of degenerative diseases of articular cartilage is 20% per year, meaning that 200,000 Brazilians develop degenerative joint diseases every year, which have a negative impact on bone mass. This study shows evidence that hormone production of sexual steroids (estrogens, progestogens, and androgens) have an influence on cartilage quality, as well as on bone mass. Therefore, this review aimed to analyze literature data on the molecular and genic action of sexual steroids on hyaline cartilage and bone physiology, as well as osteoarthritis interference on the quality of these structures.
Resumo:
OBJETIVO: Analisar a associação entre indicadores de exposição à poluição por tráfego veicular e mortalidade por doenças do aparelho circulatório em homens adultos. MÉTODOS: Foram analisadas informações sobre vias e volume de tráfego no ano de 2007 fornecidas pela companhia de engenharia de tráfego local. Mortalidade por doenças do aparelho circulatório no ano de 2005 entre homens ≥ 40 anos foram obtidas do registro de mortalidade do Programa de Aprimoramento de Informações de Mortalidade do Município de São Paulo, SP. Dados socioeconômicos do Censo 2000 e informações sobre a localização dos serviços de saúde também foram coletados. A exposição foi avaliada pela densidade de vias e volume de tráfego para cada distrito administrativo. Foi calculada regressão (α = 5%) entre esses indicadores de exposição e as taxas de mortalidade padronizadas, ajustando os modelos para variáveis socioeconômicas, número de serviços de saúde nos distritos e autocorrelação espacial. RESULTADOS: A correlação entre densidade de vias e volume de tráfego foi modesta (r² = 0,28). Os distritos do centro apresentaram os maiores valores de densidade de vias. O modelo de regressão espacial de densidade de vias indicou associação com mortalidade por doenças do aparelho circulatório (p = 0,017). Não se observou associação no modelo de volume de tráfego. Em ambos os modelos – vias e volume de tráfego (veículos leves/pesados) – a variável socioeconômica foi estatisticamente signifi cante. CONCLUSÕES: A associação entre mortalidade por doenças do aparelho circulatório e densidade de vias converge com a literatura e encoraja a realização de mais estudos epidemiológicos em nível individual e com métodos mais acurados de avaliação da exposição.
Resumo:
With the introduction of fluoride as the main anticaries agent used in preventive dentistry, and perhaps an increase in fluoride in our food chain, dental fluorosis has become an increasing world-wide problem. Visible signs of fluorosis begin to become obvious on the enamel surface as opacities, implying some porosity in the tissue. The mechanisms that conduct the formation of fluorotic enamel are unknown, but should involve modifications in the basic physical-chemistry reactions of demineralization and remineralisation of the enamel of the teeth, which is the same reaction of formation of the enamel's hydroxyapatite (HAp) in the maturation phase. The increase of the amount of fluoride inside of the apatite will result in gradual increase of the lattice parameters. The aim of this work is to characterize the healthy and fluorotic enamel in human tooth using Synchrotron X-ray diffraction. All the scattering profile measurements were carried out at the X-ray diffraction beamline (XRD1) at the Brazilian Synchrotron Light Laboratory-LNLS, Campinas, Brazil. X-ray diffraction experiments were performed both in powder samples and polished surfaces. The powder samples were analyzed to obtain the characterization of a typical healthy enamel pattern. The polished surfaces were analyzed in specific areas that have been identified as fluorotic ones. X-ray diffraction data were obtained for all samples and these data were compared with the control samples and also with the literature data. (c) 2012 Elsevier Ltd. All rights reserved.
Resumo:
To estimate causal relationships, time series econometricians must be aware of spurious correlation, a problem first mentioned by Yule (1926). To deal with this problem, one can work either with differenced series or multivariate models: VAR (VEC or VECM) models. These models usually include at least one cointegration relation. Although the Bayesian literature on VAR/VEC is quite advanced, Bauwens et al. (1999) highlighted that "the topic of selecting the cointegrating rank has not yet given very useful and convincing results". The present article applies the Full Bayesian Significance Test (FBST), especially designed to deal with sharp hypotheses, to cointegration rank selection tests in VECM time series models. It shows the FBST implementation using both simulated and available (in the literature) data sets. As illustration, standard non informative priors are used.