980 resultados para Data-bank
Resumo:
The principal topic of this work is the application of data mining techniques, in particular of machine learning, to the discovery of knowledge in a protein database. In the first chapter a general background is presented. Namely, in section 1.1 we overview the methodology of a Data Mining project and its main algorithms. In section 1.2 an introduction to the proteins and its supporting file formats is outlined. This chapter is concluded with section 1.3 which defines that main problem we pretend to address with this work: determine if an amino acid is exposed or buried in a protein, in a discrete way (i.e.: not continuous), for five exposition levels: 2%, 10%, 20%, 25% and 30%. In the second chapter, following closely the CRISP-DM methodology, whole the process of construction the database that supported this work is presented. Namely, it is described the process of loading data from the Protein Data Bank, DSSP and SCOP. Then an initial data exploration is performed and a simple prediction model (baseline) of the relative solvent accessibility of an amino acid is introduced. It is also introduced the Data Mining Table Creator, a program developed to produce the data mining tables required for this problem. In the third chapter the results obtained are analyzed with statistical significance tests. Initially the several used classifiers (Neural Networks, C5.0, CART and Chaid) are compared and it is concluded that C5.0 is the most suitable for the problem at stake. It is also compared the influence of parameters like the amino acid information level, the amino acid window size and the SCOP class type in the accuracy of the predictive models. The fourth chapter starts with a brief revision of the literature about amino acid relative solvent accessibility. Then, we overview the main results achieved and finally discuss about possible future work. The fifth and last chapter consists of appendices. Appendix A has the schema of the database that supported this thesis. Appendix B has a set of tables with additional information. Appendix C describes the software provided in the DVD accompanying this thesis that allows the reconstruction of the present work.
Resumo:
Poster at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
Includes bibliography
Resumo:
The Protein Data Bank (PDB; http://www.rcsb.org/pdb/) is the single worldwide archive of structural data of biological macromolecules. This paper describes the data uniformity project that is underway to address the inconsistency in PDB data.
Resumo:
Currently there are an overwhelming number of scientific publications in Life Sciences, especially in Genetics and Biotechnology. This huge amount of information is structured in corporate Data Warehouses (DW) or in Biological Databases (e.g. UniProt, RCSB Protein Data Bank, CEREALAB or GenBank), whose main drawback is its cost of updating that makes it obsolete easily. However, these Databases are the main tool for enterprises when they want to update their internal information, for example when a plant breeder enterprise needs to enrich its genetic information (internal structured Database) with recently discovered genes related to specific phenotypic traits (external unstructured data) in order to choose the desired parentals for breeding programs. In this paper, we propose to complement the internal information with external data from the Web using Question Answering (QA) techniques. We go a step further by providing a complete framework for integrating unstructured and structured information by combining traditional Databases and DW architectures with QA systems. The great advantage of our framework is that decision makers can compare instantaneously internal data with external data from competitors, thereby allowing taking quick strategic decisions based on richer data.
Resumo:
Universidade Estadual de Campinas. Faculdade de Educação Física
Resumo:
In the present research, we studied wines from three different south Brazilian winemaking regions with the purpose of differentiating them by geographical origin of the grapes. Brazil`s wide territory and climate diversity allow grape cultivation and winemaking in many regions of different and unique characteristics. The wine grape cultivation for winemaking concentrates in the South Region, mainly in the Serra GaA(0)cha, the mountain area of the state of Rio Grande do Sul, which is responsible for 90% of the domestic wine production. However, in recent years, two new production regions have developed: the Campanha, the plains to the south and the Serra do Sudeste, the hills to the southeast of the state. Analysis of isotopic ratios of (18)O/(16)O of wine water, (13)C/(12)C of ethanol, and of minerals were used to characterize wines from different regions. The isotope analysis of delta(18)O of wine water and minerals Mg and Rb were the most efficient to differentiate the regions. By using isotope and mineral analysis, and discrimination analysis, it was possible to classify the wines from south Brazil.
Resumo:
Biomass burning is an important source of atmospheric Particulate Matter (PM) in Brazil: the burning of forests in the northwest and of sugar cane plantations in the southeast are important examples. The objective of this work is the measurement of the PM emission profile of burning of sugar cane and other characteristic vegetative burning in the region of Sao Carlos-SP/Brazil. Samples of PM(10) and PM(2.5) were collected in different conditions, including small laboratory controlled burnings and real ones. The samples were analysed by X-Ray Fluorescence (XRF) and 14 chemical elements quantified. t-Student tests were performed to compare the obtained profiles, using as a reference a vegetative burn profile taken from the USEPA data bank SPECIATE. All measured profiles presented significant amounts of Cl and K, which are confirmed as tracers of sugar cane foliage burning.
Resumo:
Allergies represent a significant medical and industrial problem. Molecular and clinical data on allergens are growing exponentially and in this article we have reviewed nine specialized allergen databases and identified data sources related to protein allergens contained in general purpose molecular databases. An analysis of allergens contained in public databases indicates a high level of redundancy of entries and a relatively low coverage of allergens by individual databases. From this analysis we identify current database needs for allergy research and, in particular, highlight the need for a centralized reference allergen database.
Resumo:
CysView is a web-based application tool that identifies and classifies proteins according to their disulfide connectivity patterns. It accepts a dataset of annotated protein sequences in various formats and returns a graphical representation of cysteine pairing patterns. CysView displays cysteine patterns for those records in the data with disulfide annotations. It allows the viewing of records grouped by connectivity patterns. CysView's utility as an analysis tool was demonstrated by the rapid and correct classification of scorpion toxin entries from GenPept on the basis of their disulfide pairing patterns. It has proved useful for rapid detection of irrelevant and partial records, or those with incomplete annotations. CysView can be used to support distant homology between proteins. CysView is publicly available at http://research.i2r.a-star.edu.sg/CysView/.
Resumo:
A polyclonal antibody (C4), raised against the head domain of chicken myosin Va, reacted strongly towards a 65 kDa polypeptide (p65) on Western blots of extracts from squid optic lobes but did not recognize the heavy chain of squid myosin V. This peptide was not recognized by other myosin Va antibodies, nor by an antibody specific for squid myosin V. In an attempt to identify it, p65 was purified from optic lobes of Loligo plei by cationic exchange and reverse phase chromatography. Several peptide sequences were obtained by mass spectroscopy from p65 cut from sodium dodecyl sulphate polyacrylamide gel electrophoresis (SDS-PAGE) gels. BLAST analysis and partial matching with expressed sequence tags (ESTs) from a Loligo pealei data bank indicated that p65 contains consensus signatures for the heterogeneous nuclear ribonucleoprotein (hnRNP) A/B family of RNA-binding proteins. Centrifugation of post mitochondrial extracts from optic lobes on sucrose gradients after treatment with RNase gave biochemical evidence that p65 associates with cytoplasmic RNP complexes in an RNA-dependent manner. Immunohistochemistry and immunofluorescence studies using the C4 antibody showed partial co-labeling with an antibody against squid synaptotagmin in bands within the outer plexiform layer of the optic lobes and at the presynaptic zone of the stellate ganglion. Also, punctate labeling by the C4 antibody was observed within isolated optic lobe synaptosomes. The data indicate that p65 is a novel RNA-binding protein located to the presynaptic terminal within squid neurons and may have a role in synaptic localization of RNA and its translation or processing. (C) 2010 IBRO. Published by Elsevier Ltd. All rights reserved.
Resumo:
The Fundação Getulio Vargas, São Paulo, Public Management and Citizenship Program was set up in 1996 with Ford Foundation support to identify and disseminate Brazilian subnational government initiatives in service provision that have a direct effect on citizenship. Already, the program has 2,500 different experiences in its data bank, the results of four annual cycles. The article draws some initial conclusions about the possibilities of a rights-based approach to public management and about the engagement of other agencies and civil society organizations.