884 resultados para Incomplete relational database
Resumo:
XML has become an important medium for data exchange, and is frequently used as an interface to - i.e. a view of - a relational database. Although lots of work have been done on querying relational databases through XML views, the problem of updating relational databases through XML views has not received much attention. In this work, we give the rst steps towards solving this problem. Using query trees to capture the notions of selection, projection, nesting, grouping, and heterogeneous sets found throughout most XML query languages, we show how XML views expressed using query trees can be mapped to a set of corresponding relational views. Thus, we transform the problem of updating relational databases through XML views into a classical problem of updating relational databases through relational views. We then show how updates on the XML view are mapped to updates on the corresponding relational views. Existing work on updating relational views can then be leveraged to determine whether or not the relational views are updatable with respect to the relational updates, and if so, to translate the updates to the underlying relational database. Since query trees are a formal characterization of view de nition queries, they are not well suited for end-users. We then investigate how a subset of XQuery can be used as a top level language, and show how query trees can be used as an intermediate representation of view de nitions expressed in this subset.
Resumo:
Genome sequencing efforts are providing us with complete genetic blueprints for hundreds of organisms. We are now faced with assigning, understanding, and modifying the functions of proteins encoded by these genomes. DBMODELING is a relational database of annotated comparative protein structure models and their metabolic pathway characterization, when identified. This procedure was applied to complete genomes such as Mycobacteritum tuberculosis and Xylella fastidiosa. The main interest in the study of metabolic pathways is that some of these pathways are not present in humans, which makes them selective targets for drug design, decreasing the impact of drugs in humans. In the database, there are currently 1116 proteins from two genomes. It can be accessed by any researcher at http://www.biocristalografia.df.ibilce.unesp.br/tools/. This project confirms that homology modeling is a useful tool in structural bioinformatics and that it can be very valuable in annotating genome sequence information, contributing to structural and functional genomics, and analyzing protein-ligand docking.
Resumo:
Cancer is the second main cause of death in Brazil, and according to statistics disclosed by INCA - National Cancer Institute 466,730 new cases of the disease are forecast for 2008. The storage and analysis of tumour tissues of various types and patients' clinical data, genetic profiles, characteristics of diseases and epidemiological data may provide more precise diagnoses, providing more effective treatments with higher chances for the cure of cancer. In this paper we present a Web system with a client-server architecture, which manages a relational database containing all information relating to the tumour tissue and their location in freezers, patients, medical forms, physicians, users, and others. Furthermore, it is also discussed the software engineering used to developing the system.
Resumo:
Background: The functional and structural characterisation of enzymes that belong to microbial metabolic pathways is very important for structure-based drug design. The main interest in studying shikimate pathway enzymes involves the fact that they are essential for bacteria but do not occur in humans, making them selective targets for design of drugs that do not directly impact humans.Description: The ShiKimate Pathway DataBase (SKPDB) is a relational database applied to the study of shikimate pathway enzymes in microorganisms and plants. The current database is updated regularly with the addition of new data; there are currently 8902 enzymes of the shikimate pathway from different sources. The database contains extensive information on each enzyme, including detailed descriptions about sequence, references, and structural and functional studies. All files (primary sequence, atomic coordinates and quality scores) are available for downloading. The modeled structures can be viewed using the Jmol program.Conclusions: The SKPDB provides a large number of structural models to be used in docking simulations, virtual screening initiatives and drug design. It is freely accessible at http://lsbzix.rc.unesp.br/skpdb/. © 2010 Arcuri et al; licensee BioMed Central Ltd.
Resumo:
The multi-relational Data Mining approach has emerged as alternative to the analysis of structured data, such as relational databases. Unlike traditional algorithms, the multi-relational proposals allow mining directly multiple tables, avoiding the costly join operations. In this paper, is presented a comparative study involving the traditional Patricia Mine algorithm and its corresponding multi-relational proposed, MR-Radix in order to evaluate the performance of two approaches for mining association rules are used for relational databases. This study presents two original contributions: the proposition of an algorithm multi-relational MR-Radix, which is efficient for use in relational databases, both in terms of execution time and in relation to memory usage and the presentation of the empirical approach multirelational advantage in performance over several tables, which avoids the costly join operations from multiple tables. © 2011 IEEE.
Resumo:
Multi-relational data mining enables pattern mining from multiple tables. The existing multi-relational mining association rules algorithms are not able to process large volumes of data, because the amount of memory required exceeds the amount available. The proposed algorithm MRRadix presents a framework that promotes the optimization of memory usage. It also uses the concept of partitioning to handle large volumes of data. The original contribution of this proposal is enable a superior performance when compared to other related algorithms and moreover successfully concludes the task of mining association rules in large databases, bypass the problem of available memory. One of the tests showed that the MR-Radix presents fourteen times less memory usage than the GFP-growth. © 2011 IEEE.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Information overload is a significant problem for modern medicine. Searching MEDLINE for common topics often retrieves more relevant documents than users can review. Therefore, we must identify documents that are not only relevant, but also important. Our system ranks articles using citation counts and the PageRank algorithm, incorporating data from the Science Citation Index. However, citation data is usually incomplete. Therefore, we explore the relationship between the quantity of citation information available to the system and the quality of the result ranking. Specifically, we test the ability of citation count and PageRank to identify "important articles" as defined by experts from large result sets with decreasing citation information. We found that PageRank performs better than simple citation counts, but both algorithms are surprisingly robust to information loss. We conclude that even an incomplete citation database is likely to be effective for importance ranking.
Resumo:
Information overload is a significant problem for modern medicine. Searching MEDLINE for common topics often retrieves more relevant documents than users can review. Therefore, we must identify documents that are not only relevant, but also important. Our system ranks articles using citation counts and the PageRank algorithm, incorporating data from the Science Citation Index. However, citation data is usually incomplete. Therefore, we explore the relationship between the quantity of citation information available to the system and the quality of the result ranking. Specifically, we test the ability of citation count and PageRank to identify "important articles" as defined by experts from large result sets with decreasing citation information. We found that PageRank performs better than simple citation counts, but both algorithms are surprisingly robust to information loss. We conclude that even an incomplete citation database is likely to be effective for importance ranking.
Resumo:
The HIV Reverse Transcriptase and Protease Sequence Database is an on-line relational database that catalogs evolutionary and drug-related sequence variation in the human immunodeficiency virus (HIV) reverse transcriptase (RT) and protease enzymes, the molecular targets of anti-HIV therapy (http://hivdb.stanford.edu). The database contains a compilation of nearly all published HIV RT and protease sequences, including submissions from International Collaboration databases and sequences published in journal articles. Sequences are linked to data about the source of the sequence sample and the antiretroviral drug treatment history of the individual from whom the isolate was obtained. During the past year 3500 sequences have been added and the data model has been expanded to include drug susceptibility data on sequenced isolates. Database content has also been integrated with didactic text and the output of two sequence analysis programs.
Resumo:
The Conserved Key Amino Acid Positions DataBase (CKAAPs DB) provides access to an analysis of structurally similar proteins with dissimilar sequences where key residues within a common fold are identified. The derivation and significance of CKAAPs starting from pairwise structure alignments is described fully in Reddy et al. [Reddy,B.V.B., Li,W.W., Shindyalov,I.N. and Bourne,P.E. (2000) Proteins, in press]. The CKAAPs identified from this theoretical analysis are provided to experimentalists and theoreticians for potential use in protein engineering and modeling. It has been suggested that CKAAPs may be crucial features for protein folding, structural stability and function. Over 170 substructures, as defined by the Combinatorial Extension (CE) database, which are found in approximately 3000 representative polypeptide chains have been analyzed and are available in the CKAAPs DB. CKAAPs DB also provides CKAAPs of the representative set of proteins derived from the CE and FSSP databases. Thus the database contains over 5000 representative polypeptide chains, covering all known structures in the PDB. A web interface to a relational database permits fast retrieval of structure-sequence alignments, CKAAPs and associated statistics. Users may query by PDB ID, protein name, function and Enzyme Classification number. Users may also submit protein alignments of their own to obtain CKAAPs. An interface to display CKAAPs on each structure from a web browser is also being implemented. CKAAPs DB is maintained by the San Diego Supercomputer Center and accessible at the URL http://ckaaps.sdsc.edu.
Resumo:
GOBASE (http://megasun.bch.umontreal.ca/gobase/) is a network-accessible biological database, which is unique in bringing together diverse biological data on organelles with taxonomically broad coverage, and in furnishing data that have been exhaustively verified and completed by experts. So far, we have focused on mitochondrial data: GOBASE contains all published nucleotide and protein sequences encoded by mitochondrial genomes, selected RNA secondary structures of mitochondria-encoded molecules, genetic maps of completely sequenced genomes, taxonomic information for all species whose sequences are present in the database and organismal descriptions of key protistan eukaryotes. All of these data have been integrated and organized in a formal database structure to allow sophisticated biological queries using terms that are inherent in biological concepts. Most importantly, data have been validated, completed, corrected and standardized, a prerequisite of meaningful analysis. In addition, where critical data are lacking, such as genetic maps and RNA secondary structures, they are generated by the GOBASE team and collaborators, and added to the database. The database is implemented in a relational database management system, but features an object-oriented view of the biological data through a Web/Genera-generated World Wide Web interface. Finally, we have developed software for database curation (i.e. data updates, validation and correction), which will be described in some detail in this paper.
Resumo:
The Oral Cancer Gene Database (OrCGDB; http://www.tumor-gene.org/Oral/oral.html) was developed to provide the biomedical community with easy access to the latest information on the genes involved in oral cancer. The information is stored in a relational database and accessed through a WWW interface. The OrCGDB is organized by gene name, which is linked to information describing properties of the gene. This information is stored as a collection of findings (‘facts’) that are entered by the database curator in a semi-structured format from information in primary publications using a WWW interface. These facts include causes of oncogenic activation, chromosomal localization of the gene, mutations associated with the gene, the biochemical identity and activity of the gene product, synonyms for the gene name and a variety of clinical information. Each fact is associated with a MEDLINE citation. The user can search the OrCGDB by gene name or by entering a textword. The OrCGDB is part of a larger WWW-based tumor gene database and represents a new approach to catalog and display the research literature.
Resumo:
The Plasmodium falciparum Genome Database (http://PlasmoDB.org) integrates sequence information, automated analyses and annotation data emerging from the P.falciparum genome sequencing consortium. To date, raw sequence coverage is available for >90% of the genome, and two chromosomes have been finished and annotated. Data in PlasmoDB are organized by chromosome (1–14), and can be accessed using a variety of tools for graphical and text-based browsing or downloaded in various file formats. The GUS (Genomics Unified Schema) implementation of PlasmoDB provides a multi-species genomic relational database, incorporating data from human and mouse, as well as P.falciparum. The relational schema uses a highly structured format to accommodate diverse data sets related to genomic sequence and gene expression. Tools have been designed to facilitate complex biological queries, including many that are specific to Plasmodium parasites and malaria as a disease. Additional projects seek to integrate genomic information with the rich data sets now becoming available for RNA transcription, protein expression, metabolic pathways, genetic and physical mapping, antigenic and population diversity, and phylogenetic relationships with other apicomplexan parasites. The overall goal of PlasmoDB is to facilitate Internet- and CD-ROM-based access to both finished and unfinished sequence information by the global malaria research community.