942 resultados para Knowledge Discovery Database
Resumo:
To date, more than 16 million citations of published articles in biomedical domain are available in the MEDLINE database. These articles describe the new discoveries which accompany a tremendous development in biomedicine during the last decade. It is crucial for biomedical researchers to retrieve and mine some specific knowledge from the huge quantity of published articles with high efficiency. Researchers have been engaged in the development of text mining tools to find knowledge such as protein-protein interactions, which are most relevant and useful for specific analysis tasks. This chapter provides a road map to the various information extraction methods in biomedical domain, such as protein name recognition and discovery of protein-protein interactions. Disciplines involved in analyzing and processing unstructured-text are summarized. Current work in biomedical information extracting is categorized. Challenges in the field are also presented and possible solutions are discussed.
Resumo:
Document detialing plans to develop the Medical Library's knowledge base collection. Provides an overview of databases and knowledge bases, as well as recommended databases.
Resumo:
RNA is an underutilized target for drug discovery. Once thought to be a passive carrier of genetic information, RNA is now known to play a critical role in essentially all aspects of biology including signaling, gene regulation, catalysis, and retroviral infection. It is now well-established that RNA does not exist as a single static structure, but instead populates an ensemble of energetic minima along a free-energy landscape. Knowledge of this structural landscape has become an important goal for understanding its diverse biological functions. In this case, NMR spectroscopy has emerged as an important player in the characterization of RNA structural ensembles, with solution-state techniques accounting for almost half of deposited RNA structures in the PDB, yet the rate of RNA structure publication has been stagnant over the past decade. Several bottlenecks limit the pace of RNA structure determination by NMR: the high cost of isotopic labeling, tedious and ambiguous resonance assignment methods, and a limited database of RNA optimized pulse programs. We have addressed some of these challenges to NMR characterization of RNA structure with applications to various RNA-drug targets. These approaches will increasingly become integral to designing new therapeutics targeting RNA.
Resumo:
Macro- and microarrays are well-established technologies to determine gene functions through repeated measurements of transcript abundance. We constructed a chicken skeletal muscle-associated array based on a muscle-specific EST database, which was used to generate a tissue expression dataset of similar to 4500 chicken genes across 5 adult tissues (skeletal muscle, heart, liver, brain, and skin). Only a small number of ESTs were sufficiently well characterized by BLAST searches to determine their probable cellular functions. Evidence of a particular tissue-characteristic expression can be considered an indication that the transcript is likely to be functionally significant. The skeletal muscle macroarray platform was first used to search for evidence of tissue-specific expression, focusing on the biological function of genes/transcripts, since gene expression profiles generated across tissues were found to be reliable and consistent. Hierarchical clustering analysis revealed consistent clustering among genes assigned to 'developmental growth', such as the ontology genes and germ layers. Accuracy of the expression data was supported by comparing information from known transcripts and tissue from which the transcript was derived with macroarray data. Hybridization assays resulted in consistent tissue expression profile, which will be useful to dissect tissue-regulatory networks and to predict functions of novel genes identified after extensive sequencing of the genomes of model organisms. Screening our skeletal-muscle platform using 5 chicken adult tissues allowed us identifying 43 'tissue-specific' transcripts, and 112 co-expressed uncharacterized transcripts with 62 putative motifs. This platform also represents an important tool for functional investigation of novel genes; to determine expression pattern according to developmental stages; to evaluate differences in muscular growth potential between chicken lines, and to identify tissue-specific genes.
Resumo:
Knowledge of residual perturbations in the orbit of Uranus in the early 1840s did not lead to the refutation of Newton's law of gravitation but instead to the discovery of Neptune in 1846. Karl Popper asserts that this case is atypical of science and that the law of gravitation was at least prima facie falsified by these perturbations. I argue that these assertions are the product of a false, a priori methodological position I call, 'Weak Popperian Falsificationism' (WPF). Further, on the evidence the law was not prima facie false and was not generally considered so by astronomers at the time. Many of Popper's commentators (Kuhn, Lakatos, Feyerabend and others) presuppose WPF and their views on this case and its implications for scientific rationality and method suffer from this same defect.
Resumo:
In the last few years two factors have helped to significantly advance our understanding of the Myxozoa. First, the phenomenal increase in fin fish aquaculture in the 1990s has lead to the increased importance of these parasites; in rum this has lead to intensified research efforts, which have increased knowledge of the development, diagnosis, and pathogenesis of myxozoans. The hallmark discovery in the 1980s that the life cycle of Myxobolus cerebralis requires development of an actinosporean stage in the Oligochaete. Tubifex tubifex, led to the elucidation of the life cycles of several other myxozoans. Also, the life cycle and taxonomy of the enigmatic PKX myxozoan has been resolved: it is the alternate stage of the unusual myxozoan. Tetracapsula bryosalmonae, from bryozoans. The 18S rDNA gene of many species has been sequenced, and here we add 22 new sequences to the data set. Phylogenetic analyses using all these sequences indicate that: 1) the Myxozoa are closely related to Cnidaria (also supported by morphological data), 2) marine taxa at the genus level branch separately from genera that usually infect freshwater fishes; 3) taxa cluster more by development and tissue location than by spore morphology; 4) the tetracapsulids branched off early in myxozoan evolution, perhaps reflected by their having bryozoan. rather than annelid hosts; 5) the morphology of actinosporeans offers little information for determining their myxosporean counterparts (assuming that they exist), and 6) the marine actinosporeans from Australia appear to form a clade within the platysporinid myxosporeans. Ribosomal DNA sequences have also enabled development of diagnostic tests for myxozoans. PCR and in situ hybridisation tests based on rDNA sequences have been developed for Myxobolus cerebralis. Ceratomyxa shasta. Kudoa spp,, and Tetracapsula bryosalmonae (PKX). Lectin-based and antibody tests have also been developed for certain myxozoans, such as PKX and C. shasta. We also review important diseases caused by myxozoans. which are emerging or re-emerging. Epizootics of whirling disease in wild rainbow trout (Oncorhynchus mykiss) have recently been reported throughout the Rocky Mountain states of the USA. With a dramatic increase in aquaculture of fishes using marine netpens, several marine myxozoans have been recognized or elevated in status as pathological agents. Kudoa thyrsites infections have caused severe post-harvest myoliquefaction in pen-reared Atlantic salmon (Salmo salar), and Ceratomyxa spp., Sphaerospora spp., and Myxidium leei cause disease in pen-reared sea bass (Dicentrarchus labrax) and sea bream species (family Sparidae) in Mediterranean countries.
Resumo:
Two studies assessed the development of children's understanding of life as a biological goal of body functioning. In Study 1, 4-to-10-year-old children were given an interview consisting of a series of structured questions about the location and function of various body organs. Their responses were coded both for factual correctness and for appeals to the goal of maintaining life. The results showed a gradual increase in children's factual knowledge across this age range but an abrupt increase in appeals to life between the ages of 4 and 6. Analyses of the 4-year-olds' responses suggested that appeals to life were associated with increased knowledge of organ function, but not of organ location. Study 2 was designed to replicate the pattern found in Study I. A continuous sample of 4-to 5-year-old children was administered an abbreviated version of the interview from Study 1. Children's understanding of life as a biological goal was again found to be predictive of their knowledge of organ function, but not of organ location. These results indicate a reorganization in children's understanding of the body between the ages of 4 and 6, which coincides with children's discovery of 'life' as a biological goal for bodily function.
Resumo:
One of the most important advantages of database systems is that the underlying mathematics is rich enough to specify very complex operations with a small number of statements in the database language. This research covers an aspect of biological informatics that is the marriage of information technology and biology, involving the study of real-world phenomena using virtual plants derived from L-systems simulation. L-systems were introduced by Aristid Lindenmayer as a mathematical model of multicellular organisms. Not much consideration has been given to the problem of persistent storage for these simulations. Current procedures for querying data generated by L-systems for scientific experiments, simulations and measurements are also inadequate. To address these problems the research in this paper presents a generic process for data-modeling tools (L-DBM) between L-systems and database systems. This paper shows how L-system productions can be generically and automatically represented in database schemas and how a database can be populated from the L-system strings. This paper further describes the idea of pre-computing recursive structures in the data into derived attributes using compiler generation. A method to allow a correspondence between biologists' terms and compiler-generated terms in a biologist computing environment is supplied. Once the L-DBM gets any specific L-systems productions and its declarations, it can generate the specific schema for both simple correspondence terminology and also complex recursive structure data attributes and relationships.
Resumo:
With the electricity market liberalization, distribution and retail companies are looking for better market strategies based on adequate information upon the consumption patterns of its electricity customers. In this environment all consumers are free to choose their electricity supplier. A fair insight on the customer´s behaviour will permit the definition of specific contract aspects based on the different consumption patterns. In this paper Data Mining (DM) techniques are applied to electricity consumption data from a utility client’s database. To form the different customer´s classes, and find a set of representative consumption patterns, we have used the Two-Step algorithm which is a hierarchical clustering algorithm. Each consumer class will be represented by its load profile resulting from the clustering operation. Next, to characterize each consumer class a classification model will be constructed with the C5.0 classification algorithm.
Resumo:
The emergence of new business models, namely, the establishment of partnerships between organizations, the chance that companies have of adding existing data on the web, especially in the semantic web, to their information, led to the emphasis on some problems existing in databases, particularly related to data quality. Poor data can result in loss of competitiveness of the organizations holding these data, and may even lead to their disappearance, since many of their decision-making processes are based on these data. For this reason, data cleaning is essential. Current approaches to solve these problems are closely linked to database schemas and specific domains. In order that data cleaning can be used in different repositories, it is necessary for computer systems to understand these data, i.e., an associated semantic is needed. The solution presented in this paper includes the use of ontologies: (i) for the specification of data cleaning operations and, (ii) as a way of solving the semantic heterogeneity problems of data stored in different sources. With data cleaning operations defined at a conceptual level and existing mappings between domain ontologies and an ontology that results from a database, they may be instantiated and proposed to the expert/specialist to be executed over that database, thus enabling their interoperability.
Resumo:
Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia de Electrónica e Telecomunicações
Resumo:
The principal topic of this work is the application of data mining techniques, in particular of machine learning, to the discovery of knowledge in a protein database. In the first chapter a general background is presented. Namely, in section 1.1 we overview the methodology of a Data Mining project and its main algorithms. In section 1.2 an introduction to the proteins and its supporting file formats is outlined. This chapter is concluded with section 1.3 which defines that main problem we pretend to address with this work: determine if an amino acid is exposed or buried in a protein, in a discrete way (i.e.: not continuous), for five exposition levels: 2%, 10%, 20%, 25% and 30%. In the second chapter, following closely the CRISP-DM methodology, whole the process of construction the database that supported this work is presented. Namely, it is described the process of loading data from the Protein Data Bank, DSSP and SCOP. Then an initial data exploration is performed and a simple prediction model (baseline) of the relative solvent accessibility of an amino acid is introduced. It is also introduced the Data Mining Table Creator, a program developed to produce the data mining tables required for this problem. In the third chapter the results obtained are analyzed with statistical significance tests. Initially the several used classifiers (Neural Networks, C5.0, CART and Chaid) are compared and it is concluded that C5.0 is the most suitable for the problem at stake. It is also compared the influence of parameters like the amino acid information level, the amino acid window size and the SCOP class type in the accuracy of the predictive models. The fourth chapter starts with a brief revision of the literature about amino acid relative solvent accessibility. Then, we overview the main results achieved and finally discuss about possible future work. The fifth and last chapter consists of appendices. Appendix A has the schema of the database that supported this thesis. Appendix B has a set of tables with additional information. Appendix C describes the software provided in the DVD accompanying this thesis that allows the reconstruction of the present work.
Resumo:
OBJECTIVE To analyze Brazilian literature on body image and the theoretical and methodological advances that have been made. METHODS A detailed review was undertaken of the Brazilian literature on body image, selecting published articles, dissertations and theses from the SciELO, SCOPUS, LILACS and PubMed databases and the CAPES thesis database. Google Scholar was also used. There was no start date for the search, which used the following search terms: “body image” AND “Brazil” AND “scale(s)”; “body image” AND “Brazil” AND “questionnaire(s)”; “body image” AND “Brazil” AND “instrument(s)”; “body image” limited to Brazil and “body image”. RESULTS The majority of measures available were intended to be used in college students, with half of them evaluating satisfaction/dissatisfaction with the body. Females and adolescents of both sexes were the most studied population. There has been a significant increase in the number of available instruments. Nevertheless, numerous published studies have used non-validated instruments, with much confusion in the use of the appropriate terms (e.g., perception, dissatisfaction, distortion). CONCLUSIONS Much more is needed to understand body image within the Brazilian population, especially in terms of evaluating different age groups and diversifying the components/dimensions assessed. However, interest in this theme is increasing, and important steps have been taken in a short space of time.
Resumo:
Paper presented at the 9th European Conference on Knowledge Management, Southampton Solent University, Southampton, UK, 4-5 Sep. 2008. URL: http://academic-conferences.org/eckm/eckm2008/eckm08-home.htm
Resumo:
To meet the increasing demands of the complex inter-organizational processes and the demand for continuous innovation and internationalization, it is evident that new forms of organisation are being adopted, fostering more intensive collaboration processes and sharing of resources, in what can be called collaborative networks (Camarinha-Matos, 2006:03). Information and knowledge are crucial resources in collaborative networks, being their management fundamental processes to optimize. Knowledge organisation and collaboration systems are thus important instruments for the success of collaborative networks of organisations having been researched in the last decade in the areas of computer science, information science, management sciences, terminology and linguistics. Nevertheless, research in this area didn’t give much attention to multilingual contexts of collaboration, which pose specific and challenging problems. It is then clear that access to and representation of knowledge will happen more and more on a multilingual setting which implies the overcoming of difficulties inherent to the presence of multiple languages, through the use of processes like localization of ontologies. Although localization, like other processes that involve multilingualism, is a rather well-developed practice and its methodologies and tools fruitfully employed by the language industry in the development and adaptation of multilingual content, it has not yet been sufficiently explored as an element of support to the development of knowledge representations - in particular ontologies - expressed in more than one language. Multilingual knowledge representation is then an open research area calling for cross-contributions from knowledge engineering, terminology, ontology engineering, cognitive sciences, computational linguistics, natural language processing, and management sciences. This workshop joined researchers interested in multilingual knowledge representation, in a multidisciplinary environment to debate the possibilities of cross-fertilization between knowledge engineering, terminology, ontology engineering, cognitive sciences, computational linguistics, natural language processing, and management sciences applied to contexts where multilingualism continuously creates new and demanding challenges to current knowledge representation methods and techniques. In this workshop six papers dealing with different approaches to multilingual knowledge representation are presented, most of them describing tools, approaches and results obtained in the development of ongoing projects. In the first case, Andrés Domínguez Burgos, Koen Kerremansa and Rita Temmerman present a software module that is part of a workbench for terminological and ontological mining, Termontospider, a wiki crawler that aims at optimally traverse Wikipedia in search of domainspecific texts for extracting terminological and ontological information. The crawler is part of a tool suite for automatically developing multilingual termontological databases, i.e. ontologicallyunderpinned multilingual terminological databases. In this paper the authors describe the basic principles behind the crawler and summarized the research setting in which the tool is currently tested. In the second paper, Fumiko Kano presents a work comparing four feature-based similarity measures derived from cognitive sciences. The purpose of the comparative analysis presented by the author is to verify the potentially most effective model that can be applied for mapping independent ontologies in a culturally influenced domain. For that, datasets based on standardized pre-defined feature dimensions and values, which are obtainable from the UNESCO Institute for Statistics (UIS) have been used for the comparative analysis of the similarity measures. The purpose of the comparison is to verify the similarity measures based on the objectively developed datasets. According to the author the results demonstrate that the Bayesian Model of Generalization provides for the most effective cognitive model for identifying the most similar corresponding concepts existing for a targeted socio-cultural community. In another presentation, Thierry Declerck, Hans-Ulrich Krieger and Dagmar Gromann present an ongoing work and propose an approach to automatic extraction of information from multilingual financial Web resources, to provide candidate terms for building ontology elements or instances of ontology concepts. The authors present a complementary approach to the direct localization/translation of ontology labels, by acquiring terminologies through the access and harvesting of multilingual Web presences of structured information providers in the field of finance, leading to both the detection of candidate terms in various multilingual sources in the financial domain that can be used not only as labels of ontology classes and properties but also for the possible generation of (multilingual) domain ontologies themselves. In the next paper, Manuel Silva, António Lucas Soares and Rute Costa claim that despite the availability of tools, resources and techniques aimed at the construction of ontological artifacts, developing a shared conceptualization of a given reality still raises questions about the principles and methods that support the initial phases of conceptualization. These questions become, according to the authors, more complex when the conceptualization occurs in a multilingual setting. To tackle these issues the authors present a collaborative platform – conceptME - where terminological and knowledge representation processes support domain experts throughout a conceptualization framework, allowing the inclusion of multilingual data as a way to promote knowledge sharing and enhance conceptualization and support a multilingual ontology specification. In another presentation Frieda Steurs and Hendrik J. Kockaert present us TermWise, a large project dealing with legal terminology and phraseology for the Belgian public services, i.e. the translation office of the ministry of justice, a project which aims at developing an advanced tool including expert knowledge in the algorithms that extract specialized language from textual data (legal documents) and whose outcome is a knowledge database including Dutch/French equivalents for legal concepts, enriched with the phraseology related to the terms under discussion. Finally, Deborah Grbac, Luca Losito, Andrea Sada and Paolo Sirito report on the preliminary results of a pilot project currently ongoing at UCSC Central Library, where they propose to adapt to subject librarians, employed in large and multilingual Academic Institutions, the model used by translators working within European Union Institutions. The authors are using User Experience (UX) Analysis in order to provide subject librarians with a visual support, by means of “ontology tables” depicting conceptual linking and connections of words with concepts presented according to their semantic and linguistic meaning. The organizers hope that the selection of papers presented here will be of interest to a broad audience, and will be a starting point for further discussion and cooperation.