990 resultados para Knowledge Database
Resumo:
L'innovazione delle tecnologie di sequenziamento negli ultimi anni ha reso possibile la catalogazione delle varianti genetiche nei campioni umani, portando nuove scoperte e comprensioni nella ricerca medica, farmaceutica, dell'evoluzione e negli studi sulla popolazione. La quantità di sequenze prodotta è molto cospicua, e per giungere all'identificazione delle varianti sono necessari diversi stadi di elaborazione delle informazioni genetiche in cui, ad ogni passo, vengono generate ulteriori informazioni. Insieme a questa immensa accumulazione di dati, è nata la necessità da parte della comunità scientifica di organizzare i dati in repository, dapprima solo per condividere i risultati delle ricerche, poi per permettere studi statistici direttamente sui dati genetici. Gli studi su larga scala coinvolgono quantità di dati nell'ordine dei petabyte, il cui mantenimento continua a rappresentare una sfida per le infrastrutture. Per la varietà e la quantità di dati prodotti, i database giocano un ruolo di primaria importanza in questa sfida. Modelli e organizzazione dei dati in questo campo possono fare la differenza non soltanto per la scalabilità, ma anche e soprattutto per la predisposizione al data mining. Infatti, la memorizzazione di questi dati in file con formati quasi-standard, la dimensione di questi file, e i requisiti computazionali richiesti, rendono difficile la scrittura di software di analisi efficienti e scoraggiano studi su larga scala e su dati eterogenei. Prima di progettare il database si è perciò studiata l’evoluzione, negli ultimi vent’anni, dei formati quasi-standard per i flat file biologici, contenenti metadati eterogenei e sequenze nucleotidiche vere e proprie, con record privi di relazioni strutturali. Recentemente questa evoluzione è culminata nell’utilizzo dello standard XML, ma i flat file delimitati continuano a essere gli standard più supportati da tools e piattaforme online. È seguita poi un’analisi dell’organizzazione interna dei dati per i database biologici pubblici. Queste basi di dati contengono geni, varianti genetiche, strutture proteiche, ontologie fenotipiche, relazioni tra malattie e geni, relazioni tra farmaci e geni. Tra i database pubblici studiati rientrano OMIM, Entrez, KEGG, UniProt, GO. L'obiettivo principale nello studio e nella modellazione del database genetico è stato quello di strutturare i dati in modo da integrare insieme i dati eterogenei prodotti e rendere computazionalmente possibili i processi di data mining. La scelta di tecnologia Hadoop/MapReduce risulta in questo caso particolarmente incisiva, per la scalabilità garantita e per l’efficienza nelle analisi statistiche più complesse e parallele, come quelle riguardanti le varianti alleliche multi-locus.
Resumo:
In this thesis, the author presents a query language for an RDF (Resource Description Framework) database and discusses its applications in the context of the HELM project (the Hypertextual Electronic Library of Mathematics). This language aims at meeting the main requirements coming from the RDF community. in particular it includes: a human readable textual syntax and a machine-processable XML (Extensible Markup Language) syntax both for queries and for query results, a rigorously exposed formal semantics, a graph-oriented RDF data access model capable of exploring an entire RDF graph (including both RDF Models and RDF Schemata), a full set of Boolean operators to compose the query constraints, fully customizable and highly structured query results having a 4-dimensional geometry, some constructions taken from ordinary programming languages that simplify the formulation of complex queries. The HELM project aims at integrating the modern tools for the automation of formal reasoning with the most recent electronic publishing technologies, in order create and maintain a hypertextual, distributed virtual library of formal mathematical knowledge. In the spirit of the Semantic Web, the documents of this library include RDF metadata describing their structure and content in a machine-understandable form. Using the author's query engine, HELM exploits this information to implement some functionalities allowing the interactive and automatic retrieval of documents on the basis of content-aware requests that take into account the mathematical nature of these documents.
Resumo:
Die Molekularbiologie von Menschen ist ein hochkomplexes und vielfältiges Themengebiet, in dem in vielen Bereichen geforscht wird. Der Fokus liegt hier insbesondere auf den Bereichen der Genomik, Proteomik, Transkriptomik und Metabolomik, und Jahre der Forschung haben große Mengen an wertvollen Daten zusammengetragen. Diese Ansammlung wächst stetig und auch für die Zukunft ist keine Stagnation absehbar. Mittlerweile aber hat diese permanente Informationsflut wertvolles Wissen in unüberschaubaren, digitalen Datenbergen begraben und das Sammeln von forschungsspezifischen und zuverlässigen Informationen zu einer großen Herausforderung werden lassen. Die in dieser Dissertation präsentierte Arbeit hat ein umfassendes Kompendium von humanen Geweben für biomedizinische Analysen generiert. Es trägt den Namen medicalgenomics.org und hat diverse biomedizinische Probleme auf der Suche nach spezifischem Wissen in zahlreichen Datenbanken gelöst. Das Kompendium ist das erste seiner Art und sein gewonnenes Wissen wird Wissenschaftlern helfen, einen besseren systematischen Überblick über spezifische Gene oder funktionaler Profile, mit Sicht auf Regulation sowie pathologische und physiologische Bedingungen, zu bekommen. Darüber hinaus ermöglichen verschiedene Abfragemethoden eine effiziente Analyse von signalgebenden Ereignissen, metabolischen Stoffwechselwegen sowie das Studieren der Gene auf der Expressionsebene. Die gesamte Vielfalt dieser Abfrageoptionen ermöglicht den Wissenschaftlern hoch spezialisierte, genetische Straßenkarten zu erstellen, mit deren Hilfe zukünftige Experimente genauer geplant werden können. Infolgedessen können wertvolle Ressourcen und Zeit eingespart werden, bei steigenden Erfolgsaussichten. Des Weiteren kann das umfassende Wissen des Kompendiums genutzt werden, um biomedizinische Hypothesen zu generieren und zu überprüfen.
Resumo:
Much research has focused on desertification and land degradation assessments without putting sufficient emphasis on prevention and mitigation, although the concept of sustainable land management (SLM) is increasingly being acknowledged. A variety of SLM measures have already been applied at the local level, but they are rarely adequately recognised, evaluated, shared or used for decision support. WOCAT (World Overview of Technologies and Approaches) has developed an internationally recognised, standardised methodology to document and evaluate SLM technologies and approaches, including spatial distribution, allowing the sharing of SLM knowledge worldwide. The recent methodological integration into a participatory process allows now analysing and using this knowledge for decision support at the local and national level. The use of the WOCAT tools stimulates evaluation (self-evaluation as well as learning from comparing experiences) within SLM initiatives where all too often there is not only insufficient monitoring but also a lack of critical analysis. The comprehensive questionnaires and database system facilitate to document, evaluate and disseminate local experiences of SLM technologies and their implementation approaches. This evaluation process - in a team of experts and together with land users - greatly enhances understanding of the reasons behind successful (or failed) local practices. It has now been integrated into a new methodology for appraising and selecting SLM options. The methodology combines a local collective learning and decision approach with the use of the evaluated global best practices from WOCAT in a concise three step process: i) identifying land degradation and locally applied solutions in a stakeholder learning workshop; ii) assessing local solutions with the standardised WOCAT tool; iii) jointly selecting promising strategies for implementation with the help of a decision support tool. The methodology has been implemented in various countries and study sites around the world mainly within the FAO LADA (Land Degradation Assessment Project) and the EU-funded DESIRE project. Investments in SLM must be carefully assessed and planned on the basis of properly documented experiences and evaluated impacts and benefits: concerted efforts are needed and sufficient resources must be mobilised to tap the wealth of knowledge and learn from SLM successes.
Resumo:
The global World Overview of Conservation Approaches and Technologies (WOCAT) initiative has developed standardised tools and methods to compile and evaluate knowledge available about SLM. This knowledge is now combined and enriched with audiovisual information in order to give a voice to land users, reach a broad range of stakeholders, and assist in scaling up SLM to reverse trends of degradation, desertification, and drought. Five video products, adapted to the needs of different target groups, are created and embedded in already existing platforms for knowledge sharing of SLM such as the WOCAT database and Google Earth application. A pilot project was carried out in Kenya and Tajikistan to verify ideas and tools while at the same time assessing the usefulness of the suggested products on the ground. Video has the potential to bridge the gap between different actor groups and enable communication and sharing on different levels and scales: locally, regionally, and globally. Furthermore, it is an innovative tool to link local and scientific knowledge, raise awareness, and support advocacy for SLM. Keywords: Sustainable Land Management (SLM), knowledge sharing, audiovisual messages, video, World Overview of Conservation Approaches and Technologies (WOCAT)
Resumo:
A vast amount of temporal information is provided on the Web. Even though many facts expressed in documents are time-related, the temporal properties of Web presentations have not received much attention. In database research, temporal databases have become a mainstream topic in recent years. In Web documents, temporal data may exist as meta data in the header and as user-directed data in the body of a document. Whereas temporal data can easily be identified in the semi-structured meta data, it is more difficult to determine temporal data and its role in the body. We propose procedures for maintaining temporal integrity of Web pages and outline different approaches of applying bitemporal data concepts for Web documents. In particular, we regard desirable functionalities of Web repositories and other Web-related tools that may support the Webmasters in managing the temporal data of their Web documents. Some properties of a prototype environment are described.
Resumo:
At present time, there is a lack of knowledge on the interannual climate-related variability of zooplankton communities of the tropical Atlantic, central Mediterranean Sea, Caspian Sea, and Aral Sea, due to the absence of appropriate databases. In the mid latitudes, the North Atlantic Oscillation (NAO) is the dominant mode of atmospheric fluctuations over eastern North America, the northern Atlantic Ocean and Europe. Therefore, one of the issues that need to be addressed through data synthesis is the evaluation of interannual patterns in species abundance and species diversity over these regions in regard to the NAO. The database has been used to investigate the ecological role of the NAO in interannual variations of mesozooplankton abundance and biomass along the zonal array of the NAO influence. Basic approach to the proposed research involved: (1) development of co-operation between experts and data holders in Ukraine, Russia, Kazakhstan, Azerbaijan, UK, and USA to rescue and compile the oceanographic data sets and release them on CD-ROM, (2) organization and compilation of a database based on FSU cruises to the above regions, (3) analysis of the basin-scale interannual variability of the zooplankton species abundance, biomass, and species diversity.
Resumo:
This article describes the work performed over the database of questions belonging to the different opinion polls carried during the last 50 years in Spain. Approximately half of the questions are provided with a title while the other half remain untitled. The work and implemented techniques in order to automatically generate the titles for untitled questions are described. This process is performed over very short texts and generated titles are subject to strong stylistic conventions and should be fully grammatical pieces of Spanish
Resumo:
The Kabat Database was initially started in 1970 to determine the combining site of antibodies based on the available amino acid sequences. The precise delineation of complementarity determining regions (CDR) of both light and heavy chains provides the first example of how properly aligned sequences can be used to derive structural and functional information of biological macromolecules. This knowledge has subsequently been applied to the construction of artificial antibodies with prescribed specificities, and to many other studies. The Kabat database now includes nucleotide sequences, sequences of T cell receptors for antigens (TCR), major histocompatibility complex (MHC) class I and II molecules, and other proteins of immunological interest. While new sequences are continually added into this database, we have undertaken the task of developing more analytical methods to study the information content of this collection of aligned sequences. New examples of analysis will be illustrated on a yearly basis. The Kabat Database and its applications are freely available at http://immuno.bme.nwu.edu.
Resumo:
Familial structural rearrangements of chromosomes represent a factor of malformation risk that could vary over a large range, making genetic counseling difficult. However, they also represent a powerful tool for increasing knowledge of the genome, particularly by studying breakpoints and viable imbalances of the genome. We have developed a collaborative database that now includes data on more than 4100 families, from which we have developed a web site called HC Forum® (http://HCForum.imag.fr). It offers geneticists assistance in diagnosis and in genetic counseling by assessing the malformation risk with statistical models. For researchers, interactive interfaces exhibit the distribution of chromosomal breakpoints and of the genome regions observed at birth in trisomy or in monosomy. Dedicated tools including an interactive pedigree allow electronic submission of data, which will be anonymously shown in a forum for discussions. After validation, data are definitively registered in the database with the email of the sender, allowing direct location of biological material. Thus HC Forum® constitutes a link between diagnosis laboratories and genome research centers, and after 1 year, more than 700 users from about 40 different countries already exist.
Resumo:
Arabidopsis thaliana, a small annual plant belonging to the mustard family, is the subject of study by an estimated 7000 researchers around the world. In addition to the large body of genetic, physiological and biochemical data gathered for this plant, it will be the first higher plant genome to be completely sequenced, with completion expected at the end of the year 2000. The sequencing effort has been coordinated by an international collaboration, the Arabidopsis Genome Initiative (AGI). The rationale for intensive investigation of Arabidopsis is that it is an excellent model for higher plants. In order to maximize use of the knowledge gained about this plant, there is a need for a comprehensive database and information retrieval and analysis system that will provide user-friendly access to Arabidopsis information. This paper describes the initial steps we have taken toward realizing these goals in a project called The Arabidopsis Information Resource (TAIR) (www.arabidopsis.org).
Resumo:
ACTIVITY is a database on DNA/RNA site sequences with known activity magnitudes, measurement systems, sequence-activity relationships under fixed experimental conditions and procedures to adapt these relationships from one measurement system to another. This database deposits information on DNA/RNA affinities to proteins and cell nuclear extracts, cutting efficiencies, gene transcription activity, mRNA translation efficiencies, mutability and other biological activities of natural sites occurring within promoters, mRNA leaders, and other regulatory regions in pro- and eukaryotic genomes, their mutant forms and synthetic analogues. Since activity magnitudes are heavily system-dependent, the current version of ACTIVITY is supplemented by three novel sub-databases: (i) SYSTEM, measurement systems; (ii) KNOWLEDGE, sequence-activity relationships under fixed experimental conditions; and (iii) CROSS_TEST, procedures adapting a relationship from one measurement system to another. These databases are useful in molecular biology, pharmacogenetics, metabolic engineering, drug design and biotechnology. The databases can be queried using SRS and are available through the Web, http://wwwmgs.bionet.nsc.ru/systems/Activity/.
Resumo:
The present study aims to inventory and analyse the ethnobotanical knowledge about medicinal plants in the Serra de Mariola Natural Park. In respect to traditional uses, 93 species reported by local informants were therapeutic, 27 food, 4 natural dyes and 13 handcrafts. We developed a methodology that allowed the location of individuals or vegetation communities with a specific popular use. We prepared a geographic information system (GIS) that included gender, family, scientific nomenclature and common names in Spanish and Catalan for each species. We also made a classification of 39 medicinal uses from ATC (Anatomical, Therapeutic, Chemical classification system). Labiatae (n=19), Compositae (n=9) and Leguminosae (n=6) were the families most represented among the plants used to different purposes in humans. Species with the most elevated cultural importance index (CI) values were Thymus vulgaris (CI=1.431), Rosmarinus officinalis (CI=1.415), Eryngium campestre (CI=1.325), Verbascum sinuatum (CI=1.106) and Sideritis angustifolia (CI=1.041). Thus, the collected plants with more therapeutic uses were: Lippia triphylla (12), Thymus vulgaris and Allium roseum (9) and Erygium campestre (8). The most repeated ATC uses were: G04 (urological use), D03 (treatment of wounds and ulcers) and R02 (throat diseases). These results were in a geographic map where each point represented an individual of any species. A database was created with the corresponding therapeutic uses. This application is useful for the identification of individuals and the selection of species for specific medicinal properties. In the end, knowledge of these useful plants may be interesting to revive the local economy and in some cases promote their cultivation.
Resumo:
Decision support systems (DSS) support business or organizational decision-making activities, which require the access to information that is internally stored in databases or data warehouses, and externally in the Web accessed by Information Retrieval (IR) or Question Answering (QA) systems. Graphical interfaces to query these sources of information ease to constrain dynamically query formulation based on user selections, but they present a lack of flexibility in query formulation, since the expressivity power is reduced to the user interface design. Natural language interfaces (NLI) are expected as the optimal solution. However, especially for non-expert users, a real natural communication is the most difficult to realize effectively. In this paper, we propose an NLI that improves the interaction between the user and the DSS by means of referencing previous questions or their answers (i.e. anaphora such as the pronoun reference in “What traits are affected by them?”), or by eliding parts of the question (i.e. ellipsis such as “And to glume colour?” after the question “Tell me the QTLs related to awn colour in wheat”). Moreover, in order to overcome one of the main problems of NLIs about the difficulty to adapt an NLI to a new domain, our proposal is based on ontologies that are obtained semi-automatically from a framework that allows the integration of internal and external, structured and unstructured information. Therefore, our proposal can interface with databases, data warehouses, QA and IR systems. Because of the high NL ambiguity of the resolution process, our proposal is presented as an authoring tool that helps the user to query efficiently in natural language. Finally, our proposal is tested on a DSS case scenario about Biotechnology and Agriculture, whose knowledge base is the CEREALAB database as internal structured data, and the Web (e.g. PubMed) as external unstructured information.
Resumo:
Narcolepsy with cataplexy is a rare disease with an estimated prevalence of 0.02% in European populations. Narcolepsy shares many features of rare disorders, in particular the lack of awareness of the disease with serious consequences for healthcare supply. Similar to other rare diseases, only a few European countries have registered narcolepsy cases in databases of the International Classification of Diseases or in registries of the European health authorities. A promising approach to identify disease-specific adverse health effects and needs in healthcare delivery in the field of rare diseases is to establish a distributed expert network. A first and important step is to create a database that allows collection, storage and dissemination of data on narcolepsy in a comprehensive and systematic way. Here, the first prospective web-based European narcolepsy database hosted by the European Narcolepsy Network is introduced. The database structure, standardization of data acquisition and quality control procedures are described, and an overview provided of the first 1079 patients from 18 European specialized centres. Due to its standardization this continuously increasing data pool is most promising to provide a better insight into many unsolved aspects of narcolepsy and related disorders, including clear phenotype characterization of subtypes of narcolepsy, more precise epidemiological data and knowledge on the natural history of narcolepsy, expectations about treatment effects, identification of post-marketing medication side-effects, and will contribute to improve clinical trial designs and provide facilities to further develop phase III trials.