937 resultados para Relational Databases
Resumo:
Conceptual Information Systems provide a multi-dimensional conceptually structured view on data stored in relational databases. On restricting the expressiveness of the retrieval language, they allow the visualization of sets of realted queries in conceptual hierarchies, hence supporting the search of something one does not have a precise description, but only a vague idea of. Information Retrieval is considered as the process of finding specific objects (documents etc.) out of a large set of objects which fit to some description. In some data analysis and knowledge discovery applications, the dual task is of interest: The analyst needs to determine, for a subset of objects, a description for this subset. In this paper we discuss how Conceptual Information Systems can be extended to support also the second task.
Resumo:
Die Auszeichnungssprache XML dient zur Annotation von Dokumenten und hat sich als Standard-Datenaustauschformat durchgesetzt. Dabei entsteht der Bedarf, XML-Dokumente nicht nur als reine Textdateien zu speichern und zu transferieren, sondern sie auch persistent in besser strukturierter Form abzulegen. Dies kann unter anderem in speziellen XML- oder relationalen Datenbanken geschehen. Relationale Datenbanken setzen dazu bisher auf zwei grundsätzlich verschiedene Verfahren: Die XML-Dokumente werden entweder unverändert als binäre oder Zeichenkettenobjekte gespeichert oder aber aufgespalten, sodass sie in herkömmlichen relationalen Tabellen normalisiert abgelegt werden können (so genanntes „Flachklopfen“ oder „Schreddern“ der hierarchischen Struktur). Diese Dissertation verfolgt einen neuen Ansatz, der einen Mittelweg zwischen den bisherigen Lösungen darstellt und die Möglichkeiten des weiterentwickelten SQL-Standards aufgreift. SQL:2003 definiert komplexe Struktur- und Kollektionstypen (Tupel, Felder, Listen, Mengen, Multimengen), die es erlauben, XML-Dokumente derart auf relationale Strukturen abzubilden, dass der hierarchische Aufbau erhalten bleibt. Dies bietet zwei Vorteile: Einerseits stehen bewährte Technologien, die aus dem Bereich der relationalen Datenbanken stammen, uneingeschränkt zur Verfügung. Andererseits lässt sich mit Hilfe der SQL:2003-Typen die inhärente Baumstruktur der XML-Dokumente bewahren, sodass es nicht erforderlich ist, diese im Bedarfsfall durch aufwendige Joins aus den meist normalisierten und auf mehrere Tabellen verteilten Tupeln zusammenzusetzen. In dieser Arbeit werden zunächst grundsätzliche Fragen zu passenden, effizienten Abbildungsformen von XML-Dokumenten auf SQL:2003-konforme Datentypen geklärt. Darauf aufbauend wird ein geeignetes, umkehrbares Umsetzungsverfahren entwickelt, das im Rahmen einer prototypischen Applikation implementiert und analysiert wird. Beim Entwurf des Abbildungsverfahrens wird besonderer Wert auf die Einsatzmöglichkeit in Verbindung mit einem existierenden, ausgereiften relationalen Datenbankmanagementsystem (DBMS) gelegt. Da die Unterstützung von SQL:2003 in den kommerziellen DBMS bisher nur unvollständig ist, muss untersucht werden, inwieweit sich die einzelnen Systeme für das zu implementierende Abbildungsverfahren eignen. Dabei stellt sich heraus, dass unter den betrachteten Produkten das DBMS IBM Informix die beste Unterstützung für komplexe Struktur- und Kollektionstypen bietet. Um die Leistungsfähigkeit des Verfahrens besser beurteilen zu können, nimmt die Arbeit Untersuchungen des nötigen Zeitbedarfs und des erforderlichen Arbeits- und Datenbankspeichers der Implementierung vor und bewertet die Ergebnisse.
Resumo:
The Short-term Water Information and Forecasting Tools (SWIFT) is a suite of tools for flood and short-term streamflow forecasting, consisting of a collection of hydrologic model components and utilities. Catchments are modeled using conceptual subareas and a node-link structure for channel routing. The tools comprise modules for calibration, model state updating, output error correction, ensemble runs and data assimilation. Given the combinatorial nature of the modelling experiments and the sub-daily time steps typically used for simulations, the volume of model configurations and time series data is substantial and its management is not trivial. SWIFT is currently used mostly for research purposes but has also been used operationally, with intersecting but significantly different requirements. Early versions of SWIFT used mostly ad-hoc text files handled via Fortran code, with limited use of netCDF for time series data. The configuration and data handling modules have since been redesigned. The model configuration now follows a design where the data model is decoupled from the on-disk persistence mechanism. For research purposes the preferred on-disk format is JSON, to leverage numerous software libraries in a variety of languages, while retaining the legacy option of custom tab-separated text formats when it is a preferred access arrangement for the researcher. By decoupling data model and data persistence, it is much easier to interchangeably use for instance relational databases to provide stricter provenance and audit trail capabilities in an operational flood forecasting context. For the time series data, given the volume and required throughput, text based formats are usually inadequate. A schema derived from CF conventions has been designed to efficiently handle time series for SWIFT.
Resumo:
In DNA microarray experiments, the gene fragments that are spotted on the slides are usually obtained by the synthesis of specific oligonucleotides that are able to amplify genes through PCR. Shotgun library sequences are an alternative to synthesis of primers for the study of each gene in the genome. The possibility of putting thousands of gene sequences into a single slide allows the use of shotgun clones in order to proceed with microarray analysis without a completely sequenced genome. We developed an OC Identifier tool (optimal clone identifier for genomic shotgun libraries) for the identification of unique genes in shotgun libraries based on a partially sequenced genome; this allows simultaneous use of clones in projects such as transcriptome and phylogeny studies, using comparative genomic hybridization and genome assembly. The OC Identifier tool allows comparative genome analysis, biological databases, query language in relational databases, and provides bioinformatics tools to identify clones that contain unique genes as alternatives to primer synthesis. The OC Identifier allows analysis of clones during the sequencing phase, making it possible to select genes of interest for construction of a DNA microarray. ©FUNPEC-RP.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Pós-graduação em Ciência da Computação - IBILCE
Resumo:
Software dependencies play a vital role in programme comprehension, change impact analysis and other software maintenance activities. Traditionally, these activities are supported by source code analysis; however, the source code is sometimes inaccessible or difficult to analyse, as in hybrid systems composed of source code in multiple languages using various paradigms (e.g. object-oriented programming and relational databases). Moreover, not all stakeholders have adequate knowledge to perform such analyses. For example, non-technical domain experts and consultants raise most maintenance requests; however, they cannot predict the cost and impact of the requested changes without the support of the developers. We propose a novel approach to predicting software dependencies by exploiting the coupling present in domain-level information. Our approach is independent of the software implementation; hence, it can be used to approximate architectural dependencies without access to the source code or the database. As such, it can be applied to hybrid systems with heterogeneous source code or legacy systems with missing source code. In addition, this approach is based solely on information visible and understandable to domain users; therefore, it can be efficiently used by domain experts without the support of software developers. We evaluate our approach with a case study on a large-scale enterprise system, in which we demonstrate how up to 65 of the source code dependencies and 77% of the database dependencies are predicted solely based on domain information.
Resumo:
This paper describes a model of persistence in (C)LP languages and two different and practically very useful ways to implement this model in current systems. The fundamental idea is that persistence is a characteristic of certain dynamic predicates (Le., those which encapsulate state). The main effect of declaring a predicate persistent is that the dynamic changes made to such predicates persist from one execution to the next one. After proposing a syntax for declaring persistent predicates, a simple, file-based implementation of the concept is presented and some examples shown. An additional implementation is presented which stores persistent predicates in an external datábase. The abstraction of the concept of persistence from its implementation allows developing applications which can store their persistent predicates alternatively in files or databases with only a few simple changes to a declaration stating the location and modality used for persistent storage. The paper presents the model, the implementation approach in both the cases of using files and relational databases, a number of optimizations of the process (using information obtained from static global analysis and goal clustering), and performance results from an implementation of these ideas.
Resumo:
This paper describes a model of persistence in (C)LP languages and two different and practically very useful ways to implement this model in current systems. The fundamental idea is that persistence is a characteristic of certain dynamic predicates (i.e., those which encapsulate state). The main effect of declaring a predicate persistent is that the dynamic changes made to such predicates persist from one execution to the next one. After proposing a syntax for declaring persistent predicates, a simple, file-based implementation of the concept is presented and some examples shown. An additional implementation is presented which stores persistent predicates in an external database. The abstraction of the concept of persistence from its implementation allows developing applications which can store their persistent predicates alternatively in files or databases with only a few simple changes to a declaration stating the location and modality used for persistent storage. The paper presents the model, the implementation approach in both the cases of using files and relational databases, a number of optimizations of the process (using information obtained from static global analysis and goal clustering), and performance results from an implementation of these ideas.
Resumo:
Ciao is a public domain, next generation multi-paradigm programming environment with a unique set of features: Ciao offers a complete Prolog system, supporting ISO-Prolog, but its novel modular design allows both restricting and extending the language. As a result, it allows working with fully declarative subsets of Prolog and also to extend these subsets (or ISO-Prolog) both syntactically and semantically. Most importantly, these restrictions and extensions can be activated separately on each program module so that several extensions can coexist in the same application for different modules. Ciao also supports (through such extensions) programming with functions, higher-order (with predicate abstractions), constraints, and objects, as well as feature terms (records), persistence, several control rules (breadth-first search, iterative deepening, ...), concurrency (threads/engines), a good base for distributed execution (agents), and parallel execution. Libraries also support WWW programming, sockets, external interfaces (C, Java, TclTk, relational databases, etc.), etc. Ciao offers support for programming in the large with a robust module/object system, module-based separate/incremental compilation (automatically -no need for makefiles), an assertion language for declaring (optional) program properties (including types and modes, but also determinacy, non-failure, cost, etc.), automatic static inference and static/dynamic checking of such assertions, etc. Ciao also offers support for programming in the small producing small executables (including only those builtins used by the program) and support for writing scripts in Prolog. The Ciao programming environment includes a classical top-level and a rich emacs interface with an embeddable source-level debugger and a number of execution visualization tools. The Ciao compiler (which can be run outside the top level shell) generates several forms of architecture-independent and stand-alone executables, which run with speed, efficiency and executable size which are very competive with other commercial and academic Prolog/CLP systems. Library modules can be compiled into compact bytecode or C source files, and linked statically, dynamically, or autoloaded. The novel modular design of Ciao enables, in addition to modular program development, effective global program analysis and static debugging and optimization via source to source program transformation. These tasks are performed by the Ciao preprocessor ( ciaopp, distributed separately). The Ciao programming environment also includes lpdoc, an automatic documentation generator for LP/CLP programs. It processes Prolog files adorned with (Ciao) assertions and machine-readable comments and generates manuals in many formats including postscript, pdf, texinfo, info, HTML, man, etc. , as well as on-line help, ascii README files, entries for indices of manuals (info, WWW, ...), and maintains WWW distribution sites.
Resumo:
RDB2RDF systems generate RDF from relational databases, operating in two dierent manners: materializing the database content into RDF or acting as virtual RDF datastores that transform SPARQL queries into SQL. In the former, inferences on the RDF data (taking into account the ontologies that they are related to) are normally done by the RDF triple store where the RDF data is materialised and hence the results of the query answering process depend on the store. In the latter, existing RDB2RDF systems do not normally perform such inferences at query time. This paper shows how the algorithm used in the REQUIEM system, focused on handling run-time inferences for query answering, can be adapted to handle such inferences for query answering in combination with RDB2RDF systems.
Resumo:
RDB2RDF systems generate RDF from relational databases, operating in two di�erent manners: materializing the database content into RDF or acting as virtual RDF datastores that transform SPARQL queries into SQL. In the former, inferences on the RDF data (taking into account the ontologies that they are related to) are normally done by the RDF triple store where the RDF data is materialised and hence the results of the query answering process depend on the store. In the latter, existing RDB2RDF systems do not normally perform such inferences at query time. This paper shows how the algorithm used in the REQUIEM system, focused on handling run-time inferences for query answering, can be adapted to handle such inferences for query answering in combination with RDB2RDF systems.
Resumo:
El trabajo ha sido realizado dentro del marco de los proyectos EURECA (Enabling information re-Use by linking clinical REsearch and Care) e INTEGRATE (Integrative Cancer Research Through Innovative Biomedical Infrastructures), en los que colabora el Grupo de Informática Biomédica de la UPM junto a otras universidades e instituciones sanitarias europeas. En ambos proyectos se desarrollan servicios e infraestructuras con el objetivo principal de almacenar información clínica, procedente de fuentes diversas (como por ejemplo de historiales clínicos electrónicos de hospitales, de ensayos clínicos o artículos de investigación biomédica), de una forma común y fácilmente accesible y consultable para facilitar al máximo la investigación de estos ámbitos, de manera colaborativa entre instituciones. Esta es la idea principal de la interoperabilidad semántica en la que se concentran ambos proyectos, siendo clave para el correcto funcionamiento del software del que se componen. El intercambio de datos con un modelo de representación compartido, común y sin ambigüedades, en el que cada concepto, término o dato clínico tendrá una única forma de representación. Lo cual permite la inferencia de conocimiento, y encaja perfectamente en el contexto de la investigación médica. En concreto, la herramienta a desarrollar en este trabajo también está orientada a la idea de maximizar la interoperabilidad semántica, pues se ocupa de la carga de información clínica con un formato estandarizado en un modelo común de almacenamiento de datos, implementado en bases de datos relacionales. El trabajo ha sido desarrollado en el periodo comprendido entre el 3 de Febrero y el 6 de Junio de 2014. Se ha seguido un ciclo de vida en cascada para la organización del trabajo realizado en las tareas de las que se compone el proyecto, de modo que una fase no puede iniciarse sin que se haya terminado, revisado y aceptado la fase anterior. Exceptuando la tarea de documentación del trabajo (para la elaboración de esta memoria), que se ha desarrollado paralelamente a todas las demás. ----ABSTRACT--- The project has been developed during the second semester of the 2013/2014 academic year. This Project has been done inside EURECA and INTEGRATE European biomedical research projects, where the GIB (Biomedical Informatics Group) of the UPM works as a partner. Both projects aim is to develop platforms and services with the main goal of storing clinical information (e.g. information from hospital electronic health records (EHRs), clinical trials or research articles) in a common way and easy to access and query, in order to support medical research. The whole software environment of these projects is based on the idea of semantic interoperability, which means the ability of computer systems to exchange data with unambiguous and shared meaning. This idea allows knowledge inference, which fits perfectly in medical research context. The tool to develop in this project is also "semantic operability-oriented". Its purpose is to store standardized clinical information in a common data model, implemented in relational databases. The project has been performed during the period between February 3rd and June 6th, of 2014. It has followed a "Waterfall model" of software development, in which progress is seen as flowing steadily downwards through its phases. Each phase starts when its previous phase has been completed and reviewed. The task of documenting the project‟s work is an exception; it has been performed in a parallel way to the rest of the tasks.
Resumo:
Desde el inicio de los tiempos el ser humano ha tenido la necesidad de comprender y analizar todo lo que nos rodea, para ello se ha valido de diferentes herramientas como las pinturas rupestres, la biblioteca de Alejandría, bastas colecciones de libros y actualmente una enorme cantidad de información informatizada. Todo esto siempre se ha almacenado, según la tecnología de la época lo permitía, con la esperanza de que fuera útil mediante su consulta y análisis. En la actualidad continúa ocurriendo lo mismo. Hasta hace unos años se ha realizado el análisis de información manualmente o mediante bases de datos relacionales. Ahora ha llegado el momento de una nueva tecnología, Big Data, con la cual se puede realizar el análisis de extensas cantidades de datos de todo tipo en tiempos relativamente pequeños. A lo largo de este libro, se estudiarán las características y ventajas de Big Data, además de realizar un estudio de la plataforma Hadoop. Esta es una plataforma basada en Java y puede realizar el análisis de grandes cantidades de datos de diferentes formatos y procedencias. Durante la lectura de estas páginas se irá dotando al lector de los conocimientos previos necesarios para su mejor comprensión, así como de ubicarle temporalmente en el desarrollo de este concepto, de su uso, las previsiones y la evolución y desarrollo que se prevé tenga en los próximos años. ABSTRACT. Since the beginning of time, human being was in need of understanding and analyzing everything around him. In order to do that, he used different media as cave paintings, Alexandria library, big amount of book collections and nowadays massive amount of computerized information. All this information was stored, depending on the age and technology capability, with the expectation of being useful though it consulting and analysis. Nowadays they keep doing the same. In the last years, they have been processing the information manually or using relational databases. Now it is time for a new technology, Big Data, which is able to analyze huge amount of data in a, relatively, small time. Along this book, characteristics and advantages of Big Data will be detailed, so as an introduction to Hadoop platform. This platform is based on Java and can perform the analysis of massive amount of data in different formats and coming from different sources. During this reading, the reader will be provided with the prior knowledge needed to it understanding, so as the temporal location, uses, forecast, evolution and growth in the next years.
Resumo:
The W3C Linked Data Platform (LDP) candidate recom- mendation defines a standard HTTP-based protocol for read/write Linked Data. The W3C R2RML recommendation defines a language to map re- lational databases (RDBs) and RDF. This paper presents morph-LDP, a novel system that combines these two W3C standardization initiatives to expose relational data as read/write Linked Data for LDP-aware ap- plications, whilst allowing legacy applications to continue using their relational databases.