17 resultados para SQL query equivalence
em Universidad Politécnica de Madrid
Resumo:
R2RML is used to specify transformations of data available in relational databases into materialised or virtual RDF datasets. SPARQL queries evaluated against virtual datasets are translated into SQL queries according to the R2RML mappings, so that they can be evaluated over the underlying relational database engines. In this paper we describe an extension of a well-known algorithm for SPARQL to SQL translation, originally formalised for RDBMS-backed triple stores, that takes into account R2RML mappings. We present the result of our implementation using queries from a synthetic benchmark and from three real use cases, and show that SPARQL queries can be in general evaluated as fast as the SQL queries that would have been generated by SQL experts if no R2RML mappings had been used.
Resumo:
This document is the result of a process of web development to create a tool that will allow to Cracow University of Technology consult, create and manage timetables. The technologies chosen for this purpose are Apache Tomcat Server, My SQL Community Server, JDBC driver, Java Servlets and JSPs for the server side. The client part counts on Javascript, jQuery, AJAX and CSS technologies to perform the dynamism. The document will justify the choice of these technologies and will explain some development tools that help in the integration and development of all this elements: specifically, NetBeans IDE and MySQL workbench have been used as helpful tools. After explaining all the elements involved in the development of the web application, the architecture and the code developed are explained through UML diagrams. Some implementation details related to security are also deeper explained through sequence diagrams. As the source code of the application is provided, an installation manual has been developed to run the project. In addition, as the platform is intended to be a beta that will be grown, some unimplemented ideas for future development are also exposed. Finally, some annexes with important files and scripts related to the initiation of the platform are attached. This project started through an existing tool that needed to be expanded. The main purpose of the project along its development has focused on setting the roots for a whole new platform that will replace the existing one. For this goal, it has been needed to make a deep inspection on the existing web technologies: a web server and a SQL database had to be chosen. Although the alternatives were a lot, Java technology for the server was finally selected because of the big community backwards, the easiness of modelling the language through UML diagrams and the fact of being free license software. Apache Tomcat is the open source server that can use Java Servlet and JSP technology. Related to the SQL database, MySQL Community Server is the most popular open-source SQL Server, with a big community after and quite a lot of tools to manage the server. JDBC is the driver needed to put in contact Java and MySQL. Once we chose the technologies that would be part of the platform, the development process started. After a detailed explanation of the development environment installation, we used UML use case diagrams to set the main tasks of the platform; UML class diagrams served to establish the existing relations between the classes generated; the architecture of the platform was represented through UML deployment diagrams; and Enhanced entity–relationship (EER) model were used to define the tables of the database and their relationships. Apart from the previous diagrams, some implementation issues were explained to make a better understanding of the developed code - UML sequence diagrams helped to explain this. Once the whole platform was properly defined and developed, the performance of the application has been shown: it has been proved that with the current state of the code, the platform covers the use cases that were set as the main target. Nevertheless, some requisites needed for the proper working of the platform have been specified. As the project is aimed to be grown, some ideas that could not be added to this beta have been explained in order not to be missed for future development. Finally, some annexes containing important configuration issues for the platform have been added after proper explanation, as well as an installation guide that will let a new developer get the project ready. In addition to this document some other files related to the project are provided: - Javadoc. The Javadoc containing the information of every Java class created is necessary for a better understanding of the source code. - database_model.mwb. This file contains the model of the database for MySQL Workbench. This model allows, among other things, generate the MySQL script for the creation of the tables. - ScheduleManager.war. The WAR file that will allow loading the developed application into Tomcat Server without using NetBeans. - ScheduleManager.zip. The source code exported from NetBeans project containing all Java packages, JSPs, Javascript files and CSS files that are part of the platform. - config.properties. The configuration file to properly get the names and credentials to use the database, also explained in Annex II. Example of config.properties file. - db_init_script.sql. The SQL query to initiate the database explained in Annex III. SQL statements for MySQL initialization. RESUMEN. Este proyecto tiene como punto de partida la necesidad de evolución de una herramienta web existente. El propósito principal del proyecto durante su desarrollo se ha centrado en establecer las bases de una completamente nueva plataforma que reemplazará a la existente. Para lograr esto, ha sido necesario realizar una profunda inspección en las tecnologías web existentes: un servidor web y una base de datos SQL debían ser elegidos. Aunque existen muchas alternativas, la tecnología Java ha resultado ser elegida debido a la gran comunidad de desarrolladores que tiene detrás, además de la facilidad que proporciona este lenguaje a la hora de modelarlo usando diagramas UML. Tampoco hay que olvidar que es una tecnología de uso libre de licencia. Apache Tomcat es el servidor de código libre que permite emplear Java Servlets y JSPs para hacer uso de la tecnología de Java. Respecto a la base de datos SQL, el servidor más popular de código libre es MySQL, y cuenta también con una gran comunidad detrás y buenas herramientas de modelado, creación y gestión de la bases de datos. JDBC es el driver que va a permitir comunicar las aplicaciones Java con MySQL. Tras elegir las tecnologías que formarían parte de esta nueva plataforma, el proceso de desarrollo tiene comienzo. Tras una extensa explicación de la instalación del entorno de desarrollo, se han usado diagramas de caso de UML para establecer cuáles son los objetivos principales de la plataforma; los diagramas de clases nos permiten realizar una organización del código java desarrollado de modo que sean fácilmente entendibles las relaciones entre las diferentes clases. La arquitectura de la plataforma queda definida a través de diagramas de despliegue. Por último, diagramas EER van a definir las relaciones entre las tablas creadas en la base de datos. Aparte de estos diagramas, algunos detalles de implementación se van a justificar para tener una mejor comprensión del código desarrollado. Diagramas de secuencia ayudarán en estas explicaciones. Una vez que toda la plataforma haya quedad debidamente definida y desarrollada, se va a realizar una demostración de la misma: se demostrará cómo los objetivos generales han sido alcanzados con el desarrollo actual del proyecto. No obstante, algunos requisitos han sido aclarados para que la plataforma trabaje adecuadamente. Como la intención del proyecto es crecer (no es una versión final), algunas ideas que se han podido llevar acabo han quedado descritas de manera que no se pierdan. Por último, algunos anexos que contienen información importante acerca de la plataforma se han añadido tras la correspondiente explicación de su utilidad, así como una guía de instalación que va a permitir a un nuevo desarrollador tener el proyecto preparado. Junto a este documento, ficheros conteniendo el proyecto desarrollado quedan adjuntos. Estos ficheros son: - Documentación Javadoc. Contiene la información de las clases Java que han sido creadas. - database_model.mwb. Este fichero contiene el modelo de la base de datos para MySQL Workbench. Esto permite, entre otras cosas, generar el script de iniciación de la base de datos para la creación de las tablas. - ScheduleManager.war. El fichero WAR que permite desplegar la plataforma en un servidor Apache Tomcat. - ScheduleManager.zip. El código fuente exportado directamente del proyecto de Netbeans. Contiene todos los paquetes de Java generados, ficheros JSPs, Javascript y CSS que forman parte de la plataforma. - config.properties. Ejemplo del fichero de configuración que permite obtener los nombres de la base de datos - db_init_script.sql. Las consultas SQL necesarias para la creación de la base de datos.
Resumo:
RDB to RDF Mapping Language (R2RML) es una recomendación del W3C que permite especificar reglas para transformar bases de datos relacionales a RDF. Estos datos en RDF se pueden materializar y almacenar en un sistema gestor de tripletas RDF (normalmente conocidos con el nombre triple store), en el cual se pueden evaluar consultas SPARQL. Sin embargo, hay casos en los cuales la materialización no es adecuada o posible, por ejemplo, cuando la base de datos se actualiza frecuentemente. En estos casos, lo mejor es considerar los datos en RDF como datos virtuales, de tal manera que las consultas SPARQL anteriormente mencionadas se traduzcan a consultas SQL que se pueden evaluar sobre los sistemas gestores de bases de datos relacionales (SGBD) originales. Para esta traducción se tienen en cuenta los mapeos R2RML. La primera parte de esta tesis se centra en la traducción de consultas. Se propone una formalización de la traducción de SPARQL a SQL utilizando mapeos R2RML. Además se proponen varias técnicas de optimización para generar consultas SQL que son más eficientes cuando son evaluadas en sistemas gestores de bases de datos relacionales. Este enfoque se evalúa mediante un benchmark sintético y varios casos reales. Otra recomendación relacionada con R2RML es la conocida como Direct Mapping (DM), que establece reglas fijas para la transformación de datos relacionales a RDF. A pesar de que ambas recomendaciones se publicaron al mismo tiempo, en septiembre de 2012, todavía no se ha realizado un estudio formal sobre la relación entre ellas. Por tanto, la segunda parte de esta tesis se centra en el estudio de la relación entre R2RML y DM. Se divide este estudio en dos partes: de R2RML a DM, y de DM a R2RML. En el primer caso, se estudia un fragmento de R2RML que tiene la misma expresividad que DM. En el segundo caso, se representan las reglas de DM como mapeos R2RML, y también se añade la semántica implícita (relaciones de subclase, 1-N y M-N) que se puede encontrar codificada en la base de datos. Esta tesis muestra que es posible usar R2RML en casos reales, sin necesidad de realizar materializaciones de los datos, puesto que las consultas SQL generadas son suficientemente eficientes cuando son evaluadas en el sistema gestor de base de datos relacional. Asimismo, esta tesis profundiza en el entendimiento de la relación existente entre las dos recomendaciones del W3C, algo que no había sido estudiado con anterioridad. ABSTRACT. RDB to RDF Mapping Language (R2RML) is a W3C recommendation that allows specifying rules for transforming relational databases into RDF. This RDF data can be materialized and stored in a triple store, so that SPARQL queries can be evaluated by the triple store. However, there are several cases where materialization is not adequate or possible, for example, if the underlying relational database is updated frequently. In those cases, RDF data is better kept virtual, and hence SPARQL queries over it have to be translated into SQL queries to the underlying relational database system considering that the translation process has to take into account the specified R2RML mappings. The first part of this thesis focuses on query translation. We discuss the formalization of the translation from SPARQL to SQL queries that takes into account R2RML mappings. Furthermore, we propose several optimization techniques so that the translation procedure generates SQL queries that can be evaluated more efficiently over the underlying databases. We evaluate our approach using a synthetic benchmark and several real cases, and show positive results that we obtained. Direct Mapping (DM) is another W3C recommendation for the generation of RDF data from relational databases. While R2RML allows users to specify their own transformation rules, DM establishes fixed transformation rules. Although both recommendations were published at the same time, September 2012, there has not been any study regarding the relationship between them. The second part of this thesis focuses on the study of the relationship between R2RML and DM. We divide this study into two directions: from R2RML to DM, and from DM to R2RML. From R2RML to DM, we study a fragment of R2RML having the same expressive power than DM. From DM to R2RML, we represent DM transformation rules as R2RML mappings, and also add the implicit semantics encoded in databases, such as subclass, 1-N and N-N relationships. This thesis shows that by formalizing and optimizing R2RML-based SPARQL to SQL query translation, it is possible to use R2RML engines in real cases as the resulting SQL is efficient enough to be evaluated by the underlying relational databases. In addition to that, this thesis facilitates the understanding of bidirectional relationship between the two W3C recommendations, something that had not been studied before.
Resumo:
RDB2RDF systems generate RDF from relational databases, operating in two dierent manners: materializing the database content into RDF or acting as virtual RDF datastores that transform SPARQL queries into SQL. In the former, inferences on the RDF data (taking into account the ontologies that they are related to) are normally done by the RDF triple store where the RDF data is materialised and hence the results of the query answering process depend on the store. In the latter, existing RDB2RDF systems do not normally perform such inferences at query time. This paper shows how the algorithm used in the REQUIEM system, focused on handling run-time inferences for query answering, can be adapted to handle such inferences for query answering in combination with RDB2RDF systems.
Resumo:
RDB2RDF systems generate RDF from relational databases, operating in two di�erent manners: materializing the database content into RDF or acting as virtual RDF datastores that transform SPARQL queries into SQL. In the former, inferences on the RDF data (taking into account the ontologies that they are related to) are normally done by the RDF triple store where the RDF data is materialised and hence the results of the query answering process depend on the store. In the latter, existing RDB2RDF systems do not normally perform such inferences at query time. This paper shows how the algorithm used in the REQUIEM system, focused on handling run-time inferences for query answering, can be adapted to handle such inferences for query answering in combination with RDB2RDF systems.
Resumo:
El paradigma de procesamiento de eventos CEP plantea la solución al reto del análisis de grandes cantidades de datos en tiempo real, como por ejemplo, monitorización de los valores de bolsa o el estado del tráfico de carreteras. En este paradigma los eventos recibidos deben procesarse sin almacenarse debido a que el volumen de datos es demasiado elevado y a las necesidades de baja latencia. Para ello se utilizan sistemas distribuidos con una alta escalabilidad, elevado throughput y baja latencia. Este tipo de sistemas son usualmente complejos y el tiempo de aprendizaje requerido para su uso es elevado. Sin embargo, muchos de estos sistemas carecen de un lenguaje declarativo de consultas en el que expresar la computación que se desea realizar sobre los eventos recibidos. En este trabajo se ha desarrollado un lenguaje declarativo de consultas similar a SQL y un compilador que realiza la traducción de este lenguaje al lenguaje nativo del sistema de procesamiento masivo de eventos. El lenguaje desarrollado en este trabajo es similar a SQL, con el que se encuentran familiarizados un gran número de desarrolladores y por tanto aprender este lenguaje no supondría un gran esfuerzo. Así el uso de este lenguaje logra reducir los errores en ejecución de la consulta desplegada sobre el sistema distribuido al tiempo que se abstrae al programador de los detalles de este sistema.---ABSTRACT---The complex event processing paradigm CEP has become the solution for high volume data analytics which demand scalability, high throughput, and low latency. Examples of applications which use this paradigm are financial processing or traffic monitoring. A distributed system is used to achieve the performance requisites. These same requisites force the distributed system not to store the events but to process them on the fly as they are received. These distributed systems are complex systems which require a considerably long time to learn and use. The majority of such distributed systems lack a declarative language in which to express the computation to perform over incoming events. In this work, a new SQL-like declarative language and a compiler have been developed. This compiler translates this new language to the distributed system native language. Due to its similarity with SQL a vast amount of developers who are already familiar with SQL will need little time to learn this language. Thus, this language reduces the execution failures at the time the programmer no longer needs to know every single detail of the underlying distributed system to submit a query.
Resumo:
Learning the structure of a graphical model from data is a common task in a wide range of practical applications. In this paper, we focus on Gaussian Bayesian networks, i.e., on continuous data and directed acyclic graphs with a joint probability density of all variables given by a Gaussian. We propose to work in an equivalence class search space, specifically using the k-greedy equivalence search algorithm. This, combined with regularization techniques to guide the structure search, can learn sparse networks close to the one that generated the data. We provide results on some synthetic networks and on modeling the gene network of the two biological pathways regulating the biosynthesis of isoprenoids for the Arabidopsis thaliana plant
Resumo:
This paper describes the participation of DAEDALUS at the LogCLEF lab in CLEF 2011. This year, the objectives of our participation are twofold. The first topic is to analyze if there is any measurable effect on the success of the search queries if the native language and the interface language chosen by the user are different. The idea is to determine if this difference may condition the way in which the user interacts with the search application. The second topic is to analyze the user context and his/her interaction with the system in the case of successful queries, to discover out any relation among the user native language, the language of the resource involved and the interaction strategy adopted by the user to find out such resource. Only 6.89% of queries are successful out of the 628,607 queries in the 320,001 sessions with at least one search query in the log. The main conclusion that can be drawn is that, in general for all languages, whether the native language matches the interface language or not does not seem to affect the success rate of the search queries. On the other hand, the analysis of the strategy adopted by users when looking for a particular resource shows that people tend to use the simple search tool, frequently first running short queries build up of just one specific term and then browsing through the results to locate the expected resource
Resumo:
Sensor networks are increasingly being deployed in the environment for many different purposes. The observations that they produce are made available with heterogeneous schemas, vocabularies and data formats, making it difficult to share and reuse this data, for other purposes than those for which they were originally set up. The authors propose an ontology-based approach for providing data access and query capabilities to streaming data sources, allowing users to express their needs at a conceptual level, independent of implementation and language-specific details. In this article, the authors describe the theoretical foundations and technologies that enable exposing semantically enriched sensor metadata, and querying sensor observations through SPARQL extensions, using query rewriting and data translation techniques according to mapping languages, and managing both pull and push delivery modes.
Resumo:
Some verification and validation techniques have been evaluated both theoretically and empirically. Most empirical studies have been conducted without subjects, passing over any effect testers have when they apply the techniques. We have run an experiment with students to evaluate the effectiveness of three verification and validation techniques (equivalence partitioning, branch testing and code reading by stepwise abstraction). We have studied how well able the techniques are to reveal defects in three programs. We have replicated the experiment eight times at different sites. Our results show that equivalence partitioning and branch testing are equally effective and better than code reading by stepwise abstraction. The effectiveness of code reading by stepwise abstraction varies significantly from program to program. Finally, we have identified project contextual variables that should be considered when applying any verification and validation technique or to choose one particular technique.
Resumo:
Testbeds proposed so far to evaluate, compare, and eventually improve SPARQL query federation systems have still some limitations. Some variables and con�gurations that may have an impact on the behavior of these systems (e.g., network latency, data partitioning and query properties) are not su�ciently de�ned; this a�ects the results and repeatability of independent evaluation studies, and hence the insights that can be obtained from them. In this paper we evaluate FedBench, the most comprehensive testbed up to now, and empirically probe the need of considering additional dimensions and variables. The evaluation has been conducted on three SPARQL query federation systems, and the analysis of these results has allowed to uncover properties of these systems that would normally be hidden with the original testbeds.
Resumo:
Query rewriting is one of the fundamental steps in ontologybased data access (OBDA) approaches. It takes as inputs an ontology and a query written according to that ontology, and produces as an output a set of queries that should be evaluated to account for the inferences that should be considered for that query and ontology. Different query rewriting systems give support to different ontology languages with varying expressiveness, and the rewritten queries obtained as an output do also vary in expressiveness. This heterogeneity has traditionally made it difficult to compare different approaches, and the area lacks in general commonly agreed benchmarks that could be used not only for such comparisons but also for improving OBDA support. In this paper we compile data, dimensions and measurements that have been used to evaluate some of the most recent systems, we analyse and characterise these assets, and provide a unified set of them that could be used as a starting point towards a more systematic benchmarking process for such systems. Finally, we apply this initial benchmark with some of the most relevant OBDA approaches in the state of the art.
Resumo:
La gestión del conocimiento (KM) es el proceso de recolectar datos en bruto para su análisis y filtrado, con la finalidad de obtener conocimiento útil a partir de dichos datos. En este proyecto se pretende hacer un estudio sobre la gestión de la información en las redes de sensores inalámbricos como inicio para sentar las bases para la gestión del conocimiento en las mismas. Las redes de sensores inalámbricos (WSN) son redes compuestas por sensores (también conocidos como motas) distribuidos sobre un área, cuya misión es monitorizar una o varias condiciones físicas del entorno. Las redes de sensores inalámbricos se caracterizan por tener restricciones de consumo para los sensores que utilizan baterías, por su capacidad para adaptarse a cambios y ser escalables, y también por su habilidad para hacer frente a fallos en los sensores. En este proyecto se hace un estudio sobre la gestión de la información en redes de sensores inalámbricos. Se comienza introduciendo algunos conceptos básicos: arquitectura, pila de protocolos, topologías de red, etc.… Después de esto, se ha enfocado el estudio hacia TinyDB, el cual puede ser considerado como parte de las tecnologías más avanzadas en el estado del arte de la gestión de la información en redes de sensores inalámbricos. TinyDB es un sistema de procesamiento de consultas para extraer información de una red de sensores. Proporciona una interfaz similar a SQL y permite trabajar con consultas contra la red de sensores inalámbricos como si se tratara de una base de datos tradicional. Además, TinyDB implementa varias optimizaciones para manejar los datos eficientemente. En este proyecto se describe también la implementación de una sencilla aplicación basada en redes de sensores inalámbricos. Las motas en la aplicación son capaces de medir la corriente a través de un cable. El objetivo de esta aplicación es monitorizar el consumo de energía en diferentes zonas de un área industrial o doméstico, utilizando redes de sensores inalámbricas. Además, se han implementado las optimizaciones más importantes que se han aprendido en el análisis de la plataforma TinyDB. Para desarrollar esta aplicación se ha utilizado como sensores la plataforma open-source de creación de prototipos electrónicos Arduino, y el ordenador de placa reducida Raspberry Pi como coordinador. ABSTRACT. Knowledge management (KM) is the process of collecting raw data for analysis and filtering, to get a useful knowledge from this data. In this project the information management in wireless sensor networks is studied as starting point before knowledge management. Wireless sensor networks (WSN) are networks which consists of sensors (also known as motes) distributed over an area, to monitor some physical conditions of the environment. Wireless sensor networks are characterized by power consumption constrains for sensors which are using batteries, by the ability to be adaptable to changes and to be scalable, and by the ability to cope sensor failures. In this project it is studied information management in wireless sensor networks. The document starts introducing basic concepts: architecture, stack of protocols, network topology… After this, the study has been focused on TinyDB, which can be considered as part of the most advanced technologies in the state of the art of information management in wireless sensor networks. TinyDB is a query processing system for extracting information from a network of sensors. It provides a SQL-like interface and it lets us to work with queries against the wireless sensor network like if it was a traditional database. In addition, TinyDB implements a lot of optimizations to manage data efficiently. In this project, it is implemented a simple wireless sensor network application too. Application’s motes are able to measure amperage through a cable. The target of the application is, by using a wireless sensor network and these sensors, to monitor energy consumption in different areas of a house. Additionally, it is implemented the most important optimizations that we have learned from the analysis of TinyDB platform. To develop this application it is used Arduino open-source electronics prototyping platform as motes, and Raspberry Pi single-board computer as coordinator.
Resumo:
In this paper we study query answering and rewriting in ontologybased data access. Specifically, we present an algorithm for computing a perfect rewriting of unions of conjunctive queries posed over ontologies expressed in the description logic ELHIO, which covers the OWL 2 QL and OWL 2 EL profiles. The novelty of our algorithm is the use of a set of ABox dependencies, which are compiled into a so-called EBox, to limit the expansion of the rewriting. So far, EBoxes have only been used in query rewriting in the case of DL-Lite, which is less expressive than ELHIO. We have extensively evaluated our new query rewriting technique, and in this paper we discuss the tradeoff between the reduction of the size of the rewriting and the computational cost of our approach.
Resumo:
Query rewriting is one of the fundamental steps in ontologybased data access (OBDA) approaches. It takes as inputs an ontology and a query written according to that ontology, and produces as an output a set of queries that should be evaluated to account for the inferences that should be considered for that query and ontology. Different query rewriting systems give support to different ontology languages with varying expressiveness, and the rewritten queries obtained as an output do also vary in expressiveness. This heterogeneity has traditionally made it difficult to compare different approaches, and the area lacks in general commonly agreed benchmarks that could be used not only for such comparisons but also for improving OBDA support. In this paper we compile data, dimensions and measurements that have been used to evaluate some of the most recent systems, we analyse and characterise these assets, and provide a unified set of them that could be used as a starting point towards a more systematic benchmarking process for such systems. Finally, we apply this initial benchmark with some of the most relevant OBDA approaches in the state of the art.