57 resultados para grafana,SEPA,Plugin,RDF,SPARQL
em Universidad Politécnica de Madrid
Resumo:
We introduce SRBench, a general-purpose benchmark primarily designed for streaming RDF/SPARQL engines, completely based on real-world data sets from the Linked Open Data cloud. With the increasing problem of too much streaming data but not enough tools to gain knowledge from them, researchers have set out for solutions in which Semantic Web technologies are adapted and extended for publishing, sharing, analysing and understanding streaming data. To help researchers and users comparing streaming RDF/SPARQL (strRS) engines in a standardised application scenario, we have designed SRBench, with which one can assess the abilities of a strRS engine to cope with a broad range of use cases typically encountered in real-world scenarios. The data sets used in the benchmark have been carefully chosen, such that they represent a realistic and relevant usage of streaming data. The benchmark defines a concise, yet omprehensive set of queries that cover the major aspects of strRS processing. Finally, our work is complemented with a functional evaluation on three representative strRS engines: SPARQLStream, C-SPARQL and CQELS. The presented results are meant to give a first baseline and illustrate the state-of-the-art.
Resumo:
Two complementary benchmarks have been proposed so far for the evaluation and continuous improvement of RDF stream processors: SRBench and LSBench. They put a special focus on different features of the evaluated systems, including coverage of the streaming extensions of SPARQL supported by each processor, query processing throughput, and an early analysis of query evaluation correctness, based on comparing the results obtained by different processors for a set of queries. However, none of them has analysed the operational semantics of these processors in order to assess the correctness of query evaluation results. In this paper, we propose a characterization of the operational semantics of RDF stream processors, adapting well-known models used in the stream processing engine community: CQL and SECRET. Through this formalization, we address correctness in RDF stream processor benchmarks, allowing to determine the multiple answers that systems should provide. Finally, we present CSRBench, an extension of SRBench to address query result correctness verification using an automatic method.
Resumo:
El presente trabajo desarrolla un servicio REST que transforma frases en lenguaje natural a grafos RDF. Los grafos generados son grafos dirigidos, donde los nodos se forman con los sustantivos o adjetivos de las frases, y los arcos se forman con los verbos. Se utiliza dentro del proyecto p-medicine para dar soporte a las siguientes funcionalidades: Búsquedas en lenguaje natural: actualmente la plataforma p-medicine proporciona un interfaz programático para realizar consultas en SPARQL. El servicio desarrollado permitiría generar esas consultas automáticamente a partir de frases en lenguaje natural. Anotaciones de bases de datos mediante lenguaje natural: la plataforma pmedicine incorpora una herramienta, desarrollada por el Grupo de Ingeniería Biomédica de la Universidad Politécnica de Madrid, para la anotación de bases de datos RDF. Estas anotaciones son necesarias para la posterior traducción de las bases de datos a un esquema central. El proceso de anotación requiere que el usuario construya de forma manual las vistas RDF que desea anotar, lo que requiere mostrar gráficamente el esquema RDF y que el usuario construya vistas RDF seleccionando las clases y relaciones necesarias. Este proceso es a menudo complejo y demasiado difícil para un usuario sin perfil técnico. El sistema se incorporará para permitir que la construcción de estas vistas se realice con lenguaje natural. ---ABSTRACT---The present work develops a REST service that transforms natural language sentences to RDF degrees. Generated graphs are directed graphs where nodes are formed with nouns or adjectives of phrases, and the arcs are formed with verbs. Used within the p-medicine project to support the following functionality: Natural language queries: currently the p-medicine platform provides a programmatic interface to query SPARQL. The developed service would automatically generate those queries from natural language sentences. Memos databases using natural language: the p-medicine platform incorporates a tool, developed by the Group of Biomedical Engineering at the Polytechnic University of Madrid, for the annotation of RDF data bases. Such annotations are necessary for the subsequent translation of databases to a central scheme. The annotation process requires the user to manually construct the RDF views that he wants annotate, requiring graphically display the RDF schema and the user to build RDF views by selecting classes and relationships. This process is often complex and too difficult for a user with no technical background. The system is incorporated to allow the construction of these views to be performed with natural language.
Resumo:
R2RML is used to specify transformations of data available in relational databases into materialised or virtual RDF datasets. SPARQL queries evaluated against virtual datasets are translated into SQL queries according to the R2RML mappings, so that they can be evaluated over the underlying relational database engines. In this paper we describe an extension of a well-known algorithm for SPARQL to SQL translation, originally formalised for RDBMS-backed triple stores, that takes into account R2RML mappings. We present the result of our implementation using queries from a synthetic benchmark and from three real use cases, and show that SPARQL queries can be in general evaluated as fast as the SQL queries that would have been generated by SQL experts if no R2RML mappings had been used.
Resumo:
RDB2RDF systems generate RDF from relational databases, operating in two dierent manners: materializing the database content into RDF or acting as virtual RDF datastores that transform SPARQL queries into SQL. In the former, inferences on the RDF data (taking into account the ontologies that they are related to) are normally done by the RDF triple store where the RDF data is materialised and hence the results of the query answering process depend on the store. In the latter, existing RDB2RDF systems do not normally perform such inferences at query time. This paper shows how the algorithm used in the REQUIEM system, focused on handling run-time inferences for query answering, can be adapted to handle such inferences for query answering in combination with RDB2RDF systems.
Resumo:
RDB2RDF systems generate RDF from relational databases, operating in two di�erent manners: materializing the database content into RDF or acting as virtual RDF datastores that transform SPARQL queries into SQL. In the former, inferences on the RDF data (taking into account the ontologies that they are related to) are normally done by the RDF triple store where the RDF data is materialised and hence the results of the query answering process depend on the store. In the latter, existing RDB2RDF systems do not normally perform such inferences at query time. This paper shows how the algorithm used in the REQUIEM system, focused on handling run-time inferences for query answering, can be adapted to handle such inferences for query answering in combination with RDB2RDF systems.
Resumo:
This paper introduces a semantic language developed with the objective to be used in a semantic analyzer based on linguistic and world knowledge. Linguistic knowledge is provided by a Combinatorial Dictionary and several sets of rules. Extra-linguistic information is stored in an Ontology. The meaning of the text is represented by means of a series of RDF-type triples of the form predicate (subject, object). Semantic analyzer is one of the options of the multifunctional ETAP-3 linguistic processor. The analyzer can be used for Information Extraction and Question Answering. We describe semantic representation of expressions that provide an assessment of the number of objects involved and/or give a quantitative evaluation of different types of attributes. We focus on the following aspects: 1) parametric and non-parametric attributes; 2) gradable and non-gradable attributes; 3) ontological representation of different classes of attributes; 4) absolute and relative quantitative assessment; 5) punctual and interval quantitative assessment; 6) intervals with precise and fuzzy boundaries
Resumo:
Ontology antipatterns are structures that reflect ontology modelling problems, they lead to inconsistencies, bad reasoning performance or bad formalisation of domain knowledge. Antipatterns normally appear in ontologies developed by those who are not experts in ontology engineering. Based on our experience in ontology design, we have created a catalogue of such antipatterns in the past, and in this paper we describe how we can use SPARQL-DL to detect them. We conduct some experiments to detect them in a large OWL ontology corpus obtained from the Watson ontology search portal. Our results show that each antipattern needs a specialised detection method.
Resumo:
Ontology antipatterns are structures that reflect ontology modelling problems because they lead to inconsistencies, bad reasoning performance or bad formalisation of domain knowledge. We propose four methods for the detection of antipatterns using SPARQL queries.We conduct some experiments to detect antipattern in a corpus of OWL ontologies.
Resumo:
Testbeds proposed so far to evaluate, compare, and eventually improve SPARQL query federation systems have still some limitations. Some variables and con�gurations that may have an impact on the behavior of these systems (e.g., network latency, data partitioning and query properties) are not su�ciently de�ned; this a�ects the results and repeatability of independent evaluation studies, and hence the insights that can be obtained from them. In this paper we evaluate FedBench, the most comprehensive testbed up to now, and empirically probe the need of considering additional dimensions and variables. The evaluation has been conducted on three SPARQL query federation systems, and the analysis of these results has allowed to uncover properties of these systems that would normally be hidden with the original testbeds.
Resumo:
Given the sustained growth that we are experiencing in the number of SPARQL endpoints available, the need to be able to send federated SPARQL queries across these has also grown. To address this use case, the W3C SPARQL working group is defining a federation extension for SPARQL 1.1 which allows for combining graph patterns that can be evaluated over several endpoints within a single query. In this paper, we describe the syntax of that extension and formalize its semantics. Additionally, we describe how a query evaluation system can be implemented for that federation extension, describing some static optimization techniques and reusing a query engine used for data-intensive science, so as to deal with large amounts of intermediate and final results. Finally we carry out a series of experiments that show that our optimizations speed up the federated query evaluation process.
Resumo:
Durante los últimos años, el imparable crecimiento de fuentes de datos biomédicas, propiciado por el desarrollo de técnicas de generación de datos masivos (principalmente en el campo de la genómica) y la expansión de tecnologías para la comunicación y compartición de información ha propiciado que la investigación biomédica haya pasado a basarse de forma casi exclusiva en el análisis distribuido de información y en la búsqueda de relaciones entre diferentes fuentes de datos. Esto resulta una tarea compleja debido a la heterogeneidad entre las fuentes de datos empleadas (ya sea por el uso de diferentes formatos, tecnologías, o modelizaciones de dominios). Existen trabajos que tienen como objetivo la homogeneización de estas con el fin de conseguir que la información se muestre de forma integrada, como si fuera una única base de datos. Sin embargo no existe ningún trabajo que automatice de forma completa este proceso de integración semántica. Existen dos enfoques principales para dar solución al problema de integración de fuentes heterogéneas de datos: Centralizado y Distribuido. Ambos enfoques requieren de una traducción de datos de un modelo a otro. Para realizar esta tarea se emplean formalizaciones de las relaciones semánticas entre los modelos subyacentes y el modelo central. Estas formalizaciones se denominan comúnmente anotaciones. Las anotaciones de bases de datos, en el contexto de la integración semántica de la información, consisten en definir relaciones entre términos de igual significado, para posibilitar la traducción automática de la información. Dependiendo del problema en el que se esté trabajando, estas relaciones serán entre conceptos individuales o entre conjuntos enteros de conceptos (vistas). El trabajo aquí expuesto se centra en estas últimas. El proyecto europeo p-medicine (FP7-ICT-2009-270089) se basa en el enfoque centralizado y hace uso de anotaciones basadas en vistas y cuyas bases de datos están modeladas en RDF. Los datos extraídos de las diferentes fuentes son traducidos e integrados en un Data Warehouse. Dentro de la plataforma de p-medicine, el Grupo de Informática Biomédica (GIB) de la Universidad Politécnica de Madrid, en el cuál realicé mi trabajo, proporciona una herramienta para la generación de las necesarias anotaciones de las bases de datos RDF. Esta herramienta, denominada Ontology Annotator ofrece la posibilidad de generar de manera manual anotaciones basadas en vistas. Sin embargo, aunque esta herramienta muestra las fuentes de datos a anotar de manera gráfica, la gran mayoría de usuarios encuentran difícil el manejo de la herramienta , y pierden demasiado tiempo en el proceso de anotación. Es por ello que surge la necesidad de desarrollar una herramienta más avanzada, que sea capaz de asistir al usuario en el proceso de anotar bases de datos en p-medicine. El objetivo es automatizar los procesos más complejos de la anotación y presentar de forma natural y entendible la información relativa a las anotaciones de bases de datos RDF. Esta herramienta ha sido denominada Ontology Annotator Assistant, y el trabajo aquí expuesto describe el proceso de diseño y desarrollo, así como algunos algoritmos innovadores que han sido creados por el autor del trabajo para su correcto funcionamiento. Esta herramienta ofrece funcionalidades no existentes previamente en ninguna otra herramienta del área de la anotación automática e integración semántica de bases de datos. ---ABSTRACT---Over the last years, the unstoppable growth of biomedical data sources, mainly thanks to the development of massive data generation techniques (specially in the genomics field) and the rise of the communication and information sharing technologies, lead to the fact that biomedical research has come to rely almost exclusively on the analysis of distributed information and in finding relationships between different data sources. This is a complex task due to the heterogeneity of the sources used (either by the use of different formats, technologies or domain modeling). There are some research proyects that aim homogenization of these sources in order to retrieve information in an integrated way, as if it were a single database. However there is still now work to automate completely this process of semantic integration. There are two main approaches with the purpouse of integrating heterogeneous data sources: Centralized and Distributed. Both approches involve making translation from one model to another. To perform this task there is a need of using formalization of the semantic relationships between the underlying models and the main model. These formalizations are also calles annotations. In the context of semantic integration of the information, data base annotations consist on defining relations between concepts or words with the same meaning, so the automatic translation can be performed. Depending on the task, the ralationships can be between individuals or between whole sets of concepts (views). This paper focuses on the latter. The European project p-medicine (FP7-ICT-2009-270089) is based on the centralized approach. It uses view based annotations and RDF modeled databases. The data retireved from different data sources is translated and joined into a Data Warehouse. Within the p-medicine platform, the Biomedical Informatics Group (GIB) of the Polytechnic University of Madrid, in which I worked, provides a software to create annotations for the RDF sources. This tool, called Ontology Annotator, is used to create annotations manually. However, although Ontology Annotator displays the data sources graphically, most of the users find it difficult to use this software, thus they spend too much time to complete the task. For this reason there is a need to develop a more advanced tool, which would be able to help the user in the task of annotating p-medicine databases. The aim is automating the most complex processes of the annotation and display the information clearly and easy understanding. This software is called Ontology Annotater Assistant and this book describes the process of design and development of it. as well as some innovative algorithms that were designed by the author of the work. This tool provides features that no other software in the field of automatic annotation can provide.
Resumo:
The Web of Data currently comprises ? 62 billion triples from more than 2,000 different datasets covering many fields of knowledge3. This volume of structured Linked Data can be seen as a particular case of Big Data, referred to as Big Semantic Data [4]. Obviously, powerful computational configurations are tradi- tionally required to deal with the scalability problems arising to Big Semantic Data. It is not surprising that this ?data revolution? has competed in parallel with the growth of mobile computing. Smartphones and tablets are massively used at the expense of traditional computers but, to date, mobile devices have more limited computation resources. Therefore, one question that we may ask ourselves would be: can (potentially large) semantic datasets be consumed natively on mobile devices? Currently, only a few mobile apps (e.g., [1, 9, 2, 8]) make use of semantic data that they store in the mobile devices, while many others access existing SPARQL endpoints or Linked Data directly. Two main reasons can be considered for this fact. On the one hand, in spite of some initial approaches [6, 3], there are no well-established triplestores for mobile devices. This is an important limitation because any po- tential app must assume both RDF storage and SPARQL resolution. On the other hand, the particular features of these devices (little storage space, less computational power or more limited bandwidths) limit the adoption of seman- tic data for different uses and purposes. This paper introduces our HDTourist mobile application prototype. It con- sumes urban data from DBpedia4 to help tourists visiting a foreign city. Although it is a simple app, its functionality allows illustrating how semantic data can be stored and queried with limited resources. Our prototype is implemented for An- droid, but its foundations, explained in Section 2, can be deployed in any other platform. The app is described in Section 3, and Section 4 concludes about our current achievements and devises the future work.
Resumo:
In the last years, there has been an increase in the amount of real-time data generated. Sensors attached to things are transforming how we interact with our environment. Extracting meaningful information from these streams of data is essential for some application areas and requires processing systems that scale to varying conditions in data sources, complex queries, and system failures. This paper describes ongoing research on the development of a scalable RDF streaming engine.
Resumo:
In many applications (like social or sensor networks) the in- formation generated can be represented as a continuous stream of RDF items, where each item describes an application event (social network post, sensor measurement, etc). In this paper we focus on compressing RDF streams. In particular, we propose an approach for lossless RDF stream compression, named RDSZ (RDF Differential Stream compressor based on Zlib). This approach takes advantage of the structural similarities among items in a stream by combining a differential item encoding mechanism with the general purpose stream compressor Zlib. Empirical evaluation using several RDF stream datasets shows that this combi- nation produces gains in compression ratios with respect to using Zlib alone.