61 resultados para OWL web ontology language
em Universidad Politécnica de Madrid
Resumo:
Although the computational complexity of the logic underlying the standard OWL 2 for the Web Ontology Language (OWL) appears discouraging for real applications, several contributions have shown that reasoning with OWL ontologies is feasible in practice. It turns out that reasoning in practice is often far less complex than is suggested by the established theoretical complexity bound, which reflects the worstcase scenario. State-of-the reasoners like FACT++, HERMIT, PELLET and RACER have demonstrated that, even with fairly expressive fragments of OWL 2, acceptable performances can be achieved. However, it is still not well understood why reasoning is feasible in practice and it is rather unclear how to study this problem. In this paper, we suggest first steps that in our opinion could lead to a better understanding of practical complexity. We also provide and discuss some initial empirical results with HERMIT on prominent ontologies
Resumo:
The conformance of semantic technologies has to be systematically evaluated to measure and verify the real adherence of these technologies to the Semantic Web standards. Currente valuations of semantic technology conformance are not exhaustive enough and do not directly cover user requirements and use scenarios, which raises the need for a simple, extensible and parameterizable method to generate test data for such evaluations. To address this need, this paper presents a keyword-driven approach for generating ontology language conformance test data that can be used to evaluate semantic technologies, details the definition of a test suite for evaluating OWL DL conformance using this approach,and describes the use and extension of this test suite during the evaluation of some tools.
Resumo:
In the context of the Semantic Web, natural language descriptions associated with ontologies have proven to be of major importance not only to support ontology developers and adopters, but also to assist in tasks such as ontology mapping, information extraction, or natural language generation. In the state-of-the-art we find some attempts to provide guidelines for URI local names in English, and also some disagreement on the use of URIs for describing ontology elements. When trying to extrapolate these ideas to a multilingual scenario, some of these approaches fail to provide a valid solution. On the basis of some real experiences in the translation of ontologies from English into Spanish, we provide a preliminary set of guidelines for naming and labeling ontologies in a multilingual scenario.
Resumo:
The W3C Semantic Sensor Network Incubator group (the SSN-XG) produced an OWL 2 ontology to describe sensors and observations ? the SSN ontology, available at http://purl.oclc.org/NET/ssnx/ssn. The SSN ontology can describe sensors in terms of capabilities, measurement processes, observations and deployments. This article describes the SSN ontology. It further gives an example and describes the use of the ontology in recent research projects.
Resumo:
Internet está evolucionando hacia la conocida como Live Web. En esta nueva etapa en la evolución de Internet, se pone al servicio de los usuarios multitud de streams de datos sociales. Gracias a estas fuentes de datos, los usuarios han pasado de navegar por páginas web estáticas a interacturar con aplicaciones que ofrecen contenido personalizado, basada en sus preferencias. Cada usuario interactúa a diario con multiples aplicaciones que ofrecen notificaciones y alertas, en este sentido cada usuario es una fuente de eventos, y a menudo los usuarios se sienten desbordados y no son capaces de procesar toda esa información a la carta. Para lidiar con esta sobresaturación, han aparecido múltiples herramientas que automatizan las tareas más habituales, desde gestores de bandeja de entrada, gestores de alertas en redes sociales, a complejos CRMs o smart-home hubs. La contrapartida es que aunque ofrecen una solución a problemas comunes, no pueden adaptarse a las necesidades de cada usuario ofreciendo una solucion personalizada. Los Servicios de Automatización de Tareas (TAS de sus siglas en inglés) entraron en escena a partir de 2012 para dar solución a esta liminación. Dada su semejanza, estos servicios también son considerados como un nuevo enfoque en la tecnología de mash-ups pero centra en el usuarios. Los usuarios de estas plataformas tienen la capacidad de interconectar servicios, sensores y otros aparatos con connexión a internet diseñando las automatizaciones que se ajustan a sus necesidades. La propuesta ha sido ámpliamante aceptada por los usuarios. Este hecho ha propiciado multitud de plataformas que ofrecen servicios TAS entren en escena. Al ser un nuevo campo de investigación, esta tesis presenta las principales características de los TAS, describe sus componentes, e identifica las dimensiones fundamentales que los defines y permiten su clasificación. En este trabajo se acuña el termino Servicio de Automatización de Tareas (TAS) dando una descripción formal para estos servicios y sus componentes (llamados canales), y proporciona una arquitectura de referencia. De igual forma, existe una falta de herramientas para describir servicios de automatización, y las reglas de automatización. A este respecto, esta tesis propone un modelo común que se concreta en la ontología EWE (Evented WEb Ontology). Este modelo permite com parar y equiparar canales y automatizaciones de distintos TASs, constituyendo un aporte considerable paraa la portabilidad de automatizaciones de usuarios entre plataformas. De igual manera, dado el carácter semántico del modelo, permite incluir en las automatizaciones elementos de fuentes externas sobre los que razonar, como es el caso de Linked Open Data. Utilizando este modelo, se ha generado un dataset de canales y automatizaciones, con los datos obtenidos de algunos de los TAS existentes en el mercado. Como último paso hacia el lograr un modelo común para describir TAS, se ha desarrollado un algoritmo para aprender ontologías de forma automática a partir de los datos del dataset. De esta forma, se favorece el descubrimiento de nuevos canales, y se reduce el coste de mantenimiento del modelo, el cual se actualiza de forma semi-automática. En conclusión, las principales contribuciones de esta tesis son: i) describir el estado del arte en automatización de tareas y acuñar el término Servicio de Automatización de Tareas, ii) desarrollar una ontología para el modelado de los componentes de TASs y automatizaciones, iii) poblar un dataset de datos de canales y automatizaciones, usado para desarrollar un algoritmo de aprendizaje automatico de ontologías, y iv) diseñar una arquitectura de agentes para la asistencia a usuarios en la creación de automatizaciones. ABSTRACT The new stage in the evolution of the Web (the Live Web or Evented Web) puts lots of social data-streams at the service of users, who no longer browse static web pages but interact with applications that present them contextual and relevant experiences. Given that each user is a potential source of events, a typical user often gets overwhelmed. To deal with that huge amount of data, multiple automation tools have emerged, covering from simple social media managers or notification aggregators to complex CRMs or smart-home Hub/Apps. As a downside, they cannot tailor to the needs of every single user. As a natural response to this downside, Task Automation Services broke in the Internet. They may be seen as a new model of mash-up technology for combining social streams, services and connected devices from an end-user perspective: end-users are empowered to connect those stream however they want, designing the automations they need. The numbers of those platforms that appeared early on shot up, and as a consequence the amount of platforms following this approach is growing fast. Being a novel field, this thesis aims to shed light on it, presenting and exemplifying the main characteristics of Task Automation Services, describing their components, and identifying several dimensions to classify them. This thesis coins the term Task Automation Services (TAS) by providing a formal definition of them, their components (called channels), as well a TAS reference architecture. There is also a lack of tools for describing automation services and automations rules. In this regard, this thesis proposes a theoretical common model of TAS and formalizes it as the EWE ontology This model enables to compare channels and automations from different TASs, which has a high impact in interoperability; and enhances automations providing a mechanism to reason over external sources such as Linked Open Data. Based on this model, a dataset of components of TAS was built, harvesting data from the web sites of actual TASs. Going a step further towards this common model, an algorithm for categorizing them was designed, enabling their discovery across different TAS. Thus, the main contributions of the thesis are: i) surveying the state of the art on task automation and coining the term Task Automation Service; ii) providing a semantic common model for describing TAS components and automations; iii) populating a categorized dataset of TAS components, used to learn ontologies of particular domains from the TAS perspective; and iv) designing an agent architecture for assisting users in setting up automations, that is aware of their context and acts in consequence.
Resumo:
Dada la amplia información que rodea el dominio de programas de estudios superiores, específicamente los estudios de grado, este trabajo fin de grado propone la construcción de un modelo para representar dicha información mediante la construcción de una red de ontologías, proporcionando una definición común de conceptos importantes, y que posteriormente puede ser reutilizada para la construcción de aplicaciones que ayuden a las partes interesadas, como, estudiantes, personal académico y administrativo, a la búsqueda y acceso de información oportuna. Para la construcción de esta red de ontologías, se siguen las recomendaciones y pautas propuestas por la metodología NeOn [1a] [1b], que sigue un paradigma basado en la reutilización de recursos de conocimiento. Por otra parte, se realiza una populación de dicha red de ontologías mediante datos específicos del grado en Ingeniería Informática de la Universidad Politécnica de Madrid. La construcción de una red de ontologías siguiendo las directrices de la metodología NeOn requiere la realización de distintas actividades y tareas, como el estudio del dominio, estudio de la viabilidad, especificación de requisitos, conceptualización, formalización, implementación y mantenimiento. Se realizan también muchas otras actividades y tareas dependiendo del contexto en el que se construye la ontología. En este proyecto se hace un especial énfasis en las actividades de especificación de requisitos, conceptualización e implementación, además de la actividad de búsqueda de recursos ontológicos para su posterior reutilización. Se ha construido una red de ontologías llamada: European Bachelor Degree Ontology (EBDO) que incluye términos y conceptos importantes que se han detectado en la etapa de especificación de requisitos y que las ontologías a reutilizar no contemplan. Las decisiones de diseño para la construcción de esta nueva red de ontologías y su alineamiento con las ontologías a reutilizar se han basado en la especificación de requisitos ontológicos. Una vez definidos los conceptos relevantes de la red de ontologías, se ha implementado la red de ontologías en un lenguaje computable. Una vez que la red de ontologías se ha implementado se han realizado tareas de evaluación para corregir posible errores. Finalmente, cuando se ha obtenido una versión estable de la ontología, se ha realizado la instanciación de individuos del plan de estudios del grado en Ingeniería Informática de la Universidad Politécnica de Madrid.---ABSTRACT---Given the extensive information surrounding the domain of higher education programs, specifically bachelor degree studies, this bachelor degree project proposes the construction of a model to represent this information by building an ontology network, providing a common definition of important concepts. This network can be reused to build semantic applications that help stakeholders, such as, students, academic and organisational staff, to search and access to timely information. For the construction of this network of ontologies, guidelines and recommendations proposed by the NeOn Methodology [1a] [1b] have been followed. This methodology follows a paradigm based on the reuse of knowledge resources. Moreover, a population of this ontology network is performed with specific data of the Computer Science Degree from Universidad Politécnica de Madrid Building a network of ontologies following the guidelines of the NeOn Methodology requires the completion of various activities and tasks such as, the study of the domain, study of the feasibility, requirements specification, conceptualization, formalization, implementation and maintenance. Many other activities and tasks are also performed depending on the context in which the ontology is built. In this project a special emphasis is made on the activities of requirements specification, conceptualization and search of ontological resources for reuse and implementation. A new network of ontologies, named European Bachelor Degree Ontology (EBDO), has been built. This network includes terms and concepts that have been detected at the stage of requirements specification and that the reused ontologies have not contemplated. Design principles for the construction of this new network of ontologies and for the reused ontologies alignment have been based on the ontological specification requirements. Once the relevant concepts of the ontology network are defined, the network has been implemented in a computable ontology language. Once the network ontology is implemented, the evaluation activity has been conducted to correct the errors that the network presented. Finally, when a stable version of the ontology has been obtained, the instantiation of individuals of the study program of the Bachelor Degree in Computer Science from Universidad Politécnica de Madrid has been performed.
Resumo:
OntoTag - A Linguistic and Ontological Annotation Model Suitable for the Semantic Web
1. INTRODUCTION. LINGUISTIC TOOLS AND ANNOTATIONS: THEIR LIGHTS AND SHADOWS
Computational Linguistics is already a consolidated research area. It builds upon the results of other two major ones, namely Linguistics and Computer Science and Engineering, and it aims at developing computational models of human language (or natural language, as it is termed in this area). Possibly, its most well-known applications are the different tools developed so far for processing human language, such as machine translation systems and speech recognizers or dictation programs.
These tools for processing human language are commonly referred to as linguistic tools. Apart from the examples mentioned above, there are also other types of linguistic tools that perhaps are not so well-known, but on which most of the other applications of Computational Linguistics are built. These other types of linguistic tools comprise POS taggers, natural language parsers and semantic taggers, amongst others. All of them can be termed linguistic annotation tools.
Linguistic annotation tools are important assets. In fact, POS and semantic taggers (and, to a lesser extent, also natural language parsers) have become critical resources for the computer applications that process natural language. Hence, any computer application that has to analyse a text automatically and ‘intelligently’ will include at least a module for POS tagging. The more an application needs to ‘understand’ the meaning of the text it processes, the more linguistic tools and/or modules it will incorporate and integrate.
However, linguistic annotation tools have still some limitations, which can be summarised as follows:
1. Normally, they perform annotations only at a certain linguistic level (that is, Morphology, Syntax, Semantics, etc.).
2. They usually introduce a certain rate of errors and ambiguities when tagging. This error rate ranges from 10 percent up to 50 percent of the units annotated for unrestricted, general texts.
3. Their annotations are most frequently formulated in terms of an annotation schema designed and implemented ad hoc.
A priori, it seems that the interoperation and the integration of several linguistic tools into an appropriate software architecture could most likely solve the limitations stated in (1). Besides, integrating several linguistic annotation tools and making them interoperate could also minimise the limitation stated in (2). Nevertheless, in the latter case, all these tools should produce annotations for a common level, which would have to be combined in order to correct their corresponding errors and inaccuracies. Yet, the limitation stated in (3) prevents both types of integration and interoperation from being easily achieved.
In addition, most high-level annotation tools rely on other lower-level annotation tools and their outputs to generate their own ones. For example, sense-tagging tools (operating at the semantic level) often use POS taggers (operating at a lower level, i.e., the morphosyntactic) to identify the grammatical category of the word or lexical unit they are annotating. Accordingly, if a faulty or inaccurate low-level annotation tool is to be used by other higher-level one in its process, the errors and inaccuracies of the former should be minimised in advance. Otherwise, these errors and inaccuracies would be transferred to (and even magnified in) the annotations of the high-level annotation tool.
Therefore, it would be quite useful to find a way to
(i) correct or, at least, reduce the errors and the inaccuracies of lower-level linguistic tools;
(ii) unify the annotation schemas of different linguistic annotation tools or, more generally speaking, make these tools (as well as their annotations) interoperate.
Clearly, solving (i) and (ii) should ease the automatic annotation of web pages by means of linguistic tools, and their transformation into Semantic Web pages (Berners-Lee, Hendler and Lassila, 2001). Yet, as stated above, (ii) is a type of interoperability problem. There again, ontologies (Gruber, 1993; Borst, 1997) have been successfully applied thus far to solve several interoperability problems. Hence, ontologies should help solve also the problems and limitations of linguistic annotation tools aforementioned.
Thus, to summarise, the main aim of the present work was to combine somehow these separated approaches, mechanisms and tools for annotation from Linguistics and Ontological Engineering (and the Semantic Web) in a sort of hybrid (linguistic and ontological) annotation model, suitable for both areas. This hybrid (semantic) annotation model should (a) benefit from the advances, models, techniques, mechanisms and tools of these two areas; (b) minimise (and even solve, when possible) some of the problems found in each of them; and (c) be suitable for the Semantic Web. The concrete goals that helped attain this aim are presented in the following section.
2. GOALS OF THE PRESENT WORK
As mentioned above, the main goal of this work was to specify a hybrid (that is, linguistically-motivated and ontology-based) model of annotation suitable for the Semantic Web (i.e. it had to produce a semantic annotation of web page contents). This entailed that the tags included in the annotations of the model had to (1) represent linguistic concepts (or linguistic categories, as they are termed in ISO/DCR (2008)), in order for this model to be linguistically-motivated; (2) be ontological terms (i.e., use an ontological vocabulary), in order for the model to be ontology-based; and (3) be structured (linked) as a collection of ontology-based
Resumo:
Recently, the Semantic Web has experienced significant advancements in standards and techniques, as well as in the amount of semantic information available online. Nevertheless, mechanisms are still needed to automatically reconcile information when it is expressed in different natural languages on the Web of Data, in order to improve the access to semantic information across language barriers. In this context several challenges arise [1], such as: (i) ontology translation/localization, (ii) cross-lingual ontology mappings, (iii) representation of multilingual lexical information, and (iv) cross-lingual access and querying of linked data. In the following we will focus on the second challenge, which is the necessity of establishing, representing and storing cross-lingual links among semantic information on the Web. In fact, in a “truly” multilingual Semantic Web, semantic data with lexical representations in one natural language would be mapped to equivalent or related information in other languages, thus making navigation across multilingual information possible for software agents.
Resumo:
Biomedical ontologies are key elements for building up the Life Sciences Semantic Web. Reusing and building biomedical ontologies requires flexible and versatile tools to manipulate them efficiently, in particular for enriching their axiomatic content. The Ontology Pre Processor Language (OPPL) is an OWL-based language for automating the changes to be performed in an ontology. OPPL augments the ontologists’ toolbox by providing a more efficient, and less error-prone, mechanism for enriching a biomedical ontology than that obtained by a manual treatment. Results We present OPPL-Galaxy, a wrapper for using OPPL within Galaxy. The functionality delivered by OPPL (i.e. automated ontology manipulation) can be combined with the tools and workflows devised within the Galaxy framework, resulting in an enhancement of OPPL. Use cases are provided in order to demonstrate OPPL-Galaxy’s capability for enriching, modifying and querying biomedical ontologies. Conclusions Coupling OPPL-Galaxy with other bioinformatics tools of the Galaxy framework results in a system that is more than the sum of its parts. OPPL-Galaxy opens a new dimension of analyses and exploitation of biomedical ontologies, including automated reasoning, paving the way towards advanced biological data analyses.
Resumo:
This paper presents a Focused Crawler in order to Get Semantic Web Resources (CSR). Structured data web are available in formats such as Extensible Markup Language (XML), Resource Description Framework (RDF) and Ontology Web Language (OWL) that can be used for processing. One of the main challenges for performing a manual search and download semantic web resources is that this task consumes a lot of time. Our research work propose a focused crawler which allow to download these resources automatically and store them on disk in order to have a collection that will be used for data processing. CRS consists of three layers: (a) The User Interface Layer, (b) The Focus Crawler Layer and (c) The Base Crawler Layer. CSR uses as a selection policie the Shark-Search method. CSR was conducted with two experiments. The first one starts on December 15 2012 at 7:11 am and ends on December 16 2012 at 4:01 were obtained 448,123,537 bytes of data. The CSR ends by itself after to analyze 80,4375 seeds with an unlimited depth. CSR got 16,576 semantic resources files where the 89 % was RDF, the 10 % was XML and the 1% was OWL. The second one was based on the Web Data Commons work of the Research Group Data and Web Science at the University of Mannheim and the Institute AIFB at the Karlsruhe Institute of Technology. This began at 4:46 am of June 2 2013 and 1:37 am June 9 2013. After 162.51 hours of execution the result was 285,279 semantic resources where predominated the XML resources with 99 % and OWL and RDF with 1 % each one.
Resumo:
In the beginning of the 90s, ontology development was similar to an art: ontology developers did not have clear guidelines on how to build ontologies but only some design criteria to be followed. Work on principles, methods and methodologies, together with supporting technologies and languages, made ontology development become an engineering discipline, the so-called Ontology Engineering. Ontology Engineering refers to the set of activities that concern the ontology development process and the ontology life cycle, the methods and methodologies for building ontologies, and the tool suites and languages that support them. Thanks to the work done in the Ontology Engineering field, the development of ontologies within and between teams has increased and improved, as well as the possibility of reusing ontologies in other developments and in final applications. Currently, ontologies are widely used in (a) Knowledge Engineering, Artificial Intelligence and Computer Science, (b) applications related to knowledge management, natural language processing, e-commerce, intelligent information integration, information retrieval, database design and integration, bio-informatics, education, and (c) the Semantic Web, the Semantic Grid, and the Linked Data initiative. In this paper, we provide an overview of Ontology Engineering, mentioning the most outstanding and used methodologies, languages, and tools for building ontologies. In addition, we include some words on how all these elements can be used in the Linked Data initiative.
Resumo:
The Web has witnessed an enormous growth in the amount of semantic information published in recent years. This growth has been stimulated to a large extent by the emergence of Linked Data. Although this brings us a big step closer to the vision of a Semantic Web, it also raises new issues such as the need for dealing with information expressed in different natural languages. Indeed, although the Web of Data can contain any kind of information in any language, it still lacks explicit mechanisms to automatically reconcile such information when it is expressed in different languages. This leads to situations in which data expressed in a certain language is not easily accessible to speakers of other languages. The Web of Data shows the potential for being extended to a truly multilingual web as vocabularies and data can be published in a language-independent fashion, while associated language-dependent (linguistic) information supporting the access across languages can be stored separately. In this sense, the multilingual Web of Data can be realized in our view as a layer of services and resources on top of the existing Linked Data infrastructure adding i) linguistic information for data and vocabularies in different languages, ii) mappings between data with labels in different languages, and iii) services to dynamically access and traverse Linked Data across different languages. In this article we present this vision of a multilingual Web of Data. We discuss challenges that need to be addressed to make this vision come true and discuss the role that techniques such as ontology localization, ontology mapping, and cross-lingual ontology-based information access and presentation will play in achieving this. Further, we propose an initial architecture and describe a roadmap that can provide a basis for the implementation of this vision.
Resumo:
Lexica and terminology databases play a vital role in many NLP applications, but currently most such resources are published in application-specific formats, or with custom access interfaces, leading to the problem that much of this data is in ‘‘data silos’’ and hence difficult to access. The Semantic Web and in particular the Linked Data initiative provide effective solutions to this problem, as well as possibilities for data reuse by inter-lexicon linking, and incorporation of data categories by dereferencable URIs. The Semantic Web focuses on the use of ontologies to describe semantics on the Web, but currently there is no standard for providing complex lexical information for such ontologies and for describing the relationship between the lexicon and the ontology. We present our model, lemon, which aims to address these gaps
Resumo:
This PhD thesis contributes to the problem of resource and service discovery in the context of the composable web. In the current web, mashup technologies allow developers reusing services and contents to build new web applications. However, developers face a problem of information flood when searching for appropriate services or resources for their combination. To contribute to overcoming this problem, a framework is defined for the discovery of services and resources. In this framework, three levels are defined for performing discovery at content, discovery and agente levels. The content level involves the information available in web resources. The web follows the Representational Stateless Transfer (REST) architectural style, in which resources are returned as representations from servers to clients. These representations usually employ the HyperText Markup Language (HTML), which, along with Content Style Sheets (CSS), describes the markup employed to render representations in a web browser. Although the use of SemanticWeb standards such as Resource Description Framework (RDF) make this architecture suitable for automatic processes to use the information present in web resources, these standards are too often not employed, so automation must rely on processing HTML. This process, often referred as Screen Scraping in the literature, is the content discovery according to the proposed framework. At this level, discovery rules indicate how the different pieces of data in resources’ representations are mapped onto semantic entities. By processing discovery rules on web resources, semantically described contents can be obtained out of them. The service level involves the operations that can be performed on the web. The current web allows users to perform different tasks such as search, blogging, e-commerce, or social networking. To describe the possible services in RESTful architectures, a high-level feature-oriented service methodology is proposed at this level. This lightweight description framework allows defining service discovery rules to identify operations in interactions with REST resources. The discovery is thus performed by applying discovery rules to contents discovered in REST interactions, in a novel process called service probing. Also, service discovery can be performed by modelling services as contents, i.e., by retrieving Application Programming Interface (API) documentation and API listings in service registries such as ProgrammableWeb. For this, a unified model for composable components in Mashup-Driven Development (MDD) has been defined after the analysis of service repositories from the web. The agent level involves the orchestration of the discovery of services and contents. At this level, agent rules allow to specify behaviours for crawling and executing services, which results in the fulfilment of a high-level goal. Agent rules are plans that allow introspecting the discovered data and services from the web and the knowledge present in service and content discovery rules to anticipate the contents and services to be found on specific resources from the web. By the definition of plans, an agent can be configured to target specific resources. The discovery framework has been evaluated on different scenarios, each one covering different levels of the framework. Contenidos a la Carta project deals with the mashing-up of news from electronic newspapers, and the framework was used for the discovery and extraction of pieces of news from the web. Similarly, in Resulta and VulneraNET projects the discovery of ideas and security knowledge in the web is covered, respectively. The service level is covered in the OMELETTE project, where mashup components such as services and widgets are discovered from component repositories from the web. The agent level is applied to the crawling of services and news in these scenarios, highlighting how the semantic description of rules and extracted data can provide complex behaviours and orchestrations of tasks in the web. The main contributions of the thesis are the unified framework for discovery, which allows configuring agents to perform automated tasks. Also, a scraping ontology has been defined for the construction of mappings for scraping web resources. A novel first-order logic rule induction algorithm is defined for the automated construction and maintenance of these mappings out of the visual information in web resources. Additionally, a common unified model for the discovery of services is defined, which allows sharing service descriptions. Future work comprises the further extension of service probing, resource ranking, the extension of the Scraping Ontology, extensions of the agent model, and contructing a base of discovery rules. Resumen La presente tesis doctoral contribuye al problema de descubrimiento de servicios y recursos en el contexto de la web combinable. En la web actual, las tecnologías de combinación de aplicaciones permiten a los desarrolladores reutilizar servicios y contenidos para construir nuevas aplicaciones web. Pese a todo, los desarrolladores afrontan un problema de saturación de información a la hora de buscar servicios o recursos apropiados para su combinación. Para contribuir a la solución de este problema, se propone un marco de trabajo para el descubrimiento de servicios y recursos. En este marco, se definen tres capas sobre las que se realiza descubrimiento a nivel de contenido, servicio y agente. El nivel de contenido involucra a la información disponible en recursos web. La web sigue el estilo arquitectónico Representational Stateless Transfer (REST), en el que los recursos son devueltos como representaciones por parte de los servidores a los clientes. Estas representaciones normalmente emplean el lenguaje de marcado HyperText Markup Language (HTML), que, unido al estándar Content Style Sheets (CSS), describe el marcado empleado para mostrar representaciones en un navegador web. Aunque el uso de estándares de la web semántica como Resource Description Framework (RDF) hace apta esta arquitectura para su uso por procesos automatizados, estos estándares no son empleados en muchas ocasiones, por lo que cualquier automatización debe basarse en el procesado del marcado HTML. Este proceso, normalmente conocido como Screen Scraping en la literatura, es el descubrimiento de contenidos en el marco de trabajo propuesto. En este nivel, un conjunto de reglas de descubrimiento indican cómo los diferentes datos en las representaciones de recursos se corresponden con entidades semánticas. Al procesar estas reglas sobre recursos web, pueden obtenerse contenidos descritos semánticamente. El nivel de servicio involucra las operaciones que pueden ser llevadas a cabo en la web. Actualmente, los usuarios de la web pueden realizar diversas tareas como búsqueda, blogging, comercio electrónico o redes sociales. Para describir los posibles servicios en arquitecturas REST, se propone en este nivel una metodología de alto nivel para descubrimiento de servicios orientada a funcionalidades. Este marco de descubrimiento ligero permite definir reglas de descubrimiento de servicios para identificar operaciones en interacciones con recursos REST. Este descubrimiento es por tanto llevado a cabo al aplicar las reglas de descubrimiento sobre contenidos descubiertos en interacciones REST, en un nuevo procedimiento llamado sondeo de servicios. Además, el descubrimiento de servicios puede ser llevado a cabo mediante el modelado de servicios como contenidos. Es decir, mediante la recuperación de documentación de Application Programming Interfaces (APIs) y listas de APIs en registros de servicios como ProgrammableWeb. Para ello, se ha definido un modelo unificado de componentes combinables para Mashup-Driven Development (MDD) tras el análisis de repositorios de servicios de la web. El nivel de agente involucra la orquestación del descubrimiento de servicios y contenidos. En este nivel, las reglas de nivel de agente permiten especificar comportamientos para el rastreo y ejecución de servicios, lo que permite la consecución de metas de mayor nivel. Las reglas de los agentes son planes que permiten la introspección sobre los datos y servicios descubiertos, así como sobre el conocimiento presente en las reglas de descubrimiento de servicios y contenidos para anticipar contenidos y servicios por encontrar en recursos específicos de la web. Mediante la definición de planes, un agente puede ser configurado para descubrir recursos específicos. El marco de descubrimiento ha sido evaluado sobre diferentes escenarios, cada uno cubriendo distintos niveles del marco. El proyecto Contenidos a la Carta trata de la combinación de noticias de periódicos digitales, y en él el framework se ha empleado para el descubrimiento y extracción de noticias de la web. De manera análoga, en los proyectos Resulta y VulneraNET se ha llevado a cabo un descubrimiento de ideas y de conocimientos de seguridad, respectivamente. El nivel de servicio se cubre en el proyecto OMELETTE, en el que componentes combinables como servicios y widgets se descubren en repositorios de componentes de la web. El nivel de agente se aplica al rastreo de servicios y noticias en estos escenarios, mostrando cómo la descripción semántica de reglas y datos extraídos permiten proporcionar comportamientos complejos y orquestaciones de tareas en la web. Las principales contribuciones de la tesis son el marco de trabajo unificado para descubrimiento, que permite configurar agentes para realizar tareas automatizadas. Además, una ontología de extracción ha sido definida para la construcción de correspondencias y extraer información de recursos web. Asimismo, un algoritmo para la inducción de reglas de lógica de primer orden se ha definido para la construcción y el mantenimiento de estas correspondencias a partir de la información visual de recursos web. Adicionalmente, se ha definido un modelo común y unificado para el descubrimiento de servicios que permite la compartición de descripciones de servicios. Como trabajos futuros se considera la extensión del sondeo de servicios, clasificación de recursos, extensión de la ontología de extracción y la construcción de una base de reglas de descubrimiento.
Resumo:
Sensor networks are increasingly becoming one of the main sources of Big Data on the Web. However, the observations that they produce are made available with heterogeneous schemas, vocabularies and data formats, making it difficult to share and reuse these data for other purposes than those for which they were originally set up. In this thesis we address these challenges, considering how we can transform streaming raw data to rich ontology-based information that is accessible through continuous queries for streaming data. Our main contribution is an ontology-based approach for providing data access and query capabilities to streaming data sources, allowing users to express their needs at a conceptual level, independent of implementation and language-specific details. We introduce novel query rewriting and data translation techniques that rely on mapping definitions relating streaming data models to ontological concepts. Specific contributions include: • The syntax and semantics of the SPARQLStream query language for ontologybased data access, and a query rewriting approach for transforming SPARQLStream queries into streaming algebra expressions. • The design of an ontology-based streaming data access engine that can internally reuse an existing data stream engine, complex event processor or sensor middleware, using R2RML mappings for defining relationships between streaming data models and ontology concepts. Concerning the sensor metadata of such streaming data sources, we have investigated how we can use raw measurements to characterize streaming data, producing enriched data descriptions in terms of ontological models. Our specific contributions are: • A representation of sensor data time series that captures gradient information that is useful to characterize types of sensor data. • A method for classifying sensor data time series and determining the type of data, using data mining techniques, and a method for extracting semantic sensor metadata features from the time series.