23 resultados para Estrazione informazioni, analisi dati non strutturati, Web semantico, data mining, text mining, big data, open data, classificazione di testi.

em Universidad Politécnica de Madrid


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the paper we report on the results of our experiments on the construction of the opinion ontology. Our aim is to show the benefits of publishing in the open, on the Web, the results of the opinion mining process in a structured form. On the road to achieving this, we attempt to answer the research question to what extent opinion information can be formalized in a unified way. Furthermore, as part of the evaluation, we experiment with the usage of Semantic Web technologies and show particular use cases that support our claims.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently, the Semantic Web has experienced signi�cant advancements in standards and techniques, as well as in the amount of semantic information available online. Even so, mechanisms are still needed to automatically reconcile semantic information when it is expressed in di�erent natural languages, so that access to Web information across language barriers can be improved. That requires developing techniques for discovering and representing cross-lingual links on the Web of Data. In this paper we explore the different dimensions of such a problem and reflect on possible avenues of research on that topic.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Web has witnessed an enormous growth in the amount of semantic information published in recent years. This growth has been stimulated to a large extent by the emergence of Linked Data. Although this brings us a big step closer to the vision of a Semantic Web, it also raises new issues such as the need for dealing with information expressed in different natural languages. Indeed, although the Web of Data can contain any kind of information in any language, it still lacks explicit mechanisms to automatically reconcile such information when it is expressed in different languages. This leads to situations in which data expressed in a certain language is not easily accessible to speakers of other languages. The Web of Data shows the potential for being extended to a truly multilingual web as vocabularies and data can be published in a language-independent fashion, while associated language-dependent (linguistic) information supporting the access across languages can be stored separately. In this sense, the multilingual Web of Data can be realized in our view as a layer of services and resources on top of the existing Linked Data infrastructure adding i) linguistic information for data and vocabularies in different languages, ii) mappings between data with labels in different languages, and iii) services to dynamically access and traverse Linked Data across different languages. In this article we present this vision of a multilingual Web of Data. We discuss challenges that need to be addressed to make this vision come true and discuss the role that techniques such as ontology localization, ontology mapping, and cross-lingual ontology-based information access and presentation will play in achieving this. Further, we propose an initial architecture and describe a roadmap that can provide a basis for the implementation of this vision.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cross‐lingual link discovery in the Web of Data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Semantic Web is growing at a fast pace, recently boosted by the creation of the Linked Data initiative and principles. Methods, standards, techniques and the state of technology are becoming more mature and therefore are easing the task of publication and consumption of semantic information on the Web.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There are several different standardised and widespread formats to represent emotions. However, there is no standard semantic model yet. This paper presents a new ontology, called Onyx, that aims to become such a standard while adding concepts from the latest Semantic Web models. In particular, the ontology focuses on the representation of Emotion Analysis results. But the model is abstract and inherits from previous standards and formats. It can thus be used as a reference representation of emotions in any future application or ontology. To prove this, we have translated resources from EmotionML representation to Onyx. We also present several ways in which developers could benefit from using this ontology instead of an ad-hoc presentation. Our ultimate goal is to foster the use of semantic technologies for emotion Analysis while following the Linked Data ideals.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Internet está evolucionando hacia la conocida como Live Web. En esta nueva etapa en la evolución de Internet, se pone al servicio de los usuarios multitud de streams de datos sociales. Gracias a estas fuentes de datos, los usuarios han pasado de navegar por páginas web estáticas a interacturar con aplicaciones que ofrecen contenido personalizado, basada en sus preferencias. Cada usuario interactúa a diario con multiples aplicaciones que ofrecen notificaciones y alertas, en este sentido cada usuario es una fuente de eventos, y a menudo los usuarios se sienten desbordados y no son capaces de procesar toda esa información a la carta. Para lidiar con esta sobresaturación, han aparecido múltiples herramientas que automatizan las tareas más habituales, desde gestores de bandeja de entrada, gestores de alertas en redes sociales, a complejos CRMs o smart-home hubs. La contrapartida es que aunque ofrecen una solución a problemas comunes, no pueden adaptarse a las necesidades de cada usuario ofreciendo una solucion personalizada. Los Servicios de Automatización de Tareas (TAS de sus siglas en inglés) entraron en escena a partir de 2012 para dar solución a esta liminación. Dada su semejanza, estos servicios también son considerados como un nuevo enfoque en la tecnología de mash-ups pero centra en el usuarios. Los usuarios de estas plataformas tienen la capacidad de interconectar servicios, sensores y otros aparatos con connexión a internet diseñando las automatizaciones que se ajustan a sus necesidades. La propuesta ha sido ámpliamante aceptada por los usuarios. Este hecho ha propiciado multitud de plataformas que ofrecen servicios TAS entren en escena. Al ser un nuevo campo de investigación, esta tesis presenta las principales características de los TAS, describe sus componentes, e identifica las dimensiones fundamentales que los defines y permiten su clasificación. En este trabajo se acuña el termino Servicio de Automatización de Tareas (TAS) dando una descripción formal para estos servicios y sus componentes (llamados canales), y proporciona una arquitectura de referencia. De igual forma, existe una falta de herramientas para describir servicios de automatización, y las reglas de automatización. A este respecto, esta tesis propone un modelo común que se concreta en la ontología EWE (Evented WEb Ontology). Este modelo permite com parar y equiparar canales y automatizaciones de distintos TASs, constituyendo un aporte considerable paraa la portabilidad de automatizaciones de usuarios entre plataformas. De igual manera, dado el carácter semántico del modelo, permite incluir en las automatizaciones elementos de fuentes externas sobre los que razonar, como es el caso de Linked Open Data. Utilizando este modelo, se ha generado un dataset de canales y automatizaciones, con los datos obtenidos de algunos de los TAS existentes en el mercado. Como último paso hacia el lograr un modelo común para describir TAS, se ha desarrollado un algoritmo para aprender ontologías de forma automática a partir de los datos del dataset. De esta forma, se favorece el descubrimiento de nuevos canales, y se reduce el coste de mantenimiento del modelo, el cual se actualiza de forma semi-automática. En conclusión, las principales contribuciones de esta tesis son: i) describir el estado del arte en automatización de tareas y acuñar el término Servicio de Automatización de Tareas, ii) desarrollar una ontología para el modelado de los componentes de TASs y automatizaciones, iii) poblar un dataset de datos de canales y automatizaciones, usado para desarrollar un algoritmo de aprendizaje automatico de ontologías, y iv) diseñar una arquitectura de agentes para la asistencia a usuarios en la creación de automatizaciones. ABSTRACT The new stage in the evolution of the Web (the Live Web or Evented Web) puts lots of social data-streams at the service of users, who no longer browse static web pages but interact with applications that present them contextual and relevant experiences. Given that each user is a potential source of events, a typical user often gets overwhelmed. To deal with that huge amount of data, multiple automation tools have emerged, covering from simple social media managers or notification aggregators to complex CRMs or smart-home Hub/Apps. As a downside, they cannot tailor to the needs of every single user. As a natural response to this downside, Task Automation Services broke in the Internet. They may be seen as a new model of mash-up technology for combining social streams, services and connected devices from an end-user perspective: end-users are empowered to connect those stream however they want, designing the automations they need. The numbers of those platforms that appeared early on shot up, and as a consequence the amount of platforms following this approach is growing fast. Being a novel field, this thesis aims to shed light on it, presenting and exemplifying the main characteristics of Task Automation Services, describing their components, and identifying several dimensions to classify them. This thesis coins the term Task Automation Services (TAS) by providing a formal definition of them, their components (called channels), as well a TAS reference architecture. There is also a lack of tools for describing automation services and automations rules. In this regard, this thesis proposes a theoretical common model of TAS and formalizes it as the EWE ontology This model enables to compare channels and automations from different TASs, which has a high impact in interoperability; and enhances automations providing a mechanism to reason over external sources such as Linked Open Data. Based on this model, a dataset of components of TAS was built, harvesting data from the web sites of actual TASs. Going a step further towards this common model, an algorithm for categorizing them was designed, enabling their discovery across different TAS. Thus, the main contributions of the thesis are: i) surveying the state of the art on task automation and coining the term Task Automation Service; ii) providing a semantic common model for describing TAS components and automations; iii) populating a categorized dataset of TAS components, used to learn ontologies of particular domains from the TAS perspective; and iv) designing an agent architecture for assisting users in setting up automations, that is aware of their context and acts in consequence.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Interlinking text documents with Linked Open Data enables the Web of Data to be used as background knowledge within document-oriented applications such as search and faceted browsing. As a step towards interconnecting the Web of Documents with the Web of Data, we developed DBpedia Spotlight, a system for automatically annotating text documents with DBpedia URIs. DBpedia Spotlight allows users to congure the annotations to their specic needs through the DBpedia Ontology and quality measures such as prominence, topical pertinence, contextual ambiguity and disambiguation condence. We compare our approach with the state of the art in disambiguation, and evaluate our results in light of three baselines and six publicly available annotation systems, demonstrating the competitiveness of our system. DBpedia Spotlight is shared as open source and deployed as a Web Service freely available for public use.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Interlinking text documents with Linked Open Data enables the Web of Data to be used as background knowledge within document-oriented applications such as search and faceted browsing. As a step towards interconnecting the Web of Documents with the Web of Data, we developed DBpedia Spotlight, a system for automatically annotating text documents with DBpedia URIs. DBpedia Spotlight allows users to configure the annotations to their specific needs through the DBpedia Ontology and quality measures such as prominence, topical pertinence, contextual ambiguity and disambiguation confidence. We compare our approach with the state of the art in disambiguation, and evaluate our results in light of three baselines and six publicly available annotation systems, demonstrating the competitiveness of our system. DBpedia Spotlight is shared as open source and deployed as a Web Service freely available for public use.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract Web 2.0 applications enabled users to classify information resources using their own vocabularies. The bottom-up nature of these user-generated classification systems have turned them into interesting knowledge sources, since they provide a rich terminology generated by potentially large user communities. Previous research has shown that it is possible to elicit some emergent semantics from the aggregation of individual classifications in these systems. However the generation of ontologies from them is still an open research problem. In this thesis we address the problem of how to tap into user-generated classification systems for building domain ontologies. Our objective is to design a method to develop domain ontologies from user-generated classifications systems. To do so, we rely on ontologies in the Web of Data to formalize the semantics of the knowledge collected from the classification system. Current ontology development methodologies have recognized the importance of reusing knowledge from existing resources. Thus, our work is framed within the NeOn methodology scenario for building ontologies by reusing and reengineering non-ontological resources. The main contributions of this work are: An integrated method to develop ontologies from user-generated classification systems. With this method we extract a domain terminology from the classification system and then we formalize the semantics of this terminology by reusing ontologies in the Web of Data. Identification and adaptation of existing techniques for implementing the activities in the method so that they can fulfill the requirements of each activity. A novel study about emerging semantics in user-generated lists. Resumen La web 2.0 permitió a los usuarios clasificar recursos de información usando su propio vocabulario. Estos sistemas de clasificación generados por usuarios son recursos interesantes para la extracción de conocimiento debido principalmente a que proveen una extensa terminología generada por grandes comunidades de usuarios. Se ha demostrado en investigaciones previas que es posible obtener una semántica emergente de estos sistemas. Sin embargo la generación de ontologías a partir de ellos es todavía un problema de investigación abierto. Esta tesis trata el problema de cómo aprovechar los sistemas de clasificación generados por usuarios en la construcción de ontologías de dominio. Así el objetivo de la tesis es diseñar un método para desarrollar ontologías de dominio a partir de sistemas de clasificación generados por usuarios. El método propuesto reutiliza conceptualizaciones existentes en ontologías publicadas en la Web de Datos para formalizar la semántica del conocimiento que se extrae del sistema de clasificación. Por tanto, este trabajo está enmarcado dentro del escenario para desarrollar ontologías mediante la reutilización y reingeniería de recursos no ontológicos que se ha definido en la Metodología NeOn. Las principales contribuciones de este trabajo son: Un método integrado para desarrollar una ontología de dominio a partir de sistemas de clasificación generados por usuarios. En este método se extrae una terminología de dominio del sistema de clasificación y posteriormente se formaliza su semántica reutilizando ontologías en la Web de Datos. La identificación y adaptación de un conjunto de técnicas para implementar las actividades propuestas en el método de tal manera que puedan cumplir automáticamente los requerimientos de cada actividad. Un novedoso estudio acerca de la semántica emergente en las listas generadas por usuarios en la Web.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently, the Semantic Web has experienced significant advancements in standards and techniques, as well as in the amount of semantic information available online. Nevertheless, mechanisms are still needed to automatically reconcile information when it is expressed in different natural languages on the Web of Data, in order to improve the access to semantic information across language barriers. In this context several challenges arise [1], such as: (i) ontology translation/localization, (ii) cross-lingual ontology mappings, (iii) representation of multilingual lexical information, and (iv) cross-lingual access and querying of linked data. In the following we will focus on the second challenge, which is the necessity of establishing, representing and storing cross-lingual links among semantic information on the Web. In fact, in a “truly” multilingual Semantic Web, semantic data with lexical representations in one natural language would be mapped to equivalent or related information in other languages, thus making navigation across multilingual information possible for software agents.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Telecommunications networks have been always expanding and thanks to it, new services have appeared. The old mechanisms for carrying packets have become obsolete due to the new service requirements, which have begun working in real time. Real time traffic requires strict service guarantees. When this traffic is sent through the network, enough resources must be given in order to avoid delays and information losses. When browsing through the Internet and requesting web pages, data must be sent from a server to the user. If during the transmission there is any packet drop, the packet is sent again. For the end user, it does not matter if the webpage loads in one or two seconds more. But if the user is maintaining a conversation with a VoIP program, such as Skype, one or two seconds of delay in the conversation may be catastrophic, and none of them can understand the other. In order to provide support for this new services, the networks have to evolve. For this purpose MPLS and QoS were developed. MPLS is a packet carrying mechanism used in high performance telecommunication networks which directs and carries data using pre-established paths. Now, packets are forwarded on the basis of labels, making this process faster than routing the packets with the IP addresses. MPLS also supports Traffic Engineering (TE). This refers to the process of selecting the best paths for data traffic in order to balance the traffic load between the different links. In a network with multiple paths, routing algorithms calculate the shortest one, and most of the times all traffic is directed through it, causing overload and packet drops, without distributing the packets in the other paths that the network offers and do not have any traffic. But this is not enough in order to provide the real time traffic the guarantees it needs. In fact, those mechanisms improve the network, but they do not make changes in how the traffic is treated. That is why Quality of Service (QoS) was developed. Quality of service is the ability to provide different priority to different applications, users, or data flows, or to guarantee a certain level of performance to a data flow. Traffic is distributed into different classes and each of them is treated differently, according to its Service Level Agreement (SLA). Traffic with the highest priority will have the preference over lower classes, but this does not mean it will monopolize all the resources. In order to achieve this goal, a set policies are defined to control and alter how the traffic flows. Possibilities are endless, and it depends in how the network must be structured. By using those mechanisms it is possible to provide the necessary guarantees to the real-time traffic, distributing it between categories inside the network and offering the best service for both real time data and non real time data. Las Redes de Telecomunicaciones siempre han estado en expansión y han propiciado la aparición de nuevos servicios. Los viejos mecanismos para transportar paquetes se han quedado obsoletos debido a las exigencias de los nuevos servicios, que han comenzado a operar en tiempo real. El tráfico en tiempo real requiere de unas estrictas garantías de servicio. Cuando este tráfico se envía a través de la red, necesita disponer de suficientes recursos para evitar retrasos y pérdidas de información. Cuando se navega por la red y se solicitan páginas web, los datos viajan desde un servidor hasta el usuario. Si durante la transmisión se pierde algún paquete, éste se vuelve a mandar de nuevo. Para el usuario final, no importa si la página tarda uno o dos segundos más en cargar. Ahora bien, si el usuario está manteniendo una conversación usando algún programa de VoIP (como por ejemplo Skype) uno o dos segundos de retardo en la conversación podrían ser catastróficos, y ninguno de los interlocutores sería capaz de entender al otro. Para poder dar soporte a estos nuevos servicios, las redes deben evolucionar. Para este propósito se han concebido MPLS y QoS MPLS es un mecanismo de transporte de paquetes que se usa en redes de telecomunicaciones de alto rendimiento que dirige y transporta los datos de acuerdo a caminos preestablecidos. Ahora los paquetes se encaminan en función de unas etiquetas, lo cual hace que sea mucho más rápido que encaminar los paquetes usando las direcciones IP. MPLS también soporta Ingeniería de Tráfico (TE). Consiste en seleccionar los mejores caminos para el tráfico de datos con el objetivo de balancear la carga entre los diferentes enlaces. En una red con múltiples caminos, los algoritmos de enrutamiento actuales calculan el camino más corto, y muchas veces el tráfico se dirige sólo por éste, saturando el canal, mientras que otras rutas se quedan completamente desocupadas. Ahora bien, esto no es suficiente para ofrecer al tráfico en tiempo real las garantías que necesita. De hecho, estos mecanismos mejoran la red, pero no realizan cambios a la hora de tratar el tráfico. Por esto es por lo que se ha desarrollado el concepto de Calidad de Servicio (QoS). La calidad de servicio es la capacidad para ofrecer diferentes prioridades a las diferentes aplicaciones, usuarios o flujos de datos, y para garantizar un cierto nivel de rendimiento en un flujo de datos. El tráfico se distribuye en diferentes clases y cada una de ellas se trata de forma diferente, de acuerdo a las especificaciones que se indiquen en su Contrato de Tráfico (SLA). EL tráfico con mayor prioridad tendrá preferencia sobre el resto, pero esto no significa que acapare la totalidad de los recursos. Para poder alcanzar estos objetivos se definen una serie de políticas para controlar y alterar el comportamiento del tráfico. Las posibilidades son inmensas dependiendo de cómo se quiera estructurar la red. Usando estos mecanismos se pueden proporcionar las garantías necesarias al tráfico en tiempo real, distribuyéndolo en categorías dentro de la red y ofreciendo el mejor servicio posible tanto a los datos en tiempo real como a los que no lo son.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Spanish National Library (Biblioteca Nacional de España1. BNE) and the Ontology Engineering Group2 of Universidad Politécnica de Madrid are working on the joint project ?Preliminary Study of Linked Data?, whose aim is to enrich the Web of Data with the BNE authority and bibliographic records. To this end, they are transforming the BNE information to RDF following the Linked Data principles3 proposed by Tim Berners Lee.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a methodology for reducing a straight line fitting regression problem to a Least Squares minimization one. This is accomplished through the definition of a measure on the data space that takes into account directional dependences of errors, and the use of polar descriptors for straight lines. This strategy improves the robustness by avoiding singularities and non-describable lines. The methodology is powerful enough to deal with non-normal bivariate heteroscedastic data error models, but can also supersede classical regression methods by making some particular assumptions. An implementation of the methodology for the normal bivariate case is developed and evaluated.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As the number of data sources publishing their data on the Web of Data is growing, we are experiencing an immense growth of the Linked Open Data cloud. The lack of control on the published sources, which could be untrustworthy or unreliable, along with their dynamic nature that often invalidates links and causes conflicts or other discrepancies, could lead to poor quality data. In order to judge data quality, a number of quality indicators have been proposed, coupled with quality metrics that quantify the “quality level” of a dataset. In addition to the above, some approaches address how to improve the quality of the datasets through a repair process that focuses on how to correct invalidities caused by constraint violations by either removing or adding triples. In this paper we argue that provenance is a critical factor that should be taken into account during repairs to ensure that the most reliable data is kept. Based on this idea, we propose quality metrics that take into account provenance and evaluate their applicability as repair guidelines in a particular data fusion setting.