881 resultados para Estrazione informazioni, analisi dati non strutturati, Web semantico, data mining, text mining, big data, open data, classificazione di testi.


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently, the Semantic Web has experienced signi�cant advancements in standards and techniques, as well as in the amount of semantic information available online. Even so, mechanisms are still needed to automatically reconcile semantic information when it is expressed in di�erent natural languages, so that access to Web information across language barriers can be improved. That requires developing techniques for discovering and representing cross-lingual links on the Web of Data. In this paper we explore the different dimensions of such a problem and reflect on possible avenues of research on that topic.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Web has witnessed an enormous growth in the amount of semantic information published in recent years. This growth has been stimulated to a large extent by the emergence of Linked Data. Although this brings us a big step closer to the vision of a Semantic Web, it also raises new issues such as the need for dealing with information expressed in different natural languages. Indeed, although the Web of Data can contain any kind of information in any language, it still lacks explicit mechanisms to automatically reconcile such information when it is expressed in different languages. This leads to situations in which data expressed in a certain language is not easily accessible to speakers of other languages. The Web of Data shows the potential for being extended to a truly multilingual web as vocabularies and data can be published in a language-independent fashion, while associated language-dependent (linguistic) information supporting the access across languages can be stored separately. In this sense, the multilingual Web of Data can be realized in our view as a layer of services and resources on top of the existing Linked Data infrastructure adding i) linguistic information for data and vocabularies in different languages, ii) mappings between data with labels in different languages, and iii) services to dynamically access and traverse Linked Data across different languages. In this article we present this vision of a multilingual Web of Data. We discuss challenges that need to be addressed to make this vision come true and discuss the role that techniques such as ontology localization, ontology mapping, and cross-lingual ontology-based information access and presentation will play in achieving this. Further, we propose an initial architecture and describe a roadmap that can provide a basis for the implementation of this vision.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cross‐lingual link discovery in the Web of Data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Semantic Web is growing at a fast pace, recently boosted by the creation of the Linked Data initiative and principles. Methods, standards, techniques and the state of technology are becoming more mature and therefore are easing the task of publication and consumption of semantic information on the Web.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There are several different standardised and widespread formats to represent emotions. However, there is no standard semantic model yet. This paper presents a new ontology, called Onyx, that aims to become such a standard while adding concepts from the latest Semantic Web models. In particular, the ontology focuses on the representation of Emotion Analysis results. But the model is abstract and inherits from previous standards and formats. It can thus be used as a reference representation of emotions in any future application or ontology. To prove this, we have translated resources from EmotionML representation to Onyx. We also present several ways in which developers could benefit from using this ontology instead of an ad-hoc presentation. Our ultimate goal is to foster the use of semantic technologies for emotion Analysis while following the Linked Data ideals.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Internet está evolucionando hacia la conocida como Live Web. En esta nueva etapa en la evolución de Internet, se pone al servicio de los usuarios multitud de streams de datos sociales. Gracias a estas fuentes de datos, los usuarios han pasado de navegar por páginas web estáticas a interacturar con aplicaciones que ofrecen contenido personalizado, basada en sus preferencias. Cada usuario interactúa a diario con multiples aplicaciones que ofrecen notificaciones y alertas, en este sentido cada usuario es una fuente de eventos, y a menudo los usuarios se sienten desbordados y no son capaces de procesar toda esa información a la carta. Para lidiar con esta sobresaturación, han aparecido múltiples herramientas que automatizan las tareas más habituales, desde gestores de bandeja de entrada, gestores de alertas en redes sociales, a complejos CRMs o smart-home hubs. La contrapartida es que aunque ofrecen una solución a problemas comunes, no pueden adaptarse a las necesidades de cada usuario ofreciendo una solucion personalizada. Los Servicios de Automatización de Tareas (TAS de sus siglas en inglés) entraron en escena a partir de 2012 para dar solución a esta liminación. Dada su semejanza, estos servicios también son considerados como un nuevo enfoque en la tecnología de mash-ups pero centra en el usuarios. Los usuarios de estas plataformas tienen la capacidad de interconectar servicios, sensores y otros aparatos con connexión a internet diseñando las automatizaciones que se ajustan a sus necesidades. La propuesta ha sido ámpliamante aceptada por los usuarios. Este hecho ha propiciado multitud de plataformas que ofrecen servicios TAS entren en escena. Al ser un nuevo campo de investigación, esta tesis presenta las principales características de los TAS, describe sus componentes, e identifica las dimensiones fundamentales que los defines y permiten su clasificación. En este trabajo se acuña el termino Servicio de Automatización de Tareas (TAS) dando una descripción formal para estos servicios y sus componentes (llamados canales), y proporciona una arquitectura de referencia. De igual forma, existe una falta de herramientas para describir servicios de automatización, y las reglas de automatización. A este respecto, esta tesis propone un modelo común que se concreta en la ontología EWE (Evented WEb Ontology). Este modelo permite com parar y equiparar canales y automatizaciones de distintos TASs, constituyendo un aporte considerable paraa la portabilidad de automatizaciones de usuarios entre plataformas. De igual manera, dado el carácter semántico del modelo, permite incluir en las automatizaciones elementos de fuentes externas sobre los que razonar, como es el caso de Linked Open Data. Utilizando este modelo, se ha generado un dataset de canales y automatizaciones, con los datos obtenidos de algunos de los TAS existentes en el mercado. Como último paso hacia el lograr un modelo común para describir TAS, se ha desarrollado un algoritmo para aprender ontologías de forma automática a partir de los datos del dataset. De esta forma, se favorece el descubrimiento de nuevos canales, y se reduce el coste de mantenimiento del modelo, el cual se actualiza de forma semi-automática. En conclusión, las principales contribuciones de esta tesis son: i) describir el estado del arte en automatización de tareas y acuñar el término Servicio de Automatización de Tareas, ii) desarrollar una ontología para el modelado de los componentes de TASs y automatizaciones, iii) poblar un dataset de datos de canales y automatizaciones, usado para desarrollar un algoritmo de aprendizaje automatico de ontologías, y iv) diseñar una arquitectura de agentes para la asistencia a usuarios en la creación de automatizaciones. ABSTRACT The new stage in the evolution of the Web (the Live Web or Evented Web) puts lots of social data-streams at the service of users, who no longer browse static web pages but interact with applications that present them contextual and relevant experiences. Given that each user is a potential source of events, a typical user often gets overwhelmed. To deal with that huge amount of data, multiple automation tools have emerged, covering from simple social media managers or notification aggregators to complex CRMs or smart-home Hub/Apps. As a downside, they cannot tailor to the needs of every single user. As a natural response to this downside, Task Automation Services broke in the Internet. They may be seen as a new model of mash-up technology for combining social streams, services and connected devices from an end-user perspective: end-users are empowered to connect those stream however they want, designing the automations they need. The numbers of those platforms that appeared early on shot up, and as a consequence the amount of platforms following this approach is growing fast. Being a novel field, this thesis aims to shed light on it, presenting and exemplifying the main characteristics of Task Automation Services, describing their components, and identifying several dimensions to classify them. This thesis coins the term Task Automation Services (TAS) by providing a formal definition of them, their components (called channels), as well a TAS reference architecture. There is also a lack of tools for describing automation services and automations rules. In this regard, this thesis proposes a theoretical common model of TAS and formalizes it as the EWE ontology This model enables to compare channels and automations from different TASs, which has a high impact in interoperability; and enhances automations providing a mechanism to reason over external sources such as Linked Open Data. Based on this model, a dataset of components of TAS was built, harvesting data from the web sites of actual TASs. Going a step further towards this common model, an algorithm for categorizing them was designed, enabling their discovery across different TAS. Thus, the main contributions of the thesis are: i) surveying the state of the art on task automation and coining the term Task Automation Service; ii) providing a semantic common model for describing TAS components and automations; iii) populating a categorized dataset of TAS components, used to learn ontologies of particular domains from the TAS perspective; and iv) designing an agent architecture for assisting users in setting up automations, that is aware of their context and acts in consequence.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Exploratory analysis of data seeks to find common patterns to gain insights into the structure and distribution of the data. In geochemistry it is a valuable means to gain insights into the complicated processes making up a petroleum system. Typically linear visualisation methods like principal components analysis, linked plots, or brushing are used. These methods can not directly be employed when dealing with missing data and they struggle to capture global non-linear structures in the data, however they can do so locally. This thesis discusses a complementary approach based on a non-linear probabilistic model. The generative topographic mapping (GTM) enables the visualisation of the effects of very many variables on a single plot, which is able to incorporate more structure than a two dimensional principal components plot. The model can deal with uncertainty, missing data and allows for the exploration of the non-linear structure in the data. In this thesis a novel approach to initialise the GTM with arbitrary projections is developed. This makes it possible to combine GTM with algorithms like Isomap and fit complex non-linear structure like the Swiss-roll. Another novel extension is the incorporation of prior knowledge about the structure of the covariance matrix. This extension greatly enhances the modelling capabilities of the algorithm resulting in better fit to the data and better imputation capabilities for missing data. Additionally an extensive benchmark study of the missing data imputation capabilities of GTM is performed. Further a novel approach, based on missing data, will be introduced to benchmark the fit of probabilistic visualisation algorithms on unlabelled data. Finally the work is complemented by evaluating the algorithms on real-life datasets from geochemical projects.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Electronic Product Code Information Service (EPCIS) is an EPCglobal standard, that aims to bridge the gap between the physical world of RFID1 tagged artifacts, and information systems that enable their tracking and tracing via the Electronic Product Code (EPC). Central to the EPCIS data model are "events" that describe specific occurrences in the supply chain. EPCIS events, recorded and registered against EPC tagged artifacts, encapsulate the "what", "when", "where" and "why" of these artifacts as they flow through the supply chain. In this paper we propose an ontological model for representing EPCIS events on the Web of data. Our model provides a scalable approach for the representation, integration and sharing of EPCIS events as linked data via RESTful interfaces, thereby facilitating interoperability, collaboration and exchange of EPC related data across enterprises on a Web scale.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In order to reconstruct regional vegetation changes and local conditions during the fen-bog transition in the Borsteler Moor (northwestern Germany), a sediment core covering the period between 7.1 and 4.5 cal kyrs BP was palynologically in vestigated. The pollen diagram demonstrates the dominance of oak forests and a gradual replacement of trees by raised bog vegetation with the wetter conditions in the Late Atlantic. At ~ 6 cal kyrs BP, the non-pollen palynomorphs (NPP) demonstrate the succession from mesotrophic conditions, clearly indicated by a number of fungal spore types, to oligotrophic conditions, indicated by Sphagnum spores, Bryophytomyces sphagni, and testate amoebae Amphitrema, Assulina and Arcella, etc. Four relatively dry phases during the transition from fen to bog are clearly indicated by the dominance of Calluna and associated fungi as well as by the increase of microcharcoal. Several new NPP types are described and known NPP types are identified. All NPP are discussed in the context of their palaeoecological indicator values.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Lo scopo del progetto Bird-A è di mettere a disposizione uno strumento basato su ontologie per progettare un'interfaccia web collaborativa di creazione, visualizzazione, modifica e cancellazione di dati RDF e di fornirne una prima implementazione funzionante. La visione che sta muovendo la comunità del web semantico negli ultimi anni è quella di creare un Web basato su dati strutturati tra loro collegati, più che su documenti. Questo modello di architettura prende il nome di Linked Data ed è basata sulla possibilità di considerare cose, concetti, persone come risorse identificabili tramite URI e di poter fornire informazioni e descrivere collegamenti tra queste risorse attraverso l'uso di formati standard come RDF. Ciò che ha però frenato la diffusione di questi dati strutturati ed interconnessi sono stati gli alti requisiti di competenze tecniche necessarie sia alla loro creazione che alla loro fruizione. Il progetto Bird-A si prefigge di semplificare la creazione e la fruizione di dati RDF, favorendone la condivisione e la diffusione anche fra persone non dotate di conoscenze tecniche specifiche.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In questa tesi viene affrontato il tema del tracciamento video, analizzando le principali tecniche, metodologie e strumenti per la video analytics. L'intero lavoro, è stato svolto interamente presso l'azienda BitBang, dal reperimento di informazioni e materiale utile, fino alla stesura dell'elaborato. Nella stessa azienda ho avuto modo di svolgere il tirocinio, durante il quale ho approfondito gli aspetti pratici della web e video analytics, osservando il lavoro sul campo degli specialisti del settore e acquisendo familiarità con gli strumenti di analisi dati tramite l'utilizzo delle principali piattaforme di web analytics. Per comprendere a pieno questo argomento, è stato necessario innanzitutto conoscere la web analytics di base. Saranno illustrate quindi, le metodologie classiche della web analytics, ovvero come analizzare il comportamento dei visitatori nelle pagine web con le metriche più adatte in base alle diverse tipologie di business, fino ad arrivare alla nuova tecnica di tracciamento eventi. Questa nasce subito dopo la diffusione nelle pagine dei contenuti multimediali, i quali hanno portato a un cambiamento nelle modalità di navigazione degli utenti e, di conseguenza, all'esigenza di tracciare le nuove azioni generate su essi, per avere un quadro completo dell'esperienza dei visitatori sul sito. Non sono più sufficienti i dati ottenuti con i tradizionali metodi della web analytics, ma è necessario integrarla con tecniche nuove, indispensabili se si vuole ottenere una panoramica a 360 gradi di tutto ciò che succede sul sito. Da qui viene introdotto il tracciamento video, chiamato video analytics. Verranno illustrate le principali metriche per l'analisi, e come sfruttarle al meglio in base alla tipologia di sito web e allo scopo di business per cui il video viene utilizzato. Per capire in quali modi sfruttare il video come strumento di marketing e analizzare il comportamento dei visitatori su di esso, è necessario fare prima un passo indietro, facendo una panoramica sui principali aspetti legati ad esso: dalla sua produzione, all'inserimento sulle pagine web, i player per farlo, e la diffusione attraverso i siti di social netwok e su tutti i nuovi dispositivi e le piattaforme connessi nella rete. A questo proposito viene affrontata la panoramica generale di approfondimento sugli aspetti più tecnici, dove vengono mostrate le differenze tra i formati di file e i formati video, le tecniche di trasmissione sul web, come ottimizzare l'inserimento dei contenuti sulle pagine, la descrizione dei più famosi player per l'upload, infine un breve sguardo sulla situazione attuale riguardo alla guerra tra formati video open source e proprietari sul web. La sezione finale è relativa alla parte più pratica e sperimentale del lavoro. Nel capitolo 7 verranno descritte le principali funzionalità di due piattaforme di web analytics tra le più utilizzate, una gratuita, Google Analytics e una a pagamento, Omniture SyteCatalyst, con particolare attenzione alle metriche per il tracciamento video, e le differenze tra i due prodotti. Inoltre, mi è sembrato interessante illustrare le caratteristiche di alcune piattaforme specifiche per la video analytics, analizzando le più interessanti funzionalità offerte, anche se non ho avuto modo di testare il loro funzionamento nella pratica. Nell'ultimo capitolo vengono illustrate alcune applicazioni pratiche della video analytics, che ho avuto modo di osservare durante il periodo di tirocinio e tesi in azienda. Vengono descritte in particolare le problematiche riscontrate con i prodotti utilizzati per il tracciamento, le soluzioni proposte e le questioni che ancora restano irrisolte in questo campo.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The thesis studies the economic and financial conditions of Italian households, by using microeconomic data of the Survey on Household Income and Wealth (SHIW) over the period 1998-2006. It develops along two lines of enquiry. First it studies the determinants of households holdings of assets and liabilities and estimates their correlation degree. After a review of the literature, it estimates two non-linear multivariate models on the interactions between assets and liabilities with repeated cross-sections. Second, it analyses households financial difficulties. It defines a quantitative measure of financial distress and tests, by means of non-linear dynamic probit models, whether the probability of experiencing financial difficulties is persistent over time. Chapter 1 provides a critical review of the theoretical and empirical literature on the estimation of assets and liabilities holdings, on their interactions and on households net wealth. The review stresses the fact that a large part of the literature explain households debt holdings as a function, among others, of net wealth, an assumption that runs into possible endogeneity problems. Chapter 2 defines two non-linear multivariate models to study the interactions between assets and liabilities held by Italian households. Estimation refers to a pooling of cross-sections of SHIW. The first model is a bivariate tobit that estimates factors affecting assets and liabilities and their degree of correlation with results coherent with theoretical expectations. To tackle the presence of non normality and heteroskedasticity in the error term, generating non consistent tobit estimators, semi-parametric estimates are provided that confirm the results of the tobit model. The second model is a quadrivariate probit on three different assets (safe, risky and real) and total liabilities; the results show the expected patterns of interdependence suggested by theoretical considerations. Chapter 3 reviews the methodologies for estimating non-linear dynamic panel data models, drawing attention to the problems to be dealt with to obtain consistent estimators. Specific attention is given to the initial condition problem raised by the inclusion of the lagged dependent variable in the set of explanatory variables. The advantage of using dynamic panel data models lies in the fact that they allow to simultaneously account for true state dependence, via the lagged variable, and unobserved heterogeneity via individual effects specification. Chapter 4 applies the models reviewed in Chapter 3 to analyse financial difficulties of Italian households, by using information on net wealth as provided in the panel component of the SHIW. The aim is to test whether households persistently experience financial difficulties over time. A thorough discussion is provided of the alternative approaches proposed by the literature (subjective/qualitative indicators versus quantitative indexes) to identify households in financial distress. Households in financial difficulties are identified as those holding amounts of net wealth lower than the value corresponding to the first quartile of net wealth distribution. Estimation is conducted via four different methods: the pooled probit model, the random effects probit model with exogenous initial conditions, the Heckman model and the recently developed Wooldridge model. Results obtained from all estimators accept the null hypothesis of true state dependence and show that, according with the literature, less sophisticated models, namely the pooled and exogenous models, over-estimate such persistence.