10 resultados para Open Information Extraction

em AMS Tesi di Laurea - Alm@DL - Università di Bologna


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In questo lavoro si introducono i concetti di base di Natural Language Processing, soffermandosi su Information Extraction e analizzandone gli ambiti applicativi, le attività principali e la differenza rispetto a Information Retrieval. Successivamente si analizza il processo di Named Entity Recognition, focalizzando l’attenzione sulle principali problematiche di annotazione di testi e sui metodi per la valutazione della qualità dell’estrazione di entità. Infine si fornisce una panoramica della piattaforma software open-source di language processing GATE/ANNIE, descrivendone l’architettura e i suoi componenti principali, con approfondimenti sugli strumenti che GATE offre per l'approccio rule-based a Named Entity Recognition.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Il periodo in cui viviamo rappresenta la cuspide di una forte e rapida evoluzione nella comprensione del linguaggio naturale, raggiuntasi prevalentemente grazie allo sviluppo di modelli neurali. Nell'ambito dell'information extraction, tali progressi hanno recentemente consentito di riconoscere efficacemente relazioni semantiche complesse tra entità menzionate nel testo, quali proteine, sintomi e farmaci. Tale task -- reso possibile dalla modellazione ad eventi -- è fondamentale in biomedicina, dove la crescita esponenziale del numero di pubblicazioni scientifiche accresce ulteriormente il bisogno di sistemi per l'estrazione automatica delle interazioni racchiuse nei documenti testuali. La combinazione di AI simbolica e sub-simbolica può consentire l'introduzione di conoscenza strutturata nota all'interno di language model, rendendo quest'ultimi più robusti, fattuali e interpretabili. In tale contesto, la verbalizzazione di grafi è uno dei task su cui si riversano maggiori aspettative. Nonostante l'importanza di tali contributi (dallo sviluppo di chatbot alla formulazione di nuove ipotesi di ricerca), ad oggi, risultano assenti contributi capaci di verbalizzare gli eventi biomedici espressi in letteratura, apprendendo il legame tra le interazioni espresse in forma a grafo e la loro controparte testuale. La tesi propone il primo dataset altamente comprensivo su coppie evento-testo, includendo diverse sotto-aree biomediche, quali malattie infettive, ricerca oncologica e biologia molecolare. Il dataset introdotto viene usato come base per l'addestramento di modelli generativi allo stato dell'arte sul task di verbalizzazione, adottando un approccio text-to-text e illustrando una tecnica formale per la codifica di grafi evento mediante testo aumentato. Infine, si dimostra la validità degli eventi per il miglioramento delle capacità di comprensione dei modelli neurali su altri task NLP, focalizzandosi su single-document summarization e multi-task learning.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Rappresentazione della conoscenza in banca di dati testuali non strutturati in lingua Italiana.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The central objective of research in Information Retrieval (IR) is to discover new techniques to retrieve relevant information in order to satisfy an Information Need. The Information Need is satisfied when relevant information can be provided to the user. In IR, relevance is a fundamental concept which has changed over time, from popular to personal, i.e., what was considered relevant before was information for the whole population, but what is considered relevant now is specific information for each user. Hence, there is a need to connect the behavior of the system to the condition of a particular person and his social context; thereby an interdisciplinary sector called Human-Centered Computing was born. For the modern search engine, the information extracted for the individual user is crucial. According to the Personalized Search (PS), two different techniques are necessary to personalize a search: contextualization (interconnected conditions that occur in an activity), and individualization (characteristics that distinguish an individual). This movement of focus to the individual's need undermines the rigid linearity of the classical model overtaken the ``berry picking'' model which explains that the terms change thanks to the informational feedback received from the search activity introducing the concept of evolution of search terms. The development of Information Foraging theory, which observed the correlations between animal foraging and human information foraging, also contributed to this transformation through attempts to optimize the cost-benefit ratio. This thesis arose from the need to satisfy human individuality when searching for information, and it develops a synergistic collaboration between the frontiers of technological innovation and the recent advances in IR. The search method developed exploits what is relevant for the user by changing radically the way in which an Information Need is expressed, because now it is expressed through the generation of the query and its own context. As a matter of fact the method was born under the pretense to improve the quality of search by rewriting the query based on the contexts automatically generated from a local knowledge base. Furthermore, the idea of optimizing each IR system has led to develop it as a middleware of interaction between the user and the IR system. Thereby the system has just two possible actions: rewriting the query, and reordering the result. Equivalent actions to the approach was described from the PS that generally exploits information derived from analysis of user behavior, while the proposed approach exploits knowledge provided by the user. The thesis went further to generate a novel method for an assessment procedure, according to the "Cranfield paradigm", in order to evaluate this type of IR systems. The results achieved are interesting considering both the effectiveness achieved and the innovative approach undertaken together with the several applications inspired using a local knowledge base.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hybrid vehicles represent the future for automakers, since they allow to improve the fuel economy and to reduce the pollutant emissions. A key component of the hybrid powertrain is the Energy Storage System, that determines the ability of the vehicle to store and reuse energy. Though electrified Energy Storage Systems (ESS), based on batteries and ultracapacitors, are a proven technology, Alternative Energy Storage Systems (AESS), based on mechanical, hydraulic and pneumatic devices, are gaining interest because they give the possibility of realizing low-cost mild-hybrid vehicles. Currently, most literature of design methodologies focuses on electric ESS, which are not suitable for AESS design. In this contest, The Ohio State University has developed an Alternative Energy Storage System design methodology. This work focuses on the development of driving cycle analysis methodology that is a key component of Alternative Energy Storage System design procedure. The proposed methodology is based on a statistical approach to analyzing driving schedules that represent the vehicle typical use. Driving data are broken up into power events sequence, namely traction and braking events, and for each of them, energy-related and dynamic metrics are calculated. By means of a clustering process and statistical synthesis methods, statistically-relevant metrics are determined. These metrics define cycle representative braking events. By using these events as inputs for the Alternative Energy Storage System design methodology, different system designs are obtained. Each of them is characterized by attributes, namely system volume and weight. In the last part the work, the designs are evaluated in simulation by introducing and calculating a metric related to the energy conversion efficiency. Finally, the designs are compared accounting for attributes and efficiency values. In order to automate the driving data extraction and synthesis process, a specific script Matlab based has been developed. Results show that the driving cycle analysis methodology, based on the statistical approach, allows to extract and synthesize cycle representative data. The designs based on cycle statistically-relevant metrics are properly sized and have satisfying efficiency values with respect to the expectations. An exception is the design based on the cycle worst-case scenario, corresponding to same approach adopted by the conventional electric ESS design methodologies. In this case, a heavy system with poor efficiency is produced. The proposed new methodology seems to be a valid and consistent support for Alternative Energy Storage System design.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Ontology design and population -core aspects of semantic technologies- re- cently have become fields of great interest due to the increasing need of domain-specific knowledge bases that can boost the use of Semantic Web. For building such knowledge resources, the state of the art tools for ontology design require a lot of human work. Producing meaningful schemas and populating them with domain-specific data is in fact a very difficult and time-consuming task. Even more if the task consists in modelling knowledge at a web scale. The primary aim of this work is to investigate a novel and flexible method- ology for automatically learning ontology from textual data, lightening the human workload required for conceptualizing domain-specific knowledge and populating an extracted schema with real data, speeding up the whole ontology production process. Here computational linguistics plays a fundamental role, from automati- cally identifying facts from natural language and extracting frame of relations among recognized entities, to producing linked data with which extending existing knowledge bases or creating new ones. In the state of the art, automatic ontology learning systems are mainly based on plain-pipelined linguistics classifiers performing tasks such as Named Entity recognition, Entity resolution, Taxonomy and Relation extraction [11]. These approaches present some weaknesses, specially in capturing struc- tures through which the meaning of complex concepts is expressed [24]. Humans, in fact, tend to organize knowledge in well-defined patterns, which include participant entities and meaningful relations linking entities with each other. In literature, these structures have been called Semantic Frames by Fill- 6 Introduction more [20], or more recently as Knowledge Patterns [23]. Some NLP studies has recently shown the possibility of performing more accurate deep parsing with the ability of logically understanding the structure of discourse [7]. In this work, some of these technologies have been investigated and em- ployed to produce accurate ontology schemas. The long-term goal is to collect large amounts of semantically structured information from the web of crowds, through an automated process, in order to identify and investigate the cognitive patterns used by human to organize their knowledge.

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Le sfide dell'Information Visualisation ed i limiti dei sistemi di visualizzazione esistenti hanno portato alla creazione di un nuovo sistema per la generazione automatica di visualizzazioni di Open Data quantitativi, presentato in questa tesi.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most of the existing open-source search engines, utilize keyword or tf-idf based techniques to find relevant documents and web pages relative to an input query. Although these methods, with the help of a page rank or knowledge graphs, proved to be effective in some cases, they often fail to retrieve relevant instances for more complicated queries that would require a semantic understanding to be exploited. In this Thesis, a self-supervised information retrieval system based on transformers is employed to build a semantic search engine over the library of Gruppo Maggioli company. Semantic search or search with meaning can refer to an understanding of the query, instead of simply finding words matches and, in general, it represents knowledge in a way suitable for retrieval. We chose to investigate a new self-supervised strategy to handle the training of unlabeled data based on the creation of pairs of ’artificial’ queries and the respective positive passages. We claim that by removing the reliance on labeled data, we may use the large volume of unlabeled material on the web without being limited to languages or domains where labeled data is abundant.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Oggigiorno il concetto di informazione è diventato cruciale in fisica, pertanto, siccome la migliore teoria che abbiamo per compiere predizioni riguardo l'universo è la meccanica quantistica, assume una particolare importanza lo sviluppo di una versione quantistica della teoria dell'informazione. Questa centralità è confermata dal fatto che i buchi neri hanno entropia. Per questo motivo, in questo lavoro sono presentati elementi di teoria dell'informazione quantistica e della comunicazione quantistica e alcuni sono illustrati riferendosi a modelli quantistici altamente idealizzati della meccanica di buco nero. In particolare, nel primo capitolo sono forniti tutti gli strumenti quanto-meccanici per la teoria dell'informazione e della comunicazione quantistica. Successivamente, viene affrontata la teoria dell'informazione quantistica e viene trovato il limite di Bekenstein alla quantità di informazione chiudibile entro una qualunque regione spaziale. Tale questione viene trattata utilizzando un modello quantistico idealizzato della meccanica di buco nero supportato dalla termodinamica. Nell'ultimo capitolo, viene esaminato il problema di trovare un tasso raggiungibile per la comunicazione quantistica facendo nuovamente uso di un modello quantistico idealizzato di un buco nero, al fine di illustrare elementi della teoria. Infine, un breve sommario della fisica dei buchi neri è fornito in appendice.