855 resultados para Semantic Search
Resumo:
Questa tesi riguarda lo sviluppo di un'applicazione che sfrutta le tecnologie del Web Semantico e del Text Mining. L'applicazione rappresenta l'estensione di un lavoro relativo ad una tesi precedente, aggiungendo ad esso la funzionalità di ricerca semantica. Tale funzionalità permette il recupero di informazioni che con il metodo di ricerca normale non verrebbero considerate. Per raggiungere questo risultato si utilizza WordNet, un database semantico-lessicale, e una libreria per la Latent Semantic Analysis, una tecnica del Text Mining.
Resumo:
The emergence of cloud datacenters enhances the capability of online data storage. Since massive data is stored in datacenters, it is necessary to effectively locate and access interest data in such a distributed system. However, traditional search techniques only allow users to search images over exact-match keywords through a centralized index. These techniques cannot satisfy the requirements of content based image retrieval (CBIR). In this paper, we propose a scalable image retrieval framework which can efficiently support content similarity search and semantic search in the distributed environment. Its key idea is to integrate image feature vectors into distributed hash tables (DHTs) by exploiting the property of locality sensitive hashing (LSH). Thus, images with similar content are most likely gathered into the same node without the knowledge of any global information. For searching semantically close images, the relevance feedback is adopted in our system to overcome the gap between low-level features and high-level features. We show that our approach yields high recall rate with good load balance and only requires a few number of hops.
Resumo:
En esta tesis se estudia la representación, modelado y comparación de colecciones mediante el uso de ontologías en el ámbito de la Web Semántica. Las colecciones, entendidas como agrupaciones de objetos o elementos con entidad propia, son construcciones que aparecen frecuentemente en prácticamente todos los dominios del mundo real, y por tanto, es imprescindible disponer de conceptualizaciones de estas estructuras abstractas y de representaciones de estas conceptualizaciones en los sistemas informáticos, que definan adecuadamente su semántica. Mientras que en muchos ámbitos de la Informática y la Inteligencia Artificial, como por ejemplo la programación, las bases de datos o la recuperación de información, las colecciones han sido ampliamente estudiadas y se han desarrollado representaciones que responden a multitud de conceptualizaciones, en el ámbito de la Web Semántica, sin embargo, su estudio ha sido bastante limitado. De hecho hasta la fecha existen pocas propuestas de representación de colecciones mediante ontologías, y las que hay sólo cubren algunos tipos de colecciones y presentan importantes limitaciones. Esto impide la representación adecuada de colecciones y dificulta otras tareas comunes como la comparación de colecciones, algo crítico en operaciones habituales como las búsquedas semánticas o el enlazado de datos en la Web Semántica. Para solventar este problema esta tesis hace una propuesta de modelización de colecciones basada en una nueva clasificación de colecciones de acuerdo a sus características estructurales (homogeneidad, unicidad, orden y cardinalidad). Esta clasificación permite definir una taxonomía con hasta 16 tipos de colecciones distintas. Entre otras ventajas, esta nueva clasificación permite aprovechar la semántica de las propiedades estructurales de cada tipo de colección para realizar comparaciones utilizando las funciones de similitud y disimilitud más apropiadas. De este modo, la tesis desarrolla además un nuevo catálogo de funciones de similitud para las distintas colecciones, donde se han recogido las funciones de (di)similitud más conocidas y también algunas nuevas. Esta propuesta se ha implementado mediante dos ontologías paralelas, la ontología E-Collections, que representa los distintos tipos de colecciones de la taxonomía y su axiomática, y la ontología SIMEON (Similarity Measures Ontology) que representa los tipos de funciones de (di)similitud para cada tipo de colección. Gracias a estas ontologías, para comparar dos colecciones, una vez representadas como instancias de la clase más apropiada de la ontología E-Collections, automáticamente se sabe qué funciones de (di)similitud de la ontología SIMEON pueden utilizarse para su comparación. Abstract This thesis studies the representation, modeling and comparison of collections in the Semantic Web using ontologies. Collections, understood as groups of objects or elements with their own identities, are constructions that appear frequently in almost all areas of the real world. Therefore, it is essential to have conceptualizations of these abstract structures and representations of these conceptualizations in computer systems, that define their semantic properly. While in many areas of Computer Science and Artificial Intelligence, such as Programming, Databases or Information Retrieval, the collections have been extensively studied and there are representations that match many conceptualizations, in the field Semantic Web, however, their study has been quite limited. In fact, there are few representations of collections using ontologies so far, and they only cover some types of collections and have important limitations. This hinders a proper representation of collections and other common tasks like comparing collections, something critical in usual operations such as semantic search or linking data on the Semantic Web. To solve this problem this thesis makes a proposal for modelling collections based on a new classification of collections according to their structural characteristics (homogeneity, uniqueness, order and cardinality). This classification allows to define a taxonomy with up to 16 different types of collections. Among other advantages, this new classification can leverage the semantics of the structural properties of each type of collection to make comparisons using the most appropriate (dis)similarity functions. Thus, the thesis also develops a new catalog of similarity functions for the different types of collections. This catalog contains the most common (dis)similarity functions as well as new ones. This proposal is implemented through two parallel ontologies, the E-Collections ontology that represents the different types of collections in the taxonomy and their axiomatic, and the SIMEON ontology (Similarity Measures Ontology) that represents the types of (dis)similarity functions for each type of collection. Thanks to these ontologies, to compare two collections, once represented as instances of the appropriate class of E-Collections ontology, we can know automatically which (dis)similarity functions of the SIMEON ontology are suitable for the comparison. Finally, the feasibility and usefulness of this modeling and comparison of collections proposal is proved in the field of oenology, applying both E-Collections and SIMEON ontologies to the representation and comparison of wines with the E-Baco ontology.
Resumo:
Este trabajo presenta el uso de una ontología en el dominio financiero para la expansión de consultas con el fin de mejorar los resultados de un sistema de recuperación de información (RI) financiera. Este sistema está compuesto por una ontología y un índice de Lucene que permite recuperación de conceptos identificados mediante procesamiento de lenguaje natural. Se ha llevado a cabo una evaluación con un conjunto limitado de consultas y los resultados indican que la ambigüedad sigue siendo un problema al expandir la consulta. En ocasiones, la elección de las entidades adecuadas a la hora de expandir las consultas (filtrando por sector, empresa, etc.) permite resolver esa ambigüedad.
Resumo:
Automatic Term Recognition (ATR) is a fundamental processing step preceding more complex tasks such as semantic search and ontology learning. From a large number of methodologies available in the literature only a few are able to handle both single and multi-word terms. In this paper we present a comparison of five such algorithms and propose a combined approach using a voting mechanism. We evaluated the six approaches using two different corpora and show how the voting algorithm performs best on one corpus (a collection of texts from Wikipedia) and less well using the Genia corpus (a standard life science corpus). This indicates that choice and design of corpus has a major impact on the evaluation of term recognition algorithms. Our experiments also showed that single-word terms can be equally important and occupy a fairly large proportion in certain domains. As a result, algorithms that ignore single-word terms may cause problems to tasks built on top of ATR. Effective ATR systems also need to take into account both the unstructured text and the structured aspects and this means information extraction techniques need to be integrated into the term recognition process.
Resumo:
The main challenges of multimedia data retrieval lie in the effective mapping between low-level features and high-level concepts, and in the individual users' subjective perceptions of multimedia content. ^ The objectives of this dissertation are to develop an integrated multimedia indexing and retrieval framework with the aim to bridge the gap between semantic concepts and low-level features. To achieve this goal, a set of core techniques have been developed, including image segmentation, content-based image retrieval, object tracking, video indexing, and video event detection. These core techniques are integrated in a systematic way to enable the semantic search for images/videos, and can be tailored to solve the problems in other multimedia related domains. In image retrieval, two new methods of bridging the semantic gap are proposed: (1) for general content-based image retrieval, a stochastic mechanism is utilized to enable the long-term learning of high-level concepts from a set of training data, such as user access frequencies and access patterns of images. (2) In addition to whole-image retrieval, a novel multiple instance learning framework is proposed for object-based image retrieval, by which a user is allowed to more effectively search for images that contain multiple objects of interest. An enhanced image segmentation algorithm is developed to extract the object information from images. This segmentation algorithm is further used in video indexing and retrieval, by which a robust video shot/scene segmentation method is developed based on low-level visual feature comparison, object tracking, and audio analysis. Based on shot boundaries, a novel data mining framework is further proposed to detect events in soccer videos, while fully utilizing the multi-modality features and object information obtained through video shot/scene detection. ^ Another contribution of this dissertation is the potential of the above techniques to be tailored and applied to other multimedia applications. This is demonstrated by their utilization in traffic video surveillance applications. The enhanced image segmentation algorithm, coupled with an adaptive background learning algorithm, improves the performance of vehicle identification. A sophisticated object tracking algorithm is proposed to track individual vehicles, while the spatial and temporal relationships of vehicle objects are modeled by an abstract semantic model. ^
Resumo:
Increasing the size of training data in many computer vision tasks has shown to be very effective. Using large scale image datasets (e.g. ImageNet) with simple learning techniques (e.g. linear classifiers) one can achieve state-of-the-art performance in object recognition compared to sophisticated learning techniques on smaller image sets. Semantic search on visual data has become very popular. There are billions of images on the internet and the number is increasing every day. Dealing with large scale image sets is intense per se. They take a significant amount of memory that makes it impossible to process the images with complex algorithms on single CPU machines. Finding an efficient image representation can be a key to attack this problem. A representation being efficient is not enough for image understanding. It should be comprehensive and rich in carrying semantic information. In this proposal we develop an approach to computing binary codes that provide a rich and efficient image representation. We demonstrate several tasks in which binary features can be very effective. We show how binary features can speed up large scale image classification. We present learning techniques to learn the binary features from supervised image set (With different types of semantic supervision; class labels, textual descriptions). We propose several problems that are very important in finding and using efficient image representation.
Resumo:
Most of the existing open-source search engines, utilize keyword or tf-idf based techniques to find relevant documents and web pages relative to an input query. Although these methods, with the help of a page rank or knowledge graphs, proved to be effective in some cases, they often fail to retrieve relevant instances for more complicated queries that would require a semantic understanding to be exploited. In this Thesis, a self-supervised information retrieval system based on transformers is employed to build a semantic search engine over the library of Gruppo Maggioli company. Semantic search or search with meaning can refer to an understanding of the query, instead of simply finding words matches and, in general, it represents knowledge in a way suitable for retrieval. We chose to investigate a new self-supervised strategy to handle the training of unlabeled data based on the creation of pairs of ’artificial’ queries and the respective positive passages. We claim that by removing the reliance on labeled data, we may use the large volume of unlabeled material on the web without being limited to languages or domains where labeled data is abundant.
Resumo:
L’Exploratory Search, paradigma di ricerca basato sulle attività di scoperta e d’apprendimento, è stato per diverso tempo ignorato dai motori di ricerca tradizionali. Invece, è spesso dalle ricerche esplorative che nascono le idee più innovative. Le recenti tecnologie del Semantic Web forniscono le soluzioni che permettono d’implementare dei motori di ricerca capaci di accompagnare gli utenti impegnati in tale tipo di ricerca. Aemoo, motore di ricerca sul quale s’appoggia questa tesi ne è un esempio efficace. A partire da quest’ultimo e sempre con l’aiuto delle tecnologie del Web of Data, questo lavoro si propone di fornire una metodologia che permette di prendere in considerazione la singolarità del profilo di ciascun utente al fine di guidarlo nella sua ricerca esplorativa in modo personalizzato. Il criterio di personalizzazione che abbiamo scelto è comportamentale, ovvero basato sulle decisioni che l’utente prende ad ogni tappa che ritma il processo di ricerca. Implementando un prototipo, abbiamo potuto testare la validità di quest’approccio permettendo quindi all’utente di non essere più solo nel lungo e tortuoso cammino che porta alla conoscenza.
Resumo:
The web is continuously evolving into a collection of many data, which results in the interest to collect and merge these data in a meaningful way. Based on that web data, this paper describes the building of an ontology resting on fuzzy clustering techniques. Through continual harvesting folksonomies by web agents, an entire automatic fuzzy grassroots ontology is built. This self-updating ontology can then be used for several practical applications in fields such as web structuring, web searching and web knowledge visualization.A potential application for online reputation analysis, added value and possible future studies are discussed in the conclusion.
Resumo:
The expansion of the Internet has made the task of searching a crucial one. Internet users, however, have to make a great effort in order to formulate a search query that returns the required results. Many methods have been devised to assist in this task by helping the users modify their query to give better results. In this paper we propose an interactive method for query expansion. It is based on the observation that documents are often found to contain terms with high information content, which can summarise their subject matter. We present experimental results, which demonstrate that our approach significantly shortens the time required in order to accomplish a certain task by performing web searches.
Resumo:
The design of interfaces to facilitate user search has become critical for search engines, ecommercesites, and intranets. This study investigated the use of targeted instructional hints to improve search by measuring the quantitative effects of users' performance and satisfaction. The effects of syntactic, semantic and exemplar search hints on user behavior were evaluated in an empirical investigation using naturalistic scenarios. Combining the three search hint components, each with two levels of intensity, in a factorial design generated eight search engine interfaces. Eighty participants participated in the study and each completed six realistic search tasks. Results revealed that the inclusion of search hints improved user effectiveness, efficiency and confidence when using the search interfaces, but with complex interactions that require specific guidelines for search interface designers. These design guidelines will allow search designers to create more effective interfaces for a variety of searchapplications.
Resumo:
Question Answering systems that resort to the Semantic Web as a knowledge base can go well beyond the usual matching words in documents and, preferably, find a precise answer, without requiring user help to interpret the documents returned. In this paper, the authors introduce a Dialogue Manager that, through the analysis of the question and the type of expected answer, provides accurate answers to the questions posed in Natural Language. The Dialogue Manager not only represents the semantics of the questions, but also represents the structure of the discourse, including the user intentions and the questions context, adding the ability to deal with multiple answers and providing justified answers. The authors’ system performance is evaluated by comparing with similar question answering systems. Although the test suite is slight dimension, the results obtained are very promising.
Resumo:
This paper contains a new proposal for the definition of the fundamental operation of query under the Adaptive Formalism, one capable of locating functional nuclei from descriptions of their semantics. To demonstrate the method`s applicability, an implementation of the query procedure constrained to a specific class of devices is shown, and its asymptotic computational complexity is discussed.