889 resultados para Topic Ontology, User Profiles, Pelevance Assessment, Information Retrieval


Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Intelligent agents are an advanced technology utilized in Web Intelligence. When searching information from a distributed Web environment, information is retrieved by multi-agents on the client site and fused on the broker site. The current information fusion techniques rely on cooperation of agents to provide statistics. Such techniques are computationally expensive and unrealistic in the real world. In this paper, we introduce a model that uses a world ontology constructed from the Dewey Decimal Classification to acquire user profiles. By search using specific and exhaustive user profiles, information fusion techniques no longer rely on the statistics provided by agents. The model has been successfully evaluated using the large INEX data set simulating the distributed Web environment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis targets on a challenging issue that is to enhance users' experience over massive and overloaded web information. The novel pattern-based topic model proposed in this thesis can generate high-quality multi-topic user interest models technically by incorporating statistical topic modelling and pattern mining. We have successfully applied the pattern-based topic model to both fields of information filtering and information retrieval. The success of the proposed model in finding the most relevant information to users mainly comes from its precisely semantic representations to represent documents and also accurate classification of the topics at both document level and collection level.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The central objective of research in Information Retrieval (IR) is to discover new techniques to retrieve relevant information in order to satisfy an Information Need. The Information Need is satisfied when relevant information can be provided to the user. In IR, relevance is a fundamental concept which has changed over time, from popular to personal, i.e., what was considered relevant before was information for the whole population, but what is considered relevant now is specific information for each user. Hence, there is a need to connect the behavior of the system to the condition of a particular person and his social context; thereby an interdisciplinary sector called Human-Centered Computing was born. For the modern search engine, the information extracted for the individual user is crucial. According to the Personalized Search (PS), two different techniques are necessary to personalize a search: contextualization (interconnected conditions that occur in an activity), and individualization (characteristics that distinguish an individual). This movement of focus to the individual's need undermines the rigid linearity of the classical model overtaken the ``berry picking'' model which explains that the terms change thanks to the informational feedback received from the search activity introducing the concept of evolution of search terms. The development of Information Foraging theory, which observed the correlations between animal foraging and human information foraging, also contributed to this transformation through attempts to optimize the cost-benefit ratio. This thesis arose from the need to satisfy human individuality when searching for information, and it develops a synergistic collaboration between the frontiers of technological innovation and the recent advances in IR. The search method developed exploits what is relevant for the user by changing radically the way in which an Information Need is expressed, because now it is expressed through the generation of the query and its own context. As a matter of fact the method was born under the pretense to improve the quality of search by rewriting the query based on the contexts automatically generated from a local knowledge base. Furthermore, the idea of optimizing each IR system has led to develop it as a middleware of interaction between the user and the IR system. Thereby the system has just two possible actions: rewriting the query, and reordering the result. Equivalent actions to the approach was described from the PS that generally exploits information derived from analysis of user behavior, while the proposed approach exploits knowledge provided by the user. The thesis went further to generate a novel method for an assessment procedure, according to the "Cranfield paradigm", in order to evaluate this type of IR systems. The results achieved are interesting considering both the effectiveness achieved and the innovative approach undertaken together with the several applications inspired using a local knowledge base.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Document ranking is an important process in information retrieval (IR). It presents retrieved documents in an order of their estimated degrees of relevance to query. Traditional document ranking methods are mostly based on the similarity computations between documents and query. In this paper we argue that the similarity-based document ranking is insufficient in some cases. There are two reasons. Firstly it is about the increased information variety. There are far too many different types documents available now for user to search. The second is about the users variety. In many cases user may want to retrieve documents that are not only similar but also general or broad regarding a certain topic. This is particularly the case in some domains such as bio-medical IR. In this paper we propose a novel approach to re-rank the retrieved documents by incorporating the similarity with their generality. By an ontology-based analysis on the semantic cohesion of text, document generality can be quantified. The retrieved documents are then re-ranked by their combined scores of similarity and the closeness of documents’ generality to the query’s. Our experiments have shown an encouraging performance on a large bio-medical document collection, OHSUMED, containing 348,566 medical journal references and 101 test queries.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The rapid growth of the Internet and the advancements of the Web technologies have made it possible for users to have access to large amounts of on-line music data, including music acoustic signals, lyrics, style/mood labels, and user-assigned tags. The progress has made music listening more fun, but has raised an issue of how to organize this data, and more generally, how computer programs can assist users in their music experience. An important subject in computer-aided music listening is music retrieval, i.e., the issue of efficiently helping users in locating the music they are looking for. Traditionally, songs were organized in a hierarchical structure such as genre->artist->album->track, to facilitate the users’ navigation. However, the intentions of the users are often hard to be captured in such a simply organized structure. The users may want to listen to music of a particular mood, style or topic; and/or any songs similar to some given music samples. This motivated us to work on user-centric music retrieval system to improve users’ satisfaction with the system. The traditional music information retrieval research was mainly concerned with classification, clustering, identification, and similarity search of acoustic data of music by way of feature extraction algorithms and machine learning techniques. More recently the music information retrieval research has focused on utilizing other types of data, such as lyrics, user-access patterns, and user-defined tags, and on targeting non-genre categories for classification, such as mood labels and styles. This dissertation focused on investigating and developing effective data mining techniques for (1) organizing and annotating music data with styles, moods and user-assigned tags; (2) performing effective analysis of music data with features from diverse information sources; and (3) recommending music songs to the users utilizing both content features and user access patterns.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Information Retrieval is an important albeit imperfect component of information technologies. A problem of insufficient diversity of retrieved documents is one of the primary issues studied in this research. This study shows that this problem leads to a decrease of precision and recall, traditional measures of information retrieval effectiveness. This thesis presents an adaptive IR system based on the theory of adaptive dual control. The aim of the approach is the optimization of retrieval precision after all feedback has been issued. This is done by increasing the diversity of retrieved documents. This study shows that the value of recall reflects this diversity. The Probability Ranking Principle is viewed in the literature as the “bedrock” of current probabilistic Information Retrieval theory. Neither the proposed approach nor other methods of diversification of retrieved documents from the literature conform to this principle. This study shows by counterexample that the Probability Ranking Principle does not in general lead to optimal precision in a search session with feedback (for which it may not have been designed but is actively used). Retrieval precision of the search session should be optimized with a multistage stochastic programming model to accomplish the aim. However, such models are computationally intractable. Therefore, approximate linear multistage stochastic programming models are derived in this study, where the multistage improvement of the probability distribution is modelled using the proposed feedback correctness method. The proposed optimization models are based on several assumptions, starting with the assumption that Information Retrieval is conducted in units of topics. The use of clusters is the primary reasons why a new method of probability estimation is proposed. The adaptive dual control of topic-based IR system was evaluated in a series of experiments conducted on the Reuters, Wikipedia and TREC collections of documents. The Wikipedia experiment revealed that the dual control feedback mechanism improves precision and S-recall when all the underlying assumptions are satisfied. In the TREC experiment, this feedback mechanism was compared to a state-of-the-art adaptive IR system based on BM-25 term weighting and the Rocchio relevance feedback algorithm. The baseline system exhibited better effectiveness than the cluster-based optimization model of ADTIR. The main reason for this was insufficient quality of the generated clusters in the TREC collection that violated the underlying assumption.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays, everyone can effortlessly access a range of information on the World Wide Web (WWW). As information resources on the web continue to grow tremendously, it becomes progressively more difficult to meet high expectations of users and find relevant information. Although existing search engine technologies can find valuable information, however, they suffer from the problems of information overload and information mismatch. This paper presents a hybrid Web Information Retrieval approach allowing personalised search using ontology, user profile and collaborative filtering. This approach finds the context of user query with least user’s involvement, using ontology. Simultaneously, this approach uses time-based automatic user profile updating with user’s changing behaviour. Subsequently, this approach uses recommendations from similar users using collaborative filtering technique. The proposed method is evaluated with the FIRE 2010 dataset and manually generated dataset. Empirical analysis reveals that Precision, Recall and F-Score of most of the queries for many users are improved with proposed method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Relevation! is a system for performing relevance judgements for information retrieval evaluation. Relevation! is web-based, fully configurable and expandable; it allows researchers to effectively collect assessments and additional qualitative data. The system is easily deployed allowing assessors to smoothly perform their relevance judging tasks, even remotely. Relevation! is available as an open source project at: http://ielab.github.io/relevation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis presents new methods for classification and thematic grouping of billions of web pages, at scales previously not achievable. This process is also known as document clustering, where similar documents are automatically associated with clusters that represent various distinct topic. These automatically discovered topics are in turn used to improve search engine performance by only searching the topics that are deemed relevant to particular user queries.