873 resultados para 080704 Information Retrieval and Web Search
Resumo:
It is a big challenge to clearly identify the boundary between positive and negative streams. Several attempts have used negative feedback to solve this challenge; however, there are two issues for using negative relevance feedback to improve the effectiveness of information filtering. The first one is how to select constructive negative samples in order to reduce the space of negative documents. The second issue is how to decide noisy extracted features that should be updated based on the selected negative samples. This paper proposes a pattern mining based approach to select some offenders from the negative documents, where an offender can be used to reduce the side effects of noisy features. It also classifies extracted features (i.e., terms) into three categories: positive specific terms, general terms, and negative specific terms. In this way, multiple revising strategies can be used to update extracted features. An iterative learning algorithm is also proposed to implement this approach on RCV1, and substantial experiments show that the proposed approach achieves encouraging performance.
Resumo:
Over the years, people have often held the hypothesis that negative feedback should be very useful for largely improving the performance of information filtering systems; however, we have not obtained very effective models to support this hypothesis. This paper, proposes an effective model that use negative relevance feedback based on a pattern mining approach to improve extracted features. This study focuses on two main issues of using negative relevance feedback: the selection of constructive negative examples to reduce the space of negative examples; and the revision of existing features based on the selected negative examples. The former selects some offender documents, where offender documents are negative documents that are most likely to be classified in the positive group. The later groups the extracted features into three groups: the positive specific category, general category and negative specific category to easily update the weight. An iterative algorithm is also proposed to implement this approach on RCV1 data collections, and substantial experiments show that the proposed approach achieves encouraging performance.
Resumo:
Recommender Systems is one of the effective tools to deal with information overload issue. Similar with the explicit rating and other implicit rating behaviours such as purchase behaviour, click streams, and browsing history etc., the tagging information implies user’s important personal interests and preferences information, which can be used to recommend personalized items to users. This paper is to explore how to utilize tagging information to do personalized recommendations. Based on the distinctive three dimensional relationships among users, tags and items, a new user profiling and similarity measure method is proposed. The experiments suggest that the proposed approach is better than the traditional collaborative filtering recommender systems using only rating data.
Resumo:
With the size and state of the Internet today, a good quality approach to organizing this mass of information is of great importance. Clustering web pages into groups of similar documents is one approach, but relies heavily on good feature extraction and document representation as well as a good clustering approach and algorithm. Due to the changing nature of the Internet, resulting in a dynamic dataset, an incremental approach is preferred. In this work we propose an enhanced incremental clustering approach to develop a better clustering algorithm that can help to better organize the information available on the Internet in an incremental fashion. Experiments show that the enhanced algorithm outperforms the original histogram based algorithm by up to 7.5%.
Resumo:
Current multimedia Web search engines still use keywords as the primary means to search. Due to the richness in multimedia contents, general users constantly experience some difficulties in formulating textual queries that are representative enough for their needs. As a result, query reformulation becomes part of an inevitable process in most multimedia searches. Previous Web query formulation studies did not investigate the modification sequences and thus can only report limited findings on the reformulation behavior. In this study, we propose an automatic approach to examine multimedia query reformulation using large-scale transaction logs. The key findings show that search term replacement is the most dominant type of modifications in visual searches but less important in audio searches. Image search users prefer the specified search strategy more than video and audio users. There is also a clear tendency to replace terms with synonyms or associated terms in visual queries. The analysis of the search strategies in different types of multimedia searching provides some insights into user’s searching behavior, which can contribute to the design of future query formulation assistance for keyword-based Web multimedia retrieval systems.