996 resultados para Relevance feedback


Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper analyses the pairwise distances of signatures produced by the TopSig retrieval model on two document collections. The distribution of the distances are compared to purely random signatures. It explains why TopSig is only competitive with state of the art retrieval models at early precision. Only the local neighbourhood of the signatures is interpretable. We suggest this is a common property of vector space models.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Term-based approaches can extract many features in text documents, but most include noise. Many popular text-mining strategies have been adapted to reduce noisy information from extracted features; however, text-mining techniques suffer from low frequency. The key issue is how to discover relevance features in text documents to fulfil user information needs. To address this issue, we propose a new method to extract specific features from user relevance feedback. The proposed approach includes two stages. The first stage extracts topics (or patterns) from text documents to focus on interesting topics. In the second stage, topics are deployed to lower level terms to address the low-frequency problem and find specific terms. The specific terms are determined based on their appearances in relevance feedback and their distribution in topics or high-level patterns. We test our proposed method with extensive experiments in the Reuters Corpus Volume 1 dataset and TREC topics. Results show that our proposed approach significantly outperforms the state-of-the-art models.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A long query provides more useful hints for searching relevant documents, but it is likely to introduce noise which affects retrieval performance. In order to smooth such adverse effect, it is important to reduce noisy terms, introduce and boost additional relevant terms. This paper presents a comprehensive framework, called Aspect Hidden Markov Model (AHMM), which integrates query reduction and expansion, for retrieval with long queries. It optimizes the probability distribution of query terms by utilizing intra-query term dependencies as well as the relationships between query terms and words observed in relevance feedback documents. Empirical evaluation on three large-scale TREC collections demonstrates that our approach, which is automatic, achieves salient improvements over various strong baselines, and also reaches a comparable performance to a state of the art method based on user’s interactive query term reduction and expansion.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

INEX investigates focused retrieval from structured documents by providing large test collections of structured documents, uniform evaluation measures, and a forum for organizations to compare their results. This paper reports on the INEX'12 evaluation campaign, which consisted of a five tracks: Linked Data, Relevance Feedback, Snippet Retrieval, Social Book Search, and Tweet Contextualization. INEX'12 was an exciting year for INEX in which we joined forces with CLEF and for the first time ran our workshop as part of the CLEF labs in order to facilitate knowledge transfer between the evaluation forums.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis studies document signatures, which are small representations of documents and other objects that can be stored compactly and compared for similarity. This research finds that document signatures can be effectively and efficiently used to both search and understand relationships between documents in large collections, scalable enough to search a billion documents in a fraction of a second. Deliverables arising from the research include an investigation of the representational capacity of document signatures, the publication of an open-source signature search platform and an approach for scaling signature retrieval to operate efficiently on collections containing hundreds of millions of documents.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The increased availability of image capturing devices has enabled collections of digital images to rapidly expand in both size and diversity. This has created a constantly growing need for efficient and effective image browsing, searching, and retrieval tools. Pseudo-relevance feedback (PRF) has proven to be an effective mechanism for improving retrieval accuracy. An original, simple yet effective rank-based PRF mechanism (RB-PRF) that takes into account the initial rank order of each image to improve retrieval accuracy is proposed. This RB-PRF mechanism innovates by making use of binary image signatures to improve retrieval precision by promoting images similar to highly ranked images and demoting images similar to lower ranked images. Empirical evaluations based on standard benchmarks, namely Wang, Oliva & Torralba, and Corel datasets demonstrate the effectiveness of the proposed RB-PRF mechanism in image retrieval.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

需求跟踪是一种描述和跟踪整个需求生命周期的手段和方法,它是需求工程的重要组成部分。一个全面的需求跟踪关系能够辅助软件开发生命周期中很多活动的执行,并为决策分析和权衡提供依据。传统手工建立和维护需求跟踪关系的方法,在实际应用中面临着成本过高、容易出错、难以维护等问题。 为了帮助克服手工建立、维护需求跟踪关系的这些缺点,研究者提出了动态需求跟踪方法。动态需求跟踪使用信息检索、自然语言处理等技术,自动化地建立需求和工作产品之间的跟踪关系,提高了需求跟踪的效率。然而已有的方法依然需要进一步完善,其中比较突出的有两个方面:一是需求跟踪关系建立的精度有待提高,二是需求跟踪关系维护的自动化程度有待增强。这两个问题制约着动态需求跟踪方法在实际软件开发过程中的应用。如果动态需求跟踪不能保证足够的精度,建立的结果对于需求跟踪的辅助作用就会大打折扣,也就会降低方法在实际使用的价值。同时,需求跟踪关系的建立并非一劳永逸,需求和工作产品随着开发过程的进行而不断变化,这就导致跟踪关系需要不断地更新和维护。自动化、半自动化的维护跟踪关系将会促进动态需求跟踪方法在实际项目中的使用。 本文研究了现有的动态跟踪技术,重点探讨了基于信息检索技术的动态需求跟踪方法。基于以上分析,本文使用相关反馈技术改善动态需求跟踪在需求跟踪关系建立和维护中面临的问题。在跟踪关系建立上,提出了一种基于相关反馈的需求跟踪关系建立方法MERTRF(A Method for Establishing Requirement Traceability based on Relevance Feedback)。该方法使用信息检索技术建立需求跟踪关系,并使用相关反馈提高跟踪建立的精度。在跟踪关系维护上,提出了一种针对工作产品变更的跟踪关系维护方法MMRTRF(A Method for Maintaining Requirement Traceability based on Relevance Feedback from work product changes)。该方法针对工作产品的不同变更类型采用不同的维护方法自动更新跟踪关系,并使用相关反馈技术提高维护的精度和效率。同时,本文实现了提出的方法,完成了一个动态需求跟踪管理工具DRTMTool(Dynamic Requirement Traceability Management Tool)。该工具使用信息检索技术来实现跟踪关系的建立,使用向量空间模型等信息检索模型计算需求文档和工作产品之间的相似度,通过自动翻译解决中文文本和英文代码之间的匹配问题。最后本文通过使用实际的项目数据进行实验,验证了动态跟踪工具的效果。

Relevância:

60.00% 60.00%

Publicador:

Resumo:

ImageRover is a search by image content navigation tool for the world wide web. To gather images expediently, the image collection subsystem utilizes a distributed fleet of WWW robots running on different computers. The image robots gather information about the images they find, computing the appropriate image decompositions and indices, and store this extracted information in vector form for searches based on image content. At search time, users can iteratively guide the search through the selection of relevant examples. Search performance is made efficient through the use of an approximate, optimized k-d tree algorithm. The system employs a novel relevance feedback algorithm that selects the distance metrics appropriate for a particular query.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The wealth of information available freely on the web and medical image databases poses a major problem for the end users: how to find the information needed? Content –Based Image Retrieval is the obvious solution.A standard called MPEG-7 was evolved to address the interoperability issues of content-based search.The work presented in this thesis mainly concentrates on developing new shape descriptors and a framework for content – based retrieval of scoliosis images.New region-based and contour based shape descriptor is developed based on orthogonal Legendre polymomials.A novel system for indexing and retrieval of digital spine radiographs with scoliosis is presented.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Co-training is a semi-supervised learning method that is designed to take advantage of the redundancy that is present when the object to be identified has multiple descriptions. Co-training is known to work well when the multiple descriptions are conditional independent given the class of the object. The presence of multiple descriptions of objects in the form of text, images, audio and video in multimedia applications appears to provide redundancy in the form that may be suitable for co-training. In this paper, we investigate the suitability of utilizing text and image data from the Web for co-training. We perform measurements to find indications of conditional independence in the texts and images obtained from the Web. Our measurements suggest that conditional independence is likely to be present in the data. Our experiments, within a relevance feedback framework to test whether a method that exploits the conditional independence outperforms methods that do not, also indicate that better performance can indeed be obtained by designing algorithms that exploit this form of the redundancy when it is present.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We investigate whether dimensionality reduction using a latent generative model is beneficial for the task of weakly supervised scene classification. In detail, we are given a set of labeled images of scenes (for example, coast, forest, city, river, etc.), and our objective is to classify a new image into one of these categories. Our approach consists of first discovering latent ";topics"; using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature here applied to a bag of visual words representation for each image, and subsequently, training a multiway classifier on the topic distribution vector for each image. We compare this approach to that of representing each image by a bag of visual words vector directly and training a multiway classifier on these vectors. To this end, we introduce a novel vocabulary using dense color SIFT descriptors and then investigate the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM). We achieve superior classification performance to recent publications that have used a bag of visual word representation, in all cases, using the authors' own data sets and testing protocols. We also investigate the gain in adding spatial information. We show applications to image retrieval with relevance feedback and to scene classification in videos

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Traditional content-based image retrieval (CBIR) systems use low-level features such as colors, shapes, and textures of images. Although, users make queries based on semantics, which are not easily related to such low-level characteristics. Recent works on CBIR confirm that researchers have been trying to map visual low-level characteristics and high-level semantics. The relation between low-level characteristics and image textual information has motivated this article which proposes a model for automatic classification and categorization of words associated to images. This proposal considers a self-organizing neural network architecture, which classifies textual information without previous learning. Experimental results compare the performance results of the text-based approach to an image retrieval system based on low-level features. (c) 2008 Wiley Periodicals, Inc.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Conventional relevance feedback schemes may not be suitable to all practical applications of content-based image retrieval (CBIR), since most ordinary users would like to complete their search in a single interaction, especially on the web search. In this paper, we explore a new approach to improve the retrieval performance based on a new concept, bag of images, rather than relevance feedback. We consider that image collection comprises of image bags instead of independent individual images. Each image bag includes some relevant images with the same perceptual meaning. A theoretical case study demonstrates that image retrieval can benefit from the new concept. A number of experimental results show that the CBIR scheme based on bag of images can improve the retrieval performance dramatically.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In conventional content based image retrieval (CBIR) employing relevance feedback, one implicit assumption is that both pure positive and negative examples are available. However it is not always true in the practical applications of CBIR. In this paper, we address a new problem of image retrieval using several unclean positive examples, named noisy query, in which some mislabeled images or weak relevant images present. The proposed image retrieval scheme measures the image similarity by combining multiple feature distances. Incorporating data cleaning and noise tolerant classifier, a twostep strategy is proposed to handle noisy positive examples. Experiments carried out on a subset of Corel image collection show that the proposed scheme outperforms the competing image retrieval schemes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Conventional content-based image retrieval (CBIR) schemes employing relevance feedback may suffer from some problems in the practical applications. First, most ordinary users would like to complete their search in a single interaction especially on the web. Second, it is time consuming and difficult to label a lot of negative examples with sufficient variety. Third, ordinary users may introduce some noisy examples into the query. This correspondence explores solutions to a new issue that image retrieval using unclean positive examples. In the proposed scheme, multiple feature distances are combined to obtain image similarity using classification technology. To handle the noisy positive examples, a new two-step strategy is proposed by incorporating the methods of data cleaning and noise tolerant classifier. The extensive experiments carried out on two different real image collections validate the effectiveness of the proposed scheme.