891 resultados para 080704 Information Retrieval and Web Search
Resumo:
Information Retrieval systems normally have to work with rather heterogeneous sources, such as Web sites or documents from Optical Character Recognition tools. The correct conversion of these sources into flat text files is not a trivial task since noise may easily be introduced as a result of spelling or typeset errors. Interestingly, this is not a great drawback when the size of the corpus is sufficiently large, since redundancy helps to overcome noise problems. However, noise becomes a serious problem in restricted-domain Information Retrieval specially when the corpus is small and has little or no redundancy. This paper devises an approach which adds noise-tolerance to Information Retrieval systems. A set of experiments carried out in the agricultural domain proves the effectiveness of the approach presented.
Resumo:
Turner-Fairbank Highway Research Center, McLean, Va.
Resumo:
COO-1469-0174.
Resumo:
Mode of access: Internet.
Resumo:
"AD 273 115."
Resumo:
The role of gender differences in the consumption of goods and services is well established in many areas of consumer behaviour and computer use and yet there has been only limited research into such gender-based differences in the information search behaviour of Internet users. This paper reports the gender-based results of an exploratory study of consumer external information search of the web. The study investigated consumer characteristics, web search behaviour, and the post web search outcomes of purchase decision status and consumer judgements of search usefulness and satisfaction. Gender-based differences are reported in all three areas. Consideration of the results suggests they are issues which could inhibit the adoption of online purchasing by female web users. The implications of these results are discussed and a future research agenda proposed.
Resumo:
Document ranking is an important process in information retrieval (IR). It presents retrieved documents in an order of their estimated degrees of relevance to query. Traditional document ranking methods are mostly based on the similarity computations between documents and query. In this paper we argue that the similarity-based document ranking is insufficient in some cases. There are two reasons. Firstly it is about the increased information variety. There are far too many different types documents available now for user to search. The second is about the users variety. In many cases user may want to retrieve documents that are not only similar but also general or broad regarding a certain topic. This is particularly the case in some domains such as bio-medical IR. In this paper we propose a novel approach to re-rank the retrieved documents by incorporating the similarity with their generality. By an ontology-based analysis on the semantic cohesion of text, document generality can be quantified. The retrieved documents are then re-ranked by their combined scores of similarity and the closeness of documents’ generality to the query’s. Our experiments have shown an encouraging performance on a large bio-medical document collection, OHSUMED, containing 348,566 medical journal references and 101 test queries.
Resumo:
Domain specific information retrieval has become in demand. Not only domain experts, but also average non-expert users are interested in searching domain specific (e.g., medical and health) information from online resources. However, a typical problem to average users is that the search results are always a mixture of documents with different levels of readability. Non-expert users may want to see documents with higher readability on the top of the list. Consequently the search results need to be re-ranked in a descending order of readability. It is often not practical for domain experts to manually label the readability of documents for large databases. Computational models of readability needs to be investigated. However, traditional readability formulas are designed for general purpose text and insufficient to deal with technical materials for domain specific information retrieval. More advanced algorithms such as textual coherence model are computationally expensive for re-ranking a large number of retrieved documents. In this paper, we propose an effective and computationally tractable concept-based model of text readability. In addition to textual genres of a document, our model also takes into account domain specific knowledge, i.e., how the domain-specific concepts contained in the document affect the document’s readability. Three major readability formulas are proposed and applied to health and medical information retrieval. Experimental results show that our proposed readability formulas lead to remarkable improvements in terms of correlation with users’ readability ratings over four traditional readability measures.
Resumo:
This article explores consumer Web-search satisfaction. It commences with a brief overview of the concepts consumer information search and consumer satisfaction. Consumer Web adoption issues are then briefly discussed and the importance of consumer search satisfaction is highlighted in relation to the adoption of the Web as an additional source of consumer information. Research hypotheses are developed and the methodology of a large scale consumer experiment to record consumer Web search behaviour is described. The hypotheses are tested and the data explored in relation to post-Web-search satisfaction. The results suggest that consumer post-Web-search satisfaction judgments may be derived from subconscious judgments of Web search efficiency, an empirical calculation of which is problematic in unlimited information environments such as the Web. The results are discussed and a future research agenda is briefly outlined.
Resumo:
Electronic publishing exploits numerous possibilities to present or exchange information and to communicate via most current media like the Internet. By utilizing modern Web technologies like Web Services, loosely coupled services, and peer-to-peer networks we describe the integration of an intelligent business news presentation and distribution network. Employing semantics technologies enables the coupling of multinational and multilingual business news data on a scalable international level and thus introduce a service quality that is not achieved by alternative technologies in the news distribution area so far. Architecturally, we identified the loose coupling of existing services as the most feasible way to address multinational and multilingual news presentation and distribution networks. Furthermore we semantically enrich multinational news contents by relating them using AI techniques like the Vector Space Model. Summarizing our experiences we describe the technical integration of semantics and communication technologies in order to create a modern international news network.
Resumo:
This paper introduces an encoding of knowledge representation statements as regular languages and proposes a two-phase approach to processing of explicitly declared conceptual information. The idea is presented for the simple conceptual graphs where conceptual pattern search is implemented by the so called projection operation. Projection calculations are organised into off-line preprocessing and run-time computations. This enables fast run-time treatment of NP-complete problems, given that the intermediate results of the off-line phase are kept in suitable data structures. The experiments with randomly-generated, middle-size knowledge bases support the claim that the suggested approach radically improves the run-time conceptual pattern search.
Resumo:
An ontological representation of buyer interests’ knowledge in process of e-commerce is proposed to use. It makes it more efficient to make a search of the most appropriate sellers via multiagent systems. An algorithm of a comparison of buyer ontology with one of e-shops (the taxonomies) and an e-commerce multiagent system are realised using ontology of information retrieval in distributed environment.
Resumo:
This is an extended version of an article presented at the Second International Conference on Software, Services and Semantic Technologies, Sofia, Bulgaria, 11–12 September 2010.
Resumo:
Published in Electronic Handling of Information: Testing and Evaluation, Kent, Taubee, Beltzer, and Goldstein (ed.), Academic Press, London(1967), pp. 123–147.