886 resultados para Information Retrieval, Document Databases, Digital Libraries


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Peer to peer systems have been widely used in the internet. However, most of the peer to peer information systems are still missing some of the important features, for example cross-language IR (Information Retrieval) and collection selection / fusion features. Cross-language IR is the state-of-art research area in IR research community. It has not been used in any real world IR systems yet. Cross-language IR has the ability to issue a query in one language and receive documents in other languages. In typical peer to peer environment, users are from multiple countries. Their collections are definitely in multiple languages. Cross-language IR can help users to find documents more easily. E.g. many Chinese researchers will search research papers in both Chinese and English. With Cross-language IR, they can do one query in Chinese and get documents in two languages. The Out Of Vocabulary (OOV) problem is one of the key research areas in crosslanguage information retrieval. In recent years, web mining was shown to be one of the effective approaches to solving this problem. However, how to extract Multiword Lexical Units (MLUs) from the web content and how to select the correct translations from the extracted candidate MLUs are still two difficult problems in web mining based automated translation approaches. Discovering resource descriptions and merging results obtained from remote search engines are two key issues in distributed information retrieval studies. In uncooperative environments, query-based sampling and normalized-score based merging strategies are well-known approaches to solve such problems. However, such approaches only consider the content of the remote database but do not consider the retrieval performance of the remote search engine. This thesis presents research on building a peer to peer IR system with crosslanguage IR and advance collection profiling technique for fusion features. Particularly, this thesis first presents a new Chinese term measurement and new Chinese MLU extraction process that works well on small corpora. An approach to selection of MLUs in a more accurate manner is also presented. After that, this thesis proposes a collection profiling strategy which can discover not only collection content but also retrieval performance of the remote search engine. Based on collection profiling, a web-based query classification method and two collection fusion approaches are developed and presented in this thesis. Our experiments show that the proposed strategies are effective in merging results in uncooperative peer to peer environments. Here, an uncooperative environment is defined as each peer in the system is autonomous. Peer like to share documents but they do not share collection statistics. This environment is a typical peer to peer IR environment. Finally, all those approaches are grouped together to build up a secure peer to peer multilingual IR system that cooperates through X.509 and email system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Information Retrieval is an important albeit imperfect component of information technologies. A problem of insufficient diversity of retrieved documents is one of the primary issues studied in this research. This study shows that this problem leads to a decrease of precision and recall, traditional measures of information retrieval effectiveness. This thesis presents an adaptive IR system based on the theory of adaptive dual control. The aim of the approach is the optimization of retrieval precision after all feedback has been issued. This is done by increasing the diversity of retrieved documents. This study shows that the value of recall reflects this diversity. The Probability Ranking Principle is viewed in the literature as the “bedrock” of current probabilistic Information Retrieval theory. Neither the proposed approach nor other methods of diversification of retrieved documents from the literature conform to this principle. This study shows by counterexample that the Probability Ranking Principle does not in general lead to optimal precision in a search session with feedback (for which it may not have been designed but is actively used). Retrieval precision of the search session should be optimized with a multistage stochastic programming model to accomplish the aim. However, such models are computationally intractable. Therefore, approximate linear multistage stochastic programming models are derived in this study, where the multistage improvement of the probability distribution is modelled using the proposed feedback correctness method. The proposed optimization models are based on several assumptions, starting with the assumption that Information Retrieval is conducted in units of topics. The use of clusters is the primary reasons why a new method of probability estimation is proposed. The adaptive dual control of topic-based IR system was evaluated in a series of experiments conducted on the Reuters, Wikipedia and TREC collections of documents. The Wikipedia experiment revealed that the dual control feedback mechanism improves precision and S-recall when all the underlying assumptions are satisfied. In the TREC experiment, this feedback mechanism was compared to a state-of-the-art adaptive IR system based on BM-25 term weighting and the Rocchio relevance feedback algorithm. The baseline system exhibited better effectiveness than the cluster-based optimization model of ADTIR. The main reason for this was insufficient quality of the generated clusters in the TREC collection that violated the underlying assumption.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a framework for evaluating information retrieval of medical records. We use the BLULab corpus, a large collection of real-world de-identified medical records. The collection has been hand coded by clinical terminol- ogists using the ICD-9 medical classification system. The ICD codes are used to devise queries and relevance judge- ments for this collection. Results of initial test runs using a baseline IR system are provided. Queries and relevance judgements are online to aid further research in medical IR. Please visit: http://koopman.id.au/med_eval.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

RÉSUMÉ. La prise en compte des troubles de la communication dans l’utilisation des systèmes de recherche d’information tels qu’on peut en trouver sur le Web est généralement réalisée par des interfaces utilisant des modalités n’impliquant pas la lecture et l’écriture. Peu d’applications existent pour aider l’utilisateur en difficulté dans la modalité textuelle. Nous proposons la prise en compte de la conscience phonologique pour assister l’utilisateur en difficulté d’écriture de requêtes (dysorthographie) ou de lecture de documents (dyslexie). En premier lieu un système de réécriture et d’interprétation des requêtes entrées au clavier par l’utilisateur est proposé : en s’appuyant sur les causes de la dysorthographie et sur les exemples à notre disposition, il est apparu qu’un système combinant une approche éditoriale (type correcteur orthographique) et une approche orale (système de transcription automatique) était plus approprié. En second lieu une méthode d’apprentissage automatique utilise des critères spécifiques , tels que la cohésion grapho-phonémique, pour estimer la lisibilité d’une phrase, puis d’un texte. ABSTRACT. Most applications intend to help disabled users in the information retrieval process by proposing non-textual modalities. This paper introduces specific parameters linked to phonological awareness in the textual modality. This will enhance the ability of systems to deal with orthographic issues and with the adaptation of results to the reader when for example the reader is dyslexic. We propose a phonology based sentence level rewriting system that combines spelling correction, speech synthesis and automatic speech recognition. This has been evaluated on a corpus of questions we get from dyslexic children. We propose a specific sentence readability measure that involves phonetic parameters such as grapho-phonemic cohesion. This has been learned on a corpus of reading time of sentences read by dyslexic children.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purpose – Interactive information retrieval (IR) involves many human cognitive shifts at different information behaviour levels. Cognitive science defines a cognitive shift or shift in cognitive focus as triggered by the brain's response and change due to some external force. This paper aims to provide an explication of the concept of “cognitive shift” and then report results from a study replicating Spink's study of cognitive shifts during interactive IR. This work aims to generate promising insights into aspects of cognitive shifts during interactive IR and a new IR evaluation measure – information problem shift. Design/methodology/approach – The study participants (n=9) conducted an online search on an in-depth personal medical information problem. Data analysed included the pre- and post-search questionnaires completed by each study participant. Implications for web services and further research are discussed. Findings – Key findings replicated the results in Spink's study, including: all study participants reported some level of cognitive shift in their information problem, information seeking and personal knowledge due to their search interaction; and different study participants reported different levels of cognitive shift. Some study participants reported major cognitive shifts in various user-based variables such as information problem or information-seeking stage. Unlike Spink's study, no participant experienced a negative shift in their information problem stage or level of information problem understanding. Originality/value – This study builds on the previous study by Spink using a different dataset. The paper provides valuable insights for further research into cognitive shifts during interactive IR.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This position paper provides an overview of work conducted and an outlook of future directions within the field of Information Retrieval (IR) that aims to develop novel models, methods and frameworks inspired by Quantum Theory (QT).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays, everyone can effortlessly access a range of information on the World Wide Web (WWW). As information resources on the web continue to grow tremendously, it becomes progressively more difficult to meet high expectations of users and find relevant information. Although existing search engine technologies can find valuable information, however, they suffer from the problems of information overload and information mismatch. This paper presents a hybrid Web Information Retrieval approach allowing personalised search using ontology, user profile and collaborative filtering. This approach finds the context of user query with least user’s involvement, using ontology. Simultaneously, this approach uses time-based automatic user profile updating with user’s changing behaviour. Subsequently, this approach uses recommendations from similar users using collaborative filtering technique. The proposed method is evaluated with the FIRE 2010 dataset and manually generated dataset. Empirical analysis reveals that Precision, Recall and F-Score of most of the queries for many users are improved with proposed method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For more than a decade research in the field of context aware computing has aimed to find ways to exploit situational information that can be detected by mobile computing and sensor technologies. The goal is to provide people with new and improved applications, enhanced functionality and better use experience (Dey, 2001). Early applications focused on representing or computing on physical parameters, such as showing your location and the location of people or things around you. Such applications might show where the next bus is, which of your friends is in the vicinity and so on. With the advent of social networking software and microblogging sites such as Facebook and Twitter, recommender systems and so on context-aware computing is moving towards mining the social web in order to provide better representations and understanding of context, including social context. In this paper we begin by recapping different theoretical framings of context. We then discuss the problem of context- aware computing from a design perspective.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper demonstrates an experimental study that examines the accuracy of various information retrieval techniques for Web service discovery. The main goal of this research is to evaluate algorithms for semantic web service discovery. The evaluation is comprehensively benchmarked using more than 1,700 real-world WSDL documents from INEX 2010 Web Service Discovery Track dataset. For automatic search, we successfully use Latent Semantic Analysis and BM25 to perform Web service discovery. Moreover, we provide linking analysis which automatically links possible atomic Web services to meet the complex requirements of users. Our fusion engine recommends a final result to users. Our experiments show that linking analysis can improve the overall performance of Web service discovery. We also find that keyword-based search can quickly return results but it has limitation of understanding users’ goals.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As computers approach the physical limits of information storable in memory, new methods will be needed to further improve information storage and retrieval. We propose a quantum inspired vector based approach, which offers a contextually dependent mapping from the subsymbolic to the symbolic representations of information. If implemented computationally, this approach would provide exceptionally high density of information storage, without the traditionally required physical increase in storage capacity. The approach is inspired by the structure of human memory and incorporates elements of Gardenfors’ Conceptual Space approach and Humphreys et al.’s matrix model of memory.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background This paper presents a novel approach to searching electronic medical records that is based on concept matching rather than keyword matching. Aim The concept-based approach is intended to overcome specific challenges we identified in searching medical records. Method Queries and documents were transformed from their term-based originals into medical concepts as defined by the SNOMED-CT ontology. Results Evaluation on a real-world collection of medical records showed our concept-based approach outperformed a keyword baseline by 25% in Mean Average Precision. Conclusion The concept-based approach provides a framework for further development of inference based search systems for dealing with medical data.