954 resultados para Concept-based Retrieval


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Entity-oriented retrieval aims to return a list of relevant entities rather than documents to provide exact answers for user queries. The nature of entity-oriented retrieval requires identifying the semantic intent of user queries, i.e., understanding the semantic role of query terms and determining the semantic categories which indicate the class of target entities. Existing methods are not able to exploit the semantic intent by capturing the semantic relationship between terms in a query and in a document that contains entity related information. To improve the understanding of the semantic intent of user queries, we propose concept-based retrieval method that not only automatically identifies the semantic intent of user queries, i.e., Intent Type and Intent Modifier but introduces concepts represented by Wikipedia articles to user queries. We evaluate our proposed method on entity profile documents annotated by concepts from Wikipedia category and list structure. Empirical analysis reveals that the proposed method outperforms several state-of-the-art approaches.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For more than a decade research in the field of context aware computing has aimed to find ways to exploit situational information that can be detected by mobile computing and sensor technologies. The goal is to provide people with new and improved applications, enhanced functionality and better use experience (Dey, 2001). Early applications focused on representing or computing on physical parameters, such as showing your location and the location of people or things around you. Such applications might show where the next bus is, which of your friends is in the vicinity and so on. With the advent of social networking software and microblogging sites such as Facebook and Twitter, recommender systems and so on context-aware computing is moving towards mining the social web in order to provide better representations and understanding of context, including social context. In this paper we begin by recapping different theoretical framings of context. We then discuss the problem of context- aware computing from a design perspective.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background This paper presents a novel approach to searching electronic medical records that is based on concept matching rather than keyword matching. Aim The concept-based approach is intended to overcome specific challenges we identified in searching medical records. Method Queries and documents were transformed from their term-based originals into medical concepts as defined by the SNOMED-CT ontology. Results Evaluation on a real-world collection of medical records showed our concept-based approach outperformed a keyword baseline by 25% in Mean Average Precision. Conclusion The concept-based approach provides a framework for further development of inference based search systems for dealing with medical data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Australian e-Health Research Centre and Queensland University of Technology recently participated in the TREC 2011 Medical Records Track. This paper reports on our methods, results and experience using a concept-based information retrieval approach. Our concept-based approach is intended to overcome specific challenges we identify in searching medical records. Queries and documents are transformed from their term-based originals into medical concepts as de ned by the SNOMED-CT ontology. Results show our concept-based approach performed above the median in all three performance metrics: bref (+12%), R-prec (+18%) and Prec@10 (+6%).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Search technologies are critical to enable clinical sta to rapidly and e ectively access patient information contained in free-text medical records. Medical search is challenging as terms in the query are often general but those in rel- evant documents are very speci c, leading to granularity mismatch. In this paper we propose to tackle granularity mismatch by exploiting subsumption relationships de ned in formal medical domain knowledge resources. In symbolic reasoning, a subsumption (or `is-a') relationship is a parent-child rela- tionship where one concept is a subset of another concept. Subsumed concepts are included in the retrieval function. In addition, we investigate a number of initial methods for combining weights of query concepts and those of subsumed concepts. Subsumption relationships were found to provide strong indication of relevant information; their inclusion in retrieval functions yields performance improvements. This result motivates the development of formal models of rela- tionships between medical concepts for retrieval purposes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Domain specific information retrieval has become in demand. Not only domain experts, but also average non-expert users are interested in searching domain specific (e.g., medical and health) information from online resources. However, a typical problem to average users is that the search results are always a mixture of documents with different levels of readability. Non-expert users may want to see documents with higher readability on the top of the list. Consequently the search results need to be re-ranked in a descending order of readability. It is often not practical for domain experts to manually label the readability of documents for large databases. Computational models of readability needs to be investigated. However, traditional readability formulas are designed for general purpose text and insufficient to deal with technical materials for domain specific information retrieval. More advanced algorithms such as textual coherence model are computationally expensive for re-ranking a large number of retrieved documents. In this paper, we propose an effective and computationally tractable concept-based model of text readability. In addition to textual genres of a document, our model also takes into account domain specific knowledge, i.e., how the domain-specific concepts contained in the document affect the document’s readability. Three major readability formulas are proposed and applied to health and medical information retrieval. Experimental results show that our proposed readability formulas lead to remarkable improvements in terms of correlation with users’ readability ratings over four traditional readability measures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sharing of information with those in need of it has always been an idealistic goal of networked environments. With the proliferation of computer networks, information is so widely distributed among systems, that it is imperative to have well-organized schemes for retrieval and also discovery. This thesis attempts to investigate the problems associated with such schemes and suggests a software architecture, which is aimed towards achieving a meaningful discovery. Usage of information elements as a modelling base for efficient information discovery in distributed systems is demonstrated with the aid of a novel conceptual entity called infotron.The investigations are focused on distributed systems and their associated problems. The study was directed towards identifying suitable software architecture and incorporating the same in an environment where information growth is phenomenal and a proper mechanism for carrying out information discovery becomes feasible. An empirical study undertaken with the aid of an election database of constituencies distributed geographically, provided the insights required. This is manifested in the Election Counting and Reporting Software (ECRS) System. ECRS system is a software system, which is essentially distributed in nature designed to prepare reports to district administrators about the election counting process and to generate other miscellaneous statutory reports.Most of the distributed systems of the nature of ECRS normally will possess a "fragile architecture" which would make them amenable to collapse, with the occurrence of minor faults. This is resolved with the help of the penta-tier architecture proposed, that contained five different technologies at different tiers of the architecture.The results of experiment conducted and its analysis show that such an architecture would help to maintain different components of the software intact in an impermeable manner from any internal or external faults. The architecture thus evolved needed a mechanism to support information processing and discovery. This necessitated the introduction of the noveI concept of infotrons. Further, when a computing machine has to perform any meaningful extraction of information, it is guided by what is termed an infotron dictionary.The other empirical study was to find out which of the two prominent markup languages namely HTML and XML, is best suited for the incorporation of infotrons. A comparative study of 200 documents in HTML and XML was undertaken. The result was in favor ofXML.The concept of infotron and that of infotron dictionary, which were developed, was applied to implement an Information Discovery System (IDS). IDS is essentially, a system, that starts with the infotron(s) supplied as clue(s), and results in brewing the information required to satisfy the need of the information discoverer by utilizing the documents available at its disposal (as information space). The various components of the system and their interaction follows the penta-tier architectural model and therefore can be considered fault-tolerant. IDS is generic in nature and therefore the characteristics and the specifications were drawn up accordingly. Many subsystems interacted with multiple infotron dictionaries that were maintained in the system.In order to demonstrate the working of the IDS and to discover the information without modification of a typical Library Information System (LIS), an Information Discovery in Library Information System (lDLIS) application was developed. IDLIS is essentially a wrapper for the LIS, which maintains all the databases of the library. The purpose was to demonstrate that the functionality of a legacy system could be enhanced with the augmentation of IDS leading to information discovery service. IDLIS demonstrates IDS in action. IDLIS proves that any legacy system could be augmented with IDS effectively to provide the additional functionality of information discovery service.Possible applications of IDS and scope for further research in the field are covered.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cultural objects are increasingly generated and stored in digital form, yet effective methods for their indexing and retrieval still remain an important area of research. The main problem arises from the disconnection between the content-based indexing approach used by computer scientists and the description-based approach used by information scientists. There is also a lack of representational schemes that allow the alignment of the semantics and context with keywords and low-level features that can be automatically extracted from the content of these cultural objects. This paper presents an integrated approach to address these problems, taking advantage of both computer science and information science approaches. We firstly discuss the requirements from a number of perspectives: users, content providers, content managers and technical systems. We then present an overview of our system architecture and describe various techniques which underlie the major components of the system. These include: automatic object category detection; user-driven tagging; metadata transform and augmentation, and an expression language for digital cultural objects. In addition, we discuss our experience on testing and evaluating some existing collections, analyse the difficulties encountered and propose ways to address these problems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Process-Aware Information Systems (PAISs) support executions of operational processes that involve people, resources, and software applications on the basis of process models. Process models describe vast, often infinite, amounts of process instances, i.e., workflows supported by the systems. With the increasing adoption of PAISs, large process model repositories emerged in companies and public organizations. These repositories constitute significant information resources. Accurate and efficient retrieval of process models and/or process instances from such repositories is interesting for multiple reasons, e.g., searching for similar models/instances, filtering, reuse, standardization, process compliance checking, verification of formal properties, etc. This paper proposes a technique for indexing process models that relies on their alternative representations, called untanglings. We show the use of untanglings for retrieval of process models based on process instances that they specify via a solution to the total executability problem. Experiments with industrial process models testify that the proposed retrieval approach is up to three orders of magnitude faster than the state of the art.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider the following problem: members in a dynamic group retrieve their encrypted data from an untrusted server based on keywords and without any loss of data confidentiality and member’s privacy. In this paper, we investigate common secure indices for conjunctive keyword-based retrieval over encrypted data, and construct an efficient scheme from Wang et al. dynamic accumulator, Nyberg combinatorial accumulator and Kiayias et al. public-key encryption system. The proposed scheme is trapdoorless and keyword-field free. The security is proved under the random oracle, decisional composite residuosity and extended strong RSA assumptions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider the following problem: users in a dynamic group store their encrypted documents on an untrusted server, and wish to retrieve documents containing some keywords without any loss of data confidentiality. In this paper, we investigate common secure indices which can make multi-users in a dynamic group to obtain securely the encrypted documents shared among the group members without re-encrypting them. We give a formal definition of common secure index for conjunctive keyword-based retrieval over encrypted data (CSI-CKR), define the security requirement for CSI-CKR, and construct a CSI-CKR based on dynamic accumulators, Paillier’s cryptosystem and blind signatures. The security of proposed scheme is proved under strong RSA and co-DDH assumptions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A dynamic accumulator is an algorithm, which merges a large set of elements into a constant-size value such that for an element accumulated, there is a witness confirming that the element was included into the value, with a property that accumulated elements can be dynamically added and deleted into/from the original set. Recently Wang et al. presented a dynamic accumulator for batch updates at ICICS 2007. However, their construction suffers from two serious problems. We analyze them and propose a way to repair their scheme. We use the accumulator to construct a new scheme for common secure indices with conjunctive keyword-based retrieval.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This project is a step forward in the study of text mining where enhanced text representation with semantic information plays a significant role. It develops effective methods of entity-oriented retrieval, semantic relation identification and text clustering utilizing semantically annotated data. These methods are based on enriched text representation generated by introducing semantic information extracted from Wikipedia into the input text data. The proposed methods are evaluated against several start-of-art benchmarking methods on real-life data-sets. In particular, this thesis improves the performance of entity-oriented retrieval, identifies different lexical forms for an entity relation and handles clustering documents with multiple feature spaces.