4 resultados para Keyword search
em Digital Commons at Florida International University
Resumo:
Graph-structured databases are widely prevalent, and the problem of effective search and retrieval from such graphs has been receiving much attention recently. For example, the Web can be naturally viewed as a graph. Likewise, a relational database can be viewed as a graph where tuples are modeled as vertices connected via foreign-key relationships. Keyword search querying has emerged as one of the most effective paradigms for information discovery, especially over HTML documents in the World Wide Web. One of the key advantages of keyword search querying is its simplicity—users do not have to learn a complex query language, and can issue queries without any prior knowledge about the structure of the underlying data. The purpose of this dissertation was to develop techniques for user-friendly, high quality and efficient searching of graph structured databases. Several ranked search methods on data graphs have been studied in the recent years. Given a top-k keyword search query on a graph and some ranking criteria, a keyword proximity search finds the top-k answers where each answer is a substructure of the graph containing all query keywords, which illustrates the relationship between the keyword present in the graph. We applied keyword proximity search on the web and the page graph of web documents to find top-k answers that satisfy user’s information need and increase user satisfaction. Another effective ranking mechanism applied on data graphs is the authority flow based ranking mechanism. Given a top- k keyword search query on a graph, an authority-flow based search finds the top-k answers where each answer is a node in the graph ranked according to its relevance and importance to the query. We developed techniques that improved the authority flow based search on data graphs by creating a framework to explain and reformulate them taking in to consideration user preferences and feedback. We also applied the proposed graph search techniques for Information Discovery over biological databases. Our algorithms were experimentally evaluated for performance and quality. The quality of our method was compared to current approaches by using user surveys.
Resumo:
Background As the use of electronic health records (EHRs) becomes more widespread, so does the need to search and provide effective information discovery within them. Querying by keyword has emerged as one of the most effective paradigms for searching. Most work in this area is based on traditional Information Retrieval (IR) techniques, where each document is compared individually against the query. We compare the effectiveness of two fundamentally different techniques for keyword search of EHRs. Methods We built two ranking systems. The traditional BM25 system exploits the EHRs' content without regard to association among entities within. The Clinical ObjectRank (CO) system exploits the entities' associations in EHRs using an authority-flow algorithm to discover the most relevant entities. BM25 and CO were deployed on an EHR dataset of the cardiovascular division of Miami Children's Hospital. Using sequences of keywords as queries, sensitivity and specificity were measured by two physicians for a set of 11 queries related to congenital cardiac disease. Results Our pilot evaluation showed that CO outperforms BM25 in terms of sensitivity (65% vs. 38%) by 71% on average, while maintaining the specificity (64% vs. 61%). The evaluation was done by two physicians. Conclusions Authority-flow techniques can greatly improve the detection of relevant information in EHRs and hence deserve further study.
Resumo:
The Three-Layer distributed mediation architecture, designed by Secure System Architecture laboratory, employed a layered framework of presence, integration, and homogenization mediators. The architecture does not have any central component that may affect the system reliability. A distributed search technique was adapted in the system to increase its reliability. An Enhanced Chord-like algorithm (E-Chord) was designed and deployed in the integration layer. The E-Chord is a skip-list algorithm based on Distributed Hash Table (DHT) which is a distributed but structured architecture. DHT is distributed in the sense that no central unit is required to maintain indexes, and it is structured in the sense that indexes are distributed over the nodes in a systematic manner. Each node maintains three kind of routing information: a frequency list, a successor/predecessor list, and a finger table. None of the nodes in the system maintains all indexes, and each node knows about some other nodes in the system. These nodes, also called composer mediators, were connected in a P2P fashion. ^ A special composer mediator called a global mediator initiates the keyword-based matching decomposition of the request using the E-Chord. It generates an Integrated Data Structure Graph (IDSG) on the fly, creates association and dependency relations between nodes in the IDSG, and then generates a Global IDSG (GIDSG). The GIDSG graph is a plan which guides the global mediator how to integrate data. It is also used to stream data from the mediators in the homogenization layer which connected to the data sources. The connectors start sending the data to the global mediator just after the global mediator creates the GIDSG and just before the global mediator sends the answer to the presence mediator. Using the E-Chord and GIDSG made the mediation system more scalable than using a central global schema repository since all the composers in the integration layer are capable of handling and routing requests. Also, when a composer fails, it would only minimally affect the entire mediation system. ^