5 resultados para Graph databases

em DigitalCommons@The Texas Medical Center


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The biomedical literature is extensively catalogued and indexed in MEDLINE. MEDLINE indexing is done by trained human indexers, who identify the most important concepts in each article, and is expensive and inconsistent. Automating the indexing task is difficult: the National Library of Medicine produces the Medical Text Indexer (MTI), which suggests potential indexing terms to the indexers. MTI’s output is not good enough to work unattended. In my thesis, I propose a different way to approach the indexing task called MEDRank. MEDRank creates graphs representing the concepts in biomedical articles and their relationships within the text, and applies graph-based ranking algorithms to identify the most important concepts in each article. I evaluate the performance of several automated indexing solutions, including my own, by comparing their output to the indexing terms selected by the human indexers. MEDRank outperformed all other evaluated indexing solutions, including MTI, in general indexing performance and precision. MEDRank can be used to cluster documents, index any kind of biomedical text with standard vocabularies, or could become part of MTI itself.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The current state of health and biomedicine includes an enormity of heterogeneous data ‘silos’, collected for different purposes and represented differently, that are presently impossible to share or analyze in toto. The greatest challenge for large-scale and meaningful analyses of health-related data is to achieve a uniform data representation for data extracted from heterogeneous source representations. Based upon an analysis and categorization of heterogeneities, a process for achieving comparable data content by using a uniform terminological representation is developed. This process addresses the types of representational heterogeneities that commonly arise in healthcare data integration problems. Specifically, this process uses a reference terminology, and associated "maps" to transform heterogeneous data to a standard representation for comparability and secondary use. The capture of quality and precision of the “maps” between local terms and reference terminology concepts enhances the meaning of the aggregated data, empowering end users with better-informed queries for subsequent analyses. A data integration case study in the domain of pediatric asthma illustrates the development and use of a reference terminology for creating comparable data from heterogeneous source representations. The contribution of this research is a generalized process for the integration of data from heterogeneous source representations, and this process can be applied and extended to other problems where heterogeneous data needs to be merged.