Using incomplete citation data for MEDLINE results ranking.


Autoria(s): Herskovic, Jorge R; Bernstam, Elmer V
Data(s)

01/01/2005

Resumo

Information overload is a significant problem for modern medicine. Searching MEDLINE for common topics often retrieves more relevant documents than users can review. Therefore, we must identify documents that are not only relevant, but also important. Our system ranks articles using citation counts and the PageRank algorithm, incorporating data from the Science Citation Index. However, citation data is usually incomplete. Therefore, we explore the relationship between the quantity of citation information available to the system and the quality of the result ranking. Specifically, we test the ability of citation count and PageRank to identify "important articles" as defined by experts from large result sets with decreasing citation information. We found that PageRank performs better than simple citation counts, but both algorithms are surprisingly robust to information loss. We conclude that even an incomplete citation database is likely to be effective for importance ranking.

Identificador

http://digitalcommons.library.tmc.edu/uthshis_docs/35

http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1560575&tool=pmcentrez

Publicador

DigitalCommons@The Texas Medical Center

Fonte

UT SBMI Journal Articles

Palavras-Chave #Algorithms #Bibliometrics #Information Storage and Retrieval #MEDLINE #PubMed #Medicine and Health Sciences
Tipo

text