Use of Medical Subject Headings (MeSH) in Portuguese for categorizing web-based healthcare content


Autoria(s): MANCINI, Felipe; SOUSA, Fernando Sequeira; TEIXEIRA, Fabio Oliveira; FALCAO, Alex Esteves Jacoud; HUMMEL, Anderson Diniz; COSTA, Thiago Martini da; CALADO, Pavel Pereira; ARAUJO, Luciano Vieira de; PISA, Ivan Torres
Contribuinte(s)

UNIVERSIDADE DE SÃO PAULO

Data(s)

18/10/2012

18/10/2012

2011

Resumo

Introduction: Internet users are increasingly using the worldwide web to search for information relating to their health. This situation makes it necessary to create specialized tools capable of supporting users in their searches. Objective: To apply and compare strategies that were developed to investigate the use of the Portuguese version of Medical Subject Headings (MeSH) for constructing an automated classifier for Brazilian Portuguese-language web-based content within or outside of the field of healthcare, focusing on the lay public. Methods: 3658 Brazilian web pages were used to train the classifier and 606 Brazilian web pages were used to validate it. The strategies proposed were constructed using content-based vector methods for text classification, such that Naive Bayes was used for the task of classifying vector patterns with characteristics obtained through the proposed strategies. Results: A strategy named InDeCS was developed specifically to adapt MeSH for the problem that was put forward. This approach achieved better accuracy for this pattern classification task (0.94 sensitivity, specificity and area under the ROC curve). Conclusions: Because of the significant results achieved by InDeCS, this tool has been successfully applied to the Brazilian healthcare search portal known as Busca Saude. Furthermore, it could be shown that MeSH presents important results when used for the task of classifying web-based content focusing on the lay public. It was also possible to show from this study that MeSH was able to map out mutable non-deterministic characteristics of the web. (c) 2010 Elsevier Inc. All rights reserved.

Identificador

JOURNAL OF BIOMEDICAL INFORMATICS, v.44, n.2, Special Issue, p.299-309, 2011

1532-0464

http://producao.usp.br/handle/BDPI/17132

10.1016/j.jbi.2010.12.002

http://dx.doi.org/10.1016/j.jbi.2010.12.002

Idioma(s)

eng

Publicador

ACADEMIC PRESS INC ELSEVIER SCIENCE

Relação

Journal of Biomedical Informatics

Direitos

restrictedAccess

Copyright ACADEMIC PRESS INC ELSEVIER SCIENCE

Palavras-Chave #Information storage and retrieval #Medical Subject Headings #Consumer health information #Indexing #Internet #TEXT CATEGORIZATION #CLASSIFICATION #INFORMATION #INTERNET #DOCUMENTS #SYSTEM #Computer Science, Interdisciplinary Applications #Medical Informatics
Tipo

article

original article

publishedVersion