Automatic de-identification of electronic health records : an Australian perspective


Autoria(s): Zuccon, G.; Strachan, M.; Nguyen, A.; Bergheim, A.; Grayson, N.
Data(s)

2013

Resumo

We present an approach to automatically de-identify health records. In our approach, personal health information is identified using a Conditional Random Fields machine learning classifier, a large set of linguistic and lexical features, and pattern matching techniques. Identified personal information is then removed from the reports. The de-identification of personal health information is fundamental for the sharing and secondary use of electronic health records, for example for data mining and disease monitoring. The effectiveness of our approach is first evaluated on the 2007 i2b2 Shared Task dataset, a widely adopted dataset for evaluating de-identification techniques. Subsequently, we investigate the robustness of the approach to limited training data; we study its effectiveness on different type and quality of data by evaluating the approach on scanned pathology reports from an Australian institution. This data contains optical character recognition errors, as well as linguistic conventions that differ from those contained in the i2b2 dataset, for example different date formats. The findings suggest that our approach compares to the best approach from the 2007 i2b2 Shared Task; in addition, the approach is found to be robust to variations of training size, data type and quality in presence of sufficient training data.

Formato

application/pdf

Identificador

http://eprints.qut.edu.au/69301/

Relação

http://eprints.qut.edu.au/69301/1/zuccon2013b.pdf

http://nicta.com.au/__data/assets/pdf_file/0011/37658/louhi2013_submission_9.pdf

Zuccon, G., Strachan, M., Nguyen, A., Bergheim, A., & Grayson, N. (2013) Automatic de-identification of electronic health records : an Australian perspective. In NICTA - Louhi 2013, 11-12 February 2013, Sydney, NSW.

Direitos

Copyright 2013 [please consult the author]

Fonte

Institute for Future Environments; School of Information Systems; Science & Engineering Faculty

Tipo

Conference Paper