Annotating German Clinical Documents for De-Identification.

Abstract:

We devised annotation guidelines for the de-identification of German clinical documents and assembled a corpus of 1,106 discharge summaries and transfer letters with 44K annotated protected health information (PHI) items. After three iteration rounds, our annotation team finally reached an inter-annotator agreement of 0.96 on the instance level and 0.97 on the token level of annotation (averaged pair-wise F1 score). To establish a baseline for automatic de-identification on our corpus, we trained a recurrent neural network (RNN) and achieved F1 scores greater than 0.9 on most major PHI categories.

PubMed ID: 31437914

Projects: SMITH - Smart Medical Information Technology for Healthcare

Publication type: InProceedings

Journal: Stud Health Technol Inform

Human Diseases: No Human Disease specified

Citation: Stud Health Technol Inform. 2019 Aug 21;264:203-207. doi: 10.3233/SHTI190212.

Date Published: 21st Aug 2019

Registered Mode: by PubMed ID

Authors: T. Kolditz, C. Lohr, J. Hellrich, L. Modersohn, B. Betz, M. Kiehntopf, U. Hahn

Help
help Submitter
Activity

Views: 1711

Created: 7th Sep 2020 at 13:24

Last updated: 30th Jan 2023 at 12:00

help Tags

This item has not yet been tagged.

help Attributions

None

Related items

Powered by
(v.1.13.0-master)
Copyright © 2008 - 2021 The University of Manchester and HITS gGmbH
Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig

By continuing to use this site you agree to the use of cookies