GGPONC 2.0 — The German Clinical Guideline Corpus for Oncology: Curation Workflow, Annotation Policy, Baseline NER Taggers
Despite remarkable advances in the development of language resources over the recent years, there is still a shortage of annotated, publicly available corpora covering (German) medical language. With the initial release of the German Guideline Program in Oncology NLP Corpus (GGPONC), we have demonstrated how such corpora can be built upon clinical guidelines, a widely available resource in many natural languages with a reasonable coverage of medical terminology. In this work, we describe a major new release for GGPONC. The corpus has been substantially extended in size and re-annotated with a new annotation scheme based on SNOMED CT top level hierarchies, reaching high inter-annotator agreement (γ=.94). Moreover, we annotated elliptical coordinated noun phrases and their resolutions, a common language phenomenon in (not only German) scientific documents. We also trained BERT-based named entity recognition models on this new data set, which achieve high performance on short, coarse-grained entity spans (F1=.89), while the rate of boundary errors increases for long entity spans. GGPONC is freely available through a data use agreement. The trained named entity recognition models, as well as the detailed annotation guide, are also made publicly available.
Projects: SMITH - Smart Medical Information Technology for Healthcare
Publication type: Journal article
Book Title: Proceedings of the Thirteenth Language Resources and Evaluation Conference
Editors: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Publisher: European Language Resources Association
Human Diseases: No Human Disease specified
Citation: [GGPONC 2.0 - The German Clinical Guideline Corpus for Oncology: Curation Workflow, Annotation Policy, Baseline NER Taggers](https://aclanthology.org/2022.lrec-1.389) (Borchert et al., LREC 2022)
Date Published: 19th Jun 2022
URL: http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.389.pdf
Registered Mode: manually
Views: 1649
Created: 26th Jan 2023 at 15:43
Last updated: 26th Jan 2023 at 15:43
This item has not yet been tagged.
None