Comparative evaluation of automated information extraction from pathology reports in three German cancer registries

Abstract:

Feeding cancer registries with data extracted from textual reports, while maintaining a high level of data quality, has always been a labour-intensive task, due to the heterogeneity of the sources. The support of this task by IT solutions is expected to accelerate and optimise this process. To this end, the commercial text mining system Averbis Health Discovery was tailored to extract information from free text at the cancer registry of the federal state of Baden-Württemberg. The following entity types were extracted from German-language pathology reports: tumour localisation and morphology, pTNM, grading, (sentinel) nodes examined and affected, laterality and R-class. According to the entity type, several machine learning approaches as well as rules were used for the tumour types breast, prostate, colorectal and skin. Whereas for the pilot site, F values ranged between 0.800 and 0.996, values dropped when applying the extraction pipeline to two new sites (cancer registries Rhineland-Palatinate and Lower Saxony), for morphology from 0.950 to 0.657 and 0.933, and for localisation (topography) from 0.902 to 0.675 and 0.768. There was much less difference with R-class and lymph node counts. A thorough error analysis revealed numerous issues that explain these differences, such as different workflows between the sites, disagreements between textual and coded content as well as different handlings of missing values.

Projects: SMITH - Smart Medical Information Technology for Healthcare

Publication type: Journal article

Publisher: German Medical Science GMS Publishing House

Human Diseases: No Human Disease specified

Citation:

Date Published: 2021

Registered Mode: imported from a bibtex file

Authors: Stefan Schulz, Sonja Fix, Peter Klügl, Tamira Bachmayer, Tobias Hartz, Martin Richter, Nils Herm-Stapelberg, Philipp Daumke

Help
help Submitter
Activity

Views: 1089

Created: 24th Feb 2023 at 17:05

help Tags

This item has not yet been tagged.

help Attributions

None

Related items

Powered by
(v.1.13.0-master)
Copyright © 2008 - 2021 The University of Manchester and HITS gGmbH
Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig

By continuing to use this site you agree to the use of cookies