We describe the creation of GRASCCO, a novel German-language corpus composed of some 60 clinical documents with more than.43,000 tokens. GRASCCO is a synthetic corpus resulting from a series of alienation steps to obfuscate privacy-sensitive information contained in real clinical documents, the true origin of all GRASCCO texts. Therefore, it is publicly shareable without any legal restrictions We also explore whether this corpus still represents common clinical language use by comparison with a real (non-shareable) clinical corpus we developed as a contribution to the Medical Informatics Initiative in Germany (MII) within the SMITH consortium. We find evidence that such a claim can indeed be made.
Projects: SMITH - Smart Medical Information Technology for Healthcare
Publication type: InCollection
Book Title: Studies in Health Technology and Informatics
Publisher: IOS Press
Human Diseases: No Human Disease specified
Citation: In Studies in Health Technology and Informatics of Studies in health technology and informatics, IOS Press
Date Published: 1st Aug 2022
Registered Mode: imported from a bibtex file
Views: 1389
Created: 24th Feb 2023 at 17:05
This item has not yet been tagged.
None