Modern machine and deep learning methods require large datasets to achieve reliable and robust results. This requirement is often difficult to meet in the medical field, due to data sharing limitations imposed by privacy regulations or the presence of a small number of patients (e.g., rare diseases). To address this data scarcity and to improve the situation, novel generative models such as Generative Adversarial Networks (GANs) have been widely used to generate synthetic data that mimic real data by representing features that reflect health-related information without reference to real patients. In this paper, we consider several GAN models to generate synthetic data used for training binary (malignant/benign) classifiers, and compare their performances in terms of classification accuracy with cases where only real data are considered. We aim to investigate how synthetic data can improve classification accuracy, especially when a small amount of data is available. To this end, we have developed and implemented an evaluation framework where binary classifiers are trained on extended datasets containing both real and synthetic data. The results show improved accuracy for classifiers trained with generated data from more advanced GAN models, even when limited amounts of original data are available.
DOI: 10.3390/app12147075
Projects: Synthetica: generation and evaluation of synthetic data
Publication type: Journal article
Journal: Applied Sciences
Human Diseases: No Human Disease specified
Citation: Applied Sciences 12(14):7075
Date Published: 1st Jul 2022
URL: https://gitlab.com/ul-mds/data-science/synthetic-data/gan-collection
Registered Mode: by DOI
Views: 1322
Created: 11th Sep 2023 at 10:12
Last updated: 11th Sep 2023 at 10:20
This item has not yet been tagged.
None