copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Mimicking clinical trials with synthetic acute myeloid leukemia patients using generative artificial intelligence

J. Eckardt, W. Hahn, C. Röllig, S. Stasik, U. Platzbecker, C. Müller-Tidow, H. Serve, C. Baldus, C. Schliemann, K. Schäfer-Eckart, M. Hanoun, M. Kaufmann, A. Burchert, C. Thiede, J. Schetelig, M. Sedlmayr, M. Bornhäuser, M. Wolfien, and J. Middeke. npj Digital Medicine, 7 (1): 76 (March 2024)
DOI: 10.1038/s41746-024-01076-x

Abstract

Clinical research relies on high-quality patient data, however, obtaining big data sets is costly and access to existing data is often hindered by privacy and regulatory concerns. Synthetic data generation holds the promise of effectively bypassing these boundaries allowing for simplified data accessibility and the prospect of synthetic control cohorts. We employed two different methodologies of generative artificial intelligence – CTAB-GAN+ and normalizing flows (NFlow) – to synthesize patient data derived from 1606 patients with acute myeloid leukemia, a heterogeneous hematological malignancy, that were treated within four multicenter clinical trials. Both generative models accurately captured distributions of demographic, laboratory, molecular and cytogenetic variables, as well as patient outcomes yielding high performance scores regarding fidelity and usability of both synthetic cohorts (n = 1606 each). Survival analysis demonstrated close resemblance of survival curves between original and synthetic cohorts. Inter-variable relationships were preserved in univariable outcome analysis enabling explorative analysis in our synthetic data. Additionally, training sample privacy is safeguarded mitigating possible patient re-identification, which we quantified using Hamming distances. We provide not only a proof-of-concept for synthetic data generation in multimodal clinical data for rare diseases, but also full public access to synthetic data sets to foster further research.

Cite this publication

%0 Journal Article %1 eckardt_mimicking_2024 %A Eckardt, Jan-Niklas %A Hahn, Waldemar %A Röllig, Christoph %A Stasik, Sebastian %A Platzbecker, Uwe %A Müller-Tidow, Carsten %A Serve, Hubert %A Baldus, Claudia D. %A Schliemann, Christoph %A Schäfer-Eckart, Kerstin %A Hanoun, Maher %A Kaufmann, Martin %A Burchert, Andreas %A Thiede, Christian %A Schetelig, Johannes %A Sedlmayr, Martin %A Bornhäuser, Martin %A Wolfien, Markus %A Middeke, Jan Moritz %D 2024 %J npj Digital Medicine %K topic_lifescience %N 1 %P 76 %R 10.1038/s41746-024-01076-x %T Mimicking clinical trials with synthetic acute myeloid leukemia patients using generative artificial intelligence %U https://doi.org/10.1038/s41746-024-01076-x %V 7 %X Clinical research relies on high-quality patient data, however, obtaining big data sets is costly and access to existing data is often hindered by privacy and regulatory concerns. Synthetic data generation holds the promise of effectively bypassing these boundaries allowing for simplified data accessibility and the prospect of synthetic control cohorts. We employed two different methodologies of generative artificial intelligence – CTAB-GAN+ and normalizing flows (NFlow) – to synthesize patient data derived from 1606 patients with acute myeloid leukemia, a heterogeneous hematological malignancy, that were treated within four multicenter clinical trials. Both generative models accurately captured distributions of demographic, laboratory, molecular and cytogenetic variables, as well as patient outcomes yielding high performance scores regarding fidelity and usability of both synthetic cohorts (n = 1606 each). Survival analysis demonstrated close resemblance of survival curves between original and synthetic cohorts. Inter-variable relationships were preserved in univariable outcome analysis enabling explorative analysis in our synthetic data. Additionally, training sample privacy is safeguarded mitigating possible patient re-identification, which we quantified using Hamming distances. We provide not only a proof-of-concept for synthetic data generation in multimodal clinical data for rare diseases, but also full public access to synthetic data sets to foster further research.

@article{eckardt_mimicking_2024, abstract = {Clinical research relies on high-quality patient data, however, obtaining big data sets is costly and access to existing data is often hindered by privacy and regulatory concerns. Synthetic data generation holds the promise of effectively bypassing these boundaries allowing for simplified data accessibility and the prospect of synthetic control cohorts. We employed two different methodologies of generative artificial intelligence – CTAB-GAN+ and normalizing flows (NFlow) – to synthesize patient data derived from 1606 patients with acute myeloid leukemia, a heterogeneous hematological malignancy, that were treated within four multicenter clinical trials. Both generative models accurately captured distributions of demographic, laboratory, molecular and cytogenetic variables, as well as patient outcomes yielding high performance scores regarding fidelity and usability of both synthetic cohorts (n = 1606 each). Survival analysis demonstrated close resemblance of survival curves between original and synthetic cohorts. Inter-variable relationships were preserved in univariable outcome analysis enabling explorative analysis in our synthetic data. Additionally, training sample privacy is safeguarded mitigating possible patient re-identification, which we quantified using Hamming distances. We provide not only a proof-of-concept for synthetic data generation in multimodal clinical data for rare diseases, but also full public access to synthetic data sets to foster further research.}, added-at = {2024-09-10T10:41:24.000+0200}, author = {Eckardt, Jan-Niklas and Hahn, Waldemar and Röllig, Christoph and Stasik, Sebastian and Platzbecker, Uwe and Müller-Tidow, Carsten and Serve, Hubert and Baldus, Claudia D. and Schliemann, Christoph and Schäfer-Eckart, Kerstin and Hanoun, Maher and Kaufmann, Martin and Burchert, Andreas and Thiede, Christian and Schetelig, Johannes and Sedlmayr, Martin and Bornhäuser, Martin and Wolfien, Markus and Middeke, Jan Moritz}, biburl = {https://puma.scadsai.uni-leipzig.de/bibtex/20704545795a8ec914505a27e16461207/scadsfct}, doi = {10.1038/s41746-024-01076-x}, interhash = {79c5013e8419bd4ae93e627486aa503f}, intrahash = {0704545795a8ec914505a27e16461207}, issn = {2398-6352}, journal = {npj Digital Medicine}, keywords = {topic_lifescience}, month = mar, number = 1, pages = 76, timestamp = {2024-09-10T12:00:15.000+0200}, title = {Mimicking clinical trials with synthetic acute myeloid leukemia patients using generative artificial intelligence}, url = {https://doi.org/10.1038/s41746-024-01076-x}, volume = 7, year = 2024 }

PUMA

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Mimicking clinical trials with synthetic acute myeloid leukemia patients using generative artificial intelligence

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

PUMA

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Mimicking clinical trials with synthetic acute myeloid leukemia patients using generative artificial intelligence

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Mimicking clinical trials with synthetic acute myeloid leukemia patients using generative artificial intelligence

Comments and Reviews
(0)