Adaptive sample size determination for the development of clinical prediction models

Abstract Background We suggest an adaptive sample size calculation method for developing clinical prediction models, in which model performance is monitored sequentially as new data comes in. Methods We illustrate the approach using data for the diagnosis of ovarian cancer (n = 5914, 33% event fract...

Full description

Bibliographic Details
Main Authors:	Evangelia Christodoulou, Maarten van Smeden, Michael Edlinger, Dirk Timmerman, Maria Wanitschek, Ewout W. Steyerberg, Ben Van Calster
Format:	Article
Language:	English
Published:	BMC 2021-03-01
Series:	Diagnostic and Prognostic Research
Subjects:	Adaptive design Clinical prediction models Events per variable Model development Model validation Sample size
Online Access:	https://doi.org/10.1186/s41512-021-00096-5

id	doaj-3f5342bc0a0846e8ada6cbf9038c95b1
record_format	Article
spelling	doaj-3f5342bc0a0846e8ada6cbf9038c95b12021-03-28T11:38:41ZengBMCDiagnostic and Prognostic Research2397-75232021-03-015111210.1186/s41512-021-00096-5Adaptive sample size determination for the development of clinical prediction modelsEvangelia Christodoulou0Maarten van Smeden1Michael Edlinger2Dirk Timmerman3Maria Wanitschek4Ewout W. Steyerberg5Ben Van Calster6Department of Development & Regeneration, KU LeuvenJulius Center for Health Sciences and Primary Care, University Medical Center UtrechtDepartment of Development & Regeneration, KU LeuvenDepartment of Development & Regeneration, KU LeuvenUniversity Clinic of Internal Medicine III - Cardiology and Angiology, Tirol KlinikenDepartment of Biomedical Data Sciences, Leiden University Medical CenterDepartment of Development & Regeneration, KU LeuvenAbstract Background We suggest an adaptive sample size calculation method for developing clinical prediction models, in which model performance is monitored sequentially as new data comes in. Methods We illustrate the approach using data for the diagnosis of ovarian cancer (n = 5914, 33% event fraction) and obstructive coronary artery disease (CAD; n = 4888, 44% event fraction). We used logistic regression to develop a prediction model consisting only of a priori selected predictors and assumed linear relations for continuous predictors. We mimicked prospective patient recruitment by developing the model on 100 randomly selected patients, and we used bootstrapping to internally validate the model. We sequentially added 50 random new patients until we reached a sample size of 3000 and re-estimated model performance at each step. We examined the required sample size for satisfying the following stopping rule: obtaining a calibration slope ≥ 0.9 and optimism in the c-statistic (or AUC) < = 0.02 at two consecutive sample sizes. This procedure was repeated 500 times. We also investigated the impact of alternative modeling strategies: modeling nonlinear relations for continuous predictors and correcting for bias on the model estimates (Firth’s correction). Results Better discrimination was achieved in the ovarian cancer data (c-statistic 0.9 with 7 predictors) than in the CAD data (c-statistic 0.7 with 11 predictors). Adequate calibration and limited optimism in discrimination was achieved after a median of 450 patients (interquartile range 450–500) for the ovarian cancer data (22 events per parameter (EPP), 20–24) and 850 patients (750–900) for the CAD data (33 EPP, 30–35). A stricter criterion, requiring AUC optimism < = 0.01, was met with a median of 500 (23 EPP) and 1500 (59 EPP) patients, respectively. These sample sizes were much higher than the well-known 10 EPP rule of thumb and slightly higher than a recently published fixed sample size calculation method by Riley et al. Higher sample sizes were required when nonlinear relationships were modeled, and lower sample sizes when Firth’s correction was used. Conclusions Adaptive sample size determination can be a useful supplement to fixed a priori sample size calculations, because it allows to tailor the sample size to the specific prediction modeling context in a dynamic fashion.https://doi.org/10.1186/s41512-021-00096-5Adaptive designClinical prediction modelsEvents per variableModel developmentModel validationSample size
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Evangelia Christodoulou Maarten van Smeden Michael Edlinger Dirk Timmerman Maria Wanitschek Ewout W. Steyerberg Ben Van Calster
spellingShingle	Evangelia Christodoulou Maarten van Smeden Michael Edlinger Dirk Timmerman Maria Wanitschek Ewout W. Steyerberg Ben Van Calster Adaptive sample size determination for the development of clinical prediction models Diagnostic and Prognostic Research Adaptive design Clinical prediction models Events per variable Model development Model validation Sample size
author_facet	Evangelia Christodoulou Maarten van Smeden Michael Edlinger Dirk Timmerman Maria Wanitschek Ewout W. Steyerberg Ben Van Calster
author_sort	Evangelia Christodoulou
title	Adaptive sample size determination for the development of clinical prediction models
title_short	Adaptive sample size determination for the development of clinical prediction models
title_full	Adaptive sample size determination for the development of clinical prediction models
title_fullStr	Adaptive sample size determination for the development of clinical prediction models
title_full_unstemmed	Adaptive sample size determination for the development of clinical prediction models
title_sort	adaptive sample size determination for the development of clinical prediction models
publisher	BMC
series	Diagnostic and Prognostic Research
issn	2397-7523
publishDate	2021-03-01
description	Abstract Background We suggest an adaptive sample size calculation method for developing clinical prediction models, in which model performance is monitored sequentially as new data comes in. Methods We illustrate the approach using data for the diagnosis of ovarian cancer (n = 5914, 33% event fraction) and obstructive coronary artery disease (CAD; n = 4888, 44% event fraction). We used logistic regression to develop a prediction model consisting only of a priori selected predictors and assumed linear relations for continuous predictors. We mimicked prospective patient recruitment by developing the model on 100 randomly selected patients, and we used bootstrapping to internally validate the model. We sequentially added 50 random new patients until we reached a sample size of 3000 and re-estimated model performance at each step. We examined the required sample size for satisfying the following stopping rule: obtaining a calibration slope ≥ 0.9 and optimism in the c-statistic (or AUC) < = 0.02 at two consecutive sample sizes. This procedure was repeated 500 times. We also investigated the impact of alternative modeling strategies: modeling nonlinear relations for continuous predictors and correcting for bias on the model estimates (Firth’s correction). Results Better discrimination was achieved in the ovarian cancer data (c-statistic 0.9 with 7 predictors) than in the CAD data (c-statistic 0.7 with 11 predictors). Adequate calibration and limited optimism in discrimination was achieved after a median of 450 patients (interquartile range 450–500) for the ovarian cancer data (22 events per parameter (EPP), 20–24) and 850 patients (750–900) for the CAD data (33 EPP, 30–35). A stricter criterion, requiring AUC optimism < = 0.01, was met with a median of 500 (23 EPP) and 1500 (59 EPP) patients, respectively. These sample sizes were much higher than the well-known 10 EPP rule of thumb and slightly higher than a recently published fixed sample size calculation method by Riley et al. Higher sample sizes were required when nonlinear relationships were modeled, and lower sample sizes when Firth’s correction was used. Conclusions Adaptive sample size determination can be a useful supplement to fixed a priori sample size calculations, because it allows to tailor the sample size to the specific prediction modeling context in a dynamic fashion.
topic	Adaptive design Clinical prediction models Events per variable Model development Model validation Sample size
url	https://doi.org/10.1186/s41512-021-00096-5
work_keys_str_mv	AT evangeliachristodoulou adaptivesamplesizedeterminationforthedevelopmentofclinicalpredictionmodels AT maartenvansmeden adaptivesamplesizedeterminationforthedevelopmentofclinicalpredictionmodels AT michaeledlinger adaptivesamplesizedeterminationforthedevelopmentofclinicalpredictionmodels AT dirktimmerman adaptivesamplesizedeterminationforthedevelopmentofclinicalpredictionmodels AT mariawanitschek adaptivesamplesizedeterminationforthedevelopmentofclinicalpredictionmodels AT ewoutwsteyerberg adaptivesamplesizedeterminationforthedevelopmentofclinicalpredictionmodels AT benvancalster adaptivesamplesizedeterminationforthedevelopmentofclinicalpredictionmodels
_version_	1724199716016619520

Adaptive sample size determination for the development of clinical prediction models

Similar Items