Reclassification calibration test for censored survival data: performance and comparison to goodness-of-fit criteria

Abstract Background The risk reclassification table assesses clinical performance of a biomarker in terms of movements across relevant risk categories. The Reclassification- Calibration (RC) statistic has been developed for binary outcomes, but its performance for survival data with moderate to high...

Full description

Bibliographic Details
Main Authors: Olga V. Demler, Nina P. Paynter, Nancy R. Cook
Format: Article
Language:English
Published: BMC 2018-07-01
Series:Diagnostic and Prognostic Research
Subjects:
Online Access:http://link.springer.com/article/10.1186/s41512-018-0034-5
id doaj-547309e08c30414e9380d40cd9273bea
record_format Article
spelling doaj-547309e08c30414e9380d40cd9273bea2020-11-24T23:53:30ZengBMCDiagnostic and Prognostic Research2397-75232018-07-012111210.1186/s41512-018-0034-5Reclassification calibration test for censored survival data: performance and comparison to goodness-of-fit criteriaOlga V. Demler0Nina P. Paynter1Nancy R. Cook2Division of Preventive Medicine, Brigham and Women’s HospitalDivision of Preventive Medicine, Brigham and Women’s HospitalDivision of Preventive Medicine, Brigham and Women’s HospitalAbstract Background The risk reclassification table assesses clinical performance of a biomarker in terms of movements across relevant risk categories. The Reclassification- Calibration (RC) statistic has been developed for binary outcomes, but its performance for survival data with moderate to high censoring rates has not been evaluated. Methods We develop an RC statistic for survival data with higher censoring rates using the Greenwood-Nam-D’Agostino approach (RC-GND). We examine its performance characteristics and compare its performance and utility to the Hosmer-Lemeshow goodness-of-fit test under various assumptions about the censoring rate and the shape of the baseline hazard. Results The RC-GND test was robust to high (up to 50%) censoring rates and did not exceed the targeted 5% Type I error in a variety of simulated scenarios. It achieved 80% power to detect better calibration with respect to clinical categories when an important predictor with a hazard ratio of at least 1.7 to 2.2 was added to the model, while the Hosmer-Lemeshow goodness-of-fit (gof) test had power of 5% in this scenario. Conclusions The RC-GND test should be used to test the improvement in calibration with respect to clinically relevant risk strata. When an important predictor is omitted, the Hosmer-Lemeshow goodness-of-fit test is usually not significant, while the RC-GND test is sensitive to such an omission.http://link.springer.com/article/10.1186/s41512-018-0034-5Risk reclassificationCalibrationGoodness-of-fit testSurvival analysisHosmer-LemeshowGrønnesby-Borgan
collection DOAJ
language English
format Article
sources DOAJ
author Olga V. Demler
Nina P. Paynter
Nancy R. Cook
spellingShingle Olga V. Demler
Nina P. Paynter
Nancy R. Cook
Reclassification calibration test for censored survival data: performance and comparison to goodness-of-fit criteria
Diagnostic and Prognostic Research
Risk reclassification
Calibration
Goodness-of-fit test
Survival analysis
Hosmer-Lemeshow
Grønnesby-Borgan
author_facet Olga V. Demler
Nina P. Paynter
Nancy R. Cook
author_sort Olga V. Demler
title Reclassification calibration test for censored survival data: performance and comparison to goodness-of-fit criteria
title_short Reclassification calibration test for censored survival data: performance and comparison to goodness-of-fit criteria
title_full Reclassification calibration test for censored survival data: performance and comparison to goodness-of-fit criteria
title_fullStr Reclassification calibration test for censored survival data: performance and comparison to goodness-of-fit criteria
title_full_unstemmed Reclassification calibration test for censored survival data: performance and comparison to goodness-of-fit criteria
title_sort reclassification calibration test for censored survival data: performance and comparison to goodness-of-fit criteria
publisher BMC
series Diagnostic and Prognostic Research
issn 2397-7523
publishDate 2018-07-01
description Abstract Background The risk reclassification table assesses clinical performance of a biomarker in terms of movements across relevant risk categories. The Reclassification- Calibration (RC) statistic has been developed for binary outcomes, but its performance for survival data with moderate to high censoring rates has not been evaluated. Methods We develop an RC statistic for survival data with higher censoring rates using the Greenwood-Nam-D’Agostino approach (RC-GND). We examine its performance characteristics and compare its performance and utility to the Hosmer-Lemeshow goodness-of-fit test under various assumptions about the censoring rate and the shape of the baseline hazard. Results The RC-GND test was robust to high (up to 50%) censoring rates and did not exceed the targeted 5% Type I error in a variety of simulated scenarios. It achieved 80% power to detect better calibration with respect to clinical categories when an important predictor with a hazard ratio of at least 1.7 to 2.2 was added to the model, while the Hosmer-Lemeshow goodness-of-fit (gof) test had power of 5% in this scenario. Conclusions The RC-GND test should be used to test the improvement in calibration with respect to clinically relevant risk strata. When an important predictor is omitted, the Hosmer-Lemeshow goodness-of-fit test is usually not significant, while the RC-GND test is sensitive to such an omission.
topic Risk reclassification
Calibration
Goodness-of-fit test
Survival analysis
Hosmer-Lemeshow
Grønnesby-Borgan
url http://link.springer.com/article/10.1186/s41512-018-0034-5
work_keys_str_mv AT olgavdemler reclassificationcalibrationtestforcensoredsurvivaldataperformanceandcomparisontogoodnessoffitcriteria
AT ninappaynter reclassificationcalibrationtestforcensoredsurvivaldataperformanceandcomparisontogoodnessoffitcriteria
AT nancyrcook reclassificationcalibrationtestforcensoredsurvivaldataperformanceandcomparisontogoodnessoffitcriteria
_version_ 1725469313073152000