Validation measures for prognostic models for independent and correlated binary and survival outcomes
Prognostic models are developed to guide the clinical management of patients or to assess the performance of health institutions. It is essential that performances of these models are evaluated using appropriate validation measures. Despite the proposal of several validation measures for survival ou...
Main Author: | |
---|---|
Published: |
University College London (University of London)
2012
|
Subjects: | |
Online Access: | http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.565774 |
id |
ndltd-bl.uk-oai-ethos.bl.uk-565774 |
---|---|
record_format |
oai_dc |
collection |
NDLTD |
sources |
NDLTD |
topic |
519.5 |
spellingShingle |
519.5 Rahman, M. S. Validation measures for prognostic models for independent and correlated binary and survival outcomes |
description |
Prognostic models are developed to guide the clinical management of patients or to assess the performance of health institutions. It is essential that performances of these models are evaluated using appropriate validation measures. Despite the proposal of several validation measures for survival outcomes, it is still unclear which measures should be generally used in practice. In this thesis, a simulation study was performed to investigate a range of validation measures for survival outcomes in order to make practical recommendations regarding their use. Measures were evaluated with respect to their robustness to censoring and their sensitivity to the omission of important predictors. Based on the simulation results, from the discrimination measures, Gonen and Heller's K statistic can be recommended for validating a survival risk model developed using the Cox proportional hazards model, since it is both robust to censoring and reasonably sensitive to predictor omission. Royston and Sauerbrei's D statistic can be recommended provided that the distribution of the prognostic index is approximately normal. Harrell's C-index was affected by censoring and cannot be recommended for use with data with more than 30% censoring. The calibration slope can be recommended as a measure of calibration since it is not affected by censoring. The measures of predictive accuracy and explained variation (Graf et al's integrated Brier Score and its R-square version, and Schemper and Henderson's V) cannot be recommended due to their poor performance in the presence of censored data. In multicentre studies patients are typically clustered within centres and are likely to be correlated. Typically, random effects logistic and frailty models are fitted to clustered binary and survival outcomes, respectively. However, limited work has been done to assess the predictive ability of these models. This research extended existing validation measures for independent data, such as the C-index, D statistic, calibration slope, Brier score, and the K statistic for use with random effects/frailty models. Two approaches: the `overall' and `pooled cluster-specific' are proposed. The `overall' approach incorporates comparisons of subjects both within-and between-clusters. The `pooled cluster-specific' measures are obtained by pooling the cluster-specific estimates based on comparisons of subjects within each cluster; the pooling is achieved using a random effects summary statistics method. Each approach can produce three different values for the validation measures, depending on the type of predictions: conditional predictions using the estimates of the random effects or setting these as zero and marginal predictions by integrating out the random effects. Their performances were investigated using simulation studies. The `overall' measures based on the conditional predictions including the random effects performed reasonably well in a range of scenarios and are recommended for validating models when using subjects from the same clusters as the development data. The measures based on the marginal predictions and the conditional predictions that set the random effects to be zero were biased when the intra-cluster correlation was moderate to high and can be used for subjects in new clusters when the intra-cluster correlation coefficient is less than 0.05. The `pooled cluster-specific' measures performed well when the clusters had reasonable number of events. Generally, both the `overall' and `pooled' measures are recommended for use in practice. In choosing a validation measure, the following characteristics of the validation data should be investigated: the level of censoring (for survival outcome), the distribution of the prognostic index, whether the clusters are the same or different to those in the development data, the level of clustering and the cluster size. |
author |
Rahman, M. S. |
author_facet |
Rahman, M. S. |
author_sort |
Rahman, M. S. |
title |
Validation measures for prognostic models for independent and correlated binary and survival outcomes |
title_short |
Validation measures for prognostic models for independent and correlated binary and survival outcomes |
title_full |
Validation measures for prognostic models for independent and correlated binary and survival outcomes |
title_fullStr |
Validation measures for prognostic models for independent and correlated binary and survival outcomes |
title_full_unstemmed |
Validation measures for prognostic models for independent and correlated binary and survival outcomes |
title_sort |
validation measures for prognostic models for independent and correlated binary and survival outcomes |
publisher |
University College London (University of London) |
publishDate |
2012 |
url |
http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.565774 |
work_keys_str_mv |
AT rahmanms validationmeasuresforprognosticmodelsforindependentandcorrelatedbinaryandsurvivaloutcomes |
_version_ |
1718141150917820416 |
spelling |
ndltd-bl.uk-oai-ethos.bl.uk-5657742015-12-03T03:28:43ZValidation measures for prognostic models for independent and correlated binary and survival outcomesRahman, M. S.2012Prognostic models are developed to guide the clinical management of patients or to assess the performance of health institutions. It is essential that performances of these models are evaluated using appropriate validation measures. Despite the proposal of several validation measures for survival outcomes, it is still unclear which measures should be generally used in practice. In this thesis, a simulation study was performed to investigate a range of validation measures for survival outcomes in order to make practical recommendations regarding their use. Measures were evaluated with respect to their robustness to censoring and their sensitivity to the omission of important predictors. Based on the simulation results, from the discrimination measures, Gonen and Heller's K statistic can be recommended for validating a survival risk model developed using the Cox proportional hazards model, since it is both robust to censoring and reasonably sensitive to predictor omission. Royston and Sauerbrei's D statistic can be recommended provided that the distribution of the prognostic index is approximately normal. Harrell's C-index was affected by censoring and cannot be recommended for use with data with more than 30% censoring. The calibration slope can be recommended as a measure of calibration since it is not affected by censoring. The measures of predictive accuracy and explained variation (Graf et al's integrated Brier Score and its R-square version, and Schemper and Henderson's V) cannot be recommended due to their poor performance in the presence of censored data. In multicentre studies patients are typically clustered within centres and are likely to be correlated. Typically, random effects logistic and frailty models are fitted to clustered binary and survival outcomes, respectively. However, limited work has been done to assess the predictive ability of these models. This research extended existing validation measures for independent data, such as the C-index, D statistic, calibration slope, Brier score, and the K statistic for use with random effects/frailty models. Two approaches: the `overall' and `pooled cluster-specific' are proposed. The `overall' approach incorporates comparisons of subjects both within-and between-clusters. The `pooled cluster-specific' measures are obtained by pooling the cluster-specific estimates based on comparisons of subjects within each cluster; the pooling is achieved using a random effects summary statistics method. Each approach can produce three different values for the validation measures, depending on the type of predictions: conditional predictions using the estimates of the random effects or setting these as zero and marginal predictions by integrating out the random effects. Their performances were investigated using simulation studies. The `overall' measures based on the conditional predictions including the random effects performed reasonably well in a range of scenarios and are recommended for validating models when using subjects from the same clusters as the development data. The measures based on the marginal predictions and the conditional predictions that set the random effects to be zero were biased when the intra-cluster correlation was moderate to high and can be used for subjects in new clusters when the intra-cluster correlation coefficient is less than 0.05. The `pooled cluster-specific' measures performed well when the clusters had reasonable number of events. Generally, both the `overall' and `pooled' measures are recommended for use in practice. In choosing a validation measure, the following characteristics of the validation data should be investigated: the level of censoring (for survival outcome), the distribution of the prognostic index, whether the clusters are the same or different to those in the development data, the level of clustering and the cluster size.519.5University College London (University of London)http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.565774http://discovery.ucl.ac.uk/1367069/Electronic Thesis or Dissertation |