Quantitative Assessment of Tissue Biomarkers and Construction of a Model to Predict Outcome in Breast Cancer Using Multiple Imputation
Missing data pose one of the greatest challenges in the rigorous evaluation of biomarkers. The limited availability of specimens with complete clinical annotation and quality biomaterial often leads to underpowered studies. Tissue microarray studies, for example, may be further handicapped by the lo...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SAGE Publishing
2009-01-01
|
Series: | Cancer Informatics |
Online Access: | https://doi.org/10.4137/CIN.S911 |
id |
doaj-14e3d118d99a4628bf994dec1262382e |
---|---|
record_format |
Article |
spelling |
doaj-14e3d118d99a4628bf994dec1262382e2020-11-25T03:15:28ZengSAGE PublishingCancer Informatics1176-93512009-01-01710.4137/CIN.S911Quantitative Assessment of Tissue Biomarkers and Construction of a Model to Predict Outcome in Breast Cancer Using Multiple ImputationJohn W. Emerson0Marisa Dolled-Filhart1Lyndsay Harris2David L. Rimm3David P. Tuck4Department of Statistics, Yale University, New Haven, Connecticut 06520.Department of Pathology, Yale University School of Medicine, New Haven, Connecticut 06510.Medical Oncology, Yale University School of Medicine, New Haven, Connecticut 06510.Department of Pathology, Yale University School of Medicine, New Haven, Connecticut 06510.Department of Pathology, Yale University School of Medicine, New Haven, Connecticut 06510.Missing data pose one of the greatest challenges in the rigorous evaluation of biomarkers. The limited availability of specimens with complete clinical annotation and quality biomaterial often leads to underpowered studies. Tissue microarray studies, for example, may be further handicapped by the loss of data points because of unevaluable staining, core loss, or the lack of tumor in the histospot. This paper presents a novel approach to these common problems in the context of a tissue protein biomarker analysis in a cohort of patients with breast cancer. Our analysis develops techniques based on multiple imputation to address the missing value problem. We first select markers using a training cohort, identifying a small subset of protein expression levels that are most useful in predicting patient survival. The best model is obtained by including both protein markers (including COX6C, GATA3, NAT1, and ESR1) and lymph node status. The use of either lymph node status or the four protein expression levels provides similar improvements in goodness-of-fit, with both significantly better than a baseline clinical model. Using the same multiple imputation strategy, we then validate the results out-of-sample on a larger independent cohort. Our approach of integrating multiple imputation with each stage of the analysis serves as an example that may be replicated or adapted in future studies with missing values.https://doi.org/10.4137/CIN.S911 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
John W. Emerson Marisa Dolled-Filhart Lyndsay Harris David L. Rimm David P. Tuck |
spellingShingle |
John W. Emerson Marisa Dolled-Filhart Lyndsay Harris David L. Rimm David P. Tuck Quantitative Assessment of Tissue Biomarkers and Construction of a Model to Predict Outcome in Breast Cancer Using Multiple Imputation Cancer Informatics |
author_facet |
John W. Emerson Marisa Dolled-Filhart Lyndsay Harris David L. Rimm David P. Tuck |
author_sort |
John W. Emerson |
title |
Quantitative Assessment of Tissue Biomarkers and Construction of a Model to Predict Outcome in Breast Cancer Using Multiple Imputation |
title_short |
Quantitative Assessment of Tissue Biomarkers and Construction of a Model to Predict Outcome in Breast Cancer Using Multiple Imputation |
title_full |
Quantitative Assessment of Tissue Biomarkers and Construction of a Model to Predict Outcome in Breast Cancer Using Multiple Imputation |
title_fullStr |
Quantitative Assessment of Tissue Biomarkers and Construction of a Model to Predict Outcome in Breast Cancer Using Multiple Imputation |
title_full_unstemmed |
Quantitative Assessment of Tissue Biomarkers and Construction of a Model to Predict Outcome in Breast Cancer Using Multiple Imputation |
title_sort |
quantitative assessment of tissue biomarkers and construction of a model to predict outcome in breast cancer using multiple imputation |
publisher |
SAGE Publishing |
series |
Cancer Informatics |
issn |
1176-9351 |
publishDate |
2009-01-01 |
description |
Missing data pose one of the greatest challenges in the rigorous evaluation of biomarkers. The limited availability of specimens with complete clinical annotation and quality biomaterial often leads to underpowered studies. Tissue microarray studies, for example, may be further handicapped by the loss of data points because of unevaluable staining, core loss, or the lack of tumor in the histospot. This paper presents a novel approach to these common problems in the context of a tissue protein biomarker analysis in a cohort of patients with breast cancer. Our analysis develops techniques based on multiple imputation to address the missing value problem. We first select markers using a training cohort, identifying a small subset of protein expression levels that are most useful in predicting patient survival. The best model is obtained by including both protein markers (including COX6C, GATA3, NAT1, and ESR1) and lymph node status. The use of either lymph node status or the four protein expression levels provides similar improvements in goodness-of-fit, with both significantly better than a baseline clinical model. Using the same multiple imputation strategy, we then validate the results out-of-sample on a larger independent cohort. Our approach of integrating multiple imputation with each stage of the analysis serves as an example that may be replicated or adapted in future studies with missing values. |
url |
https://doi.org/10.4137/CIN.S911 |
work_keys_str_mv |
AT johnwemerson quantitativeassessmentoftissuebiomarkersandconstructionofamodeltopredictoutcomeinbreastcancerusingmultipleimputation AT marisadolledfilhart quantitativeassessmentoftissuebiomarkersandconstructionofamodeltopredictoutcomeinbreastcancerusingmultipleimputation AT lyndsayharris quantitativeassessmentoftissuebiomarkersandconstructionofamodeltopredictoutcomeinbreastcancerusingmultipleimputation AT davidlrimm quantitativeassessmentoftissuebiomarkersandconstructionofamodeltopredictoutcomeinbreastcancerusingmultipleimputation AT davidptuck quantitativeassessmentoftissuebiomarkersandconstructionofamodeltopredictoutcomeinbreastcancerusingmultipleimputation |
_version_ |
1724639242743709696 |