Quantitative Assessment of Tissue Biomarkers and Construction of a Model to Predict Outcome in Breast Cancer Using Multiple Imputation

Missing data pose one of the greatest challenges in the rigorous evaluation of biomarkers. The limited availability of specimens with complete clinical annotation and quality biomaterial often leads to underpowered studies. Tissue microarray studies, for example, may be further handicapped by the lo...

Full description

Bibliographic Details
Main Authors: John W. Emerson, Marisa Dolled-Filhart, Lyndsay Harris, David L. Rimm, David P. Tuck
Format: Article
Language:English
Published: SAGE Publishing 2009-01-01
Series:Cancer Informatics
Online Access:https://doi.org/10.4137/CIN.S911
id doaj-14e3d118d99a4628bf994dec1262382e
record_format Article
spelling doaj-14e3d118d99a4628bf994dec1262382e2020-11-25T03:15:28ZengSAGE PublishingCancer Informatics1176-93512009-01-01710.4137/CIN.S911Quantitative Assessment of Tissue Biomarkers and Construction of a Model to Predict Outcome in Breast Cancer Using Multiple ImputationJohn W. Emerson0Marisa Dolled-Filhart1Lyndsay Harris2David L. Rimm3David P. Tuck4Department of Statistics, Yale University, New Haven, Connecticut 06520.Department of Pathology, Yale University School of Medicine, New Haven, Connecticut 06510.Medical Oncology, Yale University School of Medicine, New Haven, Connecticut 06510.Department of Pathology, Yale University School of Medicine, New Haven, Connecticut 06510.Department of Pathology, Yale University School of Medicine, New Haven, Connecticut 06510.Missing data pose one of the greatest challenges in the rigorous evaluation of biomarkers. The limited availability of specimens with complete clinical annotation and quality biomaterial often leads to underpowered studies. Tissue microarray studies, for example, may be further handicapped by the loss of data points because of unevaluable staining, core loss, or the lack of tumor in the histospot. This paper presents a novel approach to these common problems in the context of a tissue protein biomarker analysis in a cohort of patients with breast cancer. Our analysis develops techniques based on multiple imputation to address the missing value problem. We first select markers using a training cohort, identifying a small subset of protein expression levels that are most useful in predicting patient survival. The best model is obtained by including both protein markers (including COX6C, GATA3, NAT1, and ESR1) and lymph node status. The use of either lymph node status or the four protein expression levels provides similar improvements in goodness-of-fit, with both significantly better than a baseline clinical model. Using the same multiple imputation strategy, we then validate the results out-of-sample on a larger independent cohort. Our approach of integrating multiple imputation with each stage of the analysis serves as an example that may be replicated or adapted in future studies with missing values.https://doi.org/10.4137/CIN.S911
collection DOAJ
language English
format Article
sources DOAJ
author John W. Emerson
Marisa Dolled-Filhart
Lyndsay Harris
David L. Rimm
David P. Tuck
spellingShingle John W. Emerson
Marisa Dolled-Filhart
Lyndsay Harris
David L. Rimm
David P. Tuck
Quantitative Assessment of Tissue Biomarkers and Construction of a Model to Predict Outcome in Breast Cancer Using Multiple Imputation
Cancer Informatics
author_facet John W. Emerson
Marisa Dolled-Filhart
Lyndsay Harris
David L. Rimm
David P. Tuck
author_sort John W. Emerson
title Quantitative Assessment of Tissue Biomarkers and Construction of a Model to Predict Outcome in Breast Cancer Using Multiple Imputation
title_short Quantitative Assessment of Tissue Biomarkers and Construction of a Model to Predict Outcome in Breast Cancer Using Multiple Imputation
title_full Quantitative Assessment of Tissue Biomarkers and Construction of a Model to Predict Outcome in Breast Cancer Using Multiple Imputation
title_fullStr Quantitative Assessment of Tissue Biomarkers and Construction of a Model to Predict Outcome in Breast Cancer Using Multiple Imputation
title_full_unstemmed Quantitative Assessment of Tissue Biomarkers and Construction of a Model to Predict Outcome in Breast Cancer Using Multiple Imputation
title_sort quantitative assessment of tissue biomarkers and construction of a model to predict outcome in breast cancer using multiple imputation
publisher SAGE Publishing
series Cancer Informatics
issn 1176-9351
publishDate 2009-01-01
description Missing data pose one of the greatest challenges in the rigorous evaluation of biomarkers. The limited availability of specimens with complete clinical annotation and quality biomaterial often leads to underpowered studies. Tissue microarray studies, for example, may be further handicapped by the loss of data points because of unevaluable staining, core loss, or the lack of tumor in the histospot. This paper presents a novel approach to these common problems in the context of a tissue protein biomarker analysis in a cohort of patients with breast cancer. Our analysis develops techniques based on multiple imputation to address the missing value problem. We first select markers using a training cohort, identifying a small subset of protein expression levels that are most useful in predicting patient survival. The best model is obtained by including both protein markers (including COX6C, GATA3, NAT1, and ESR1) and lymph node status. The use of either lymph node status or the four protein expression levels provides similar improvements in goodness-of-fit, with both significantly better than a baseline clinical model. Using the same multiple imputation strategy, we then validate the results out-of-sample on a larger independent cohort. Our approach of integrating multiple imputation with each stage of the analysis serves as an example that may be replicated or adapted in future studies with missing values.
url https://doi.org/10.4137/CIN.S911
work_keys_str_mv AT johnwemerson quantitativeassessmentoftissuebiomarkersandconstructionofamodeltopredictoutcomeinbreastcancerusingmultipleimputation
AT marisadolledfilhart quantitativeassessmentoftissuebiomarkersandconstructionofamodeltopredictoutcomeinbreastcancerusingmultipleimputation
AT lyndsayharris quantitativeassessmentoftissuebiomarkersandconstructionofamodeltopredictoutcomeinbreastcancerusingmultipleimputation
AT davidlrimm quantitativeassessmentoftissuebiomarkersandconstructionofamodeltopredictoutcomeinbreastcancerusingmultipleimputation
AT davidptuck quantitativeassessmentoftissuebiomarkersandconstructionofamodeltopredictoutcomeinbreastcancerusingmultipleimputation
_version_ 1724639242743709696