Methodological considerations for identifying multiple plasma proteins associated with all-cause mortality in a population-based prospective cohort

Abstract Novel methods to characterize the plasma proteome has made it possible to examine a wide range of proteins in large longitudinal cohort studies, but the complexity of the human proteome makes it difficult to identify robust protein-disease associations. Nevertheless, identification of indiv...

Full description

Bibliographic Details
Main Authors: Isabel Drake, George Hindy, Peter Almgren, Gunnar Engström, Jan Nilsson, Olle Melander, Marju Orho-Melander
Format: Article
Language:English
Published: Nature Publishing Group 2021-03-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-021-85991-z
id doaj-825622f704a643f1ae22d3d8a2adeb44
record_format Article
spelling doaj-825622f704a643f1ae22d3d8a2adeb442021-03-28T11:31:40ZengNature Publishing GroupScientific Reports2045-23222021-03-0111111010.1038/s41598-021-85991-zMethodological considerations for identifying multiple plasma proteins associated with all-cause mortality in a population-based prospective cohortIsabel Drake0George Hindy1Peter Almgren2Gunnar Engström3Jan Nilsson4Olle Melander5Marju Orho-Melander6Diabetes and Cardiovascular Disease—Genetic Epidemiology, Department of Clinical Sciences in Malmö, Lund UniversityDiabetes and Cardiovascular Disease—Genetic Epidemiology, Department of Clinical Sciences in Malmö, Lund UniversityDiabetes and Cardiovascular Disease—Genetic Epidemiology, Department of Clinical Sciences in Malmö, Lund UniversityCardiovascular Epidemiology, Department of Clinical Sciences in Malmö, Lund UniversityExperimental Cardiovascular Research, Department of Clinical Sciences in Malmö, Lund UniversityHypertension and Cardiovascular Disease, Department of Clinical Sciences in Malmö, Lund UniversityDiabetes and Cardiovascular Disease—Genetic Epidemiology, Department of Clinical Sciences in Malmö, Lund UniversityAbstract Novel methods to characterize the plasma proteome has made it possible to examine a wide range of proteins in large longitudinal cohort studies, but the complexity of the human proteome makes it difficult to identify robust protein-disease associations. Nevertheless, identification of individuals at high risk of early mortality is a central issue in clinical decision making and novel biomarkers may be useful to improve risk stratification. With adjustment for established risk factors, we examined the associations between 138 plasma proteins measured using two proximity extension assays and long-term risk of all-cause mortality in 3,918 participants of the population-based Malmö Diet and Cancer Study. To examine the reproducibility of protein-mortality associations we used a two-step random-split approach to simulate a discovery and replication cohort and conducted analyses using four different methods: Cox regression, stepwise Cox regression, Lasso-Cox regression, and random survival forest (RSF). In the total study population, we identified eight proteins that associated with all-cause mortality after adjustment for established risk factors and with Bonferroni correction for multiple testing. In the two-step analyses, the number of proteins selected for model inclusion in both random samples ranged from 6 to 21 depending on the method used. However, only three proteins were consistently included in both samples across all four methods (growth/differentiation factor-15 (GDF-15), N-terminal pro-B-type natriuretic peptide, and epididymal secretory protein E4). Using the total study population, the C-statistic for a model including established risk factors was 0.7222 and increased to 0.7284 with inclusion of the most predictive protein (GDF-15; P < 0.0001). All multiple protein models showed additional improvement in the C-statistic compared to the single protein model (all P < 0.0001). We identified several plasma proteins associated with increased risk of all-cause mortality independently of established risk factors. Further investigation into the putatively causal role of these proteins for longevity is needed. In addition, the examined methods for identifying multiple proteins showed tendencies for overfitting by including several putatively false positive findings. Thus, the reproducibility of findings using such approaches may be limited.https://doi.org/10.1038/s41598-021-85991-z
collection DOAJ
language English
format Article
sources DOAJ
author Isabel Drake
George Hindy
Peter Almgren
Gunnar Engström
Jan Nilsson
Olle Melander
Marju Orho-Melander
spellingShingle Isabel Drake
George Hindy
Peter Almgren
Gunnar Engström
Jan Nilsson
Olle Melander
Marju Orho-Melander
Methodological considerations for identifying multiple plasma proteins associated with all-cause mortality in a population-based prospective cohort
Scientific Reports
author_facet Isabel Drake
George Hindy
Peter Almgren
Gunnar Engström
Jan Nilsson
Olle Melander
Marju Orho-Melander
author_sort Isabel Drake
title Methodological considerations for identifying multiple plasma proteins associated with all-cause mortality in a population-based prospective cohort
title_short Methodological considerations for identifying multiple plasma proteins associated with all-cause mortality in a population-based prospective cohort
title_full Methodological considerations for identifying multiple plasma proteins associated with all-cause mortality in a population-based prospective cohort
title_fullStr Methodological considerations for identifying multiple plasma proteins associated with all-cause mortality in a population-based prospective cohort
title_full_unstemmed Methodological considerations for identifying multiple plasma proteins associated with all-cause mortality in a population-based prospective cohort
title_sort methodological considerations for identifying multiple plasma proteins associated with all-cause mortality in a population-based prospective cohort
publisher Nature Publishing Group
series Scientific Reports
issn 2045-2322
publishDate 2021-03-01
description Abstract Novel methods to characterize the plasma proteome has made it possible to examine a wide range of proteins in large longitudinal cohort studies, but the complexity of the human proteome makes it difficult to identify robust protein-disease associations. Nevertheless, identification of individuals at high risk of early mortality is a central issue in clinical decision making and novel biomarkers may be useful to improve risk stratification. With adjustment for established risk factors, we examined the associations between 138 plasma proteins measured using two proximity extension assays and long-term risk of all-cause mortality in 3,918 participants of the population-based Malmö Diet and Cancer Study. To examine the reproducibility of protein-mortality associations we used a two-step random-split approach to simulate a discovery and replication cohort and conducted analyses using four different methods: Cox regression, stepwise Cox regression, Lasso-Cox regression, and random survival forest (RSF). In the total study population, we identified eight proteins that associated with all-cause mortality after adjustment for established risk factors and with Bonferroni correction for multiple testing. In the two-step analyses, the number of proteins selected for model inclusion in both random samples ranged from 6 to 21 depending on the method used. However, only three proteins were consistently included in both samples across all four methods (growth/differentiation factor-15 (GDF-15), N-terminal pro-B-type natriuretic peptide, and epididymal secretory protein E4). Using the total study population, the C-statistic for a model including established risk factors was 0.7222 and increased to 0.7284 with inclusion of the most predictive protein (GDF-15; P < 0.0001). All multiple protein models showed additional improvement in the C-statistic compared to the single protein model (all P < 0.0001). We identified several plasma proteins associated with increased risk of all-cause mortality independently of established risk factors. Further investigation into the putatively causal role of these proteins for longevity is needed. In addition, the examined methods for identifying multiple proteins showed tendencies for overfitting by including several putatively false positive findings. Thus, the reproducibility of findings using such approaches may be limited.
url https://doi.org/10.1038/s41598-021-85991-z
work_keys_str_mv AT isabeldrake methodologicalconsiderationsforidentifyingmultipleplasmaproteinsassociatedwithallcausemortalityinapopulationbasedprospectivecohort
AT georgehindy methodologicalconsiderationsforidentifyingmultipleplasmaproteinsassociatedwithallcausemortalityinapopulationbasedprospectivecohort
AT peteralmgren methodologicalconsiderationsforidentifyingmultipleplasmaproteinsassociatedwithallcausemortalityinapopulationbasedprospectivecohort
AT gunnarengstrom methodologicalconsiderationsforidentifyingmultipleplasmaproteinsassociatedwithallcausemortalityinapopulationbasedprospectivecohort
AT jannilsson methodologicalconsiderationsforidentifyingmultipleplasmaproteinsassociatedwithallcausemortalityinapopulationbasedprospectivecohort
AT ollemelander methodologicalconsiderationsforidentifyingmultipleplasmaproteinsassociatedwithallcausemortalityinapopulationbasedprospectivecohort
AT marjuorhomelander methodologicalconsiderationsforidentifyingmultipleplasmaproteinsassociatedwithallcausemortalityinapopulationbasedprospectivecohort
_version_ 1724199867588280320