Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA

Abstract Background Blood-based methods using cell-free DNA (cfDNA) are under development as an alternative to existing screening tests. However, early-stage detection of cancer using tumor-derived cfDNA has proven challenging because of the small proportion of cfDNA derived from tumor tissue in ear...

Full description

Bibliographic Details
Main Authors: Nathan Wan, David Weinberg, Tzu-Yu Liu, Katherine Niehaus, Eric A. Ariazi, Daniel Delubac, Ajay Kannan, Brandon White, Mitch Bailey, Marvin Bertin, Nathan Boley, Derek Bowen, James Cregg, Adam M. Drake, Riley Ennis, Signe Fransen, Erik Gafni, Loren Hansen, Yaping Liu, Gabriel L. Otte, Jennifer Pecson, Brandon Rice, Gabriel E. Sanderson, Aarushi Sharma, John St. John, Catherina Tang, Abraham Tzou, Leilani Young, Girish Putcha, Imran S. Haque
Format: Article
Language:English
Published: BMC 2019-08-01
Series:BMC Cancer
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12885-019-6003-8
id doaj-fe732f6285b646618c64630c5a91ae05
record_format Article
spelling doaj-fe732f6285b646618c64630c5a91ae052020-11-25T02:58:47ZengBMCBMC Cancer1471-24072019-08-0119111010.1186/s12885-019-6003-8Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNANathan Wan0David Weinberg1Tzu-Yu Liu2Katherine Niehaus3Eric A. Ariazi4Daniel Delubac5Ajay Kannan6Brandon White7Mitch Bailey8Marvin Bertin9Nathan Boley10Derek Bowen11James Cregg12Adam M. Drake13Riley Ennis14Signe Fransen15Erik Gafni16Loren Hansen17Yaping Liu18Gabriel L. Otte19Jennifer Pecson20Brandon Rice21Gabriel E. Sanderson22Aarushi Sharma23John St. John24Catherina Tang25Abraham Tzou26Leilani Young27Girish Putcha28Imran S. Haque29FreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeAbstract Background Blood-based methods using cell-free DNA (cfDNA) are under development as an alternative to existing screening tests. However, early-stage detection of cancer using tumor-derived cfDNA has proven challenging because of the small proportion of cfDNA derived from tumor tissue in early-stage disease. A machine learning approach to discover signatures in cfDNA, potentially reflective of both tumor and non-tumor contributions, may represent a promising direction for the early detection of cancer. Methods Whole-genome sequencing was performed on cfDNA extracted from plasma samples (N = 546 colorectal cancer and 271 non-cancer controls). Reads aligning to protein-coding gene bodies were extracted, and read counts were normalized. cfDNA tumor fraction was estimated using IchorCNA. Machine learning models were trained using k-fold cross-validation and confounder-based cross-validations to assess generalization performance. Results In a colorectal cancer cohort heavily weighted towards early-stage cancer (80% stage I/II), we achieved a mean AUC of 0.92 (95% CI 0.91–0.93) with a mean sensitivity of 85% (95% CI 83–86%) at 85% specificity. Sensitivity generally increased with tumor stage and increasing tumor fraction. Stratification by age, sequencing batch, and institution demonstrated the impact of these confounders and provided a more accurate assessment of generalization performance. Conclusions A machine learning approach using cfDNA achieved high sensitivity and specificity in a large, predominantly early-stage, colorectal cancer cohort. The possibility of systematic technical and institution-specific biases warrants similar confounder analyses in other studies. Prospective validation of this machine learning method and evaluation of a multi-analyte approach are underway.http://link.springer.com/article/10.1186/s12885-019-6003-8Cell-free DNAColorectal cancerScreeningWhole-genome sequencingEarly-stage cancer
collection DOAJ
language English
format Article
sources DOAJ
author Nathan Wan
David Weinberg
Tzu-Yu Liu
Katherine Niehaus
Eric A. Ariazi
Daniel Delubac
Ajay Kannan
Brandon White
Mitch Bailey
Marvin Bertin
Nathan Boley
Derek Bowen
James Cregg
Adam M. Drake
Riley Ennis
Signe Fransen
Erik Gafni
Loren Hansen
Yaping Liu
Gabriel L. Otte
Jennifer Pecson
Brandon Rice
Gabriel E. Sanderson
Aarushi Sharma
John St. John
Catherina Tang
Abraham Tzou
Leilani Young
Girish Putcha
Imran S. Haque
spellingShingle Nathan Wan
David Weinberg
Tzu-Yu Liu
Katherine Niehaus
Eric A. Ariazi
Daniel Delubac
Ajay Kannan
Brandon White
Mitch Bailey
Marvin Bertin
Nathan Boley
Derek Bowen
James Cregg
Adam M. Drake
Riley Ennis
Signe Fransen
Erik Gafni
Loren Hansen
Yaping Liu
Gabriel L. Otte
Jennifer Pecson
Brandon Rice
Gabriel E. Sanderson
Aarushi Sharma
John St. John
Catherina Tang
Abraham Tzou
Leilani Young
Girish Putcha
Imran S. Haque
Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA
BMC Cancer
Cell-free DNA
Colorectal cancer
Screening
Whole-genome sequencing
Early-stage cancer
author_facet Nathan Wan
David Weinberg
Tzu-Yu Liu
Katherine Niehaus
Eric A. Ariazi
Daniel Delubac
Ajay Kannan
Brandon White
Mitch Bailey
Marvin Bertin
Nathan Boley
Derek Bowen
James Cregg
Adam M. Drake
Riley Ennis
Signe Fransen
Erik Gafni
Loren Hansen
Yaping Liu
Gabriel L. Otte
Jennifer Pecson
Brandon Rice
Gabriel E. Sanderson
Aarushi Sharma
John St. John
Catherina Tang
Abraham Tzou
Leilani Young
Girish Putcha
Imran S. Haque
author_sort Nathan Wan
title Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA
title_short Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA
title_full Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA
title_fullStr Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA
title_full_unstemmed Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA
title_sort machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free dna
publisher BMC
series BMC Cancer
issn 1471-2407
publishDate 2019-08-01
description Abstract Background Blood-based methods using cell-free DNA (cfDNA) are under development as an alternative to existing screening tests. However, early-stage detection of cancer using tumor-derived cfDNA has proven challenging because of the small proportion of cfDNA derived from tumor tissue in early-stage disease. A machine learning approach to discover signatures in cfDNA, potentially reflective of both tumor and non-tumor contributions, may represent a promising direction for the early detection of cancer. Methods Whole-genome sequencing was performed on cfDNA extracted from plasma samples (N = 546 colorectal cancer and 271 non-cancer controls). Reads aligning to protein-coding gene bodies were extracted, and read counts were normalized. cfDNA tumor fraction was estimated using IchorCNA. Machine learning models were trained using k-fold cross-validation and confounder-based cross-validations to assess generalization performance. Results In a colorectal cancer cohort heavily weighted towards early-stage cancer (80% stage I/II), we achieved a mean AUC of 0.92 (95% CI 0.91–0.93) with a mean sensitivity of 85% (95% CI 83–86%) at 85% specificity. Sensitivity generally increased with tumor stage and increasing tumor fraction. Stratification by age, sequencing batch, and institution demonstrated the impact of these confounders and provided a more accurate assessment of generalization performance. Conclusions A machine learning approach using cfDNA achieved high sensitivity and specificity in a large, predominantly early-stage, colorectal cancer cohort. The possibility of systematic technical and institution-specific biases warrants similar confounder analyses in other studies. Prospective validation of this machine learning method and evaluation of a multi-analyte approach are underway.
topic Cell-free DNA
Colorectal cancer
Screening
Whole-genome sequencing
Early-stage cancer
url http://link.springer.com/article/10.1186/s12885-019-6003-8
work_keys_str_mv AT nathanwan machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT davidweinberg machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT tzuyuliu machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT katherineniehaus machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT ericaariazi machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT danieldelubac machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT ajaykannan machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT brandonwhite machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT mitchbailey machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT marvinbertin machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT nathanboley machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT derekbowen machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT jamescregg machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT adammdrake machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT rileyennis machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT signefransen machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT erikgafni machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT lorenhansen machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT yapingliu machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT gabriellotte machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT jenniferpecson machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT brandonrice machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT gabrielesanderson machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT aarushisharma machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT johnstjohn machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT catherinatang machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT abrahamtzou machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT leilaniyoung machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT girishputcha machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
AT imranshaque machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna
_version_ 1724705153114701824