Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA
Abstract Background Blood-based methods using cell-free DNA (cfDNA) are under development as an alternative to existing screening tests. However, early-stage detection of cancer using tumor-derived cfDNA has proven challenging because of the small proportion of cfDNA derived from tumor tissue in ear...
Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2019-08-01
|
Series: | BMC Cancer |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12885-019-6003-8 |
id |
doaj-fe732f6285b646618c64630c5a91ae05 |
---|---|
record_format |
Article |
spelling |
doaj-fe732f6285b646618c64630c5a91ae052020-11-25T02:58:47ZengBMCBMC Cancer1471-24072019-08-0119111010.1186/s12885-019-6003-8Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNANathan Wan0David Weinberg1Tzu-Yu Liu2Katherine Niehaus3Eric A. Ariazi4Daniel Delubac5Ajay Kannan6Brandon White7Mitch Bailey8Marvin Bertin9Nathan Boley10Derek Bowen11James Cregg12Adam M. Drake13Riley Ennis14Signe Fransen15Erik Gafni16Loren Hansen17Yaping Liu18Gabriel L. Otte19Jennifer Pecson20Brandon Rice21Gabriel E. Sanderson22Aarushi Sharma23John St. John24Catherina Tang25Abraham Tzou26Leilani Young27Girish Putcha28Imran S. Haque29FreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeFreenomeAbstract Background Blood-based methods using cell-free DNA (cfDNA) are under development as an alternative to existing screening tests. However, early-stage detection of cancer using tumor-derived cfDNA has proven challenging because of the small proportion of cfDNA derived from tumor tissue in early-stage disease. A machine learning approach to discover signatures in cfDNA, potentially reflective of both tumor and non-tumor contributions, may represent a promising direction for the early detection of cancer. Methods Whole-genome sequencing was performed on cfDNA extracted from plasma samples (N = 546 colorectal cancer and 271 non-cancer controls). Reads aligning to protein-coding gene bodies were extracted, and read counts were normalized. cfDNA tumor fraction was estimated using IchorCNA. Machine learning models were trained using k-fold cross-validation and confounder-based cross-validations to assess generalization performance. Results In a colorectal cancer cohort heavily weighted towards early-stage cancer (80% stage I/II), we achieved a mean AUC of 0.92 (95% CI 0.91–0.93) with a mean sensitivity of 85% (95% CI 83–86%) at 85% specificity. Sensitivity generally increased with tumor stage and increasing tumor fraction. Stratification by age, sequencing batch, and institution demonstrated the impact of these confounders and provided a more accurate assessment of generalization performance. Conclusions A machine learning approach using cfDNA achieved high sensitivity and specificity in a large, predominantly early-stage, colorectal cancer cohort. The possibility of systematic technical and institution-specific biases warrants similar confounder analyses in other studies. Prospective validation of this machine learning method and evaluation of a multi-analyte approach are underway.http://link.springer.com/article/10.1186/s12885-019-6003-8Cell-free DNAColorectal cancerScreeningWhole-genome sequencingEarly-stage cancer |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Nathan Wan David Weinberg Tzu-Yu Liu Katherine Niehaus Eric A. Ariazi Daniel Delubac Ajay Kannan Brandon White Mitch Bailey Marvin Bertin Nathan Boley Derek Bowen James Cregg Adam M. Drake Riley Ennis Signe Fransen Erik Gafni Loren Hansen Yaping Liu Gabriel L. Otte Jennifer Pecson Brandon Rice Gabriel E. Sanderson Aarushi Sharma John St. John Catherina Tang Abraham Tzou Leilani Young Girish Putcha Imran S. Haque |
spellingShingle |
Nathan Wan David Weinberg Tzu-Yu Liu Katherine Niehaus Eric A. Ariazi Daniel Delubac Ajay Kannan Brandon White Mitch Bailey Marvin Bertin Nathan Boley Derek Bowen James Cregg Adam M. Drake Riley Ennis Signe Fransen Erik Gafni Loren Hansen Yaping Liu Gabriel L. Otte Jennifer Pecson Brandon Rice Gabriel E. Sanderson Aarushi Sharma John St. John Catherina Tang Abraham Tzou Leilani Young Girish Putcha Imran S. Haque Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA BMC Cancer Cell-free DNA Colorectal cancer Screening Whole-genome sequencing Early-stage cancer |
author_facet |
Nathan Wan David Weinberg Tzu-Yu Liu Katherine Niehaus Eric A. Ariazi Daniel Delubac Ajay Kannan Brandon White Mitch Bailey Marvin Bertin Nathan Boley Derek Bowen James Cregg Adam M. Drake Riley Ennis Signe Fransen Erik Gafni Loren Hansen Yaping Liu Gabriel L. Otte Jennifer Pecson Brandon Rice Gabriel E. Sanderson Aarushi Sharma John St. John Catherina Tang Abraham Tzou Leilani Young Girish Putcha Imran S. Haque |
author_sort |
Nathan Wan |
title |
Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA |
title_short |
Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA |
title_full |
Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA |
title_fullStr |
Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA |
title_full_unstemmed |
Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA |
title_sort |
machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free dna |
publisher |
BMC |
series |
BMC Cancer |
issn |
1471-2407 |
publishDate |
2019-08-01 |
description |
Abstract Background Blood-based methods using cell-free DNA (cfDNA) are under development as an alternative to existing screening tests. However, early-stage detection of cancer using tumor-derived cfDNA has proven challenging because of the small proportion of cfDNA derived from tumor tissue in early-stage disease. A machine learning approach to discover signatures in cfDNA, potentially reflective of both tumor and non-tumor contributions, may represent a promising direction for the early detection of cancer. Methods Whole-genome sequencing was performed on cfDNA extracted from plasma samples (N = 546 colorectal cancer and 271 non-cancer controls). Reads aligning to protein-coding gene bodies were extracted, and read counts were normalized. cfDNA tumor fraction was estimated using IchorCNA. Machine learning models were trained using k-fold cross-validation and confounder-based cross-validations to assess generalization performance. Results In a colorectal cancer cohort heavily weighted towards early-stage cancer (80% stage I/II), we achieved a mean AUC of 0.92 (95% CI 0.91–0.93) with a mean sensitivity of 85% (95% CI 83–86%) at 85% specificity. Sensitivity generally increased with tumor stage and increasing tumor fraction. Stratification by age, sequencing batch, and institution demonstrated the impact of these confounders and provided a more accurate assessment of generalization performance. Conclusions A machine learning approach using cfDNA achieved high sensitivity and specificity in a large, predominantly early-stage, colorectal cancer cohort. The possibility of systematic technical and institution-specific biases warrants similar confounder analyses in other studies. Prospective validation of this machine learning method and evaluation of a multi-analyte approach are underway. |
topic |
Cell-free DNA Colorectal cancer Screening Whole-genome sequencing Early-stage cancer |
url |
http://link.springer.com/article/10.1186/s12885-019-6003-8 |
work_keys_str_mv |
AT nathanwan machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT davidweinberg machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT tzuyuliu machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT katherineniehaus machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT ericaariazi machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT danieldelubac machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT ajaykannan machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT brandonwhite machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT mitchbailey machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT marvinbertin machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT nathanboley machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT derekbowen machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT jamescregg machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT adammdrake machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT rileyennis machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT signefransen machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT erikgafni machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT lorenhansen machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT yapingliu machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT gabriellotte machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT jenniferpecson machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT brandonrice machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT gabrielesanderson machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT aarushisharma machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT johnstjohn machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT catherinatang machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT abrahamtzou machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT leilaniyoung machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT girishputcha machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna AT imranshaque machinelearningenablesdetectionofearlystagecolorectalcancerbywholegenomesequencingofplasmacellfreedna |
_version_ |
1724705153114701824 |