Evaluation of parameters affecting performance and reliability of machine learning-based antibiotic susceptibility testing from whole genome sequencing data.

Prediction of antibiotic resistance phenotypes from whole genome sequencing data by machine learning methods has been proposed as a promising platform for the development of sequence-based diagnostics. However, there has been no systematic evaluation of factors that may influence performance of such...

Full description

Bibliographic Details
Main Authors:	Allison L Hicks, Nicole Wheeler, Leonor Sánchez-Busó, Jennifer L Rakeman, Simon R Harris, Yonatan H Grad
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2019-09-01
Series:	PLoS Computational Biology
Online Access:	https://doi.org/10.1371/journal.pcbi.1007349

id	doaj-3e5ab5d53c7942a1ab9339a65098e286
record_format	Article
spelling	doaj-3e5ab5d53c7942a1ab9339a65098e2862021-04-21T15:44:22ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582019-09-01159e100734910.1371/journal.pcbi.1007349Evaluation of parameters affecting performance and reliability of machine learning-based antibiotic susceptibility testing from whole genome sequencing data.Allison L HicksNicole WheelerLeonor Sánchez-BusóJennifer L RakemanSimon R HarrisYonatan H GradPrediction of antibiotic resistance phenotypes from whole genome sequencing data by machine learning methods has been proposed as a promising platform for the development of sequence-based diagnostics. However, there has been no systematic evaluation of factors that may influence performance of such models, how they might apply to and vary across clinical populations, and what the implications might be in the clinical setting. Here, we performed a meta-analysis of seven large Neisseria gonorrhoeae datasets, as well as Klebsiella pneumoniae and Acinetobacter baumannii datasets, with whole genome sequence data and antibiotic susceptibility phenotypes using set covering machine classification, random forest classification, and random forest regression models to predict resistance phenotypes from genotype. We demonstrate how model performance varies by drug, dataset, resistance metric, and species, reflecting the complexities of generating clinically relevant conclusions from machine learning-derived models. Our findings underscore the importance of incorporating relevant biological and epidemiological knowledge into model design and assessment and suggest that doing so can inform tailored modeling for individual drugs, pathogens, and clinical populations. We further suggest that continued comprehensive sampling and incorporation of up-to-date whole genome sequence data, resistance phenotypes, and treatment outcome data into model training will be crucial to the clinical utility and sustainability of machine learning-based molecular diagnostics.https://doi.org/10.1371/journal.pcbi.1007349
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Allison L Hicks Nicole Wheeler Leonor Sánchez-Busó Jennifer L Rakeman Simon R Harris Yonatan H Grad
spellingShingle	Allison L Hicks Nicole Wheeler Leonor Sánchez-Busó Jennifer L Rakeman Simon R Harris Yonatan H Grad Evaluation of parameters affecting performance and reliability of machine learning-based antibiotic susceptibility testing from whole genome sequencing data. PLoS Computational Biology
author_facet	Allison L Hicks Nicole Wheeler Leonor Sánchez-Busó Jennifer L Rakeman Simon R Harris Yonatan H Grad
author_sort	Allison L Hicks
title	Evaluation of parameters affecting performance and reliability of machine learning-based antibiotic susceptibility testing from whole genome sequencing data.
title_short	Evaluation of parameters affecting performance and reliability of machine learning-based antibiotic susceptibility testing from whole genome sequencing data.
title_full	Evaluation of parameters affecting performance and reliability of machine learning-based antibiotic susceptibility testing from whole genome sequencing data.
title_fullStr	Evaluation of parameters affecting performance and reliability of machine learning-based antibiotic susceptibility testing from whole genome sequencing data.
title_full_unstemmed	Evaluation of parameters affecting performance and reliability of machine learning-based antibiotic susceptibility testing from whole genome sequencing data.
title_sort	evaluation of parameters affecting performance and reliability of machine learning-based antibiotic susceptibility testing from whole genome sequencing data.
publisher	Public Library of Science (PLoS)
series	PLoS Computational Biology
issn	1553-734X 1553-7358
publishDate	2019-09-01
description	Prediction of antibiotic resistance phenotypes from whole genome sequencing data by machine learning methods has been proposed as a promising platform for the development of sequence-based diagnostics. However, there has been no systematic evaluation of factors that may influence performance of such models, how they might apply to and vary across clinical populations, and what the implications might be in the clinical setting. Here, we performed a meta-analysis of seven large Neisseria gonorrhoeae datasets, as well as Klebsiella pneumoniae and Acinetobacter baumannii datasets, with whole genome sequence data and antibiotic susceptibility phenotypes using set covering machine classification, random forest classification, and random forest regression models to predict resistance phenotypes from genotype. We demonstrate how model performance varies by drug, dataset, resistance metric, and species, reflecting the complexities of generating clinically relevant conclusions from machine learning-derived models. Our findings underscore the importance of incorporating relevant biological and epidemiological knowledge into model design and assessment and suggest that doing so can inform tailored modeling for individual drugs, pathogens, and clinical populations. We further suggest that continued comprehensive sampling and incorporation of up-to-date whole genome sequence data, resistance phenotypes, and treatment outcome data into model training will be crucial to the clinical utility and sustainability of machine learning-based molecular diagnostics.
url	https://doi.org/10.1371/journal.pcbi.1007349
work_keys_str_mv	AT allisonlhicks evaluationofparametersaffectingperformanceandreliabilityofmachinelearningbasedantibioticsusceptibilitytestingfromwholegenomesequencingdata AT nicolewheeler evaluationofparametersaffectingperformanceandreliabilityofmachinelearningbasedantibioticsusceptibilitytestingfromwholegenomesequencingdata AT leonorsanchezbuso evaluationofparametersaffectingperformanceandreliabilityofmachinelearningbasedantibioticsusceptibilitytestingfromwholegenomesequencingdata AT jenniferlrakeman evaluationofparametersaffectingperformanceandreliabilityofmachinelearningbasedantibioticsusceptibilitytestingfromwholegenomesequencingdata AT simonrharris evaluationofparametersaffectingperformanceandreliabilityofmachinelearningbasedantibioticsusceptibilitytestingfromwholegenomesequencingdata AT yonatanhgrad evaluationofparametersaffectingperformanceandreliabilityofmachinelearningbasedantibioticsusceptibilitytestingfromwholegenomesequencingdata
_version_	1714666997828026368

Evaluation of parameters affecting performance and reliability of machine learning-based antibiotic susceptibility testing from whole genome sequencing data.

Similar Items