Oropharyngeal cancer patient stratification using random forest based-learning over high-dimensional radiomic features

Abstract To improve risk prediction for oropharyngeal cancer (OPC) patients using cluster analysis on the radiomic features extracted from pre-treatment Computed Tomography (CT) scans. 553 OPC Patients randomly split into training (80%) and validation (20%), were classified into 2 or 3 risk groups b...

Full description

Bibliographic Details
Main Authors: Harsh Patel, David M. Vock, G. Elisabeta Marai, Clifton D. Fuller, Abdallah S. R. Mohamed, Guadalupe Canahuate
Format: Article
Language:English
Published: Nature Publishing Group 2021-07-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-021-92072-8
id doaj-7839f9f79de3480ebfb1a1b48ee86d6d
record_format Article
spelling doaj-7839f9f79de3480ebfb1a1b48ee86d6d2021-07-11T11:27:23ZengNature Publishing GroupScientific Reports2045-23222021-07-0111111110.1038/s41598-021-92072-8Oropharyngeal cancer patient stratification using random forest based-learning over high-dimensional radiomic featuresHarsh Patel0David M. Vock1G. Elisabeta Marai2Clifton D. Fuller3Abdallah S. R. Mohamed4Guadalupe Canahuate5Department of Electrical and Computer Engineering, University of IowaDivision of Biostatistics, University of MinnesotaDepartment of Department of Computer Science, University of Illinois at ChicagoDepartment of Radiation Oncology, MD Anderson Cancer CenterDepartment of Radiation Oncology, MD Anderson Cancer CenterDepartment of Electrical and Computer Engineering, University of IowaAbstract To improve risk prediction for oropharyngeal cancer (OPC) patients using cluster analysis on the radiomic features extracted from pre-treatment Computed Tomography (CT) scans. 553 OPC Patients randomly split into training (80%) and validation (20%), were classified into 2 or 3 risk groups by applying hierarchical clustering over the co-occurrence matrix obtained from a random survival forest (RSF) trained over 301 radiomic features. The cluster label was included together with other clinical data to train an ensemble model using five predictive models (Cox, random forest, RSF, logistic regression, and logistic-elastic net). Ensemble performance was evaluated over the independent test set for both recurrence free survival (RFS) and overall survival (OS). The Kaplan–Meier curves for OS stratified by cluster label show significant differences for both training and testing (p val < 0.0001). When compared to the models trained using clinical data only, the inclusion of the cluster label improves AUC test performance from .62 to .79 and from .66 to .80 for OS and RFS, respectively. The extraction of a single feature, namely a cluster label, to represent the high-dimensional radiomic feature space reduces the dimensionality and sparsity of the data. Moreover, inclusion of the cluster label improves model performance compared to clinical data only and offers comparable performance to the models including raw radiomic features.https://doi.org/10.1038/s41598-021-92072-8
collection DOAJ
language English
format Article
sources DOAJ
author Harsh Patel
David M. Vock
G. Elisabeta Marai
Clifton D. Fuller
Abdallah S. R. Mohamed
Guadalupe Canahuate
spellingShingle Harsh Patel
David M. Vock
G. Elisabeta Marai
Clifton D. Fuller
Abdallah S. R. Mohamed
Guadalupe Canahuate
Oropharyngeal cancer patient stratification using random forest based-learning over high-dimensional radiomic features
Scientific Reports
author_facet Harsh Patel
David M. Vock
G. Elisabeta Marai
Clifton D. Fuller
Abdallah S. R. Mohamed
Guadalupe Canahuate
author_sort Harsh Patel
title Oropharyngeal cancer patient stratification using random forest based-learning over high-dimensional radiomic features
title_short Oropharyngeal cancer patient stratification using random forest based-learning over high-dimensional radiomic features
title_full Oropharyngeal cancer patient stratification using random forest based-learning over high-dimensional radiomic features
title_fullStr Oropharyngeal cancer patient stratification using random forest based-learning over high-dimensional radiomic features
title_full_unstemmed Oropharyngeal cancer patient stratification using random forest based-learning over high-dimensional radiomic features
title_sort oropharyngeal cancer patient stratification using random forest based-learning over high-dimensional radiomic features
publisher Nature Publishing Group
series Scientific Reports
issn 2045-2322
publishDate 2021-07-01
description Abstract To improve risk prediction for oropharyngeal cancer (OPC) patients using cluster analysis on the radiomic features extracted from pre-treatment Computed Tomography (CT) scans. 553 OPC Patients randomly split into training (80%) and validation (20%), were classified into 2 or 3 risk groups by applying hierarchical clustering over the co-occurrence matrix obtained from a random survival forest (RSF) trained over 301 radiomic features. The cluster label was included together with other clinical data to train an ensemble model using five predictive models (Cox, random forest, RSF, logistic regression, and logistic-elastic net). Ensemble performance was evaluated over the independent test set for both recurrence free survival (RFS) and overall survival (OS). The Kaplan–Meier curves for OS stratified by cluster label show significant differences for both training and testing (p val < 0.0001). When compared to the models trained using clinical data only, the inclusion of the cluster label improves AUC test performance from .62 to .79 and from .66 to .80 for OS and RFS, respectively. The extraction of a single feature, namely a cluster label, to represent the high-dimensional radiomic feature space reduces the dimensionality and sparsity of the data. Moreover, inclusion of the cluster label improves model performance compared to clinical data only and offers comparable performance to the models including raw radiomic features.
url https://doi.org/10.1038/s41598-021-92072-8
work_keys_str_mv AT harshpatel oropharyngealcancerpatientstratificationusingrandomforestbasedlearningoverhighdimensionalradiomicfeatures
AT davidmvock oropharyngealcancerpatientstratificationusingrandomforestbasedlearningoverhighdimensionalradiomicfeatures
AT gelisabetamarai oropharyngealcancerpatientstratificationusingrandomforestbasedlearningoverhighdimensionalradiomicfeatures
AT cliftondfuller oropharyngealcancerpatientstratificationusingrandomforestbasedlearningoverhighdimensionalradiomicfeatures
AT abdallahsrmohamed oropharyngealcancerpatientstratificationusingrandomforestbasedlearningoverhighdimensionalradiomicfeatures
AT guadalupecanahuate oropharyngealcancerpatientstratificationusingrandomforestbasedlearningoverhighdimensionalradiomicfeatures
_version_ 1721309020620324864