An improved clear cell renal cell carcinoma stage prediction model based on gene sets

Abstract Background Clear cell renal cell carcinoma (ccRCC) is the most common subtype of renal cell carcinoma and accounts for cancer-related deaths. Survival rates are very low when the tumor is discovered in the late-stage. Thus, developing an efficient strategy to stratify patients by the stage...

Full description

Bibliographic Details
Main Authors: Fangjun Li, Mu Yang, Yunhe Li, Mingqiang Zhang, Wenjuan Wang, Dongfeng Yuan, Dongqi Tang
Format: Article
Language:English
Published: BMC 2020-06-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-020-03543-0
id doaj-b936896bfe6a4391b81ed4226cd87d91
record_format Article
spelling doaj-b936896bfe6a4391b81ed4226cd87d912020-11-25T03:18:24ZengBMCBMC Bioinformatics1471-21052020-06-0121111510.1186/s12859-020-03543-0An improved clear cell renal cell carcinoma stage prediction model based on gene setsFangjun Li0Mu Yang1Yunhe Li2Mingqiang Zhang3Wenjuan Wang4Dongfeng Yuan5Dongqi Tang6School of Information Science and Engineering, Shandong University, supported by Shandong Provincial Key Laboratory of Wireless Communication TechnologiesCenter for Gene and Immunothererapy, The Second Hospital of Shandong UniversitySchool of Information Science and Engineering, Shandong University, supported by Shandong Provincial Key Laboratory of Wireless Communication TechnologiesSchool of Information Science and Engineering, Shandong University, supported by Shandong Provincial Key Laboratory of Wireless Communication TechnologiesCenter for Gene and Immunothererapy, The Second Hospital of Shandong UniversitySchool of Information Science and Engineering, Shandong University, supported by Shandong Provincial Key Laboratory of Wireless Communication TechnologiesCenter for Gene and Immunothererapy, The Second Hospital of Shandong UniversityAbstract Background Clear cell renal cell carcinoma (ccRCC) is the most common subtype of renal cell carcinoma and accounts for cancer-related deaths. Survival rates are very low when the tumor is discovered in the late-stage. Thus, developing an efficient strategy to stratify patients by the stage of the cancer and inner mechanisms that drive the development and progression of cancers is critical in early prevention and treatment. Results In this study, we developed new strategies to extract important gene features and trained machine learning-based classifiers to predict stages of ccRCC samples. The novelty of our approach is that (i) We improved the feature preprocessing procedure by binning and coding, and increased the stability of data and robustness of the classification model. (ii) We proposed a joint gene selection algorithm by combining the Fast-Correlation-Based Filter (FCBF) search with the information value, the linear correlation coefficient, and variance inflation factor, and removed irrelevant/redundant features. Then the logistic regression-based feature selection method was used to determine influencing factors. (iii) Classification models were developed using machine learning algorithms. This method is evaluated on RNA expression value of clear cell renal cell carcinoma derived from The Cancer Genome Atlas (TCGA). The results showed that the result on the testing set (accuracy of 81.15% and AUC 0.86) outperformed state-of-the-art models (accuracy of 72.64% and AUC 0.81) and a gene set FJL-set was developed, which contained 23 genes, far less than 64. Furthermore, a gene function analysis was used to explore molecular mechanisms that might affect cancer development. Conclusions The results suggested that our model can extract more prognostic information, and is worthy of further investigation and validation in order to understand the progression mechanism.http://link.springer.com/article/10.1186/s12859-020-03543-0Feature selectionMachine learningClear cell renal cell carcinomaCancer stage
collection DOAJ
language English
format Article
sources DOAJ
author Fangjun Li
Mu Yang
Yunhe Li
Mingqiang Zhang
Wenjuan Wang
Dongfeng Yuan
Dongqi Tang
spellingShingle Fangjun Li
Mu Yang
Yunhe Li
Mingqiang Zhang
Wenjuan Wang
Dongfeng Yuan
Dongqi Tang
An improved clear cell renal cell carcinoma stage prediction model based on gene sets
BMC Bioinformatics
Feature selection
Machine learning
Clear cell renal cell carcinoma
Cancer stage
author_facet Fangjun Li
Mu Yang
Yunhe Li
Mingqiang Zhang
Wenjuan Wang
Dongfeng Yuan
Dongqi Tang
author_sort Fangjun Li
title An improved clear cell renal cell carcinoma stage prediction model based on gene sets
title_short An improved clear cell renal cell carcinoma stage prediction model based on gene sets
title_full An improved clear cell renal cell carcinoma stage prediction model based on gene sets
title_fullStr An improved clear cell renal cell carcinoma stage prediction model based on gene sets
title_full_unstemmed An improved clear cell renal cell carcinoma stage prediction model based on gene sets
title_sort improved clear cell renal cell carcinoma stage prediction model based on gene sets
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2020-06-01
description Abstract Background Clear cell renal cell carcinoma (ccRCC) is the most common subtype of renal cell carcinoma and accounts for cancer-related deaths. Survival rates are very low when the tumor is discovered in the late-stage. Thus, developing an efficient strategy to stratify patients by the stage of the cancer and inner mechanisms that drive the development and progression of cancers is critical in early prevention and treatment. Results In this study, we developed new strategies to extract important gene features and trained machine learning-based classifiers to predict stages of ccRCC samples. The novelty of our approach is that (i) We improved the feature preprocessing procedure by binning and coding, and increased the stability of data and robustness of the classification model. (ii) We proposed a joint gene selection algorithm by combining the Fast-Correlation-Based Filter (FCBF) search with the information value, the linear correlation coefficient, and variance inflation factor, and removed irrelevant/redundant features. Then the logistic regression-based feature selection method was used to determine influencing factors. (iii) Classification models were developed using machine learning algorithms. This method is evaluated on RNA expression value of clear cell renal cell carcinoma derived from The Cancer Genome Atlas (TCGA). The results showed that the result on the testing set (accuracy of 81.15% and AUC 0.86) outperformed state-of-the-art models (accuracy of 72.64% and AUC 0.81) and a gene set FJL-set was developed, which contained 23 genes, far less than 64. Furthermore, a gene function analysis was used to explore molecular mechanisms that might affect cancer development. Conclusions The results suggested that our model can extract more prognostic information, and is worthy of further investigation and validation in order to understand the progression mechanism.
topic Feature selection
Machine learning
Clear cell renal cell carcinoma
Cancer stage
url http://link.springer.com/article/10.1186/s12859-020-03543-0
work_keys_str_mv AT fangjunli animprovedclearcellrenalcellcarcinomastagepredictionmodelbasedongenesets
AT muyang animprovedclearcellrenalcellcarcinomastagepredictionmodelbasedongenesets
AT yunheli animprovedclearcellrenalcellcarcinomastagepredictionmodelbasedongenesets
AT mingqiangzhang animprovedclearcellrenalcellcarcinomastagepredictionmodelbasedongenesets
AT wenjuanwang animprovedclearcellrenalcellcarcinomastagepredictionmodelbasedongenesets
AT dongfengyuan animprovedclearcellrenalcellcarcinomastagepredictionmodelbasedongenesets
AT dongqitang animprovedclearcellrenalcellcarcinomastagepredictionmodelbasedongenesets
AT fangjunli improvedclearcellrenalcellcarcinomastagepredictionmodelbasedongenesets
AT muyang improvedclearcellrenalcellcarcinomastagepredictionmodelbasedongenesets
AT yunheli improvedclearcellrenalcellcarcinomastagepredictionmodelbasedongenesets
AT mingqiangzhang improvedclearcellrenalcellcarcinomastagepredictionmodelbasedongenesets
AT wenjuanwang improvedclearcellrenalcellcarcinomastagepredictionmodelbasedongenesets
AT dongfengyuan improvedclearcellrenalcellcarcinomastagepredictionmodelbasedongenesets
AT dongqitang improvedclearcellrenalcellcarcinomastagepredictionmodelbasedongenesets
_version_ 1724626871782473728