Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods

Breast cancer is the most common malignancy in women, and because it has a high mortality rate, it is urgent to develop computational methods to increase the accuracy of breast cancer survival predictive models. Although multi-omics data such as gene expression have been extensively used in recent s...

Full description

Bibliographic Details
Main Authors: Zongzhen He, Junying Zhang, Xiguo Yuan, Yuanyuan Zhang
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-01-01
Series:Frontiers in Genetics
Subjects:
MKL
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2020.632901/full
id doaj-51e4dac65aac44f3a2e4039da1f18b43
record_format Article
spelling doaj-51e4dac65aac44f3a2e4039da1f18b432021-01-18T05:54:10ZengFrontiers Media S.A.Frontiers in Genetics1664-80212021-01-011110.3389/fgene.2020.632901632901Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning MethodsZongzhen He0Junying Zhang1Xiguo Yuan2Yuanyuan Zhang3School of Computer Science and Technology, Xidian University, Xi’an, ChinaSchool of Computer Science and Technology, Xidian University, Xi’an, ChinaSchool of Computer Science and Technology, Xidian University, Xi’an, ChinaSchool of Information and Control Engineering, Qingdao University of Technology, Qingdao, ChinaBreast cancer is the most common malignancy in women, and because it has a high mortality rate, it is urgent to develop computational methods to increase the accuracy of breast cancer survival predictive models. Although multi-omics data such as gene expression have been extensively used in recent studies, the accurate prognosis of breast cancer remains a challenge. Somatic mutations are another important and promising data source for studying cancer development, and its effect on the prognosis of breast cancer remains to be further explored. Meanwhile, these omics datasets are high-dimensional and redundant. Therefore, we adopted multiple kernel learning (MKL) to efficiently integrate somatic mutation to currently molecular data including gene expression, copy number variation (CNV), methylation, and protein expression data for the prediction of breast cancer survival. Before integration, the maximum relevance minimum redundancy (mRMR) feature selection method was utilized to select features that present high relevance to survival and low redundancy among themselves for each type of data. The experimental results demonstrated that the proposed method achieved the most optimal performance and there was a remarkable improvement in the prediction performance when somatic mutations were included, indicating that somatic mutations are critical for improving breast cancer survival predictions. Moreover, mRMR was superior to other feature selection methods used in previous studies. Furthermore, MKL outperformed the other traditional classifiers in multi-omics data integration. Our analysis indicated that through employing promising omics data such as somatic mutations and harnessing the power of proper feature selection methods and effective integration frameworks, the breast cancer survival predictive accuracy can be further increased, thereby providing a more optimal clinical diagnosis and more effective treatment for breast cancer patients.https://www.frontiersin.org/articles/10.3389/fgene.2020.632901/fullbreast cancermulti-omicssurvival predictionsomatic mutationmRMRMKL
collection DOAJ
language English
format Article
sources DOAJ
author Zongzhen He
Junying Zhang
Xiguo Yuan
Yuanyuan Zhang
spellingShingle Zongzhen He
Junying Zhang
Xiguo Yuan
Yuanyuan Zhang
Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods
Frontiers in Genetics
breast cancer
multi-omics
survival prediction
somatic mutation
mRMR
MKL
author_facet Zongzhen He
Junying Zhang
Xiguo Yuan
Yuanyuan Zhang
author_sort Zongzhen He
title Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods
title_short Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods
title_full Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods
title_fullStr Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods
title_full_unstemmed Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods
title_sort integrating somatic mutations for breast cancer survival prediction using machine learning methods
publisher Frontiers Media S.A.
series Frontiers in Genetics
issn 1664-8021
publishDate 2021-01-01
description Breast cancer is the most common malignancy in women, and because it has a high mortality rate, it is urgent to develop computational methods to increase the accuracy of breast cancer survival predictive models. Although multi-omics data such as gene expression have been extensively used in recent studies, the accurate prognosis of breast cancer remains a challenge. Somatic mutations are another important and promising data source for studying cancer development, and its effect on the prognosis of breast cancer remains to be further explored. Meanwhile, these omics datasets are high-dimensional and redundant. Therefore, we adopted multiple kernel learning (MKL) to efficiently integrate somatic mutation to currently molecular data including gene expression, copy number variation (CNV), methylation, and protein expression data for the prediction of breast cancer survival. Before integration, the maximum relevance minimum redundancy (mRMR) feature selection method was utilized to select features that present high relevance to survival and low redundancy among themselves for each type of data. The experimental results demonstrated that the proposed method achieved the most optimal performance and there was a remarkable improvement in the prediction performance when somatic mutations were included, indicating that somatic mutations are critical for improving breast cancer survival predictions. Moreover, mRMR was superior to other feature selection methods used in previous studies. Furthermore, MKL outperformed the other traditional classifiers in multi-omics data integration. Our analysis indicated that through employing promising omics data such as somatic mutations and harnessing the power of proper feature selection methods and effective integration frameworks, the breast cancer survival predictive accuracy can be further increased, thereby providing a more optimal clinical diagnosis and more effective treatment for breast cancer patients.
topic breast cancer
multi-omics
survival prediction
somatic mutation
mRMR
MKL
url https://www.frontiersin.org/articles/10.3389/fgene.2020.632901/full
work_keys_str_mv AT zongzhenhe integratingsomaticmutationsforbreastcancersurvivalpredictionusingmachinelearningmethods
AT junyingzhang integratingsomaticmutationsforbreastcancersurvivalpredictionusingmachinelearningmethods
AT xiguoyuan integratingsomaticmutationsforbreastcancersurvivalpredictionusingmachinelearningmethods
AT yuanyuanzhang integratingsomaticmutationsforbreastcancersurvivalpredictionusingmachinelearningmethods
_version_ 1724333690232766464