Breast cancer risk prediction in African women using Random Forest Classifier

Introduction: One of the most important steps in combating breast cancer is early and accurate diagnosis. Unfortunately, breast cancer is asymptomatic at the early stage, although some symptoms are presented at a later time, but at symptomatic stage treatment could be complicated or even become impo...

Full description

Bibliographic Details
Published in:Cancer Treatment and Research Communications
Main Authors: Babafemi Oluropo Macaulay, Benjamin Segun Aribisala, Soji Alabi Akande, Boluwaji Ade Akinnuwesi, Olusola Aanu Olabanjo
Format: Article
Language:English
Published: Elsevier 2021-01-01
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2468294221000940
_version_ 1856929331964542976
author Babafemi Oluropo Macaulay
Benjamin Segun Aribisala
Soji Alabi Akande
Boluwaji Ade Akinnuwesi
Olusola Aanu Olabanjo
author_facet Babafemi Oluropo Macaulay
Benjamin Segun Aribisala
Soji Alabi Akande
Boluwaji Ade Akinnuwesi
Olusola Aanu Olabanjo
author_sort Babafemi Oluropo Macaulay
collection DOAJ
container_title Cancer Treatment and Research Communications
description Introduction: One of the most important steps in combating breast cancer is early and accurate diagnosis. Unfortunately, breast cancer is asymptomatic at the early stage, although some symptoms are presented at a later time, but at symptomatic stage treatment could be complicated or even become impossible thereby leading to death. Proper risk assessment is hence very important in reducing mortality. Some computational techniques have been developed for breast cancer risk assessment in the developed world, but such techniques do not work well in Africa because of the difference in risk profiles of African women e.g. later menarche, low drug abuse and low smoking rate. Aim: In this work, we propose a bespoke risk prediction model for African women using Random Forest Classifier (RFC) machine learning technique. Methods: A total of 180 subjects were studied out of which 90 were confirmed cases of breast cancer and 90 were benign. Twenty-five risk factors were included, for example, smoking, alcohol intake, occupational hazards and age at menopause. Four approaches were empirically used in the feature selection, these are the use of Chi-Square, mutual information gain, Spearman correlation and the entire features. RFC algorithm was used to develop the prediction model. Results: We found that family history of breast cancer, dense breast, deliberate abortion, age at first child, fruit intake and regular exercise are predictors of breast cancer. The RFC model gave an accuracy of 91.67%, sensitivity of 87.10%, specificity of 96.55% and Area under curve (AUC) of 92% when all the risk factors were included in the model while an accuracy of 96.67%, sensitivity of 93.75%, specificity of 100% and AUC of 97% were obtained when correlation-selected features were included in the model. The Chi-Square selected features gave the best performance with 98.33% accuracy, 100% sensitivity, 96.55 specificity and 98% AUC. Mutual information gain selected feature gave the same results as Chi-Square selected features. Conclusion: Random Forest Classifier has a good potential at predicting the risk of breast cancer in African women. The study helped to identify the risk factors of breast cancer in African women. This is a valuable information which can help African women to pay attention to those risk factors with the intention of reducing the incidence of breast cancer in Africa.
format Article
id doaj-art-e7eb63e5fc9644ebb7f949e397dd00a0
institution Directory of Open Access Journals
issn 2468-2942
language English
publishDate 2021-01-01
publisher Elsevier
record_format Article
spelling doaj-art-e7eb63e5fc9644ebb7f949e397dd00a02025-08-19T20:13:57ZengElsevierCancer Treatment and Research Communications2468-29422021-01-012810039610.1016/j.ctarc.2021.100396Breast cancer risk prediction in African women using Random Forest ClassifierBabafemi Oluropo Macaulay0Benjamin Segun Aribisala1Soji Alabi Akande2Boluwaji Ade Akinnuwesi3Olusola Aanu Olabanjo4Department of Computer Science, Lagos State University, NigeriaDepartment of Computer Science, Lagos State University, Nigeria; Corresponding author: Department of Computer Science, Lagos State University, Nigeria.Department of Surgery, Lagos State University Teaching Hospital, NigeriaDepartment of Computer Science, Lagos State University, NigeriaDepartment of Computer Science, Lagos State University, NigeriaIntroduction: One of the most important steps in combating breast cancer is early and accurate diagnosis. Unfortunately, breast cancer is asymptomatic at the early stage, although some symptoms are presented at a later time, but at symptomatic stage treatment could be complicated or even become impossible thereby leading to death. Proper risk assessment is hence very important in reducing mortality. Some computational techniques have been developed for breast cancer risk assessment in the developed world, but such techniques do not work well in Africa because of the difference in risk profiles of African women e.g. later menarche, low drug abuse and low smoking rate. Aim: In this work, we propose a bespoke risk prediction model for African women using Random Forest Classifier (RFC) machine learning technique. Methods: A total of 180 subjects were studied out of which 90 were confirmed cases of breast cancer and 90 were benign. Twenty-five risk factors were included, for example, smoking, alcohol intake, occupational hazards and age at menopause. Four approaches were empirically used in the feature selection, these are the use of Chi-Square, mutual information gain, Spearman correlation and the entire features. RFC algorithm was used to develop the prediction model. Results: We found that family history of breast cancer, dense breast, deliberate abortion, age at first child, fruit intake and regular exercise are predictors of breast cancer. The RFC model gave an accuracy of 91.67%, sensitivity of 87.10%, specificity of 96.55% and Area under curve (AUC) of 92% when all the risk factors were included in the model while an accuracy of 96.67%, sensitivity of 93.75%, specificity of 100% and AUC of 97% were obtained when correlation-selected features were included in the model. The Chi-Square selected features gave the best performance with 98.33% accuracy, 100% sensitivity, 96.55 specificity and 98% AUC. Mutual information gain selected feature gave the same results as Chi-Square selected features. Conclusion: Random Forest Classifier has a good potential at predicting the risk of breast cancer in African women. The study helped to identify the risk factors of breast cancer in African women. This is a valuable information which can help African women to pay attention to those risk factors with the intention of reducing the incidence of breast cancer in Africa.http://www.sciencedirect.com/science/article/pii/S2468294221000940Breast cancerRandom forestMachine learningAfrican womenRisk predictionFeature selection
spellingShingle Babafemi Oluropo Macaulay
Benjamin Segun Aribisala
Soji Alabi Akande
Boluwaji Ade Akinnuwesi
Olusola Aanu Olabanjo
Breast cancer risk prediction in African women using Random Forest Classifier
Breast cancer
Random forest
Machine learning
African women
Risk prediction
Feature selection
title Breast cancer risk prediction in African women using Random Forest Classifier
title_full Breast cancer risk prediction in African women using Random Forest Classifier
title_fullStr Breast cancer risk prediction in African women using Random Forest Classifier
title_full_unstemmed Breast cancer risk prediction in African women using Random Forest Classifier
title_short Breast cancer risk prediction in African women using Random Forest Classifier
title_sort breast cancer risk prediction in african women using random forest classifier
topic Breast cancer
Random forest
Machine learning
African women
Risk prediction
Feature selection
url http://www.sciencedirect.com/science/article/pii/S2468294221000940
work_keys_str_mv AT babafemioluropomacaulay breastcancerriskpredictioninafricanwomenusingrandomforestclassifier
AT benjaminsegunaribisala breastcancerriskpredictioninafricanwomenusingrandomforestclassifier
AT sojialabiakande breastcancerriskpredictioninafricanwomenusingrandomforestclassifier
AT boluwajiadeakinnuwesi breastcancerriskpredictioninafricanwomenusingrandomforestclassifier
AT olusolaaanuolabanjo breastcancerriskpredictioninafricanwomenusingrandomforestclassifier