Breast cancer risk prediction in African women using Random Forest Classifier
Introduction: One of the most important steps in combating breast cancer is early and accurate diagnosis. Unfortunately, breast cancer is asymptomatic at the early stage, although some symptoms are presented at a later time, but at symptomatic stage treatment could be complicated or even become impo...
| Published in: | Cancer Treatment and Research Communications |
|---|---|
| Main Authors: | , , , , |
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2021-01-01
|
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2468294221000940 |
| _version_ | 1856929331964542976 |
|---|---|
| author | Babafemi Oluropo Macaulay Benjamin Segun Aribisala Soji Alabi Akande Boluwaji Ade Akinnuwesi Olusola Aanu Olabanjo |
| author_facet | Babafemi Oluropo Macaulay Benjamin Segun Aribisala Soji Alabi Akande Boluwaji Ade Akinnuwesi Olusola Aanu Olabanjo |
| author_sort | Babafemi Oluropo Macaulay |
| collection | DOAJ |
| container_title | Cancer Treatment and Research Communications |
| description | Introduction: One of the most important steps in combating breast cancer is early and accurate diagnosis. Unfortunately, breast cancer is asymptomatic at the early stage, although some symptoms are presented at a later time, but at symptomatic stage treatment could be complicated or even become impossible thereby leading to death. Proper risk assessment is hence very important in reducing mortality. Some computational techniques have been developed for breast cancer risk assessment in the developed world, but such techniques do not work well in Africa because of the difference in risk profiles of African women e.g. later menarche, low drug abuse and low smoking rate. Aim: In this work, we propose a bespoke risk prediction model for African women using Random Forest Classifier (RFC) machine learning technique. Methods: A total of 180 subjects were studied out of which 90 were confirmed cases of breast cancer and 90 were benign. Twenty-five risk factors were included, for example, smoking, alcohol intake, occupational hazards and age at menopause. Four approaches were empirically used in the feature selection, these are the use of Chi-Square, mutual information gain, Spearman correlation and the entire features. RFC algorithm was used to develop the prediction model. Results: We found that family history of breast cancer, dense breast, deliberate abortion, age at first child, fruit intake and regular exercise are predictors of breast cancer. The RFC model gave an accuracy of 91.67%, sensitivity of 87.10%, specificity of 96.55% and Area under curve (AUC) of 92% when all the risk factors were included in the model while an accuracy of 96.67%, sensitivity of 93.75%, specificity of 100% and AUC of 97% were obtained when correlation-selected features were included in the model. The Chi-Square selected features gave the best performance with 98.33% accuracy, 100% sensitivity, 96.55 specificity and 98% AUC. Mutual information gain selected feature gave the same results as Chi-Square selected features. Conclusion: Random Forest Classifier has a good potential at predicting the risk of breast cancer in African women. The study helped to identify the risk factors of breast cancer in African women. This is a valuable information which can help African women to pay attention to those risk factors with the intention of reducing the incidence of breast cancer in Africa. |
| format | Article |
| id | doaj-art-e7eb63e5fc9644ebb7f949e397dd00a0 |
| institution | Directory of Open Access Journals |
| issn | 2468-2942 |
| language | English |
| publishDate | 2021-01-01 |
| publisher | Elsevier |
| record_format | Article |
| spelling | doaj-art-e7eb63e5fc9644ebb7f949e397dd00a02025-08-19T20:13:57ZengElsevierCancer Treatment and Research Communications2468-29422021-01-012810039610.1016/j.ctarc.2021.100396Breast cancer risk prediction in African women using Random Forest ClassifierBabafemi Oluropo Macaulay0Benjamin Segun Aribisala1Soji Alabi Akande2Boluwaji Ade Akinnuwesi3Olusola Aanu Olabanjo4Department of Computer Science, Lagos State University, NigeriaDepartment of Computer Science, Lagos State University, Nigeria; Corresponding author: Department of Computer Science, Lagos State University, Nigeria.Department of Surgery, Lagos State University Teaching Hospital, NigeriaDepartment of Computer Science, Lagos State University, NigeriaDepartment of Computer Science, Lagos State University, NigeriaIntroduction: One of the most important steps in combating breast cancer is early and accurate diagnosis. Unfortunately, breast cancer is asymptomatic at the early stage, although some symptoms are presented at a later time, but at symptomatic stage treatment could be complicated or even become impossible thereby leading to death. Proper risk assessment is hence very important in reducing mortality. Some computational techniques have been developed for breast cancer risk assessment in the developed world, but such techniques do not work well in Africa because of the difference in risk profiles of African women e.g. later menarche, low drug abuse and low smoking rate. Aim: In this work, we propose a bespoke risk prediction model for African women using Random Forest Classifier (RFC) machine learning technique. Methods: A total of 180 subjects were studied out of which 90 were confirmed cases of breast cancer and 90 were benign. Twenty-five risk factors were included, for example, smoking, alcohol intake, occupational hazards and age at menopause. Four approaches were empirically used in the feature selection, these are the use of Chi-Square, mutual information gain, Spearman correlation and the entire features. RFC algorithm was used to develop the prediction model. Results: We found that family history of breast cancer, dense breast, deliberate abortion, age at first child, fruit intake and regular exercise are predictors of breast cancer. The RFC model gave an accuracy of 91.67%, sensitivity of 87.10%, specificity of 96.55% and Area under curve (AUC) of 92% when all the risk factors were included in the model while an accuracy of 96.67%, sensitivity of 93.75%, specificity of 100% and AUC of 97% were obtained when correlation-selected features were included in the model. The Chi-Square selected features gave the best performance with 98.33% accuracy, 100% sensitivity, 96.55 specificity and 98% AUC. Mutual information gain selected feature gave the same results as Chi-Square selected features. Conclusion: Random Forest Classifier has a good potential at predicting the risk of breast cancer in African women. The study helped to identify the risk factors of breast cancer in African women. This is a valuable information which can help African women to pay attention to those risk factors with the intention of reducing the incidence of breast cancer in Africa.http://www.sciencedirect.com/science/article/pii/S2468294221000940Breast cancerRandom forestMachine learningAfrican womenRisk predictionFeature selection |
| spellingShingle | Babafemi Oluropo Macaulay Benjamin Segun Aribisala Soji Alabi Akande Boluwaji Ade Akinnuwesi Olusola Aanu Olabanjo Breast cancer risk prediction in African women using Random Forest Classifier Breast cancer Random forest Machine learning African women Risk prediction Feature selection |
| title | Breast cancer risk prediction in African women using Random Forest Classifier |
| title_full | Breast cancer risk prediction in African women using Random Forest Classifier |
| title_fullStr | Breast cancer risk prediction in African women using Random Forest Classifier |
| title_full_unstemmed | Breast cancer risk prediction in African women using Random Forest Classifier |
| title_short | Breast cancer risk prediction in African women using Random Forest Classifier |
| title_sort | breast cancer risk prediction in african women using random forest classifier |
| topic | Breast cancer Random forest Machine learning African women Risk prediction Feature selection |
| url | http://www.sciencedirect.com/science/article/pii/S2468294221000940 |
| work_keys_str_mv | AT babafemioluropomacaulay breastcancerriskpredictioninafricanwomenusingrandomforestclassifier AT benjaminsegunaribisala breastcancerriskpredictioninafricanwomenusingrandomforestclassifier AT sojialabiakande breastcancerriskpredictioninafricanwomenusingrandomforestclassifier AT boluwajiadeakinnuwesi breastcancerriskpredictioninafricanwomenusingrandomforestclassifier AT olusolaaanuolabanjo breastcancerriskpredictioninafricanwomenusingrandomforestclassifier |
