Next-Generation Diagnostics: The Impact of Synthetic Data Generation on the Detection of Breast Cancer from Ultrasound Imaging

Breast cancer is one of the most lethal and widespread diseases affecting women worldwide. As a result, it is necessary to diagnose breast cancer accurately and efficiently utilizing the most cost-effective and widely used methods. In this research, we demonstrated that synthetically created high-qu...

Full description

Bibliographic Details
Published in:Mathematics
Main Authors: Hari Mohan Rai, Serhii Dashkevych, Joon Yoo
Format: Article
Language:English
Published: MDPI AG 2024-09-01
Subjects:
Online Access:https://www.mdpi.com/2227-7390/12/18/2808
_version_ 1850305787990114304
author Hari Mohan Rai
Serhii Dashkevych
Joon Yoo
author_facet Hari Mohan Rai
Serhii Dashkevych
Joon Yoo
author_sort Hari Mohan Rai
collection DOAJ
container_title Mathematics
description Breast cancer is one of the most lethal and widespread diseases affecting women worldwide. As a result, it is necessary to diagnose breast cancer accurately and efficiently utilizing the most cost-effective and widely used methods. In this research, we demonstrated that synthetically created high-quality ultrasound data outperformed conventional augmentation strategies for efficiently diagnosing breast cancer using deep learning. We trained a deep-learning model using the EfficientNet-B7 architecture and a large dataset of 3186 ultrasound images acquired from multiple publicly available sources, as well as 10,000 synthetically generated images using generative adversarial networks (StyleGAN3). The model was trained using five-fold cross-validation techniques and validated using four metrics: accuracy, recall, precision, and the F1 score measure. The results showed that integrating synthetically produced data into the training set increased the classification accuracy from 88.72% to 92.01% based on the F1 score, demonstrating the power of generative models to expand and improve the quality of training datasets in medical-imaging applications. This demonstrated that training the model using a larger set of data comprising synthetic images significantly improved its performance by more than 3% over the genuine dataset with common augmentation. Various data augmentation procedures were also investigated to improve the training set’s diversity and representativeness. This research emphasizes the relevance of using modern artificial intelligence and machine-learning technologies in medical imaging by providing an effective strategy for categorizing ultrasound images, which may lead to increased diagnostic accuracy and optimal treatment options. The proposed techniques are highly promising and have strong potential for future clinical application in the diagnosis of breast cancer.
format Article
id doaj-art-8f7554d8a8e347fd8f2851c59cd601b2
institution Directory of Open Access Journals
issn 2227-7390
language English
publishDate 2024-09-01
publisher MDPI AG
record_format Article
spelling doaj-art-8f7554d8a8e347fd8f2851c59cd601b22025-08-19T23:29:16ZengMDPI AGMathematics2227-73902024-09-011218280810.3390/math12182808Next-Generation Diagnostics: The Impact of Synthetic Data Generation on the Detection of Breast Cancer from Ultrasound ImagingHari Mohan Rai0Serhii Dashkevych1Joon Yoo2School of Computing, Gachon University, 1342 Seongnam-daero, Sujeong-gu, Seongnam-si 13120, Gyeonggi-do, Republic of KoreaDepartment of Computer Engineering, Vistula University, Stokłosy 3, 02-787 Warszawa, PolandSchool of Computing, Gachon University, 1342 Seongnam-daero, Sujeong-gu, Seongnam-si 13120, Gyeonggi-do, Republic of KoreaBreast cancer is one of the most lethal and widespread diseases affecting women worldwide. As a result, it is necessary to diagnose breast cancer accurately and efficiently utilizing the most cost-effective and widely used methods. In this research, we demonstrated that synthetically created high-quality ultrasound data outperformed conventional augmentation strategies for efficiently diagnosing breast cancer using deep learning. We trained a deep-learning model using the EfficientNet-B7 architecture and a large dataset of 3186 ultrasound images acquired from multiple publicly available sources, as well as 10,000 synthetically generated images using generative adversarial networks (StyleGAN3). The model was trained using five-fold cross-validation techniques and validated using four metrics: accuracy, recall, precision, and the F1 score measure. The results showed that integrating synthetically produced data into the training set increased the classification accuracy from 88.72% to 92.01% based on the F1 score, demonstrating the power of generative models to expand and improve the quality of training datasets in medical-imaging applications. This demonstrated that training the model using a larger set of data comprising synthetic images significantly improved its performance by more than 3% over the genuine dataset with common augmentation. Various data augmentation procedures were also investigated to improve the training set’s diversity and representativeness. This research emphasizes the relevance of using modern artificial intelligence and machine-learning technologies in medical imaging by providing an effective strategy for categorizing ultrasound images, which may lead to increased diagnostic accuracy and optimal treatment options. The proposed techniques are highly promising and have strong potential for future clinical application in the diagnosis of breast cancer.https://www.mdpi.com/2227-7390/12/18/2808ultrasound imagingbreast cancer detectiondeep learningsynthetic dataset generationStyleGAN3EffiecientNet-B7
spellingShingle Hari Mohan Rai
Serhii Dashkevych
Joon Yoo
Next-Generation Diagnostics: The Impact of Synthetic Data Generation on the Detection of Breast Cancer from Ultrasound Imaging
ultrasound imaging
breast cancer detection
deep learning
synthetic dataset generation
StyleGAN3
EffiecientNet-B7
title Next-Generation Diagnostics: The Impact of Synthetic Data Generation on the Detection of Breast Cancer from Ultrasound Imaging
title_full Next-Generation Diagnostics: The Impact of Synthetic Data Generation on the Detection of Breast Cancer from Ultrasound Imaging
title_fullStr Next-Generation Diagnostics: The Impact of Synthetic Data Generation on the Detection of Breast Cancer from Ultrasound Imaging
title_full_unstemmed Next-Generation Diagnostics: The Impact of Synthetic Data Generation on the Detection of Breast Cancer from Ultrasound Imaging
title_short Next-Generation Diagnostics: The Impact of Synthetic Data Generation on the Detection of Breast Cancer from Ultrasound Imaging
title_sort next generation diagnostics the impact of synthetic data generation on the detection of breast cancer from ultrasound imaging
topic ultrasound imaging
breast cancer detection
deep learning
synthetic dataset generation
StyleGAN3
EffiecientNet-B7
url https://www.mdpi.com/2227-7390/12/18/2808
work_keys_str_mv AT harimohanrai nextgenerationdiagnosticstheimpactofsyntheticdatagenerationonthedetectionofbreastcancerfromultrasoundimaging
AT serhiidashkevych nextgenerationdiagnosticstheimpactofsyntheticdatagenerationonthedetectionofbreastcancerfromultrasoundimaging
AT joonyoo nextgenerationdiagnosticstheimpactofsyntheticdatagenerationonthedetectionofbreastcancerfromultrasoundimaging