Assessment of deep neural networks for the diagnosis of benign and malignant skin neoplasms in comparison with dermatologists: A retrospective validation study.

<h4>Background</h4>The diagnostic performance of convolutional neural networks (CNNs) for diagnosing several types of skin neoplasms has been demonstrated as comparable with that of dermatologists using clinical photography. However, the generalizability should be demonstrated using a la...

Full description

Bibliographic Details
Main Authors:	Seung Seog Han, Ik Jun Moon, Seong Hwan Kim, Jung-Im Na, Myoung Shin Kim, Gyeong Hun Park, Ilwoo Park, Keewon Kim, Woohyung Lim, Ju Hee Lee, Sung Eun Chang
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2020-11-01
Series:	PLoS Medicine
Online Access:	https://doi.org/10.1371/journal.pmed.1003381

id	doaj-0751bff114bd476d9161891ef4cf7400
record_format	Article
spelling	doaj-0751bff114bd476d9161891ef4cf74002021-04-21T18:38:54ZengPublic Library of Science (PLoS)PLoS Medicine1549-12771549-16762020-11-011711e100338110.1371/journal.pmed.1003381Assessment of deep neural networks for the diagnosis of benign and malignant skin neoplasms in comparison with dermatologists: A retrospective validation study.Seung Seog HanIk Jun MoonSeong Hwan KimJung-Im NaMyoung Shin KimGyeong Hun ParkIlwoo ParkKeewon KimWoohyung LimJu Hee LeeSung Eun Chang<h4>Background</h4>The diagnostic performance of convolutional neural networks (CNNs) for diagnosing several types of skin neoplasms has been demonstrated as comparable with that of dermatologists using clinical photography. However, the generalizability should be demonstrated using a large-scale external dataset that includes most types of skin neoplasms. In this study, the performance of a neural network algorithm was compared with that of dermatologists in both real-world practice and experimental settings.<h4>Methods and findings</h4>To demonstrate generalizability, the skin cancer detection algorithm (https://rcnn.modelderm.com) developed in our previous study was used without modification. We conducted a retrospective study with all single lesion biopsied cases (43 disorders; 40,331 clinical images from 10,426 cases: 1,222 malignant cases and 9,204 benign cases); mean age (standard deviation [SD], 52.1 [18.3]; 4,701 men [45.1%]) were obtained from the Department of Dermatology, Severance Hospital in Seoul, Korea between January 1, 2008 and March 31, 2019. Using the external validation dataset, the predictions of the algorithm were compared with the clinical diagnoses of 65 attending physicians who had recorded the clinical diagnoses with thorough examinations in real-world practice. In addition, the results obtained by the algorithm for the data of randomly selected batches of 30 patients were compared with those obtained by 44 dermatologists in experimental settings; the dermatologists were only provided with multiple images of each lesion, without clinical information. With regard to the determination of malignancy, the area under the curve (AUC) achieved by the algorithm was 0.863 (95% confidence interval [CI] 0.852-0.875), when unprocessed clinical photographs were used. The sensitivity and specificity of the algorithm at the predefined high-specificity threshold were 62.7% (95% CI 59.9-65.1) and 90.0% (95% CI 89.4-90.6), respectively. Furthermore, the sensitivity and specificity of the first clinical impression of 65 attending physicians were 70.2% and 95.6%, respectively, which were superior to those of the algorithm (McNemar test; p < 0.0001). The positive and negative predictive values of the algorithm were 45.4% (CI 43.7-47.3) and 94.8% (CI 94.4-95.2), respectively, whereas those of the first clinical impression were 68.1% and 96.0%, respectively. In the reader test conducted using images corresponding to batches of 30 patients, the sensitivity and specificity of the algorithm at the predefined threshold were 66.9% (95% CI 57.7-76.0) and 87.4% (95% CI 82.5-92.2), respectively. Furthermore, the sensitivity and specificity derived from the first impression of 44 of the participants were 65.8% (95% CI 55.7-75.9) and 85.7% (95% CI 82.4-88.9), respectively, which are values comparable with those of the algorithm (Wilcoxon signed-rank test; p = 0.607 and 0.097). Limitations of this study include the exclusive use of high-quality clinical photographs taken in hospitals and the lack of ethnic diversity in the study population.<h4>Conclusions</h4>Our algorithm could diagnose skin tumors with nearly the same accuracy as a dermatologist when the diagnosis was performed solely with photographs. However, as a result of limited data relevancy, the performance was inferior to that of actual medical examination. To achieve more accurate predictive diagnoses, clinical information should be integrated with imaging information.https://doi.org/10.1371/journal.pmed.1003381
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Seung Seog Han Ik Jun Moon Seong Hwan Kim Jung-Im Na Myoung Shin Kim Gyeong Hun Park Ilwoo Park Keewon Kim Woohyung Lim Ju Hee Lee Sung Eun Chang
spellingShingle	Seung Seog Han Ik Jun Moon Seong Hwan Kim Jung-Im Na Myoung Shin Kim Gyeong Hun Park Ilwoo Park Keewon Kim Woohyung Lim Ju Hee Lee Sung Eun Chang Assessment of deep neural networks for the diagnosis of benign and malignant skin neoplasms in comparison with dermatologists: A retrospective validation study. PLoS Medicine
author_facet	Seung Seog Han Ik Jun Moon Seong Hwan Kim Jung-Im Na Myoung Shin Kim Gyeong Hun Park Ilwoo Park Keewon Kim Woohyung Lim Ju Hee Lee Sung Eun Chang
author_sort	Seung Seog Han
title	Assessment of deep neural networks for the diagnosis of benign and malignant skin neoplasms in comparison with dermatologists: A retrospective validation study.
title_short	Assessment of deep neural networks for the diagnosis of benign and malignant skin neoplasms in comparison with dermatologists: A retrospective validation study.
title_full	Assessment of deep neural networks for the diagnosis of benign and malignant skin neoplasms in comparison with dermatologists: A retrospective validation study.
title_fullStr	Assessment of deep neural networks for the diagnosis of benign and malignant skin neoplasms in comparison with dermatologists: A retrospective validation study.
title_full_unstemmed	Assessment of deep neural networks for the diagnosis of benign and malignant skin neoplasms in comparison with dermatologists: A retrospective validation study.
title_sort	assessment of deep neural networks for the diagnosis of benign and malignant skin neoplasms in comparison with dermatologists: a retrospective validation study.
publisher	Public Library of Science (PLoS)
series	PLoS Medicine
issn	1549-1277 1549-1676
publishDate	2020-11-01
description	<h4>Background</h4>The diagnostic performance of convolutional neural networks (CNNs) for diagnosing several types of skin neoplasms has been demonstrated as comparable with that of dermatologists using clinical photography. However, the generalizability should be demonstrated using a large-scale external dataset that includes most types of skin neoplasms. In this study, the performance of a neural network algorithm was compared with that of dermatologists in both real-world practice and experimental settings.<h4>Methods and findings</h4>To demonstrate generalizability, the skin cancer detection algorithm (https://rcnn.modelderm.com) developed in our previous study was used without modification. We conducted a retrospective study with all single lesion biopsied cases (43 disorders; 40,331 clinical images from 10,426 cases: 1,222 malignant cases and 9,204 benign cases); mean age (standard deviation [SD], 52.1 [18.3]; 4,701 men [45.1%]) were obtained from the Department of Dermatology, Severance Hospital in Seoul, Korea between January 1, 2008 and March 31, 2019. Using the external validation dataset, the predictions of the algorithm were compared with the clinical diagnoses of 65 attending physicians who had recorded the clinical diagnoses with thorough examinations in real-world practice. In addition, the results obtained by the algorithm for the data of randomly selected batches of 30 patients were compared with those obtained by 44 dermatologists in experimental settings; the dermatologists were only provided with multiple images of each lesion, without clinical information. With regard to the determination of malignancy, the area under the curve (AUC) achieved by the algorithm was 0.863 (95% confidence interval [CI] 0.852-0.875), when unprocessed clinical photographs were used. The sensitivity and specificity of the algorithm at the predefined high-specificity threshold were 62.7% (95% CI 59.9-65.1) and 90.0% (95% CI 89.4-90.6), respectively. Furthermore, the sensitivity and specificity of the first clinical impression of 65 attending physicians were 70.2% and 95.6%, respectively, which were superior to those of the algorithm (McNemar test; p < 0.0001). The positive and negative predictive values of the algorithm were 45.4% (CI 43.7-47.3) and 94.8% (CI 94.4-95.2), respectively, whereas those of the first clinical impression were 68.1% and 96.0%, respectively. In the reader test conducted using images corresponding to batches of 30 patients, the sensitivity and specificity of the algorithm at the predefined threshold were 66.9% (95% CI 57.7-76.0) and 87.4% (95% CI 82.5-92.2), respectively. Furthermore, the sensitivity and specificity derived from the first impression of 44 of the participants were 65.8% (95% CI 55.7-75.9) and 85.7% (95% CI 82.4-88.9), respectively, which are values comparable with those of the algorithm (Wilcoxon signed-rank test; p = 0.607 and 0.097). Limitations of this study include the exclusive use of high-quality clinical photographs taken in hospitals and the lack of ethnic diversity in the study population.<h4>Conclusions</h4>Our algorithm could diagnose skin tumors with nearly the same accuracy as a dermatologist when the diagnosis was performed solely with photographs. However, as a result of limited data relevancy, the performance was inferior to that of actual medical examination. To achieve more accurate predictive diagnoses, clinical information should be integrated with imaging information.
url	https://doi.org/10.1371/journal.pmed.1003381
work_keys_str_mv	AT seungseoghan assessmentofdeepneuralnetworksforthediagnosisofbenignandmalignantskinneoplasmsincomparisonwithdermatologistsaretrospectivevalidationstudy AT ikjunmoon assessmentofdeepneuralnetworksforthediagnosisofbenignandmalignantskinneoplasmsincomparisonwithdermatologistsaretrospectivevalidationstudy AT seonghwankim assessmentofdeepneuralnetworksforthediagnosisofbenignandmalignantskinneoplasmsincomparisonwithdermatologistsaretrospectivevalidationstudy AT jungimna assessmentofdeepneuralnetworksforthediagnosisofbenignandmalignantskinneoplasmsincomparisonwithdermatologistsaretrospectivevalidationstudy AT myoungshinkim assessmentofdeepneuralnetworksforthediagnosisofbenignandmalignantskinneoplasmsincomparisonwithdermatologistsaretrospectivevalidationstudy AT gyeonghunpark assessmentofdeepneuralnetworksforthediagnosisofbenignandmalignantskinneoplasmsincomparisonwithdermatologistsaretrospectivevalidationstudy AT ilwoopark assessmentofdeepneuralnetworksforthediagnosisofbenignandmalignantskinneoplasmsincomparisonwithdermatologistsaretrospectivevalidationstudy AT keewonkim assessmentofdeepneuralnetworksforthediagnosisofbenignandmalignantskinneoplasmsincomparisonwithdermatologistsaretrospectivevalidationstudy AT woohyunglim assessmentofdeepneuralnetworksforthediagnosisofbenignandmalignantskinneoplasmsincomparisonwithdermatologistsaretrospectivevalidationstudy AT juheelee assessmentofdeepneuralnetworksforthediagnosisofbenignandmalignantskinneoplasmsincomparisonwithdermatologistsaretrospectivevalidationstudy AT sungeunchang assessmentofdeepneuralnetworksforthediagnosisofbenignandmalignantskinneoplasmsincomparisonwithdermatologistsaretrospectivevalidationstudy
_version_	1714664528619241472

Assessment of deep neural networks for the diagnosis of benign and malignant skin neoplasms in comparison with dermatologists: A retrospective validation study.

Similar Items