The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study

BackgroundArtificial intelligence (AI) has gained tremendous popularity recently, especially the use of natural language processing (NLP). ChatGPT is a state-of-the-art chatbot capable of creating natural conversations using NLP. The use of AI in medicine can have a tremendou...

Full description

Bibliographic Details
Published in:Journal of Medical Internet Research
Main Authors: Tomoyuki Kuroiwa, Aida Sarcon, Takuya Ibara, Eriku Yamada, Akiko Yamamoto, Kazuya Tsukamoto, Koji Fujita
Format: Article
Language:English
Published: JMIR Publications 2023-09-01
Online Access:https://www.jmir.org/2023/1/e47621
_version_ 1851867323647918080
author Tomoyuki Kuroiwa
Aida Sarcon
Takuya Ibara
Eriku Yamada
Akiko Yamamoto
Kazuya Tsukamoto
Koji Fujita
author_facet Tomoyuki Kuroiwa
Aida Sarcon
Takuya Ibara
Eriku Yamada
Akiko Yamamoto
Kazuya Tsukamoto
Koji Fujita
author_sort Tomoyuki Kuroiwa
collection DOAJ
container_title Journal of Medical Internet Research
description BackgroundArtificial intelligence (AI) has gained tremendous popularity recently, especially the use of natural language processing (NLP). ChatGPT is a state-of-the-art chatbot capable of creating natural conversations using NLP. The use of AI in medicine can have a tremendous impact on health care delivery. Although some studies have evaluated ChatGPT’s accuracy in self-diagnosis, there is no research regarding its precision and the degree to which it recommends medical consultations. ObjectiveThe aim of this study was to evaluate ChatGPT’s ability to accurately and precisely self-diagnose common orthopedic diseases, as well as the degree of recommendation it provides for medical consultations. MethodsOver a 5-day course, each of the study authors submitted the same questions to ChatGPT. The conditions evaluated were carpal tunnel syndrome (CTS), cervical myelopathy (CM), lumbar spinal stenosis (LSS), knee osteoarthritis (KOA), and hip osteoarthritis (HOA). Answers were categorized as either correct, partially correct, incorrect, or a differential diagnosis. The percentage of correct answers and reproducibility were calculated. The reproducibility between days and raters were calculated using the Fleiss κ coefficient. Answers that recommended that the patient seek medical attention were recategorized according to the strength of the recommendation as defined by the study. ResultsThe ratios of correct answers were 25/25, 1/25, 24/25, 16/25, and 17/25 for CTS, CM, LSS, KOA, and HOA, respectively. The ratios of incorrect answers were 23/25 for CM and 0/25 for all other conditions. The reproducibility between days was 1.0, 0.15, 0.7, 0.6, and 0.6 for CTS, CM, LSS, KOA, and HOA, respectively. The reproducibility between raters was 1.0, 0.1, 0.64, –0.12, and 0.04 for CTS, CM, LSS, KOA, and HOA, respectively. Among the answers recommending medical attention, the phrases “essential,” “recommended,” “best,” and “important” were used. Specifically, “essential” occurred in 4 out of 125, “recommended” in 12 out of 125, “best” in 6 out of 125, and “important” in 94 out of 125 answers. Additionally, 7 out of the 125 answers did not include a recommendation to seek medical attention. ConclusionsThe accuracy and reproducibility of ChatGPT to self-diagnose five common orthopedic conditions were inconsistent. The accuracy could potentially be improved by adding symptoms that could easily identify a specific location. Only a few answers were accompanied by a strong recommendation to seek medical attention according to our study standards. Although ChatGPT could serve as a potential first step in accessing care, we found variability in accurate self-diagnosis. Given the risk of harm with self-diagnosis without medical follow-up, it would be prudent for an NLP to include clear language alerting patients to seek expert medical opinions. We hope to shed further light on the use of AI in a future clinical study.
format Article
id doaj-art-e6865349142d45c1bd6b6876578c71be
institution Directory of Open Access Journals
issn 1438-8871
language English
publishDate 2023-09-01
publisher JMIR Publications
record_format Article
spelling doaj-art-e6865349142d45c1bd6b6876578c71be2025-08-19T22:18:10ZengJMIR PublicationsJournal of Medical Internet Research1438-88712023-09-0125e4762110.2196/47621The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory StudyTomoyuki Kuroiwahttps://orcid.org/0000-0002-9942-1811Aida Sarconhttps://orcid.org/0000-0002-2763-878XTakuya Ibarahttps://orcid.org/0000-0002-0518-1918Eriku Yamadahttps://orcid.org/0000-0001-8777-9552Akiko Yamamotohttps://orcid.org/0000-0003-3639-8201Kazuya Tsukamotohttps://orcid.org/0000-0003-4927-2149Koji Fujitahttps://orcid.org/0000-0003-3733-0188 BackgroundArtificial intelligence (AI) has gained tremendous popularity recently, especially the use of natural language processing (NLP). ChatGPT is a state-of-the-art chatbot capable of creating natural conversations using NLP. The use of AI in medicine can have a tremendous impact on health care delivery. Although some studies have evaluated ChatGPT’s accuracy in self-diagnosis, there is no research regarding its precision and the degree to which it recommends medical consultations. ObjectiveThe aim of this study was to evaluate ChatGPT’s ability to accurately and precisely self-diagnose common orthopedic diseases, as well as the degree of recommendation it provides for medical consultations. MethodsOver a 5-day course, each of the study authors submitted the same questions to ChatGPT. The conditions evaluated were carpal tunnel syndrome (CTS), cervical myelopathy (CM), lumbar spinal stenosis (LSS), knee osteoarthritis (KOA), and hip osteoarthritis (HOA). Answers were categorized as either correct, partially correct, incorrect, or a differential diagnosis. The percentage of correct answers and reproducibility were calculated. The reproducibility between days and raters were calculated using the Fleiss κ coefficient. Answers that recommended that the patient seek medical attention were recategorized according to the strength of the recommendation as defined by the study. ResultsThe ratios of correct answers were 25/25, 1/25, 24/25, 16/25, and 17/25 for CTS, CM, LSS, KOA, and HOA, respectively. The ratios of incorrect answers were 23/25 for CM and 0/25 for all other conditions. The reproducibility between days was 1.0, 0.15, 0.7, 0.6, and 0.6 for CTS, CM, LSS, KOA, and HOA, respectively. The reproducibility between raters was 1.0, 0.1, 0.64, –0.12, and 0.04 for CTS, CM, LSS, KOA, and HOA, respectively. Among the answers recommending medical attention, the phrases “essential,” “recommended,” “best,” and “important” were used. Specifically, “essential” occurred in 4 out of 125, “recommended” in 12 out of 125, “best” in 6 out of 125, and “important” in 94 out of 125 answers. Additionally, 7 out of the 125 answers did not include a recommendation to seek medical attention. ConclusionsThe accuracy and reproducibility of ChatGPT to self-diagnose five common orthopedic conditions were inconsistent. The accuracy could potentially be improved by adding symptoms that could easily identify a specific location. Only a few answers were accompanied by a strong recommendation to seek medical attention according to our study standards. Although ChatGPT could serve as a potential first step in accessing care, we found variability in accurate self-diagnosis. Given the risk of harm with self-diagnosis without medical follow-up, it would be prudent for an NLP to include clear language alerting patients to seek expert medical opinions. We hope to shed further light on the use of AI in a future clinical study.https://www.jmir.org/2023/1/e47621
spellingShingle Tomoyuki Kuroiwa
Aida Sarcon
Takuya Ibara
Eriku Yamada
Akiko Yamamoto
Kazuya Tsukamoto
Koji Fujita
The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study
title The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study
title_full The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study
title_fullStr The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study
title_full_unstemmed The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study
title_short The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study
title_sort potential of chatgpt as a self diagnostic tool in common orthopedic diseases exploratory study
url https://www.jmir.org/2023/1/e47621
work_keys_str_mv AT tomoyukikuroiwa thepotentialofchatgptasaselfdiagnostictoolincommonorthopedicdiseasesexploratorystudy
AT aidasarcon thepotentialofchatgptasaselfdiagnostictoolincommonorthopedicdiseasesexploratorystudy
AT takuyaibara thepotentialofchatgptasaselfdiagnostictoolincommonorthopedicdiseasesexploratorystudy
AT erikuyamada thepotentialofchatgptasaselfdiagnostictoolincommonorthopedicdiseasesexploratorystudy
AT akikoyamamoto thepotentialofchatgptasaselfdiagnostictoolincommonorthopedicdiseasesexploratorystudy
AT kazuyatsukamoto thepotentialofchatgptasaselfdiagnostictoolincommonorthopedicdiseasesexploratorystudy
AT kojifujita thepotentialofchatgptasaselfdiagnostictoolincommonorthopedicdiseasesexploratorystudy
AT tomoyukikuroiwa potentialofchatgptasaselfdiagnostictoolincommonorthopedicdiseasesexploratorystudy
AT aidasarcon potentialofchatgptasaselfdiagnostictoolincommonorthopedicdiseasesexploratorystudy
AT takuyaibara potentialofchatgptasaselfdiagnostictoolincommonorthopedicdiseasesexploratorystudy
AT erikuyamada potentialofchatgptasaselfdiagnostictoolincommonorthopedicdiseasesexploratorystudy
AT akikoyamamoto potentialofchatgptasaselfdiagnostictoolincommonorthopedicdiseasesexploratorystudy
AT kazuyatsukamoto potentialofchatgptasaselfdiagnostictoolincommonorthopedicdiseasesexploratorystudy
AT kojifujita potentialofchatgptasaselfdiagnostictoolincommonorthopedicdiseasesexploratorystudy