Artificial Intelligence vs. Human Cognition: A Comparative Analysis of ChatGPT and Candidates Sitting the European Board of Ophthalmology Diploma Examination

Background/Objectives: This paper aims to assess ChatGPT’s performance in answering European Board of Ophthalmology Diploma (EBOD) examination papers and to compare these results to pass benchmarks and candidate results. Methods: This cross-sectional study used a sample of past exam papers from 2012...

Full description

Bibliographic Details
Published in:Vision
Main Authors: Anna P. Maino, Jakub Klikowski, Brendan Strong, Wahid Ghaffari, Michał Woźniak, Tristan Bourcier, Andrzej Grzybowski
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Subjects:
Online Access:https://www.mdpi.com/2411-5150/9/2/31
_version_ 1849442513015799808
author Anna P. Maino
Jakub Klikowski
Brendan Strong
Wahid Ghaffari
Michał Woźniak
Tristan Bourcier
Andrzej Grzybowski
author_facet Anna P. Maino
Jakub Klikowski
Brendan Strong
Wahid Ghaffari
Michał Woźniak
Tristan Bourcier
Andrzej Grzybowski
author_sort Anna P. Maino
collection DOAJ
container_title Vision
description Background/Objectives: This paper aims to assess ChatGPT’s performance in answering European Board of Ophthalmology Diploma (EBOD) examination papers and to compare these results to pass benchmarks and candidate results. Methods: This cross-sectional study used a sample of past exam papers from 2012, 2013, 2020–2023 EBOD examinations. This study analyzed ChatGPT’s responses to 440 multiple choice questions (MCQs), each containing five true/false statements (2200 statements in total) and 48 single best answer (SBA) questions. Results: ChatGPT, for MCQs, scored on average 64.39%. ChatGPT’s strongest metric performance for MCQs was precision (68.76%). ChatGPT performed best at answering pathology MCQs (Grubbs test <i>p</i> < 0.05). Optics and refraction had the lowest-scoring MCQ performance across all metrics. ChatGPT-3.5 Turbo performed worse than human candidates and ChatGPT-4o on easy questions (75% vs. 100% accuracy) but outperformed humans and ChatGPT-4o on challenging questions (50% vs. 28% accuracy). ChatGPT’s SBA performance averaged 28.43%, with the highest score and strongest performance in precision (29.36%). Pathology SBA questions were consistently the lowest-scoring topic across most metrics. ChatGPT demonstrated a nonsignificant tendency to select option 1 more frequently (<i>p</i> = 0.19). When answering SBAs, human candidates scored higher than ChatGPT in all metric areas measured. Conclusions: ChatGPT performed stronger for true/false questions, scoring a pass mark in most instances. Performance was poorer for SBA questions, suggesting that ChatGPT’s ability in information retrieval is better than that in knowledge integration. ChatGPT could become a valuable tool in ophthalmic education, allowing exam boards to test their exam papers to ensure they are pitched at the right level, marking open-ended questions and providing detailed feedback.
format Article
id doaj-art-1a4e9d2cae7447eeaefadcc23db8e073
institution Directory of Open Access Journals
issn 2411-5150
language English
publishDate 2025-04-01
publisher MDPI AG
record_format Article
spelling doaj-art-1a4e9d2cae7447eeaefadcc23db8e0732025-08-20T03:32:31ZengMDPI AGVision2411-51502025-04-01923110.3390/vision9020031Artificial Intelligence vs. Human Cognition: A Comparative Analysis of ChatGPT and Candidates Sitting the European Board of Ophthalmology Diploma ExaminationAnna P. Maino0Jakub Klikowski1Brendan Strong2Wahid Ghaffari3Michał Woźniak4Tristan Bourcier5Andrzej Grzybowski6Manchester Royal Eye Hospital, Manchester M13 9WL, UKDepartment of Systems and Computer Networks, Wrocław University of Science and Technology, 50-370 Wrocław, PolandEuropean Board of Ophthalmology Examination Headquarters, RP56PT10 Kilcullen, IrelandDepartment of Medical Education, Stockport NHS Foundation Trust, Stockport SK2 7JE, UKDepartment of Systems and Computer Networks, Wrocław University of Science and Technology, 50-370 Wrocław, PolandDepartment of Ophthalmology, University of Strasbourg, 67081 Strasbourg, FranceInstitute for Research in Ophthalmology, 60-836 Poznań, PolandBackground/Objectives: This paper aims to assess ChatGPT’s performance in answering European Board of Ophthalmology Diploma (EBOD) examination papers and to compare these results to pass benchmarks and candidate results. Methods: This cross-sectional study used a sample of past exam papers from 2012, 2013, 2020–2023 EBOD examinations. This study analyzed ChatGPT’s responses to 440 multiple choice questions (MCQs), each containing five true/false statements (2200 statements in total) and 48 single best answer (SBA) questions. Results: ChatGPT, for MCQs, scored on average 64.39%. ChatGPT’s strongest metric performance for MCQs was precision (68.76%). ChatGPT performed best at answering pathology MCQs (Grubbs test <i>p</i> < 0.05). Optics and refraction had the lowest-scoring MCQ performance across all metrics. ChatGPT-3.5 Turbo performed worse than human candidates and ChatGPT-4o on easy questions (75% vs. 100% accuracy) but outperformed humans and ChatGPT-4o on challenging questions (50% vs. 28% accuracy). ChatGPT’s SBA performance averaged 28.43%, with the highest score and strongest performance in precision (29.36%). Pathology SBA questions were consistently the lowest-scoring topic across most metrics. ChatGPT demonstrated a nonsignificant tendency to select option 1 more frequently (<i>p</i> = 0.19). When answering SBAs, human candidates scored higher than ChatGPT in all metric areas measured. Conclusions: ChatGPT performed stronger for true/false questions, scoring a pass mark in most instances. Performance was poorer for SBA questions, suggesting that ChatGPT’s ability in information retrieval is better than that in knowledge integration. ChatGPT could become a valuable tool in ophthalmic education, allowing exam boards to test their exam papers to ensure they are pitched at the right level, marking open-ended questions and providing detailed feedback.https://www.mdpi.com/2411-5150/9/2/31artificial intelligenceophthalmologymedical examinations
spellingShingle Anna P. Maino
Jakub Klikowski
Brendan Strong
Wahid Ghaffari
Michał Woźniak
Tristan Bourcier
Andrzej Grzybowski
Artificial Intelligence vs. Human Cognition: A Comparative Analysis of ChatGPT and Candidates Sitting the European Board of Ophthalmology Diploma Examination
artificial intelligence
ophthalmology
medical examinations
title Artificial Intelligence vs. Human Cognition: A Comparative Analysis of ChatGPT and Candidates Sitting the European Board of Ophthalmology Diploma Examination
title_full Artificial Intelligence vs. Human Cognition: A Comparative Analysis of ChatGPT and Candidates Sitting the European Board of Ophthalmology Diploma Examination
title_fullStr Artificial Intelligence vs. Human Cognition: A Comparative Analysis of ChatGPT and Candidates Sitting the European Board of Ophthalmology Diploma Examination
title_full_unstemmed Artificial Intelligence vs. Human Cognition: A Comparative Analysis of ChatGPT and Candidates Sitting the European Board of Ophthalmology Diploma Examination
title_short Artificial Intelligence vs. Human Cognition: A Comparative Analysis of ChatGPT and Candidates Sitting the European Board of Ophthalmology Diploma Examination
title_sort artificial intelligence vs human cognition a comparative analysis of chatgpt and candidates sitting the european board of ophthalmology diploma examination
topic artificial intelligence
ophthalmology
medical examinations
url https://www.mdpi.com/2411-5150/9/2/31
work_keys_str_mv AT annapmaino artificialintelligencevshumancognitionacomparativeanalysisofchatgptandcandidatessittingtheeuropeanboardofophthalmologydiplomaexamination
AT jakubklikowski artificialintelligencevshumancognitionacomparativeanalysisofchatgptandcandidatessittingtheeuropeanboardofophthalmologydiplomaexamination
AT brendanstrong artificialintelligencevshumancognitionacomparativeanalysisofchatgptandcandidatessittingtheeuropeanboardofophthalmologydiplomaexamination
AT wahidghaffari artificialintelligencevshumancognitionacomparativeanalysisofchatgptandcandidatessittingtheeuropeanboardofophthalmologydiplomaexamination
AT michałwozniak artificialintelligencevshumancognitionacomparativeanalysisofchatgptandcandidatessittingtheeuropeanboardofophthalmologydiplomaexamination
AT tristanbourcier artificialintelligencevshumancognitionacomparativeanalysisofchatgptandcandidatessittingtheeuropeanboardofophthalmologydiplomaexamination
AT andrzejgrzybowski artificialintelligencevshumancognitionacomparativeanalysisofchatgptandcandidatessittingtheeuropeanboardofophthalmologydiplomaexamination