Artificial Intelligence vs. Human Cognition: A Comparative Analysis of ChatGPT and Candidates Sitting the European Board of Ophthalmology Diploma Examination
Background/Objectives: This paper aims to assess ChatGPT’s performance in answering European Board of Ophthalmology Diploma (EBOD) examination papers and to compare these results to pass benchmarks and candidate results. Methods: This cross-sectional study used a sample of past exam papers from 2012...
| Published in: | Vision |
|---|---|
| Main Authors: | , , , , , , |
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Subjects: | |
| Online Access: | https://www.mdpi.com/2411-5150/9/2/31 |
| _version_ | 1849442513015799808 |
|---|---|
| author | Anna P. Maino Jakub Klikowski Brendan Strong Wahid Ghaffari Michał Woźniak Tristan Bourcier Andrzej Grzybowski |
| author_facet | Anna P. Maino Jakub Klikowski Brendan Strong Wahid Ghaffari Michał Woźniak Tristan Bourcier Andrzej Grzybowski |
| author_sort | Anna P. Maino |
| collection | DOAJ |
| container_title | Vision |
| description | Background/Objectives: This paper aims to assess ChatGPT’s performance in answering European Board of Ophthalmology Diploma (EBOD) examination papers and to compare these results to pass benchmarks and candidate results. Methods: This cross-sectional study used a sample of past exam papers from 2012, 2013, 2020–2023 EBOD examinations. This study analyzed ChatGPT’s responses to 440 multiple choice questions (MCQs), each containing five true/false statements (2200 statements in total) and 48 single best answer (SBA) questions. Results: ChatGPT, for MCQs, scored on average 64.39%. ChatGPT’s strongest metric performance for MCQs was precision (68.76%). ChatGPT performed best at answering pathology MCQs (Grubbs test <i>p</i> < 0.05). Optics and refraction had the lowest-scoring MCQ performance across all metrics. ChatGPT-3.5 Turbo performed worse than human candidates and ChatGPT-4o on easy questions (75% vs. 100% accuracy) but outperformed humans and ChatGPT-4o on challenging questions (50% vs. 28% accuracy). ChatGPT’s SBA performance averaged 28.43%, with the highest score and strongest performance in precision (29.36%). Pathology SBA questions were consistently the lowest-scoring topic across most metrics. ChatGPT demonstrated a nonsignificant tendency to select option 1 more frequently (<i>p</i> = 0.19). When answering SBAs, human candidates scored higher than ChatGPT in all metric areas measured. Conclusions: ChatGPT performed stronger for true/false questions, scoring a pass mark in most instances. Performance was poorer for SBA questions, suggesting that ChatGPT’s ability in information retrieval is better than that in knowledge integration. ChatGPT could become a valuable tool in ophthalmic education, allowing exam boards to test their exam papers to ensure they are pitched at the right level, marking open-ended questions and providing detailed feedback. |
| format | Article |
| id | doaj-art-1a4e9d2cae7447eeaefadcc23db8e073 |
| institution | Directory of Open Access Journals |
| issn | 2411-5150 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | MDPI AG |
| record_format | Article |
| spelling | doaj-art-1a4e9d2cae7447eeaefadcc23db8e0732025-08-20T03:32:31ZengMDPI AGVision2411-51502025-04-01923110.3390/vision9020031Artificial Intelligence vs. Human Cognition: A Comparative Analysis of ChatGPT and Candidates Sitting the European Board of Ophthalmology Diploma ExaminationAnna P. Maino0Jakub Klikowski1Brendan Strong2Wahid Ghaffari3Michał Woźniak4Tristan Bourcier5Andrzej Grzybowski6Manchester Royal Eye Hospital, Manchester M13 9WL, UKDepartment of Systems and Computer Networks, Wrocław University of Science and Technology, 50-370 Wrocław, PolandEuropean Board of Ophthalmology Examination Headquarters, RP56PT10 Kilcullen, IrelandDepartment of Medical Education, Stockport NHS Foundation Trust, Stockport SK2 7JE, UKDepartment of Systems and Computer Networks, Wrocław University of Science and Technology, 50-370 Wrocław, PolandDepartment of Ophthalmology, University of Strasbourg, 67081 Strasbourg, FranceInstitute for Research in Ophthalmology, 60-836 Poznań, PolandBackground/Objectives: This paper aims to assess ChatGPT’s performance in answering European Board of Ophthalmology Diploma (EBOD) examination papers and to compare these results to pass benchmarks and candidate results. Methods: This cross-sectional study used a sample of past exam papers from 2012, 2013, 2020–2023 EBOD examinations. This study analyzed ChatGPT’s responses to 440 multiple choice questions (MCQs), each containing five true/false statements (2200 statements in total) and 48 single best answer (SBA) questions. Results: ChatGPT, for MCQs, scored on average 64.39%. ChatGPT’s strongest metric performance for MCQs was precision (68.76%). ChatGPT performed best at answering pathology MCQs (Grubbs test <i>p</i> < 0.05). Optics and refraction had the lowest-scoring MCQ performance across all metrics. ChatGPT-3.5 Turbo performed worse than human candidates and ChatGPT-4o on easy questions (75% vs. 100% accuracy) but outperformed humans and ChatGPT-4o on challenging questions (50% vs. 28% accuracy). ChatGPT’s SBA performance averaged 28.43%, with the highest score and strongest performance in precision (29.36%). Pathology SBA questions were consistently the lowest-scoring topic across most metrics. ChatGPT demonstrated a nonsignificant tendency to select option 1 more frequently (<i>p</i> = 0.19). When answering SBAs, human candidates scored higher than ChatGPT in all metric areas measured. Conclusions: ChatGPT performed stronger for true/false questions, scoring a pass mark in most instances. Performance was poorer for SBA questions, suggesting that ChatGPT’s ability in information retrieval is better than that in knowledge integration. ChatGPT could become a valuable tool in ophthalmic education, allowing exam boards to test their exam papers to ensure they are pitched at the right level, marking open-ended questions and providing detailed feedback.https://www.mdpi.com/2411-5150/9/2/31artificial intelligenceophthalmologymedical examinations |
| spellingShingle | Anna P. Maino Jakub Klikowski Brendan Strong Wahid Ghaffari Michał Woźniak Tristan Bourcier Andrzej Grzybowski Artificial Intelligence vs. Human Cognition: A Comparative Analysis of ChatGPT and Candidates Sitting the European Board of Ophthalmology Diploma Examination artificial intelligence ophthalmology medical examinations |
| title | Artificial Intelligence vs. Human Cognition: A Comparative Analysis of ChatGPT and Candidates Sitting the European Board of Ophthalmology Diploma Examination |
| title_full | Artificial Intelligence vs. Human Cognition: A Comparative Analysis of ChatGPT and Candidates Sitting the European Board of Ophthalmology Diploma Examination |
| title_fullStr | Artificial Intelligence vs. Human Cognition: A Comparative Analysis of ChatGPT and Candidates Sitting the European Board of Ophthalmology Diploma Examination |
| title_full_unstemmed | Artificial Intelligence vs. Human Cognition: A Comparative Analysis of ChatGPT and Candidates Sitting the European Board of Ophthalmology Diploma Examination |
| title_short | Artificial Intelligence vs. Human Cognition: A Comparative Analysis of ChatGPT and Candidates Sitting the European Board of Ophthalmology Diploma Examination |
| title_sort | artificial intelligence vs human cognition a comparative analysis of chatgpt and candidates sitting the european board of ophthalmology diploma examination |
| topic | artificial intelligence ophthalmology medical examinations |
| url | https://www.mdpi.com/2411-5150/9/2/31 |
| work_keys_str_mv | AT annapmaino artificialintelligencevshumancognitionacomparativeanalysisofchatgptandcandidatessittingtheeuropeanboardofophthalmologydiplomaexamination AT jakubklikowski artificialintelligencevshumancognitionacomparativeanalysisofchatgptandcandidatessittingtheeuropeanboardofophthalmologydiplomaexamination AT brendanstrong artificialintelligencevshumancognitionacomparativeanalysisofchatgptandcandidatessittingtheeuropeanboardofophthalmologydiplomaexamination AT wahidghaffari artificialintelligencevshumancognitionacomparativeanalysisofchatgptandcandidatessittingtheeuropeanboardofophthalmologydiplomaexamination AT michałwozniak artificialintelligencevshumancognitionacomparativeanalysisofchatgptandcandidatessittingtheeuropeanboardofophthalmologydiplomaexamination AT tristanbourcier artificialintelligencevshumancognitionacomparativeanalysisofchatgptandcandidatessittingtheeuropeanboardofophthalmologydiplomaexamination AT andrzejgrzybowski artificialintelligencevshumancognitionacomparativeanalysisofchatgptandcandidatessittingtheeuropeanboardofophthalmologydiplomaexamination |
