Classification of Dental Radiographs Using Deep Learning

Objectives: To retrospectively assess radiographic data and to prospectively classify radiographs (namely, panoramic, bitewing, periapical, and cephalometric images), we compared three deep learning architectures for their classification performance. Methods: Our dataset consisted of 31,288 panorami...

Full description

Bibliographic Details
Main Authors: Jose E. Cejudo, Akhilanand Chaurasia, Ben Feldberg, Joachim Krois, Falk Schwendicke
Format: Article
Language:English
Published: MDPI AG 2021-04-01
Series:Journal of Clinical Medicine
Subjects:
Online Access:https://www.mdpi.com/2077-0383/10/7/1496
Description
Summary:Objectives: To retrospectively assess radiographic data and to prospectively classify radiographs (namely, panoramic, bitewing, periapical, and cephalometric images), we compared three deep learning architectures for their classification performance. Methods: Our dataset consisted of 31,288 panoramic, 43,598 periapical, 14,326 bitewing, and 1176 cephalometric radiographs from two centers (Berlin/Germany; Lucknow/India). For a subset of images L (32,381 images), image classifications were available and manually validated by an expert. The remaining subset of images U was iteratively annotated using active learning, with ResNet-34 being trained on <i>L</i>, least confidence informative sampling being performed on U, and the most uncertain image classifications from U being reviewed by a human expert and iteratively used for re-training. We then employed a baseline convolutional neural networks (CNN), a residual network (another ResNet-34, pretrained on ImageNet), and a capsule network (CapsNet) for classification. Early stopping was used to prevent overfitting. Evaluation of the model performances followed stratified k-fold cross-validation. Gradient-weighted Class Activation Mapping (Grad-CAM) was used to provide visualizations of the weighted activations maps. Results: All three models showed high accuracy (>98%) with significantly higher accuracy, F1-score, precision, and sensitivity of ResNet than baseline CNN and CapsNet (<i>p</i> < 0.05). Specificity was not significantly different. ResNet achieved the best performance at small variance and fastest convergence. Misclassification was most common between bitewings and periapicals. For bitewings, model activation was most notable in the inter-arch space for periapicals interdentally, for panoramics on bony structures of maxilla and mandible, and for cephalometrics on the viscerocranium. Conclusions: Regardless of the models, high classification accuracies were achieved. Image features considered for classification were consistent with expert reasoning.
ISSN:2077-0383