Where single-cell transcriptomics fails T cells: The misuse of unsupervised clustering for T-cell annotationThis perspective used publicly available data (DOI: 10.5281/zenodo.12569981; ).

The current state of single-cell transcriptomic interrogation typically consists of using an unsupervised clustering approach followed by expert opinion-based annotation. The underlying assumption is that this process will identify transcriptional differences between cellular subsets accurately, and...

وصف كامل

التفاصيل البيبلوغرافية
الحاوية / القاعدة:ImmunoInformatics
المؤلفون الرئيسيون: Kerry A. Mullan, Sebastiaan Valkiers, Nicky de Vrij, Chen Li, Sara Verbandt, Ting Pu, Pieter Meysman
التنسيق: مقال
اللغة:الإنجليزية
منشور في: Elsevier 2025-12-01
الموضوعات:
الوصول للمادة أونلاين:http://www.sciencedirect.com/science/article/pii/S2667119025000163
الوصف
الملخص:The current state of single-cell transcriptomic interrogation typically consists of using an unsupervised clustering approach followed by expert opinion-based annotation. The underlying assumption is that this process will identify transcriptional differences between cellular subsets accurately, and thus be able to cluster for example CD8+ T cells apart from CD4+ T cells. However, this widely applied assumption that the clustering reflects T-cell biology has never been validated. We used a large T-cell atlas (V2) that combined twelve 10x Genomics single T-cell transcriptomics datasets (∼500 K cells) as well as an independent CITE-seq dataset to qualify if the unsupervised clustering produced by Seurat reflected the biology. Annotations were then evaluated using the expression of key marker genes. The main T-cell markers CD8 and CD4 were mixed in most clusters, regardless of the feature selection and either principal/harmony components or features. The factors driving the clustering were also related to cellular functions (glucose metabolism), T-cell receptor (TCR), immunoglobulin and HLA transcripts, and not typical markers. Against current assumptions, the clustering was not being driven by the T-cell phenotypes and could not accurately segregate the CD4+ from CD8+ T cells, let alone the sub-classifications. This implicated many of the T cells would be incorrectly classified if using the standard cluster-based annotation approach. Methods relying on unsupervised clustering should be used with care, as improper handling can misrepresent the data, and alternatives such as semi-supervised approaches with TCR-seq or protein-based annotations should be preferred.
تدمد:2667-1190