| الملخص: | The current state of single-cell transcriptomic interrogation typically consists of using an unsupervised clustering approach followed by expert opinion-based annotation. The underlying assumption is that this process will identify transcriptional differences between cellular subsets accurately, and thus be able to cluster for example CD8+ T cells apart from CD4+ T cells. However, this widely applied assumption that the clustering reflects T-cell biology has never been validated. We used a large T-cell atlas (V2) that combined twelve 10x Genomics single T-cell transcriptomics datasets (∼500 K cells) as well as an independent CITE-seq dataset to qualify if the unsupervised clustering produced by Seurat reflected the biology. Annotations were then evaluated using the expression of key marker genes. The main T-cell markers CD8 and CD4 were mixed in most clusters, regardless of the feature selection and either principal/harmony components or features. The factors driving the clustering were also related to cellular functions (glucose metabolism), T-cell receptor (TCR), immunoglobulin and HLA transcripts, and not typical markers. Against current assumptions, the clustering was not being driven by the T-cell phenotypes and could not accurately segregate the CD4+ from CD8+ T cells, let alone the sub-classifications. This implicated many of the T cells would be incorrectly classified if using the standard cluster-based annotation approach. Methods relying on unsupervised clustering should be used with care, as improper handling can misrepresent the data, and alternatives such as semi-supervised approaches with TCR-seq or protein-based annotations should be preferred.
|