Quantification of Inter-Sample Differences in T-Cell Receptor Repertoires Using Sequence-Based Information

Inter-sample comparisons of T-cell receptor (TCR) repertoires are crucial for gaining a better understanding of the immunological states determined by different collections of T cells from different donor sites, cell types, and genetic and pathological backgrounds. For quantitative comparison, most...

Full description

Bibliographic Details
Main Authors: Ryo Yokota, Yuki Kaminaga, Tetsuya J. Kobayashi
Format: Article
Language:English
Published: Frontiers Media S.A. 2017-11-01
Series:Frontiers in Immunology
Subjects:
Online Access:http://journal.frontiersin.org/article/10.3389/fimmu.2017.01500/full
id doaj-e4e5ce563a454535b955c91ec0399361
record_format Article
spelling doaj-e4e5ce563a454535b955c91ec03993612020-11-24T21:20:57ZengFrontiers Media S.A.Frontiers in Immunology1664-32242017-11-01810.3389/fimmu.2017.01500276648Quantification of Inter-Sample Differences in T-Cell Receptor Repertoires Using Sequence-Based InformationRyo Yokota0Yuki Kaminaga1Tetsuya J. Kobayashi2Tetsuya J. Kobayashi3Tetsuya J. Kobayashi4Institute of Industrial Science, The University of Tokyo, Tokyo, JapanDepartment of Electrical Engineering and Information Systems, Graduate School of Engineering, The University of Tokyo, Tokyo, JapanInstitute of Industrial Science, The University of Tokyo, Tokyo, JapanDepartment of Electrical Engineering and Information Systems, Graduate School of Engineering, The University of Tokyo, Tokyo, JapanPRESTO, Japan Science and Technology Agency (JST), Saitama, JapanInter-sample comparisons of T-cell receptor (TCR) repertoires are crucial for gaining a better understanding of the immunological states determined by different collections of T cells from different donor sites, cell types, and genetic and pathological backgrounds. For quantitative comparison, most previous studies utilized conventional methods in ecology, which focus on TCR sequences that overlap between pairwise samples. Some recent studies attempted another approach that is categorized into Poisson abundance models using the abundance distribution of observed TCR sequences. However, these methods ignore the details of the measured sequences and are consequently unable to identify sub-repertoires that might have important contributions to the observed inter-sample differences. Moreover, the sparsity of sequence data due to the huge diversity of repertoires hampers the performance of these methods, especially when few overlapping sequences exist. In this paper, we propose a new approach for REpertoire COmparison in Low Dimensions (RECOLD) based on TCR sequence information, which can estimate the low-dimensional structure by embedding the pairwise sequence dissimilarities in high-dimensional sequence space. The inter-sample differences between repertoires are then quantified by information-theoretic measures among the distributions of data estimated in the embedded space. Using datasets of mouse and human TCR repertoires, we demonstrate that RECOLD can accurately identify the inter-sample hierarchical structures, which have a good correspondence with our intuitive understanding about sample conditions. Moreover, for the dataset of transgenic mice that have strong restrictions on the diversity of their repertoires, our estimated inter-sample structure was consistent with the structure estimated by previous methods based on abundance or overlapping sequence information. For the dataset of human healthy donors and Sézary syndrome patients, our method also showed robust estimation performance even under the condition of high sparsity in TCR sequences, while previous studies failed to estimate the structure. In addition, we identified the sequences that contribute to the pairwise-sample differences between the repertoires with the different genetic backgrounds of mice. Such identification of the sequences contributing to variation in immune cell repertoires may provide substantial insight for the development of new immunotherapies and vaccines.http://journal.frontiersin.org/article/10.3389/fimmu.2017.01500/fullT cellTCR repertoireinter-repertoire comparisonpairwise sequence alignmentsequence dissimilaritymanifold learning
collection DOAJ
language English
format Article
sources DOAJ
author Ryo Yokota
Yuki Kaminaga
Tetsuya J. Kobayashi
Tetsuya J. Kobayashi
Tetsuya J. Kobayashi
spellingShingle Ryo Yokota
Yuki Kaminaga
Tetsuya J. Kobayashi
Tetsuya J. Kobayashi
Tetsuya J. Kobayashi
Quantification of Inter-Sample Differences in T-Cell Receptor Repertoires Using Sequence-Based Information
Frontiers in Immunology
T cell
TCR repertoire
inter-repertoire comparison
pairwise sequence alignment
sequence dissimilarity
manifold learning
author_facet Ryo Yokota
Yuki Kaminaga
Tetsuya J. Kobayashi
Tetsuya J. Kobayashi
Tetsuya J. Kobayashi
author_sort Ryo Yokota
title Quantification of Inter-Sample Differences in T-Cell Receptor Repertoires Using Sequence-Based Information
title_short Quantification of Inter-Sample Differences in T-Cell Receptor Repertoires Using Sequence-Based Information
title_full Quantification of Inter-Sample Differences in T-Cell Receptor Repertoires Using Sequence-Based Information
title_fullStr Quantification of Inter-Sample Differences in T-Cell Receptor Repertoires Using Sequence-Based Information
title_full_unstemmed Quantification of Inter-Sample Differences in T-Cell Receptor Repertoires Using Sequence-Based Information
title_sort quantification of inter-sample differences in t-cell receptor repertoires using sequence-based information
publisher Frontiers Media S.A.
series Frontiers in Immunology
issn 1664-3224
publishDate 2017-11-01
description Inter-sample comparisons of T-cell receptor (TCR) repertoires are crucial for gaining a better understanding of the immunological states determined by different collections of T cells from different donor sites, cell types, and genetic and pathological backgrounds. For quantitative comparison, most previous studies utilized conventional methods in ecology, which focus on TCR sequences that overlap between pairwise samples. Some recent studies attempted another approach that is categorized into Poisson abundance models using the abundance distribution of observed TCR sequences. However, these methods ignore the details of the measured sequences and are consequently unable to identify sub-repertoires that might have important contributions to the observed inter-sample differences. Moreover, the sparsity of sequence data due to the huge diversity of repertoires hampers the performance of these methods, especially when few overlapping sequences exist. In this paper, we propose a new approach for REpertoire COmparison in Low Dimensions (RECOLD) based on TCR sequence information, which can estimate the low-dimensional structure by embedding the pairwise sequence dissimilarities in high-dimensional sequence space. The inter-sample differences between repertoires are then quantified by information-theoretic measures among the distributions of data estimated in the embedded space. Using datasets of mouse and human TCR repertoires, we demonstrate that RECOLD can accurately identify the inter-sample hierarchical structures, which have a good correspondence with our intuitive understanding about sample conditions. Moreover, for the dataset of transgenic mice that have strong restrictions on the diversity of their repertoires, our estimated inter-sample structure was consistent with the structure estimated by previous methods based on abundance or overlapping sequence information. For the dataset of human healthy donors and Sézary syndrome patients, our method also showed robust estimation performance even under the condition of high sparsity in TCR sequences, while previous studies failed to estimate the structure. In addition, we identified the sequences that contribute to the pairwise-sample differences between the repertoires with the different genetic backgrounds of mice. Such identification of the sequences contributing to variation in immune cell repertoires may provide substantial insight for the development of new immunotherapies and vaccines.
topic T cell
TCR repertoire
inter-repertoire comparison
pairwise sequence alignment
sequence dissimilarity
manifold learning
url http://journal.frontiersin.org/article/10.3389/fimmu.2017.01500/full
work_keys_str_mv AT ryoyokota quantificationofintersampledifferencesintcellreceptorrepertoiresusingsequencebasedinformation
AT yukikaminaga quantificationofintersampledifferencesintcellreceptorrepertoiresusingsequencebasedinformation
AT tetsuyajkobayashi quantificationofintersampledifferencesintcellreceptorrepertoiresusingsequencebasedinformation
AT tetsuyajkobayashi quantificationofintersampledifferencesintcellreceptorrepertoiresusingsequencebasedinformation
AT tetsuyajkobayashi quantificationofintersampledifferencesintcellreceptorrepertoiresusingsequencebasedinformation
_version_ 1726001992974729216