RepSeq Data Representativeness and Robustness Assessment by Shannon Entropy

High-throughput sequencing (HTS) has the potential to decipher the diversity of T cell repertoires and their dynamics during immune responses. Applied to T cell subsets such as T effector and T regulatory cells, it should help identify novel biomarkers of diseases. However, given the extreme diversi...

Full description

Bibliographic Details
Main Authors: Wahiba Chaara, Ariadna Gonzalez-Tort, Laura-Maria Florez, David Klatzmann, Encarnita Mariotti-Ferrandiz, Adrien Six
Format: Article
Language:English
Published: Frontiers Media S.A. 2018-05-01
Series:Frontiers in Immunology
Subjects:
Online Access:http://journal.frontiersin.org/article/10.3389/fimmu.2018.01038/full
id doaj-d5d6af61a1ec4ad38b52dc24ba70901c
record_format Article
spelling doaj-d5d6af61a1ec4ad38b52dc24ba70901c2020-11-25T00:15:31ZengFrontiers Media S.A.Frontiers in Immunology1664-32242018-05-01910.3389/fimmu.2018.01038346983RepSeq Data Representativeness and Robustness Assessment by Shannon EntropyWahiba Chaara0Wahiba Chaara1Ariadna Gonzalez-Tort2Laura-Maria Florez3David Klatzmann4David Klatzmann5Encarnita Mariotti-Ferrandiz6Encarnita Mariotti-Ferrandiz7Adrien Six8Adrien Six9Sorbonne Université, INSERM, UMR_S 959, Immunology-Immunopathology-Immunotherapy (i3), Paris, FranceAP-HP, Hôpital Pitié-Salpêtrière, Biotherapy (CIC-BTi) and Inflammation-Immunopathology-Biotherapy Department (i2B), Paris, FranceSorbonne Université, INSERM, UMR_S 959, Immunology-Immunopathology-Immunotherapy (i3), Paris, FranceSorbonne Université, INSERM, UMR_S 959, Immunology-Immunopathology-Immunotherapy (i3), Paris, FranceSorbonne Université, INSERM, UMR_S 959, Immunology-Immunopathology-Immunotherapy (i3), Paris, FranceAP-HP, Hôpital Pitié-Salpêtrière, Biotherapy (CIC-BTi) and Inflammation-Immunopathology-Biotherapy Department (i2B), Paris, FranceSorbonne Université, INSERM, UMR_S 959, Immunology-Immunopathology-Immunotherapy (i3), Paris, FranceAP-HP, Hôpital Pitié-Salpêtrière, Biotherapy (CIC-BTi) and Inflammation-Immunopathology-Biotherapy Department (i2B), Paris, FranceSorbonne Université, INSERM, UMR_S 959, Immunology-Immunopathology-Immunotherapy (i3), Paris, FranceAP-HP, Hôpital Pitié-Salpêtrière, Biotherapy (CIC-BTi) and Inflammation-Immunopathology-Biotherapy Department (i2B), Paris, FranceHigh-throughput sequencing (HTS) has the potential to decipher the diversity of T cell repertoires and their dynamics during immune responses. Applied to T cell subsets such as T effector and T regulatory cells, it should help identify novel biomarkers of diseases. However, given the extreme diversity of TCR repertoires, understanding how the sequencing conditions, including cell numbers, biological and technical sampling and sequencing depth, impact the experimental outcome is critical to proper use of these data. Here, we assessed the representativeness and robustness of TCR repertoire diversity assessment according to experimental conditions. By comparative analyses of experimental datasets and computer simulations, we found that (i) for small samples, the number of clonotypes recovered is often higher than the number of cells per sample, even after removing the singletons; (ii) high-sequencing depth for small samples alters the clonotype distributions, which can be corrected by filtering the datasets using Shannon entropy as a threshold; and (iii) a single sequencing run at high depth does not ensure a good coverage of the clonotype richness in highly polyclonal populations, which can be better covered using multiple sequencing. Altogether, our results warrant better understanding and awareness of the limitation of TCR diversity analyses by HTS and justify the development of novel computational tools for improved modeling of the highly complex nature of TCR repertoires.http://journal.frontiersin.org/article/10.3389/fimmu.2018.01038/fullTCR repertoirediversitysamplingnormalizationbioinformatics
collection DOAJ
language English
format Article
sources DOAJ
author Wahiba Chaara
Wahiba Chaara
Ariadna Gonzalez-Tort
Laura-Maria Florez
David Klatzmann
David Klatzmann
Encarnita Mariotti-Ferrandiz
Encarnita Mariotti-Ferrandiz
Adrien Six
Adrien Six
spellingShingle Wahiba Chaara
Wahiba Chaara
Ariadna Gonzalez-Tort
Laura-Maria Florez
David Klatzmann
David Klatzmann
Encarnita Mariotti-Ferrandiz
Encarnita Mariotti-Ferrandiz
Adrien Six
Adrien Six
RepSeq Data Representativeness and Robustness Assessment by Shannon Entropy
Frontiers in Immunology
TCR repertoire
diversity
sampling
normalization
bioinformatics
author_facet Wahiba Chaara
Wahiba Chaara
Ariadna Gonzalez-Tort
Laura-Maria Florez
David Klatzmann
David Klatzmann
Encarnita Mariotti-Ferrandiz
Encarnita Mariotti-Ferrandiz
Adrien Six
Adrien Six
author_sort Wahiba Chaara
title RepSeq Data Representativeness and Robustness Assessment by Shannon Entropy
title_short RepSeq Data Representativeness and Robustness Assessment by Shannon Entropy
title_full RepSeq Data Representativeness and Robustness Assessment by Shannon Entropy
title_fullStr RepSeq Data Representativeness and Robustness Assessment by Shannon Entropy
title_full_unstemmed RepSeq Data Representativeness and Robustness Assessment by Shannon Entropy
title_sort repseq data representativeness and robustness assessment by shannon entropy
publisher Frontiers Media S.A.
series Frontiers in Immunology
issn 1664-3224
publishDate 2018-05-01
description High-throughput sequencing (HTS) has the potential to decipher the diversity of T cell repertoires and their dynamics during immune responses. Applied to T cell subsets such as T effector and T regulatory cells, it should help identify novel biomarkers of diseases. However, given the extreme diversity of TCR repertoires, understanding how the sequencing conditions, including cell numbers, biological and technical sampling and sequencing depth, impact the experimental outcome is critical to proper use of these data. Here, we assessed the representativeness and robustness of TCR repertoire diversity assessment according to experimental conditions. By comparative analyses of experimental datasets and computer simulations, we found that (i) for small samples, the number of clonotypes recovered is often higher than the number of cells per sample, even after removing the singletons; (ii) high-sequencing depth for small samples alters the clonotype distributions, which can be corrected by filtering the datasets using Shannon entropy as a threshold; and (iii) a single sequencing run at high depth does not ensure a good coverage of the clonotype richness in highly polyclonal populations, which can be better covered using multiple sequencing. Altogether, our results warrant better understanding and awareness of the limitation of TCR diversity analyses by HTS and justify the development of novel computational tools for improved modeling of the highly complex nature of TCR repertoires.
topic TCR repertoire
diversity
sampling
normalization
bioinformatics
url http://journal.frontiersin.org/article/10.3389/fimmu.2018.01038/full
work_keys_str_mv AT wahibachaara repseqdatarepresentativenessandrobustnessassessmentbyshannonentropy
AT wahibachaara repseqdatarepresentativenessandrobustnessassessmentbyshannonentropy
AT ariadnagonzaleztort repseqdatarepresentativenessandrobustnessassessmentbyshannonentropy
AT lauramariaflorez repseqdatarepresentativenessandrobustnessassessmentbyshannonentropy
AT davidklatzmann repseqdatarepresentativenessandrobustnessassessmentbyshannonentropy
AT davidklatzmann repseqdatarepresentativenessandrobustnessassessmentbyshannonentropy
AT encarnitamariottiferrandiz repseqdatarepresentativenessandrobustnessassessmentbyshannonentropy
AT encarnitamariottiferrandiz repseqdatarepresentativenessandrobustnessassessmentbyshannonentropy
AT adriensix repseqdatarepresentativenessandrobustnessassessmentbyshannonentropy
AT adriensix repseqdatarepresentativenessandrobustnessassessmentbyshannonentropy
_version_ 1725386461017014272