The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories

The adaptation of high-throughput sequencing to the B cell receptor and T cell receptor has made it possible to characterize the adaptive immune receptor repertoire (AIRR) at unprecedented depth. These AIRR sequencing (AIRR-seq) studies offer tremendous potential to increase the understanding of ada...

Full description

Bibliographic Details
Main Authors: Syed Ahmad Chan Bukhari, Martin J. O’Connor, Marcos Martínez-Romero, Attila L. Egyedi, Debra Willrett, John Graybeal, Mark A. Musen, Florian Rubelt, Kei-Hoi Cheung, Steven H. Kleinstein
Format: Article
Language:English
Published: Frontiers Media S.A. 2018-08-01
Series:Frontiers in Immunology
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fimmu.2018.01877/full
id doaj-e28f95a5137843d3a78e695d29ec81b2
record_format Article
collection DOAJ
language English
format Article
sources DOAJ
author Syed Ahmad Chan Bukhari
Martin J. O’Connor
Marcos Martínez-Romero
Attila L. Egyedi
Debra Willrett
John Graybeal
Mark A. Musen
Florian Rubelt
Kei-Hoi Cheung
Kei-Hoi Cheung
Kei-Hoi Cheung
Steven H. Kleinstein
Steven H. Kleinstein
spellingShingle Syed Ahmad Chan Bukhari
Martin J. O’Connor
Marcos Martínez-Romero
Attila L. Egyedi
Debra Willrett
John Graybeal
Mark A. Musen
Florian Rubelt
Kei-Hoi Cheung
Kei-Hoi Cheung
Kei-Hoi Cheung
Steven H. Kleinstein
Steven H. Kleinstein
The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories
Frontiers in Immunology
immune-repertoire sequencing
Rep-seq
antibody
B cell receptor
T cell receptor
National Center for Biotechnology Information
author_facet Syed Ahmad Chan Bukhari
Martin J. O’Connor
Marcos Martínez-Romero
Attila L. Egyedi
Debra Willrett
John Graybeal
Mark A. Musen
Florian Rubelt
Kei-Hoi Cheung
Kei-Hoi Cheung
Kei-Hoi Cheung
Steven H. Kleinstein
Steven H. Kleinstein
author_sort Syed Ahmad Chan Bukhari
title The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories
title_short The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories
title_full The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories
title_fullStr The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories
title_full_unstemmed The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories
title_sort cairr pipeline for submitting standards-compliant b and t cell receptor repertoire sequencing studies to the national center for biotechnology information repositories
publisher Frontiers Media S.A.
series Frontiers in Immunology
issn 1664-3224
publishDate 2018-08-01
description The adaptation of high-throughput sequencing to the B cell receptor and T cell receptor has made it possible to characterize the adaptive immune receptor repertoire (AIRR) at unprecedented depth. These AIRR sequencing (AIRR-seq) studies offer tremendous potential to increase the understanding of adaptive immune responses in vaccinology, infectious disease, autoimmunity, and cancer. The increasingly wide application of AIRR-seq is leading to a critical mass of studies being deposited in the public domain, offering the possibility of novel scientific insights through secondary analyses and meta-analyses. However, effective sharing of these large-scale data remains a challenge. The AIRR community has proposed minimal information about adaptive immune receptor repertoire (MiAIRR), a standard for reporting AIRR-seq studies. The MiAIRR standard has been operationalized using the National Center for Biotechnology Information (NCBI) repositories. Submissions of AIRR-seq data to the NCBI repositories typically use a combination of web-based and flat-file templates and include only a minimal amount of terminology validation. As a result, AIRR-seq studies at the NCBI are often described using inconsistent terminologies, limiting scientists’ ability to access, find, interoperate, and reuse the data sets. In order to improve metadata quality and ease submission of AIRR-seq studies to the NCBI, we have leveraged the software framework developed by the Center for Expanded Data Annotation and Retrieval (CEDAR), which develops technologies involving the use of data standards and ontologies to improve metadata quality. The resulting CEDAR-AIRR (CAIRR) pipeline enables data submitters to: (i) create web-based templates whose entries are controlled by ontology terms, (ii) generate and validate metadata, and (iii) submit the ontology-linked metadata and sequence files (FASTQ) to the NCBI BioProject, BioSample, and Sequence Read Archive databases. Overall, CAIRR provides a web-based metadata submission interface that supports compliance with the MiAIRR standard. This pipeline is available at http://cairr.miairr.org, and will facilitate the NCBI submission process and improve the metadata quality of AIRR-seq studies.
topic immune-repertoire sequencing
Rep-seq
antibody
B cell receptor
T cell receptor
National Center for Biotechnology Information
url https://www.frontiersin.org/article/10.3389/fimmu.2018.01877/full
work_keys_str_mv AT syedahmadchanbukhari thecairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT martinjoconnor thecairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT marcosmartinezromero thecairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT attilalegyedi thecairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT debrawillrett thecairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT johngraybeal thecairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT markamusen thecairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT florianrubelt thecairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT keihoicheung thecairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT keihoicheung thecairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT keihoicheung thecairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT stevenhkleinstein thecairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT stevenhkleinstein thecairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT syedahmadchanbukhari cairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT martinjoconnor cairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT marcosmartinezromero cairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT attilalegyedi cairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT debrawillrett cairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT johngraybeal cairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT markamusen cairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT florianrubelt cairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT keihoicheung cairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT keihoicheung cairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT keihoicheung cairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT stevenhkleinstein cairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
AT stevenhkleinstein cairrpipelineforsubmittingstandardscompliantbandtcellreceptorrepertoiresequencingstudiestothenationalcenterforbiotechnologyinformationrepositories
_version_ 1725356862234165248
spelling doaj-e28f95a5137843d3a78e695d29ec81b22020-11-25T00:23:29ZengFrontiers Media S.A.Frontiers in Immunology1664-32242018-08-01910.3389/fimmu.2018.01877402322The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information RepositoriesSyed Ahmad Chan Bukhari0Martin J. O’Connor1Marcos Martínez-Romero2Attila L. Egyedi3Debra Willrett4John Graybeal5Mark A. Musen6Florian Rubelt7Kei-Hoi Cheung8Kei-Hoi Cheung9Kei-Hoi Cheung10Steven H. Kleinstein11Steven H. Kleinstein12Department of Pathology, Yale School of Medicine, Yale University, New Haven, CT, United StatesStanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United StatesStanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United StatesStanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United StatesStanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United StatesStanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United StatesStanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United StatesDepartment of Microbiology and Immunology, Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA, United StatesDepartment of Emergency Medicine, Yale School of Medicine, Yale University, New Haven, CT, United StatesYale Center for Medical Informatics, Yale School of Medicine, Yale University, New Haven, CT, United StatesInterdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, United StatesDepartment of Pathology, Yale School of Medicine, Yale University, New Haven, CT, United StatesInterdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, United StatesThe adaptation of high-throughput sequencing to the B cell receptor and T cell receptor has made it possible to characterize the adaptive immune receptor repertoire (AIRR) at unprecedented depth. These AIRR sequencing (AIRR-seq) studies offer tremendous potential to increase the understanding of adaptive immune responses in vaccinology, infectious disease, autoimmunity, and cancer. The increasingly wide application of AIRR-seq is leading to a critical mass of studies being deposited in the public domain, offering the possibility of novel scientific insights through secondary analyses and meta-analyses. However, effective sharing of these large-scale data remains a challenge. The AIRR community has proposed minimal information about adaptive immune receptor repertoire (MiAIRR), a standard for reporting AIRR-seq studies. The MiAIRR standard has been operationalized using the National Center for Biotechnology Information (NCBI) repositories. Submissions of AIRR-seq data to the NCBI repositories typically use a combination of web-based and flat-file templates and include only a minimal amount of terminology validation. As a result, AIRR-seq studies at the NCBI are often described using inconsistent terminologies, limiting scientists’ ability to access, find, interoperate, and reuse the data sets. In order to improve metadata quality and ease submission of AIRR-seq studies to the NCBI, we have leveraged the software framework developed by the Center for Expanded Data Annotation and Retrieval (CEDAR), which develops technologies involving the use of data standards and ontologies to improve metadata quality. The resulting CEDAR-AIRR (CAIRR) pipeline enables data submitters to: (i) create web-based templates whose entries are controlled by ontology terms, (ii) generate and validate metadata, and (iii) submit the ontology-linked metadata and sequence files (FASTQ) to the NCBI BioProject, BioSample, and Sequence Read Archive databases. Overall, CAIRR provides a web-based metadata submission interface that supports compliance with the MiAIRR standard. This pipeline is available at http://cairr.miairr.org, and will facilitate the NCBI submission process and improve the metadata quality of AIRR-seq studies.https://www.frontiersin.org/article/10.3389/fimmu.2018.01877/fullimmune-repertoire sequencingRep-seqantibodyB cell receptorT cell receptorNational Center for Biotechnology Information