Predicting B cell receptor substitution profiles using public repertoire data.

B cells develop high affinity receptors during the course of affinity maturation, a cyclic process of mutation and selection. At the end of affinity maturation, a number of cells sharing the same ancestor (i.e. in the same "clonal family") are released from the germinal center; their amino...

Full description

Bibliographic Details
Main Authors: Amrit Dhar, Kristian Davidsen, Frederick A Matsen, Vladimir N Minin
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2018-10-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC6205660?pdf=render
id doaj-81bd538e6bad475499757b4572e3e6d0
record_format Article
spelling doaj-81bd538e6bad475499757b4572e3e6d02020-11-25T01:48:09ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582018-10-011410e100638810.1371/journal.pcbi.1006388Predicting B cell receptor substitution profiles using public repertoire data.Amrit DharKristian DavidsenFrederick A MatsenVladimir N MininB cells develop high affinity receptors during the course of affinity maturation, a cyclic process of mutation and selection. At the end of affinity maturation, a number of cells sharing the same ancestor (i.e. in the same "clonal family") are released from the germinal center; their amino acid frequency profile reflects the allowed and disallowed substitutions at each position. These clonal-family-specific frequency profiles, called "substitution profiles", are useful for studying the course of affinity maturation as well as for antibody engineering purposes. However, most often only a single sequence is recovered from each clonal family in a sequencing experiment, making it impossible to construct a clonal-family-specific substitution profile. Given the public release of many high-quality large B cell receptor datasets, one may ask whether it is possible to use such data in a prediction model for clonal-family-specific substitution profiles. In this paper, we present the method "Substitution Profiles Using Related Families" (SPURF), a penalized tensor regression framework that integrates information from a rich assemblage of datasets to predict the clonal-family-specific substitution profile for any single input sequence. Using this framework, we show that substitution profiles from similar clonal families can be leveraged together with simulated substitution profiles and germline gene sequence information to improve prediction. We fit this model on a large public dataset and validate the robustness of our approach on two external datasets. Furthermore, we provide a command-line tool in an open-source software package (https://github.com/krdav/SPURF) implementing these ideas and providing easy prediction using our pre-fit models.http://europepmc.org/articles/PMC6205660?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Amrit Dhar
Kristian Davidsen
Frederick A Matsen
Vladimir N Minin
spellingShingle Amrit Dhar
Kristian Davidsen
Frederick A Matsen
Vladimir N Minin
Predicting B cell receptor substitution profiles using public repertoire data.
PLoS Computational Biology
author_facet Amrit Dhar
Kristian Davidsen
Frederick A Matsen
Vladimir N Minin
author_sort Amrit Dhar
title Predicting B cell receptor substitution profiles using public repertoire data.
title_short Predicting B cell receptor substitution profiles using public repertoire data.
title_full Predicting B cell receptor substitution profiles using public repertoire data.
title_fullStr Predicting B cell receptor substitution profiles using public repertoire data.
title_full_unstemmed Predicting B cell receptor substitution profiles using public repertoire data.
title_sort predicting b cell receptor substitution profiles using public repertoire data.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2018-10-01
description B cells develop high affinity receptors during the course of affinity maturation, a cyclic process of mutation and selection. At the end of affinity maturation, a number of cells sharing the same ancestor (i.e. in the same "clonal family") are released from the germinal center; their amino acid frequency profile reflects the allowed and disallowed substitutions at each position. These clonal-family-specific frequency profiles, called "substitution profiles", are useful for studying the course of affinity maturation as well as for antibody engineering purposes. However, most often only a single sequence is recovered from each clonal family in a sequencing experiment, making it impossible to construct a clonal-family-specific substitution profile. Given the public release of many high-quality large B cell receptor datasets, one may ask whether it is possible to use such data in a prediction model for clonal-family-specific substitution profiles. In this paper, we present the method "Substitution Profiles Using Related Families" (SPURF), a penalized tensor regression framework that integrates information from a rich assemblage of datasets to predict the clonal-family-specific substitution profile for any single input sequence. Using this framework, we show that substitution profiles from similar clonal families can be leveraged together with simulated substitution profiles and germline gene sequence information to improve prediction. We fit this model on a large public dataset and validate the robustness of our approach on two external datasets. Furthermore, we provide a command-line tool in an open-source software package (https://github.com/krdav/SPURF) implementing these ideas and providing easy prediction using our pre-fit models.
url http://europepmc.org/articles/PMC6205660?pdf=render
work_keys_str_mv AT amritdhar predictingbcellreceptorsubstitutionprofilesusingpublicrepertoiredata
AT kristiandavidsen predictingbcellreceptorsubstitutionprofilesusingpublicrepertoiredata
AT frederickamatsen predictingbcellreceptorsubstitutionprofilesusingpublicrepertoiredata
AT vladimirnminin predictingbcellreceptorsubstitutionprofilesusingpublicrepertoiredata
_version_ 1725012663718641664