Low-level variant calling for non-matched samples using a position-based and nucleotide-specific approach

Abstract Background The widespread use of next-generation sequencing has identified an important role for somatic mosaicism in many diseases. However, detecting low-level mosaic variants from next-generation sequencing data remains challenging. Results Here, we present a method for Position-Based Va...

Full description

Bibliographic Details
Main Authors: Jeffrey N. Dudley, Celine S. Hong, Marwan A. Hawari, Jasmine Shwetar, Julie C. Sapp, Justin Lack, Henoke Shiferaw, NISC Comparative Sequencing Program, Jennifer J. Johnston, Leslie G. Biesecker
Format: Article
Language:English
Published: BMC 2021-04-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-021-04090-y
id doaj-ef209cf7fefe41e7888cace1d84a2c3f
record_format Article
spelling doaj-ef209cf7fefe41e7888cace1d84a2c3f2021-04-11T11:44:59ZengBMCBMC Bioinformatics1471-21052021-04-0122111710.1186/s12859-021-04090-yLow-level variant calling for non-matched samples using a position-based and nucleotide-specific approachJeffrey N. Dudley0Celine S. Hong1Marwan A. Hawari2Jasmine Shwetar3Julie C. Sapp4Justin Lack5Henoke Shiferaw6NISC Comparative Sequencing Program7Jennifer J. Johnston8Leslie G. Biesecker9National Human Genome Research Institute, National Institutes of HealthNational Human Genome Research Institute, National Institutes of HealthNational Human Genome Research Institute, National Institutes of HealthNational Human Genome Research Institute, National Institutes of HealthNational Human Genome Research Institute, National Institutes of HealthNIAID Collaborative Bioinformatics Resource, National Institutes of Allergy and Infectious Diseases, National Institutes of HealthNational Human Genome Research Institute, National Institutes of HealthNIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of HealthNational Human Genome Research Institute, National Institutes of HealthNational Human Genome Research Institute, National Institutes of HealthAbstract Background The widespread use of next-generation sequencing has identified an important role for somatic mosaicism in many diseases. However, detecting low-level mosaic variants from next-generation sequencing data remains challenging. Results Here, we present a method for Position-Based Variant Identification (PBVI) that uses empirically-derived distributions of alternate nucleotides from a control dataset. We modeled this approach on 11 segmental overgrowth genes. We show that this method improves detection of single nucleotide mosaic variants of 0.01–0.05 variant allele fraction compared to other low-level variant callers. At depths of 600 × and 1200 ×, we observed > 85% and > 95% sensitivity, respectively. In a cohort of 26 individuals with somatic overgrowth disorders PBVI showed improved signal to noise, identifying pathogenic variants in 17 individuals. Conclusion PBVI can facilitate identification of low-level mosaic variants thus increasing the utility of next-generation sequencing data for research and diagnostic purposes.https://doi.org/10.1186/s12859-021-04090-yMosaic variantsPrediction of mosaic variantsSomatic overgrowth disorder
collection DOAJ
language English
format Article
sources DOAJ
author Jeffrey N. Dudley
Celine S. Hong
Marwan A. Hawari
Jasmine Shwetar
Julie C. Sapp
Justin Lack
Henoke Shiferaw
NISC Comparative Sequencing Program
Jennifer J. Johnston
Leslie G. Biesecker
spellingShingle Jeffrey N. Dudley
Celine S. Hong
Marwan A. Hawari
Jasmine Shwetar
Julie C. Sapp
Justin Lack
Henoke Shiferaw
NISC Comparative Sequencing Program
Jennifer J. Johnston
Leslie G. Biesecker
Low-level variant calling for non-matched samples using a position-based and nucleotide-specific approach
BMC Bioinformatics
Mosaic variants
Prediction of mosaic variants
Somatic overgrowth disorder
author_facet Jeffrey N. Dudley
Celine S. Hong
Marwan A. Hawari
Jasmine Shwetar
Julie C. Sapp
Justin Lack
Henoke Shiferaw
NISC Comparative Sequencing Program
Jennifer J. Johnston
Leslie G. Biesecker
author_sort Jeffrey N. Dudley
title Low-level variant calling for non-matched samples using a position-based and nucleotide-specific approach
title_short Low-level variant calling for non-matched samples using a position-based and nucleotide-specific approach
title_full Low-level variant calling for non-matched samples using a position-based and nucleotide-specific approach
title_fullStr Low-level variant calling for non-matched samples using a position-based and nucleotide-specific approach
title_full_unstemmed Low-level variant calling for non-matched samples using a position-based and nucleotide-specific approach
title_sort low-level variant calling for non-matched samples using a position-based and nucleotide-specific approach
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2021-04-01
description Abstract Background The widespread use of next-generation sequencing has identified an important role for somatic mosaicism in many diseases. However, detecting low-level mosaic variants from next-generation sequencing data remains challenging. Results Here, we present a method for Position-Based Variant Identification (PBVI) that uses empirically-derived distributions of alternate nucleotides from a control dataset. We modeled this approach on 11 segmental overgrowth genes. We show that this method improves detection of single nucleotide mosaic variants of 0.01–0.05 variant allele fraction compared to other low-level variant callers. At depths of 600 × and 1200 ×, we observed > 85% and > 95% sensitivity, respectively. In a cohort of 26 individuals with somatic overgrowth disorders PBVI showed improved signal to noise, identifying pathogenic variants in 17 individuals. Conclusion PBVI can facilitate identification of low-level mosaic variants thus increasing the utility of next-generation sequencing data for research and diagnostic purposes.
topic Mosaic variants
Prediction of mosaic variants
Somatic overgrowth disorder
url https://doi.org/10.1186/s12859-021-04090-y
work_keys_str_mv AT jeffreyndudley lowlevelvariantcallingfornonmatchedsamplesusingapositionbasedandnucleotidespecificapproach
AT celineshong lowlevelvariantcallingfornonmatchedsamplesusingapositionbasedandnucleotidespecificapproach
AT marwanahawari lowlevelvariantcallingfornonmatchedsamplesusingapositionbasedandnucleotidespecificapproach
AT jasmineshwetar lowlevelvariantcallingfornonmatchedsamplesusingapositionbasedandnucleotidespecificapproach
AT juliecsapp lowlevelvariantcallingfornonmatchedsamplesusingapositionbasedandnucleotidespecificapproach
AT justinlack lowlevelvariantcallingfornonmatchedsamplesusingapositionbasedandnucleotidespecificapproach
AT henokeshiferaw lowlevelvariantcallingfornonmatchedsamplesusingapositionbasedandnucleotidespecificapproach
AT nisccomparativesequencingprogram lowlevelvariantcallingfornonmatchedsamplesusingapositionbasedandnucleotidespecificapproach
AT jenniferjjohnston lowlevelvariantcallingfornonmatchedsamplesusingapositionbasedandnucleotidespecificapproach
AT lesliegbiesecker lowlevelvariantcallingfornonmatchedsamplesusingapositionbasedandnucleotidespecificapproach
_version_ 1721530752472973312