Variant callers for next-generation sequencing data: a comparison study.

Next generation sequencing (NGS) has been leading the genetic study of human disease into an era of unprecedented productivity. Many bioinformatics pipelines have been developed to call variants from NGS data. The performance of these pipelines depends crucially on the variant caller used and on the...

Full description

Bibliographic Details
Main Authors: Xiangtao Liu, Shizhong Han, Zuoheng Wang, Joel Gelernter, Bao-Zhu Yang
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2013-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3785481?pdf=render
id doaj-f0790c86b2f941ff85162eed0767165f
record_format Article
spelling doaj-f0790c86b2f941ff85162eed0767165f2020-11-25T02:34:22ZengPublic Library of Science (PLoS)PLoS ONE1932-62032013-01-0189e7561910.1371/journal.pone.0075619Variant callers for next-generation sequencing data: a comparison study.Xiangtao LiuShizhong HanZuoheng WangJoel GelernterBao-Zhu YangNext generation sequencing (NGS) has been leading the genetic study of human disease into an era of unprecedented productivity. Many bioinformatics pipelines have been developed to call variants from NGS data. The performance of these pipelines depends crucially on the variant caller used and on the calling strategies implemented. We studied the performance of four prevailing callers, SAMtools, GATK, glftools and Atlas2, using single-sample and multiple-sample variant-calling strategies. Using the same aligner, BWA, we built four single-sample and three multiple-sample calling pipelines and applied the pipelines to whole exome sequencing data taken from 20 individuals. We obtained genotypes generated by Illumina Infinium HumanExome v1.1 Beadchip for validation analysis and then used Sanger sequencing as a "gold-standard" method to resolve discrepancies for selected regions of high discordance. Finally, we compared the sensitivity of three of the single-sample calling pipelines using known simulated whole genome sequence data as a gold standard. Overall, for single-sample calling, the called variants were highly consistent across callers and the pairwise overlapping rate was about 0.9. Compared with other callers, GATK had the highest rediscovery rate (0.9969) and specificity (0.99996), and the Ti/Tv ratio out of GATK was closest to the expected value of 3.02. Multiple-sample calling increased the sensitivity. Results from the simulated data suggested that GATK outperformed SAMtools and glfSingle in sensitivity, especially for low coverage data. Further, for the selected discrepant regions evaluated by Sanger sequencing, variant genotypes called by exome sequencing versus the exome array were more accurate, although the average variant sensitivity and overall genotype consistency rate were as high as 95.87% and 99.82%, respectively. In conclusion, GATK showed several advantages over other variant callers for general purpose NGS analyses. The GATK pipelines we developed perform very well.http://europepmc.org/articles/PMC3785481?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Xiangtao Liu
Shizhong Han
Zuoheng Wang
Joel Gelernter
Bao-Zhu Yang
spellingShingle Xiangtao Liu
Shizhong Han
Zuoheng Wang
Joel Gelernter
Bao-Zhu Yang
Variant callers for next-generation sequencing data: a comparison study.
PLoS ONE
author_facet Xiangtao Liu
Shizhong Han
Zuoheng Wang
Joel Gelernter
Bao-Zhu Yang
author_sort Xiangtao Liu
title Variant callers for next-generation sequencing data: a comparison study.
title_short Variant callers for next-generation sequencing data: a comparison study.
title_full Variant callers for next-generation sequencing data: a comparison study.
title_fullStr Variant callers for next-generation sequencing data: a comparison study.
title_full_unstemmed Variant callers for next-generation sequencing data: a comparison study.
title_sort variant callers for next-generation sequencing data: a comparison study.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2013-01-01
description Next generation sequencing (NGS) has been leading the genetic study of human disease into an era of unprecedented productivity. Many bioinformatics pipelines have been developed to call variants from NGS data. The performance of these pipelines depends crucially on the variant caller used and on the calling strategies implemented. We studied the performance of four prevailing callers, SAMtools, GATK, glftools and Atlas2, using single-sample and multiple-sample variant-calling strategies. Using the same aligner, BWA, we built four single-sample and three multiple-sample calling pipelines and applied the pipelines to whole exome sequencing data taken from 20 individuals. We obtained genotypes generated by Illumina Infinium HumanExome v1.1 Beadchip for validation analysis and then used Sanger sequencing as a "gold-standard" method to resolve discrepancies for selected regions of high discordance. Finally, we compared the sensitivity of three of the single-sample calling pipelines using known simulated whole genome sequence data as a gold standard. Overall, for single-sample calling, the called variants were highly consistent across callers and the pairwise overlapping rate was about 0.9. Compared with other callers, GATK had the highest rediscovery rate (0.9969) and specificity (0.99996), and the Ti/Tv ratio out of GATK was closest to the expected value of 3.02. Multiple-sample calling increased the sensitivity. Results from the simulated data suggested that GATK outperformed SAMtools and glfSingle in sensitivity, especially for low coverage data. Further, for the selected discrepant regions evaluated by Sanger sequencing, variant genotypes called by exome sequencing versus the exome array were more accurate, although the average variant sensitivity and overall genotype consistency rate were as high as 95.87% and 99.82%, respectively. In conclusion, GATK showed several advantages over other variant callers for general purpose NGS analyses. The GATK pipelines we developed perform very well.
url http://europepmc.org/articles/PMC3785481?pdf=render
work_keys_str_mv AT xiangtaoliu variantcallersfornextgenerationsequencingdataacomparisonstudy
AT shizhonghan variantcallersfornextgenerationsequencingdataacomparisonstudy
AT zuohengwang variantcallersfornextgenerationsequencingdataacomparisonstudy
AT joelgelernter variantcallersfornextgenerationsequencingdataacomparisonstudy
AT baozhuyang variantcallersfornextgenerationsequencingdataacomparisonstudy
_version_ 1724809315753132032