A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy

Abstract Background Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unr...

Full description

Bibliographic Details
Main Authors: Xiang Gao, Huaiying Lin, Kashi Revanna, Qunfeng Dong
Format: Article
Language:English
Published: BMC 2017-05-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-017-1670-4
id doaj-3f160480a5c8438294c3004ac16fb5a9
record_format Article
spelling doaj-3f160480a5c8438294c3004ac16fb5a92020-11-24T21:15:34ZengBMCBMC Bioinformatics1471-21052017-05-0118111010.1186/s12859-017-1670-4A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracyXiang Gao0Huaiying Lin1Kashi Revanna2Qunfeng Dong3Department of Public Health Sciences, Loyola University Chicago Health Sciences DivisionDepartment of Public Health Sciences, Loyola University Chicago Health Sciences DivisionDepartment of Public Health Sciences, Loyola University Chicago Health Sciences DivisionDepartment of Public Health Sciences, Loyola University Chicago Health Sciences DivisionAbstract Background Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. Results We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Conclusions Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .http://link.springer.com/article/10.1186/s12859-017-1670-416S rRNA geneTaxonomic classification
collection DOAJ
language English
format Article
sources DOAJ
author Xiang Gao
Huaiying Lin
Kashi Revanna
Qunfeng Dong
spellingShingle Xiang Gao
Huaiying Lin
Kashi Revanna
Qunfeng Dong
A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy
BMC Bioinformatics
16S rRNA gene
Taxonomic classification
author_facet Xiang Gao
Huaiying Lin
Kashi Revanna
Qunfeng Dong
author_sort Xiang Gao
title A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy
title_short A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy
title_full A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy
title_fullStr A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy
title_full_unstemmed A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy
title_sort bayesian taxonomic classification method for 16s rrna gene sequences with improved species-level accuracy
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2017-05-01
description Abstract Background Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. Results We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Conclusions Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .
topic 16S rRNA gene
Taxonomic classification
url http://link.springer.com/article/10.1186/s12859-017-1670-4
work_keys_str_mv AT xianggao abayesiantaxonomicclassificationmethodfor16srrnagenesequenceswithimprovedspecieslevelaccuracy
AT huaiyinglin abayesiantaxonomicclassificationmethodfor16srrnagenesequenceswithimprovedspecieslevelaccuracy
AT kashirevanna abayesiantaxonomicclassificationmethodfor16srrnagenesequenceswithimprovedspecieslevelaccuracy
AT qunfengdong abayesiantaxonomicclassificationmethodfor16srrnagenesequenceswithimprovedspecieslevelaccuracy
AT xianggao bayesiantaxonomicclassificationmethodfor16srrnagenesequenceswithimprovedspecieslevelaccuracy
AT huaiyinglin bayesiantaxonomicclassificationmethodfor16srrnagenesequenceswithimprovedspecieslevelaccuracy
AT kashirevanna bayesiantaxonomicclassificationmethodfor16srrnagenesequenceswithimprovedspecieslevelaccuracy
AT qunfengdong bayesiantaxonomicclassificationmethodfor16srrnagenesequenceswithimprovedspecieslevelaccuracy
_version_ 1716744764159688704