Meta-QC-Chain: Comprehensive and Fast Quality Control Method for Metagenomic Data

Next-generation sequencing (NGS) technology has revolutionized and significantly impacted metagenomic research. However, the NGS data usually contains sequencing artifacts such as low-quality reads and contaminating reads, which will significantly compromise downstream analysis. Many quality control...

Full description

Bibliographic Details
Main Authors: Qian Zhou, Xiaoquan Su, Gongchao Jing, Kang Ning
Format: Article
Language:English
Published: Elsevier 2014-02-01
Series:Genomics, Proteomics & Bioinformatics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1672022914000060
id doaj-e8345657479a43fa82ff8efa4ffb03ef
record_format Article
spelling doaj-e8345657479a43fa82ff8efa4ffb03ef2020-11-25T01:02:27ZengElsevierGenomics, Proteomics & Bioinformatics1672-02292014-02-01121525610.1016/j.gpb.2014.01.002Meta-QC-Chain: Comprehensive and Fast Quality Control Method for Metagenomic DataQian ZhouXiaoquan SuGongchao JingKang NingNext-generation sequencing (NGS) technology has revolutionized and significantly impacted metagenomic research. However, the NGS data usually contains sequencing artifacts such as low-quality reads and contaminating reads, which will significantly compromise downstream analysis. Many quality control (QC) tools have been proposed, however, few of them have been verified to be suitable or efficient for metagenomic data, which are composed of multiple genomes and are more complex than other kinds of NGS data. Here we present a metagenomic data QC method named Meta-QC-Chain. Meta-QC-Chain combines multiple QC functions: technical tests describe input data status and identify potential errors, quality trimming filters poor sequencing-quality bases and reads, and contamination screening identifies higher eukaryotic species, which are considered as contamination for metagenomic data. Most computing processes are optimized based on parallel programming. Testing on an 8-GB real dataset showed that Meta-QC-Chain trimmed low sequencing-quality reads and contaminating reads, and the whole quality control procedure was completed within 20 min. Therefore, Meta-QC-Chain provides a comprehensive, useful and high-performance QC tool for metagenomic data. Meta-QC-Chain is publicly available for free at: http://computationalbioenergy.org/meta-qc-chain.html.http://www.sciencedirect.com/science/article/pii/S1672022914000060Quality controlMetagenomic dataParallel computingNext-generation sequencing
collection DOAJ
language English
format Article
sources DOAJ
author Qian Zhou
Xiaoquan Su
Gongchao Jing
Kang Ning
spellingShingle Qian Zhou
Xiaoquan Su
Gongchao Jing
Kang Ning
Meta-QC-Chain: Comprehensive and Fast Quality Control Method for Metagenomic Data
Genomics, Proteomics & Bioinformatics
Quality control
Metagenomic data
Parallel computing
Next-generation sequencing
author_facet Qian Zhou
Xiaoquan Su
Gongchao Jing
Kang Ning
author_sort Qian Zhou
title Meta-QC-Chain: Comprehensive and Fast Quality Control Method for Metagenomic Data
title_short Meta-QC-Chain: Comprehensive and Fast Quality Control Method for Metagenomic Data
title_full Meta-QC-Chain: Comprehensive and Fast Quality Control Method for Metagenomic Data
title_fullStr Meta-QC-Chain: Comprehensive and Fast Quality Control Method for Metagenomic Data
title_full_unstemmed Meta-QC-Chain: Comprehensive and Fast Quality Control Method for Metagenomic Data
title_sort meta-qc-chain: comprehensive and fast quality control method for metagenomic data
publisher Elsevier
series Genomics, Proteomics & Bioinformatics
issn 1672-0229
publishDate 2014-02-01
description Next-generation sequencing (NGS) technology has revolutionized and significantly impacted metagenomic research. However, the NGS data usually contains sequencing artifacts such as low-quality reads and contaminating reads, which will significantly compromise downstream analysis. Many quality control (QC) tools have been proposed, however, few of them have been verified to be suitable or efficient for metagenomic data, which are composed of multiple genomes and are more complex than other kinds of NGS data. Here we present a metagenomic data QC method named Meta-QC-Chain. Meta-QC-Chain combines multiple QC functions: technical tests describe input data status and identify potential errors, quality trimming filters poor sequencing-quality bases and reads, and contamination screening identifies higher eukaryotic species, which are considered as contamination for metagenomic data. Most computing processes are optimized based on parallel programming. Testing on an 8-GB real dataset showed that Meta-QC-Chain trimmed low sequencing-quality reads and contaminating reads, and the whole quality control procedure was completed within 20 min. Therefore, Meta-QC-Chain provides a comprehensive, useful and high-performance QC tool for metagenomic data. Meta-QC-Chain is publicly available for free at: http://computationalbioenergy.org/meta-qc-chain.html.
topic Quality control
Metagenomic data
Parallel computing
Next-generation sequencing
url http://www.sciencedirect.com/science/article/pii/S1672022914000060
work_keys_str_mv AT qianzhou metaqcchaincomprehensiveandfastqualitycontrolmethodformetagenomicdata
AT xiaoquansu metaqcchaincomprehensiveandfastqualitycontrolmethodformetagenomicdata
AT gongchaojing metaqcchaincomprehensiveandfastqualitycontrolmethodformetagenomicdata
AT kangning metaqcchaincomprehensiveandfastqualitycontrolmethodformetagenomicdata
_version_ 1725204832438976512