Meta-QC-Chain: Comprehensive and Fast Quality Control Method for Metagenomic Data
Next-generation sequencing (NGS) technology has revolutionized and significantly impacted metagenomic research. However, the NGS data usually contains sequencing artifacts such as low-quality reads and contaminating reads, which will significantly compromise downstream analysis. Many quality control...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2014-02-01
|
Series: | Genomics, Proteomics & Bioinformatics |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S1672022914000060 |
id |
doaj-e8345657479a43fa82ff8efa4ffb03ef |
---|---|
record_format |
Article |
spelling |
doaj-e8345657479a43fa82ff8efa4ffb03ef2020-11-25T01:02:27ZengElsevierGenomics, Proteomics & Bioinformatics1672-02292014-02-01121525610.1016/j.gpb.2014.01.002Meta-QC-Chain: Comprehensive and Fast Quality Control Method for Metagenomic DataQian ZhouXiaoquan SuGongchao JingKang NingNext-generation sequencing (NGS) technology has revolutionized and significantly impacted metagenomic research. However, the NGS data usually contains sequencing artifacts such as low-quality reads and contaminating reads, which will significantly compromise downstream analysis. Many quality control (QC) tools have been proposed, however, few of them have been verified to be suitable or efficient for metagenomic data, which are composed of multiple genomes and are more complex than other kinds of NGS data. Here we present a metagenomic data QC method named Meta-QC-Chain. Meta-QC-Chain combines multiple QC functions: technical tests describe input data status and identify potential errors, quality trimming filters poor sequencing-quality bases and reads, and contamination screening identifies higher eukaryotic species, which are considered as contamination for metagenomic data. Most computing processes are optimized based on parallel programming. Testing on an 8-GB real dataset showed that Meta-QC-Chain trimmed low sequencing-quality reads and contaminating reads, and the whole quality control procedure was completed within 20 min. Therefore, Meta-QC-Chain provides a comprehensive, useful and high-performance QC tool for metagenomic data. Meta-QC-Chain is publicly available for free at: http://computationalbioenergy.org/meta-qc-chain.html.http://www.sciencedirect.com/science/article/pii/S1672022914000060Quality controlMetagenomic dataParallel computingNext-generation sequencing |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Qian Zhou Xiaoquan Su Gongchao Jing Kang Ning |
spellingShingle |
Qian Zhou Xiaoquan Su Gongchao Jing Kang Ning Meta-QC-Chain: Comprehensive and Fast Quality Control Method for Metagenomic Data Genomics, Proteomics & Bioinformatics Quality control Metagenomic data Parallel computing Next-generation sequencing |
author_facet |
Qian Zhou Xiaoquan Su Gongchao Jing Kang Ning |
author_sort |
Qian Zhou |
title |
Meta-QC-Chain: Comprehensive and Fast Quality Control Method for Metagenomic Data |
title_short |
Meta-QC-Chain: Comprehensive and Fast Quality Control Method for Metagenomic Data |
title_full |
Meta-QC-Chain: Comprehensive and Fast Quality Control Method for Metagenomic Data |
title_fullStr |
Meta-QC-Chain: Comprehensive and Fast Quality Control Method for Metagenomic Data |
title_full_unstemmed |
Meta-QC-Chain: Comprehensive and Fast Quality Control Method for Metagenomic Data |
title_sort |
meta-qc-chain: comprehensive and fast quality control method for metagenomic data |
publisher |
Elsevier |
series |
Genomics, Proteomics & Bioinformatics |
issn |
1672-0229 |
publishDate |
2014-02-01 |
description |
Next-generation sequencing (NGS) technology has revolutionized and significantly impacted metagenomic research. However, the NGS data usually contains sequencing artifacts such as low-quality reads and contaminating reads, which will significantly compromise downstream analysis. Many quality control (QC) tools have been proposed, however, few of them have been verified to be suitable or efficient for metagenomic data, which are composed of multiple genomes and are more complex than other kinds of NGS data. Here we present a metagenomic data QC method named Meta-QC-Chain. Meta-QC-Chain combines multiple QC functions: technical tests describe input data status and identify potential errors, quality trimming filters poor sequencing-quality bases and reads, and contamination screening identifies higher eukaryotic species, which are considered as contamination for metagenomic data. Most computing processes are optimized based on parallel programming. Testing on an 8-GB real dataset showed that Meta-QC-Chain trimmed low sequencing-quality reads and contaminating reads, and the whole quality control procedure was completed within 20 min. Therefore, Meta-QC-Chain provides a comprehensive, useful and high-performance QC tool for metagenomic data. Meta-QC-Chain is publicly available for free at: http://computationalbioenergy.org/meta-qc-chain.html. |
topic |
Quality control Metagenomic data Parallel computing Next-generation sequencing |
url |
http://www.sciencedirect.com/science/article/pii/S1672022914000060 |
work_keys_str_mv |
AT qianzhou metaqcchaincomprehensiveandfastqualitycontrolmethodformetagenomicdata AT xiaoquansu metaqcchaincomprehensiveandfastqualitycontrolmethodformetagenomicdata AT gongchaojing metaqcchaincomprehensiveandfastqualitycontrolmethodformetagenomicdata AT kangning metaqcchaincomprehensiveandfastqualitycontrolmethodformetagenomicdata |
_version_ |
1725204832438976512 |