VFM: Identification of Bacteriophages From Metagenomic Bins and Contigs Based on Features Related to Gene and Genome Composition

As the main regulator of microbial community composition, bacteriophages exist widely on Earth. However, since they are hidden in metagenomes, most of them are unknown. To identify phages from metagenomes more effectively, a new tool named VFM (Virus Finding & Mining) is presented in this pa...

Full description

Bibliographic Details
Main Authors: Qiaoliang Liu, Fu Liu, Jiaxue He, Miaolei Zhou, Tao Hou, Yun Liu
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
COG
Online Access:https://ieeexplore.ieee.org/document/8924706/
id doaj-d5492e383afd458087aff33081fb829d
record_format Article
spelling doaj-d5492e383afd458087aff33081fb829d2021-03-29T22:59:15ZengIEEEIEEE Access2169-35362019-01-01717752917753810.1109/ACCESS.2019.29578338924706VFM: Identification of Bacteriophages From Metagenomic Bins and Contigs Based on Features Related to Gene and Genome CompositionQiaoliang Liu0https://orcid.org/0000-0002-6832-3223Fu Liu1Jiaxue He2Miaolei Zhou3https://orcid.org/0000-0002-8040-2768Tao Hou4Yun Liu5College of Communication Engineering, Jilin University, Changchun, ChinaCollege of Communication Engineering, Jilin University, Changchun, ChinaGenetic Diagnosis Center, The First Hospital of Jilin University, Changchun, ChinaCollege of Communication Engineering, Jilin University, Changchun, ChinaCollege of Communication Engineering, Jilin University, Changchun, ChinaCollege of Communication Engineering, Jilin University, Changchun, ChinaAs the main regulator of microbial community composition, bacteriophages exist widely on Earth. However, since they are hidden in metagenomes, most of them are unknown. To identify phages from metagenomes more effectively, a new tool named VFM (Virus Finding & Mining) is presented in this paper. VFM has two versions, i.e., bin-VFM and unbin-VFM. Eighteen new features describing the codon usage bias, the proportion of hits of clusters of orthologous groups of proteins (COG), and 1-mer and 2-mer frequency are introduced to improve the performance of the classifiers. By using missing value interpolation, bin-VFM improves the classification performance for short sequence bins significantly. Compared with previous tools for virus mining, bin-VFM and unbin-VFM perform much better for simulated and real metagenomes with short and long sequences respectively. Thus, VFM may play a helpful role in studies of metagenome-related problems, such as horizontal gene transfer and antibiotic resistance. VFM is freely available at https://github.com/liuql2019/VFM.https://ieeexplore.ieee.org/document/8924706/Codon usage biasCOGmissing value interpolationmetagenomic virusesphage miningshort k-mer frequency
collection DOAJ
language English
format Article
sources DOAJ
author Qiaoliang Liu
Fu Liu
Jiaxue He
Miaolei Zhou
Tao Hou
Yun Liu
spellingShingle Qiaoliang Liu
Fu Liu
Jiaxue He
Miaolei Zhou
Tao Hou
Yun Liu
VFM: Identification of Bacteriophages From Metagenomic Bins and Contigs Based on Features Related to Gene and Genome Composition
IEEE Access
Codon usage bias
COG
missing value interpolation
metagenomic viruses
phage mining
short k-mer frequency
author_facet Qiaoliang Liu
Fu Liu
Jiaxue He
Miaolei Zhou
Tao Hou
Yun Liu
author_sort Qiaoliang Liu
title VFM: Identification of Bacteriophages From Metagenomic Bins and Contigs Based on Features Related to Gene and Genome Composition
title_short VFM: Identification of Bacteriophages From Metagenomic Bins and Contigs Based on Features Related to Gene and Genome Composition
title_full VFM: Identification of Bacteriophages From Metagenomic Bins and Contigs Based on Features Related to Gene and Genome Composition
title_fullStr VFM: Identification of Bacteriophages From Metagenomic Bins and Contigs Based on Features Related to Gene and Genome Composition
title_full_unstemmed VFM: Identification of Bacteriophages From Metagenomic Bins and Contigs Based on Features Related to Gene and Genome Composition
title_sort vfm: identification of bacteriophages from metagenomic bins and contigs based on features related to gene and genome composition
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description As the main regulator of microbial community composition, bacteriophages exist widely on Earth. However, since they are hidden in metagenomes, most of them are unknown. To identify phages from metagenomes more effectively, a new tool named VFM (Virus Finding & Mining) is presented in this paper. VFM has two versions, i.e., bin-VFM and unbin-VFM. Eighteen new features describing the codon usage bias, the proportion of hits of clusters of orthologous groups of proteins (COG), and 1-mer and 2-mer frequency are introduced to improve the performance of the classifiers. By using missing value interpolation, bin-VFM improves the classification performance for short sequence bins significantly. Compared with previous tools for virus mining, bin-VFM and unbin-VFM perform much better for simulated and real metagenomes with short and long sequences respectively. Thus, VFM may play a helpful role in studies of metagenome-related problems, such as horizontal gene transfer and antibiotic resistance. VFM is freely available at https://github.com/liuql2019/VFM.
topic Codon usage bias
COG
missing value interpolation
metagenomic viruses
phage mining
short k-mer frequency
url https://ieeexplore.ieee.org/document/8924706/
work_keys_str_mv AT qiaoliangliu vfmidentificationofbacteriophagesfrommetagenomicbinsandcontigsbasedonfeaturesrelatedtogeneandgenomecomposition
AT fuliu vfmidentificationofbacteriophagesfrommetagenomicbinsandcontigsbasedonfeaturesrelatedtogeneandgenomecomposition
AT jiaxuehe vfmidentificationofbacteriophagesfrommetagenomicbinsandcontigsbasedonfeaturesrelatedtogeneandgenomecomposition
AT miaoleizhou vfmidentificationofbacteriophagesfrommetagenomicbinsandcontigsbasedonfeaturesrelatedtogeneandgenomecomposition
AT taohou vfmidentificationofbacteriophagesfrommetagenomicbinsandcontigsbasedonfeaturesrelatedtogeneandgenomecomposition
AT yunliu vfmidentificationofbacteriophagesfrommetagenomicbinsandcontigsbasedonfeaturesrelatedtogeneandgenomecomposition
_version_ 1724190481882021888