GPRED-GC: a Gene PREDiction model accounting for 5 ′- 3′ GC gradient

Abstract Background Gene is a key step in genome annotation. Ab initio gene prediction enables gene annotation of new genomes regardless of availability of homologous sequences. There exist a number of ab initio gene prediction tools and they have been widely used for gene annotation for various spe...

Full description

Bibliographic Details
Main Authors: Prapaporn Techa-Angkoon, Kevin L. Childs, Yanni Sun
Format: Article
Language:English
Published: BMC 2019-12-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-019-3047-3
id doaj-6c3784b70a1c4679879b31b813d1442d
record_format Article
spelling doaj-6c3784b70a1c4679879b31b813d1442d2020-12-27T12:21:42ZengBMCBMC Bioinformatics1471-21052019-12-0120S1511510.1186/s12859-019-3047-3GPRED-GC: a Gene PREDiction model accounting for 5 ′- 3′ GC gradientPrapaporn Techa-Angkoon0Kevin L. Childs1Yanni Sun2Department of Computer Science and Engineering, Michigan State UniversityDepartment of Plant Biology, Michigan State UniversityDepartment of Electronic Engineering, City University of Hong KongAbstract Background Gene is a key step in genome annotation. Ab initio gene prediction enables gene annotation of new genomes regardless of availability of homologous sequences. There exist a number of ab initio gene prediction tools and they have been widely used for gene annotation for various species. However, existing tools are not optimized for identifying genes with highly variable GC content. In addition, some genes in grass genomes exhibit a sharp 5 ′- 3′ decreasing GC content gradient, which is not carefully modeled by available gene prediction tools. Thus, there is still room to improve the sensitivity and accuracy for predicting genes with GC gradients. Results In this work, we designed and implemented a new hidden Markov model (HMM)-based ab initio gene prediction tool, which is optimized for finding genes with highly variable GC contents, such as the genes with negative GC gradients in grass genomes. We tested the tool on three datasets from Arabidopsis thaliana and Oryza sativa. The results showed that our tool can identify genes missed by existing tools due to the highly variable GC contents. Conclusions GPRED-GC can effectively predict genes with highly variable GC contents without manual intervention. It provides a useful complementary tool to existing ones such as Augustus for more sensitive gene discovery. The source code is freely available at https://sourceforge.net/projects/gpred-gc/.https://doi.org/10.1186/s12859-019-3047-3Gene findingPlant genome gene predictionHidden Markov modelGC contentsGrass genomes
collection DOAJ
language English
format Article
sources DOAJ
author Prapaporn Techa-Angkoon
Kevin L. Childs
Yanni Sun
spellingShingle Prapaporn Techa-Angkoon
Kevin L. Childs
Yanni Sun
GPRED-GC: a Gene PREDiction model accounting for 5 ′- 3′ GC gradient
BMC Bioinformatics
Gene finding
Plant genome gene prediction
Hidden Markov model
GC contents
Grass genomes
author_facet Prapaporn Techa-Angkoon
Kevin L. Childs
Yanni Sun
author_sort Prapaporn Techa-Angkoon
title GPRED-GC: a Gene PREDiction model accounting for 5 ′- 3′ GC gradient
title_short GPRED-GC: a Gene PREDiction model accounting for 5 ′- 3′ GC gradient
title_full GPRED-GC: a Gene PREDiction model accounting for 5 ′- 3′ GC gradient
title_fullStr GPRED-GC: a Gene PREDiction model accounting for 5 ′- 3′ GC gradient
title_full_unstemmed GPRED-GC: a Gene PREDiction model accounting for 5 ′- 3′ GC gradient
title_sort gpred-gc: a gene prediction model accounting for 5 ′- 3′ gc gradient
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2019-12-01
description Abstract Background Gene is a key step in genome annotation. Ab initio gene prediction enables gene annotation of new genomes regardless of availability of homologous sequences. There exist a number of ab initio gene prediction tools and they have been widely used for gene annotation for various species. However, existing tools are not optimized for identifying genes with highly variable GC content. In addition, some genes in grass genomes exhibit a sharp 5 ′- 3′ decreasing GC content gradient, which is not carefully modeled by available gene prediction tools. Thus, there is still room to improve the sensitivity and accuracy for predicting genes with GC gradients. Results In this work, we designed and implemented a new hidden Markov model (HMM)-based ab initio gene prediction tool, which is optimized for finding genes with highly variable GC contents, such as the genes with negative GC gradients in grass genomes. We tested the tool on three datasets from Arabidopsis thaliana and Oryza sativa. The results showed that our tool can identify genes missed by existing tools due to the highly variable GC contents. Conclusions GPRED-GC can effectively predict genes with highly variable GC contents without manual intervention. It provides a useful complementary tool to existing ones such as Augustus for more sensitive gene discovery. The source code is freely available at https://sourceforge.net/projects/gpred-gc/.
topic Gene finding
Plant genome gene prediction
Hidden Markov model
GC contents
Grass genomes
url https://doi.org/10.1186/s12859-019-3047-3
work_keys_str_mv AT prapaporntechaangkoon gpredgcagenepredictionmodelaccountingfor53gcgradient
AT kevinlchilds gpredgcagenepredictionmodelaccountingfor53gcgradient
AT yannisun gpredgcagenepredictionmodelaccountingfor53gcgradient
_version_ 1724369052906815488