Estimation of AT and GC content distributions of nucleotide substitution rates in bacterial core genomes

Abstract Background Genomic GC content varies both within and, substantially, between microbial genomes. While some of this variation can be explained by evolutionary divergence and environmental factors, a notable portion is not understood. To investigate further, we explore a non-linear mathematic...

Full description

Bibliographic Details
Main Authors: Jon Bohlin, Brittany Rose, John H.-O. Pettersson
Format: Article
Language:English
Published: BMC 2019-08-01
Series:Big Data Analytics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s41044-019-0042-7
id doaj-3b789efbd39440d8873fc3f2092c4ff2
record_format Article
spelling doaj-3b789efbd39440d8873fc3f2092c4ff22020-11-25T03:45:54ZengBMCBig Data Analytics2058-63452019-08-014111110.1186/s41044-019-0042-7Estimation of AT and GC content distributions of nucleotide substitution rates in bacterial core genomesJon Bohlin0Brittany Rose1John H.-O. Pettersson2Norwegian Institute of Public HealthNorwegian Institute of Public HealthDepartment of Medical Biochemistry and Microbiology, Zoonosis Science Center, Uppsala UniversityAbstract Background Genomic GC content varies both within and, substantially, between microbial genomes. While some of this variation can be explained by evolutionary divergence and environmental factors, a notable portion is not understood. To investigate further, we explore a non-linear mathematical model (gcMOD) of single-nucleotide polymorphism (SNP) GC content (sbGC, the GC content of substituted bases) as a function of core genome GC content (cgGC). We estimate the model’s parameters using Bayesian inference on empirical genetic data from the microbial core genomes of 35 bacterial species, each of which contains at least 10 representative strains. We utilize 716 bacterial genomes in total. We also explore some possible implications that result from the mathematical properties of gcMOD. Results We find that the median GC → AT substitution rates (β) are almost always considerably higher than the corresponding AT → GC substitution rates (α) for all 35 core genomes. The distribution of β is also noticeably more concentrated (i.e. thinner) than the corresponding distribution of α for almost all species, excepting the bacteria with the most GC-rich genomes. We also demonstrate that at the singularity point of gcMOD (where α = β), the model is reduced to a linear equation. By analyzing the linear model, we show that due to the constraints on gcMOD, the mutation rates can have profound influence on both cgGC as well as sbGC. Moreover, by examining the mathematical properties of gcMOD’s inverse function, we find that change in cgGC, and hence in genomic GC content, can potentially occur quite rapidly. Conclusions Examining the distributions of the GC → AT and AT → GC substitution rates for 35 bacterial species, we demonstrate that the former (β) are remarkably similar for all species examined. In addition, GC → AT substitution rate distributions were considerably more concentrated for all species, with the mode consistently peaking at higher rates than for AT → GC substitution rates.http://link.springer.com/article/10.1186/s41044-019-0042-7Bacterial genomicsCore genome analysisSingle nucleotide polymorphismsEvolutionary biology
collection DOAJ
language English
format Article
sources DOAJ
author Jon Bohlin
Brittany Rose
John H.-O. Pettersson
spellingShingle Jon Bohlin
Brittany Rose
John H.-O. Pettersson
Estimation of AT and GC content distributions of nucleotide substitution rates in bacterial core genomes
Big Data Analytics
Bacterial genomics
Core genome analysis
Single nucleotide polymorphisms
Evolutionary biology
author_facet Jon Bohlin
Brittany Rose
John H.-O. Pettersson
author_sort Jon Bohlin
title Estimation of AT and GC content distributions of nucleotide substitution rates in bacterial core genomes
title_short Estimation of AT and GC content distributions of nucleotide substitution rates in bacterial core genomes
title_full Estimation of AT and GC content distributions of nucleotide substitution rates in bacterial core genomes
title_fullStr Estimation of AT and GC content distributions of nucleotide substitution rates in bacterial core genomes
title_full_unstemmed Estimation of AT and GC content distributions of nucleotide substitution rates in bacterial core genomes
title_sort estimation of at and gc content distributions of nucleotide substitution rates in bacterial core genomes
publisher BMC
series Big Data Analytics
issn 2058-6345
publishDate 2019-08-01
description Abstract Background Genomic GC content varies both within and, substantially, between microbial genomes. While some of this variation can be explained by evolutionary divergence and environmental factors, a notable portion is not understood. To investigate further, we explore a non-linear mathematical model (gcMOD) of single-nucleotide polymorphism (SNP) GC content (sbGC, the GC content of substituted bases) as a function of core genome GC content (cgGC). We estimate the model’s parameters using Bayesian inference on empirical genetic data from the microbial core genomes of 35 bacterial species, each of which contains at least 10 representative strains. We utilize 716 bacterial genomes in total. We also explore some possible implications that result from the mathematical properties of gcMOD. Results We find that the median GC → AT substitution rates (β) are almost always considerably higher than the corresponding AT → GC substitution rates (α) for all 35 core genomes. The distribution of β is also noticeably more concentrated (i.e. thinner) than the corresponding distribution of α for almost all species, excepting the bacteria with the most GC-rich genomes. We also demonstrate that at the singularity point of gcMOD (where α = β), the model is reduced to a linear equation. By analyzing the linear model, we show that due to the constraints on gcMOD, the mutation rates can have profound influence on both cgGC as well as sbGC. Moreover, by examining the mathematical properties of gcMOD’s inverse function, we find that change in cgGC, and hence in genomic GC content, can potentially occur quite rapidly. Conclusions Examining the distributions of the GC → AT and AT → GC substitution rates for 35 bacterial species, we demonstrate that the former (β) are remarkably similar for all species examined. In addition, GC → AT substitution rate distributions were considerably more concentrated for all species, with the mode consistently peaking at higher rates than for AT → GC substitution rates.
topic Bacterial genomics
Core genome analysis
Single nucleotide polymorphisms
Evolutionary biology
url http://link.springer.com/article/10.1186/s41044-019-0042-7
work_keys_str_mv AT jonbohlin estimationofatandgccontentdistributionsofnucleotidesubstitutionratesinbacterialcoregenomes
AT brittanyrose estimationofatandgccontentdistributionsofnucleotidesubstitutionratesinbacterialcoregenomes
AT johnhopettersson estimationofatandgccontentdistributionsofnucleotidesubstitutionratesinbacterialcoregenomes
_version_ 1724509043661209600