Estimation of AT and GC content distributions of nucleotide substitution rates in bacterial core genomes
Abstract Background Genomic GC content varies both within and, substantially, between microbial genomes. While some of this variation can be explained by evolutionary divergence and environmental factors, a notable portion is not understood. To investigate further, we explore a non-linear mathematic...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2019-08-01
|
Series: | Big Data Analytics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s41044-019-0042-7 |
id |
doaj-3b789efbd39440d8873fc3f2092c4ff2 |
---|---|
record_format |
Article |
spelling |
doaj-3b789efbd39440d8873fc3f2092c4ff22020-11-25T03:45:54ZengBMCBig Data Analytics2058-63452019-08-014111110.1186/s41044-019-0042-7Estimation of AT and GC content distributions of nucleotide substitution rates in bacterial core genomesJon Bohlin0Brittany Rose1John H.-O. Pettersson2Norwegian Institute of Public HealthNorwegian Institute of Public HealthDepartment of Medical Biochemistry and Microbiology, Zoonosis Science Center, Uppsala UniversityAbstract Background Genomic GC content varies both within and, substantially, between microbial genomes. While some of this variation can be explained by evolutionary divergence and environmental factors, a notable portion is not understood. To investigate further, we explore a non-linear mathematical model (gcMOD) of single-nucleotide polymorphism (SNP) GC content (sbGC, the GC content of substituted bases) as a function of core genome GC content (cgGC). We estimate the model’s parameters using Bayesian inference on empirical genetic data from the microbial core genomes of 35 bacterial species, each of which contains at least 10 representative strains. We utilize 716 bacterial genomes in total. We also explore some possible implications that result from the mathematical properties of gcMOD. Results We find that the median GC → AT substitution rates (β) are almost always considerably higher than the corresponding AT → GC substitution rates (α) for all 35 core genomes. The distribution of β is also noticeably more concentrated (i.e. thinner) than the corresponding distribution of α for almost all species, excepting the bacteria with the most GC-rich genomes. We also demonstrate that at the singularity point of gcMOD (where α = β), the model is reduced to a linear equation. By analyzing the linear model, we show that due to the constraints on gcMOD, the mutation rates can have profound influence on both cgGC as well as sbGC. Moreover, by examining the mathematical properties of gcMOD’s inverse function, we find that change in cgGC, and hence in genomic GC content, can potentially occur quite rapidly. Conclusions Examining the distributions of the GC → AT and AT → GC substitution rates for 35 bacterial species, we demonstrate that the former (β) are remarkably similar for all species examined. In addition, GC → AT substitution rate distributions were considerably more concentrated for all species, with the mode consistently peaking at higher rates than for AT → GC substitution rates.http://link.springer.com/article/10.1186/s41044-019-0042-7Bacterial genomicsCore genome analysisSingle nucleotide polymorphismsEvolutionary biology |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jon Bohlin Brittany Rose John H.-O. Pettersson |
spellingShingle |
Jon Bohlin Brittany Rose John H.-O. Pettersson Estimation of AT and GC content distributions of nucleotide substitution rates in bacterial core genomes Big Data Analytics Bacterial genomics Core genome analysis Single nucleotide polymorphisms Evolutionary biology |
author_facet |
Jon Bohlin Brittany Rose John H.-O. Pettersson |
author_sort |
Jon Bohlin |
title |
Estimation of AT and GC content distributions of nucleotide substitution rates in bacterial core genomes |
title_short |
Estimation of AT and GC content distributions of nucleotide substitution rates in bacterial core genomes |
title_full |
Estimation of AT and GC content distributions of nucleotide substitution rates in bacterial core genomes |
title_fullStr |
Estimation of AT and GC content distributions of nucleotide substitution rates in bacterial core genomes |
title_full_unstemmed |
Estimation of AT and GC content distributions of nucleotide substitution rates in bacterial core genomes |
title_sort |
estimation of at and gc content distributions of nucleotide substitution rates in bacterial core genomes |
publisher |
BMC |
series |
Big Data Analytics |
issn |
2058-6345 |
publishDate |
2019-08-01 |
description |
Abstract Background Genomic GC content varies both within and, substantially, between microbial genomes. While some of this variation can be explained by evolutionary divergence and environmental factors, a notable portion is not understood. To investigate further, we explore a non-linear mathematical model (gcMOD) of single-nucleotide polymorphism (SNP) GC content (sbGC, the GC content of substituted bases) as a function of core genome GC content (cgGC). We estimate the model’s parameters using Bayesian inference on empirical genetic data from the microbial core genomes of 35 bacterial species, each of which contains at least 10 representative strains. We utilize 716 bacterial genomes in total. We also explore some possible implications that result from the mathematical properties of gcMOD. Results We find that the median GC → AT substitution rates (β) are almost always considerably higher than the corresponding AT → GC substitution rates (α) for all 35 core genomes. The distribution of β is also noticeably more concentrated (i.e. thinner) than the corresponding distribution of α for almost all species, excepting the bacteria with the most GC-rich genomes. We also demonstrate that at the singularity point of gcMOD (where α = β), the model is reduced to a linear equation. By analyzing the linear model, we show that due to the constraints on gcMOD, the mutation rates can have profound influence on both cgGC as well as sbGC. Moreover, by examining the mathematical properties of gcMOD’s inverse function, we find that change in cgGC, and hence in genomic GC content, can potentially occur quite rapidly. Conclusions Examining the distributions of the GC → AT and AT → GC substitution rates for 35 bacterial species, we demonstrate that the former (β) are remarkably similar for all species examined. In addition, GC → AT substitution rate distributions were considerably more concentrated for all species, with the mode consistently peaking at higher rates than for AT → GC substitution rates. |
topic |
Bacterial genomics Core genome analysis Single nucleotide polymorphisms Evolutionary biology |
url |
http://link.springer.com/article/10.1186/s41044-019-0042-7 |
work_keys_str_mv |
AT jonbohlin estimationofatandgccontentdistributionsofnucleotidesubstitutionratesinbacterialcoregenomes AT brittanyrose estimationofatandgccontentdistributionsofnucleotidesubstitutionratesinbacterialcoregenomes AT johnhopettersson estimationofatandgccontentdistributionsofnucleotidesubstitutionratesinbacterialcoregenomes |
_version_ |
1724509043661209600 |