Distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for dinoflagellate genomes.

The ability to predict gene content is highly desirable for characterization of not-yet sequenced genomes like those of dinoflagellates. Using data from completely sequenced and annotated genomes from phylogenetically diverse lineages, we investigated the relationship between gene content and genome...

Full description

Bibliographic Details
Main Authors: Yubo Hou, Senjie Lin
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2009-09-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC2737104?pdf=render
id doaj-1e604444328e4ac39cbddc4ce1e7eccd
record_format Article
spelling doaj-1e604444328e4ac39cbddc4ce1e7eccd2020-11-24T21:50:24ZengPublic Library of Science (PLoS)PLoS ONE1932-62032009-09-0149e697810.1371/journal.pone.0006978Distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for dinoflagellate genomes.Yubo HouSenjie LinThe ability to predict gene content is highly desirable for characterization of not-yet sequenced genomes like those of dinoflagellates. Using data from completely sequenced and annotated genomes from phylogenetically diverse lineages, we investigated the relationship between gene content and genome size using regression analyses. Distinct relationships between log(10)-transformed protein-coding gene number (Y') versus log(10)-transformed genome size (X', genome size in kbp) were found for eukaryotes and non-eukaryotes. Eukaryotes best fit a logarithmic model, Y' = ln(-46.200+22.678X', whereas non-eukaryotes a linear model, Y' = 0.045+0.977X', both with high significance (p<0.001, R(2)>0.91). Total gene number shows similar trends in both groups to their respective protein coding regressions. The distinct correlations reflect lower and decreasing gene-coding percentages as genome size increases in eukaryotes (82%-1%) compared to higher and relatively stable percentages in prokaryotes and viruses (97%-47%). The eukaryotic regression models project that the smallest dinoflagellate genome (3x10(6) kbp) contains 38,188 protein-coding (40,086 total) genes and the largest (245x10(6) kbp) 87,688 protein-coding (92,013 total) genes, corresponding to 1.8% and 0.05% gene-coding percentages. These estimates do not likely represent extraordinarily high functional diversity of the encoded proteome but rather highly redundant genomes as evidenced by high gene copy numbers documented for various dinoflagellate species.http://europepmc.org/articles/PMC2737104?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Yubo Hou
Senjie Lin
spellingShingle Yubo Hou
Senjie Lin
Distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for dinoflagellate genomes.
PLoS ONE
author_facet Yubo Hou
Senjie Lin
author_sort Yubo Hou
title Distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for dinoflagellate genomes.
title_short Distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for dinoflagellate genomes.
title_full Distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for dinoflagellate genomes.
title_fullStr Distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for dinoflagellate genomes.
title_full_unstemmed Distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for dinoflagellate genomes.
title_sort distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for dinoflagellate genomes.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2009-09-01
description The ability to predict gene content is highly desirable for characterization of not-yet sequenced genomes like those of dinoflagellates. Using data from completely sequenced and annotated genomes from phylogenetically diverse lineages, we investigated the relationship between gene content and genome size using regression analyses. Distinct relationships between log(10)-transformed protein-coding gene number (Y') versus log(10)-transformed genome size (X', genome size in kbp) were found for eukaryotes and non-eukaryotes. Eukaryotes best fit a logarithmic model, Y' = ln(-46.200+22.678X', whereas non-eukaryotes a linear model, Y' = 0.045+0.977X', both with high significance (p<0.001, R(2)>0.91). Total gene number shows similar trends in both groups to their respective protein coding regressions. The distinct correlations reflect lower and decreasing gene-coding percentages as genome size increases in eukaryotes (82%-1%) compared to higher and relatively stable percentages in prokaryotes and viruses (97%-47%). The eukaryotic regression models project that the smallest dinoflagellate genome (3x10(6) kbp) contains 38,188 protein-coding (40,086 total) genes and the largest (245x10(6) kbp) 87,688 protein-coding (92,013 total) genes, corresponding to 1.8% and 0.05% gene-coding percentages. These estimates do not likely represent extraordinarily high functional diversity of the encoded proteome but rather highly redundant genomes as evidenced by high gene copy numbers documented for various dinoflagellate species.
url http://europepmc.org/articles/PMC2737104?pdf=render
work_keys_str_mv AT yubohou distinctgenenumbergenomesizerelationshipsforeukaryotesandnoneukaryotesgenecontentestimationfordinoflagellategenomes
AT senjielin distinctgenenumbergenomesizerelationshipsforeukaryotesandnoneukaryotesgenecontentestimationfordinoflagellategenomes
_version_ 1725884233848717312