Predicting statistical properties of open reading frames in bacterial genomes.

An analytical model based on the statistical properties of Open Reading Frames (ORFs) of eubacterial genomes such as codon composition and sequence length of all reading frames was developed. This new model predicts the average length, maximum length as well as the length distribution of the ORFs of...

Full description

Bibliographic Details
Main Authors: Katharina Mir, Klaus Neuhaus, Siegfried Scherer, Martin Bossert, Steffen Schober
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2012-01-01
Series:PLoS ONE
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/23028785/?tool=EBI
id doaj-0ade4f91d18a4724ab34f6e8400ee4a9
record_format Article
spelling doaj-0ade4f91d18a4724ab34f6e8400ee4a92021-03-04T00:16:43ZengPublic Library of Science (PLoS)PLoS ONE1932-62032012-01-0179e4510310.1371/journal.pone.0045103Predicting statistical properties of open reading frames in bacterial genomes.Katharina MirKlaus NeuhausSiegfried SchererMartin BossertSteffen SchoberAn analytical model based on the statistical properties of Open Reading Frames (ORFs) of eubacterial genomes such as codon composition and sequence length of all reading frames was developed. This new model predicts the average length, maximum length as well as the length distribution of the ORFs of 70 species with GC contents varying between 21% and 74%. Furthermore, the number of annotated genes is predicted with high accordance. However, the ORF length distribution in the five alternative reading frames shows interesting deviations from the predicted distribution. In particular, long ORFs appear more often than expected statistically. The unexpected depletion of stop codons in these alternative open reading frames cannot completely be explained by a biased codon usage in the +1 frame. While it is unknown if the stop codon depletion has a biological function, it could be due to a protein coding capacity of alternative ORFs exerting a selection pressure which prevents the fixation of stop codon mutations. The comparison of the analytical model with bacterial genomes, therefore, leads to a hypothesis suggesting novel gene candidates which can now be investigated in subsequent wet lab experiments.https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/23028785/?tool=EBI
collection DOAJ
language English
format Article
sources DOAJ
author Katharina Mir
Klaus Neuhaus
Siegfried Scherer
Martin Bossert
Steffen Schober
spellingShingle Katharina Mir
Klaus Neuhaus
Siegfried Scherer
Martin Bossert
Steffen Schober
Predicting statistical properties of open reading frames in bacterial genomes.
PLoS ONE
author_facet Katharina Mir
Klaus Neuhaus
Siegfried Scherer
Martin Bossert
Steffen Schober
author_sort Katharina Mir
title Predicting statistical properties of open reading frames in bacterial genomes.
title_short Predicting statistical properties of open reading frames in bacterial genomes.
title_full Predicting statistical properties of open reading frames in bacterial genomes.
title_fullStr Predicting statistical properties of open reading frames in bacterial genomes.
title_full_unstemmed Predicting statistical properties of open reading frames in bacterial genomes.
title_sort predicting statistical properties of open reading frames in bacterial genomes.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2012-01-01
description An analytical model based on the statistical properties of Open Reading Frames (ORFs) of eubacterial genomes such as codon composition and sequence length of all reading frames was developed. This new model predicts the average length, maximum length as well as the length distribution of the ORFs of 70 species with GC contents varying between 21% and 74%. Furthermore, the number of annotated genes is predicted with high accordance. However, the ORF length distribution in the five alternative reading frames shows interesting deviations from the predicted distribution. In particular, long ORFs appear more often than expected statistically. The unexpected depletion of stop codons in these alternative open reading frames cannot completely be explained by a biased codon usage in the +1 frame. While it is unknown if the stop codon depletion has a biological function, it could be due to a protein coding capacity of alternative ORFs exerting a selection pressure which prevents the fixation of stop codon mutations. The comparison of the analytical model with bacterial genomes, therefore, leads to a hypothesis suggesting novel gene candidates which can now be investigated in subsequent wet lab experiments.
url https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/23028785/?tool=EBI
work_keys_str_mv AT katharinamir predictingstatisticalpropertiesofopenreadingframesinbacterialgenomes
AT klausneuhaus predictingstatisticalpropertiesofopenreadingframesinbacterialgenomes
AT siegfriedscherer predictingstatisticalpropertiesofopenreadingframesinbacterialgenomes
AT martinbossert predictingstatisticalpropertiesofopenreadingframesinbacterialgenomes
AT steffenschober predictingstatisticalpropertiesofopenreadingframesinbacterialgenomes
_version_ 1714810400590004224