SkewIT: The Skew Index Test for large-scale GC Skew analysis of bacterial genomes.

GC skew is a phenomenon observed in many bacterial genomes, wherein the two replication strands of the same chromosome contain different proportions of guanine and cytosine nucleotides. Here we demonstrate that this phenomenon, which was first discovered in the mid-1990s, can be used today as an ana...

Full description

Bibliographic Details
Main Authors: Jennifer Lu, Steven L Salzberg
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2020-12-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1008439
id doaj-abbf24ff6bf74a54abfff3cad421e07a
record_format Article
spelling doaj-abbf24ff6bf74a54abfff3cad421e07a2021-04-21T16:40:02ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582020-12-011612e100843910.1371/journal.pcbi.1008439SkewIT: The Skew Index Test for large-scale GC Skew analysis of bacterial genomes.Jennifer LuSteven L SalzbergGC skew is a phenomenon observed in many bacterial genomes, wherein the two replication strands of the same chromosome contain different proportions of guanine and cytosine nucleotides. Here we demonstrate that this phenomenon, which was first discovered in the mid-1990s, can be used today as an analysis tool for the 15,000+ complete bacterial genomes in NCBI's Refseq library. In order to analyze all 15,000+ genomes, we introduce a new method, SkewIT (Skew Index Test), that calculates a single metric representing the degree of GC skew for a genome. Using this metric, we demonstrate how GC skew patterns are conserved within certain bacterial phyla, e.g. Firmicutes, but show different patterns in other phylogenetic groups such as Actinobacteria. We also discovered that outlier values of SkewIT highlight potential bacterial mis-assemblies. Using our newly defined metric, we identify multiple mis-assembled chromosomal sequences in previously published complete bacterial genomes. We provide a SkewIT web app https://jenniferlu717.shinyapps.io/SkewIT/ that calculates SkewI for any user-provided bacterial sequence. The web app also provides an interactive interface for the data generated in this paper, allowing users to further investigate the SkewI values and thresholds of the Refseq-97 complete bacterial genomes. Individual scripts for analysis of bacterial genomes are provided in the following repository: https://github.com/jenniferlu717/SkewIT.https://doi.org/10.1371/journal.pcbi.1008439
collection DOAJ
language English
format Article
sources DOAJ
author Jennifer Lu
Steven L Salzberg
spellingShingle Jennifer Lu
Steven L Salzberg
SkewIT: The Skew Index Test for large-scale GC Skew analysis of bacterial genomes.
PLoS Computational Biology
author_facet Jennifer Lu
Steven L Salzberg
author_sort Jennifer Lu
title SkewIT: The Skew Index Test for large-scale GC Skew analysis of bacterial genomes.
title_short SkewIT: The Skew Index Test for large-scale GC Skew analysis of bacterial genomes.
title_full SkewIT: The Skew Index Test for large-scale GC Skew analysis of bacterial genomes.
title_fullStr SkewIT: The Skew Index Test for large-scale GC Skew analysis of bacterial genomes.
title_full_unstemmed SkewIT: The Skew Index Test for large-scale GC Skew analysis of bacterial genomes.
title_sort skewit: the skew index test for large-scale gc skew analysis of bacterial genomes.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2020-12-01
description GC skew is a phenomenon observed in many bacterial genomes, wherein the two replication strands of the same chromosome contain different proportions of guanine and cytosine nucleotides. Here we demonstrate that this phenomenon, which was first discovered in the mid-1990s, can be used today as an analysis tool for the 15,000+ complete bacterial genomes in NCBI's Refseq library. In order to analyze all 15,000+ genomes, we introduce a new method, SkewIT (Skew Index Test), that calculates a single metric representing the degree of GC skew for a genome. Using this metric, we demonstrate how GC skew patterns are conserved within certain bacterial phyla, e.g. Firmicutes, but show different patterns in other phylogenetic groups such as Actinobacteria. We also discovered that outlier values of SkewIT highlight potential bacterial mis-assemblies. Using our newly defined metric, we identify multiple mis-assembled chromosomal sequences in previously published complete bacterial genomes. We provide a SkewIT web app https://jenniferlu717.shinyapps.io/SkewIT/ that calculates SkewI for any user-provided bacterial sequence. The web app also provides an interactive interface for the data generated in this paper, allowing users to further investigate the SkewI values and thresholds of the Refseq-97 complete bacterial genomes. Individual scripts for analysis of bacterial genomes are provided in the following repository: https://github.com/jenniferlu717/SkewIT.
url https://doi.org/10.1371/journal.pcbi.1008439
work_keys_str_mv AT jenniferlu skewittheskewindextestforlargescalegcskewanalysisofbacterialgenomes
AT stevenlsalzberg skewittheskewindextestforlargescalegcskewanalysisofbacterialgenomes
_version_ 1714666770798739456