HOTSPOT: hierarchical host prediction for assembled plasmid contigs with transformer

MOTIVATION: As prevalent extrachromosomal replicons in many bacteria, plasmids play an essential role in their hosts' evolution and adaptation. The host range of a plasmid refers to the taxonomic range of bacteria in which it can replicate and thrive. Understanding host ranges of plasmids sheds...

Full description

Bibliographic Details
Main Authors: Ji, Y. (Author), Shang, J. (Author), Sun, Y. (Author), Tang, X. (Author)
Format: Article
Language:English
Published: NLM (Medline) 2023
Subjects:
Online Access:View Fulltext in Publisher
View in Scopus
LEADER 02785nam a2200349Ia 4500
001 10.1093-bioinformatics-btad283
008 230529s2023 CNT 000 0 und d
020 |a 13674811 (ISSN) 
245 1 0 |a HOTSPOT: hierarchical host prediction for assembled plasmid contigs with transformer 
260 0 |b NLM (Medline)  |c 2023 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1093/bioinformatics/btad283 
856 |z View in Scopus  |u https://www.scopus.com/inward/record.uri?eid=2-s2.0-85159552726&doi=10.1093%2fbioinformatics%2fbtad283&partnerID=40&md5=b09beba7676df603876bf9f6c333a991 
520 3 |a MOTIVATION: As prevalent extrachromosomal replicons in many bacteria, plasmids play an essential role in their hosts' evolution and adaptation. The host range of a plasmid refers to the taxonomic range of bacteria in which it can replicate and thrive. Understanding host ranges of plasmids sheds light on studying the roles of plasmids in bacterial evolution and adaptation. Metagenomic sequencing has become a major means to obtain new plasmids and derive their hosts. However, host prediction for assembled plasmid contigs still needs to tackle several challenges: different sequence compositions and copy numbers between plasmids and the hosts, high diversity in plasmids, and limited plasmid annotations. Existing tools have not yet achieved an ideal tradeoff between sensitivity and precision on metagenomic assembled contigs. RESULTS: In this work, we construct a hierarchical classification tool named HOTSPOT, whose backbone is a phylogenetic tree of the bacterial hosts from phylum to species. By incorporating the state-of-the-art language model, Transformer, in each node's taxon classifier, the top-down tree search achieves an accurate host taxonomy prediction for the input plasmid contigs. We rigorously tested HOTSPOT on multiple datasets, including RefSeq complete plasmids, artificial contigs, simulated metagenomic data, mock metagenomic data, the Hi-C dataset, and the CAMI2 marine dataset. All experiments show that HOTSPOT outperforms other popular methods. AVAILABILITY AND IMPLEMENTATION: The source code of HOTSPOT is available via: https://github.com/Orin-beep/HOTSPOT. © The Author(s) 2023. Published by Oxford University Press. 
650 0 4 |a Bacteria 
650 0 4 |a bacterium 
650 0 4 |a genetics 
650 0 4 |a metagenome 
650 0 4 |a Metagenome 
650 0 4 |a metagenomics 
650 0 4 |a Metagenomics 
650 0 4 |a phylogeny 
650 0 4 |a Phylogeny 
650 0 4 |a plasmid 
650 0 4 |a Plasmids 
650 0 4 |a procedures 
650 0 4 |a software 
650 0 4 |a Software 
700 1 0 |a Ji, Y.  |e author 
700 1 0 |a Shang, J.  |e author 
700 1 0 |a Sun, Y.  |e author 
700 1 0 |a Tang, X.  |e author 
773 |t Bioinformatics (Oxford, England)