SeMPI 2.0—A Web Server for PKS and NRPS Predictions Combined with Metabolite Screening in Natural Product Databases

Microorganisms produce secondary metabolites with a remarkable range of bioactive properties. The constantly increasing amount of published genomic data provides the opportunity for efficient identification of biosynthetic gene clusters by genome mining. On the other hand, for many natural products...

Full description

Bibliographic Details
Main Authors: Paul F. Zierep, Adriana T. Ceci, Ilia Dobrusin, Sinclair C. Rockwell-Kollmann, Stefan Günther
Format: Article
Language:English
Published: MDPI AG 2021-12-01
Series:Metabolites
Subjects:
Online Access:https://www.mdpi.com/2218-1989/11/1/13
id doaj-39b960dc09c1497ab8da5242a3275b66
record_format Article
spelling doaj-39b960dc09c1497ab8da5242a3275b662020-12-30T00:01:29ZengMDPI AGMetabolites2218-19892021-12-0111131310.3390/metabo11010013SeMPI 2.0—A Web Server for PKS and NRPS Predictions Combined with Metabolite Screening in Natural Product DatabasesPaul F. Zierep0Adriana T. Ceci1Ilia Dobrusin2Sinclair C. Rockwell-Kollmann3Stefan Günther4Institute of Pharmaceutical Sciences, Albert-Ludwigs-Universität Freiburg, Hermann-Herder-Straße 9, 79104 Freiburg, GermanyDepartment of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Via Sommarive 9, Povo, 38123 Trento, ItalyInstitute of Pharmaceutical Sciences, Albert-Ludwigs-Universität Freiburg, Hermann-Herder-Straße 9, 79104 Freiburg, GermanyInstitute of Pharmaceutical Sciences, Albert-Ludwigs-Universität Freiburg, Hermann-Herder-Straße 9, 79104 Freiburg, GermanyInstitute of Pharmaceutical Sciences, Albert-Ludwigs-Universität Freiburg, Hermann-Herder-Straße 9, 79104 Freiburg, GermanyMicroorganisms produce secondary metabolites with a remarkable range of bioactive properties. The constantly increasing amount of published genomic data provides the opportunity for efficient identification of biosynthetic gene clusters by genome mining. On the other hand, for many natural products with resolved structures, the encoding biosynthetic gene clusters have not been identified yet. Of those secondary metabolites, the scaffolds of nonribosomal peptides and polyketides (type I modular) can be predicted due to their building block-like assembly. SeMPI v2 provides a comprehensive prediction pipeline, which includes the screening of the scaffold in publicly available natural compound databases. The screening algorithm was designed to detect homologous structures even for partial, incomplete clusters. The pipeline allows linking of gene clusters to known natural products and therefore also provides a metric to estimate the novelty of the cluster if a matching scaffold cannot be found. Whereas currently available tools attempt to provide comprehensive information about a wide range of gene clusters, SeMPI v2 aims to focus on precise predictions. Therefore, the cluster detection algorithm, including building block generation and domain substrate prediction, was thoroughly refined and benchmarked, to provide high-quality scaffold predictions. In a benchmark based on 559 gene clusters, SeMPI v2 achieved comparable or better results than antiSMASH v5. Additionally, the SeMPI v2 web server provides features that can help to further investigate a submitted gene cluster, such as the incorporation of a genome browser, and the possibility to modify a predicted scaffold in a workbench before the database screening.https://www.mdpi.com/2218-1989/11/1/13secondary metabolitesnatural compoundsmachine learningnonribosomal peptidespolyketides
collection DOAJ
language English
format Article
sources DOAJ
author Paul F. Zierep
Adriana T. Ceci
Ilia Dobrusin
Sinclair C. Rockwell-Kollmann
Stefan Günther
spellingShingle Paul F. Zierep
Adriana T. Ceci
Ilia Dobrusin
Sinclair C. Rockwell-Kollmann
Stefan Günther
SeMPI 2.0—A Web Server for PKS and NRPS Predictions Combined with Metabolite Screening in Natural Product Databases
Metabolites
secondary metabolites
natural compounds
machine learning
nonribosomal peptides
polyketides
author_facet Paul F. Zierep
Adriana T. Ceci
Ilia Dobrusin
Sinclair C. Rockwell-Kollmann
Stefan Günther
author_sort Paul F. Zierep
title SeMPI 2.0—A Web Server for PKS and NRPS Predictions Combined with Metabolite Screening in Natural Product Databases
title_short SeMPI 2.0—A Web Server for PKS and NRPS Predictions Combined with Metabolite Screening in Natural Product Databases
title_full SeMPI 2.0—A Web Server for PKS and NRPS Predictions Combined with Metabolite Screening in Natural Product Databases
title_fullStr SeMPI 2.0—A Web Server for PKS and NRPS Predictions Combined with Metabolite Screening in Natural Product Databases
title_full_unstemmed SeMPI 2.0—A Web Server for PKS and NRPS Predictions Combined with Metabolite Screening in Natural Product Databases
title_sort sempi 2.0—a web server for pks and nrps predictions combined with metabolite screening in natural product databases
publisher MDPI AG
series Metabolites
issn 2218-1989
publishDate 2021-12-01
description Microorganisms produce secondary metabolites with a remarkable range of bioactive properties. The constantly increasing amount of published genomic data provides the opportunity for efficient identification of biosynthetic gene clusters by genome mining. On the other hand, for many natural products with resolved structures, the encoding biosynthetic gene clusters have not been identified yet. Of those secondary metabolites, the scaffolds of nonribosomal peptides and polyketides (type I modular) can be predicted due to their building block-like assembly. SeMPI v2 provides a comprehensive prediction pipeline, which includes the screening of the scaffold in publicly available natural compound databases. The screening algorithm was designed to detect homologous structures even for partial, incomplete clusters. The pipeline allows linking of gene clusters to known natural products and therefore also provides a metric to estimate the novelty of the cluster if a matching scaffold cannot be found. Whereas currently available tools attempt to provide comprehensive information about a wide range of gene clusters, SeMPI v2 aims to focus on precise predictions. Therefore, the cluster detection algorithm, including building block generation and domain substrate prediction, was thoroughly refined and benchmarked, to provide high-quality scaffold predictions. In a benchmark based on 559 gene clusters, SeMPI v2 achieved comparable or better results than antiSMASH v5. Additionally, the SeMPI v2 web server provides features that can help to further investigate a submitted gene cluster, such as the incorporation of a genome browser, and the possibility to modify a predicted scaffold in a workbench before the database screening.
topic secondary metabolites
natural compounds
machine learning
nonribosomal peptides
polyketides
url https://www.mdpi.com/2218-1989/11/1/13
work_keys_str_mv AT paulfzierep sempi20awebserverforpksandnrpspredictionscombinedwithmetabolitescreeninginnaturalproductdatabases
AT adrianatceci sempi20awebserverforpksandnrpspredictionscombinedwithmetabolitescreeninginnaturalproductdatabases
AT iliadobrusin sempi20awebserverforpksandnrpspredictionscombinedwithmetabolitescreeninginnaturalproductdatabases
AT sinclaircrockwellkollmann sempi20awebserverforpksandnrpspredictionscombinedwithmetabolitescreeninginnaturalproductdatabases
AT stefangunther sempi20awebserverforpksandnrpspredictionscombinedwithmetabolitescreeninginnaturalproductdatabases
_version_ 1724367371352670208