Automatic reconstruction of metabolic pathways from identified biosynthetic gene clusters

Abstract Background A wide range of bioactive compounds is produced by enzymes and enzymatic complexes encoded in biosynthetic gene clusters (BGCs). These BGCs can be identified and functionally annotated based on their DNA sequence. Candidates for further research and development may be prioritized...

Full description

Bibliographic Details
Main Authors: Snorre Sulheim, Fredrik A. Fossheim, Alexander Wentzel, Eivind Almaas
Format: Article
Language:English
Published: BMC 2021-02-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-021-03985-0
id doaj-534272f42bf541ef99219604d7b07f68
record_format Article
spelling doaj-534272f42bf541ef99219604d7b07f682021-02-23T09:33:20ZengBMCBMC Bioinformatics1471-21052021-02-0122111510.1186/s12859-021-03985-0Automatic reconstruction of metabolic pathways from identified biosynthetic gene clustersSnorre Sulheim0Fredrik A. Fossheim1Alexander Wentzel2Eivind Almaas3Department of Biotechnology and Food Science, NTNU - Norwegian University of Science and TechnologyDepartment of Biotechnology and Food Science, NTNU - Norwegian University of Science and TechnologyDepartment of Biotechnology and Nanomedicine, SINTEF IndustryDepartment of Biotechnology and Food Science, NTNU - Norwegian University of Science and TechnologyAbstract Background A wide range of bioactive compounds is produced by enzymes and enzymatic complexes encoded in biosynthetic gene clusters (BGCs). These BGCs can be identified and functionally annotated based on their DNA sequence. Candidates for further research and development may be prioritized based on properties such as their functional annotation, (dis)similarity to known BGCs, and bioactivity assays. Production of the target compound in the native strain is often not achievable, rendering heterologous expression in an optimized host strain as a promising alternative. Genome-scale metabolic models are frequently used to guide strain development, but large-scale incorporation and testing of heterologous production of complex natural products in this framework is hampered by the amount of manual work required to translate annotated BGCs to metabolic pathways. To this end, we have developed a pipeline for an automated reconstruction of BGC associated metabolic pathways responsible for the synthesis of non-ribosomal peptides and polyketides, two of the dominant classes of bioactive compounds. Results The developed pipeline correctly predicts 72.8% of the metabolic reactions in a detailed evaluation of 8 different BGCs comprising 228 functional domains. By introducing the reconstructed pathways into a genome-scale metabolic model we demonstrate that this level of accuracy is sufficient to make reliable in silico predictions with respect to production rate and gene knockout targets. Furthermore, we apply the pipeline to a large BGC database and reconstruct 943 metabolic pathways. We identify 17 enzymatic reactions using high-throughput assessment of potential knockout targets for increasing the production of any of the associated compounds. However, the targets only provide a relative increase of up to 6% compared to wild-type production rates. Conclusion With this pipeline we pave the way for an extended use of genome-scale metabolic models in strain design of heterologous expression hosts. In this context, we identified generic knockout targets for the increased production of heterologous compounds. However, as the predicted increase is minor for any of the single-reaction knockout targets, these results indicate that more sophisticated strain-engineering strategies are necessary for the development of efficient BGC expression hosts.https://doi.org/10.1186/s12859-021-03985-0Biosynthetic gene clustersGenome-scale metabolic modelAntiSMASHPolyketide synthasesNatural productsHeterologous expression
collection DOAJ
language English
format Article
sources DOAJ
author Snorre Sulheim
Fredrik A. Fossheim
Alexander Wentzel
Eivind Almaas
spellingShingle Snorre Sulheim
Fredrik A. Fossheim
Alexander Wentzel
Eivind Almaas
Automatic reconstruction of metabolic pathways from identified biosynthetic gene clusters
BMC Bioinformatics
Biosynthetic gene clusters
Genome-scale metabolic model
AntiSMASH
Polyketide synthases
Natural products
Heterologous expression
author_facet Snorre Sulheim
Fredrik A. Fossheim
Alexander Wentzel
Eivind Almaas
author_sort Snorre Sulheim
title Automatic reconstruction of metabolic pathways from identified biosynthetic gene clusters
title_short Automatic reconstruction of metabolic pathways from identified biosynthetic gene clusters
title_full Automatic reconstruction of metabolic pathways from identified biosynthetic gene clusters
title_fullStr Automatic reconstruction of metabolic pathways from identified biosynthetic gene clusters
title_full_unstemmed Automatic reconstruction of metabolic pathways from identified biosynthetic gene clusters
title_sort automatic reconstruction of metabolic pathways from identified biosynthetic gene clusters
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2021-02-01
description Abstract Background A wide range of bioactive compounds is produced by enzymes and enzymatic complexes encoded in biosynthetic gene clusters (BGCs). These BGCs can be identified and functionally annotated based on their DNA sequence. Candidates for further research and development may be prioritized based on properties such as their functional annotation, (dis)similarity to known BGCs, and bioactivity assays. Production of the target compound in the native strain is often not achievable, rendering heterologous expression in an optimized host strain as a promising alternative. Genome-scale metabolic models are frequently used to guide strain development, but large-scale incorporation and testing of heterologous production of complex natural products in this framework is hampered by the amount of manual work required to translate annotated BGCs to metabolic pathways. To this end, we have developed a pipeline for an automated reconstruction of BGC associated metabolic pathways responsible for the synthesis of non-ribosomal peptides and polyketides, two of the dominant classes of bioactive compounds. Results The developed pipeline correctly predicts 72.8% of the metabolic reactions in a detailed evaluation of 8 different BGCs comprising 228 functional domains. By introducing the reconstructed pathways into a genome-scale metabolic model we demonstrate that this level of accuracy is sufficient to make reliable in silico predictions with respect to production rate and gene knockout targets. Furthermore, we apply the pipeline to a large BGC database and reconstruct 943 metabolic pathways. We identify 17 enzymatic reactions using high-throughput assessment of potential knockout targets for increasing the production of any of the associated compounds. However, the targets only provide a relative increase of up to 6% compared to wild-type production rates. Conclusion With this pipeline we pave the way for an extended use of genome-scale metabolic models in strain design of heterologous expression hosts. In this context, we identified generic knockout targets for the increased production of heterologous compounds. However, as the predicted increase is minor for any of the single-reaction knockout targets, these results indicate that more sophisticated strain-engineering strategies are necessary for the development of efficient BGC expression hosts.
topic Biosynthetic gene clusters
Genome-scale metabolic model
AntiSMASH
Polyketide synthases
Natural products
Heterologous expression
url https://doi.org/10.1186/s12859-021-03985-0
work_keys_str_mv AT snorresulheim automaticreconstructionofmetabolicpathwaysfromidentifiedbiosyntheticgeneclusters
AT fredrikafossheim automaticreconstructionofmetabolicpathwaysfromidentifiedbiosyntheticgeneclusters
AT alexanderwentzel automaticreconstructionofmetabolicpathwaysfromidentifiedbiosyntheticgeneclusters
AT eivindalmaas automaticreconstructionofmetabolicpathwaysfromidentifiedbiosyntheticgeneclusters
_version_ 1724254752493010944