Codon optimization with deep learning to enhance protein expression

Abstract Heterologous expression is the main approach for recombinant protein production ingenetic synthesis, for which codon optimization is necessary. The existing optimization methods are based on biological indexes. In this paper, we propose a novel codon optimization method based on deep learni...

Full description

Bibliographic Details
Main Authors: Hongguang Fu, Yanbing Liang, Xiuqin Zhong, ZhiLing Pan, Lei Huang, HaiLin Zhang, Yang Xu, Wei Zhou, Zhong Liu
Format: Article
Language:English
Published: Nature Publishing Group 2020-10-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-020-74091-z
id doaj-099d6ae85f104ec1a497335bc64d5ae2
record_format Article
spelling doaj-099d6ae85f104ec1a497335bc64d5ae22020-12-08T10:40:02ZengNature Publishing GroupScientific Reports2045-23222020-10-011011910.1038/s41598-020-74091-zCodon optimization with deep learning to enhance protein expressionHongguang Fu0Yanbing Liang1Xiuqin Zhong2ZhiLing Pan3Lei Huang4HaiLin Zhang5Yang Xu6Wei Zhou7Zhong Liu8University of Electronic Science and Technology of ChinaUniversity of Electronic Science and Technology of ChinaUniversity of Electronic Science and Technology of ChinaState Key Laboratory of Biotherapy, West China Hospital, Sichuan UniversityUniversity of Electronic Science and Technology of ChinaState Key Laboratory of Biotherapy, West China Hospital, Sichuan UniversityUniversity of Electronic Science and Technology of ChinaUniversity of Electronic Science and Technology of ChinaChengdu Institute of Computer Applications, Chinese Academy of SciencesAbstract Heterologous expression is the main approach for recombinant protein production ingenetic synthesis, for which codon optimization is necessary. The existing optimization methods are based on biological indexes. In this paper, we propose a novel codon optimization method based on deep learning. First, we introduce the concept of codon boxes, via which DNA sequences can be recoded into codon box sequences while ignoring the order of bases. Then, the problem of codon optimization can be converted to sequence annotation of corresponding amino acids with codon boxes. The codon optimization models for Escherichia Coli were trained by the Bidirectional Long-Short-Term Memory Conditional Random Field. Theoretically, deep learning is a good method to obtain the distribution characteristics of DNA. In addition to the comparison of the codon adaptation index, protein expression experiments for plasmodium falciparum candidate vaccine and polymerase acidic protein were implemented for comparison with the original sequences and the optimized sequences from Genewiz and ThermoFisher. The results show that our method for enhancing protein expression is efficient and competitive.https://doi.org/10.1038/s41598-020-74091-z
collection DOAJ
language English
format Article
sources DOAJ
author Hongguang Fu
Yanbing Liang
Xiuqin Zhong
ZhiLing Pan
Lei Huang
HaiLin Zhang
Yang Xu
Wei Zhou
Zhong Liu
spellingShingle Hongguang Fu
Yanbing Liang
Xiuqin Zhong
ZhiLing Pan
Lei Huang
HaiLin Zhang
Yang Xu
Wei Zhou
Zhong Liu
Codon optimization with deep learning to enhance protein expression
Scientific Reports
author_facet Hongguang Fu
Yanbing Liang
Xiuqin Zhong
ZhiLing Pan
Lei Huang
HaiLin Zhang
Yang Xu
Wei Zhou
Zhong Liu
author_sort Hongguang Fu
title Codon optimization with deep learning to enhance protein expression
title_short Codon optimization with deep learning to enhance protein expression
title_full Codon optimization with deep learning to enhance protein expression
title_fullStr Codon optimization with deep learning to enhance protein expression
title_full_unstemmed Codon optimization with deep learning to enhance protein expression
title_sort codon optimization with deep learning to enhance protein expression
publisher Nature Publishing Group
series Scientific Reports
issn 2045-2322
publishDate 2020-10-01
description Abstract Heterologous expression is the main approach for recombinant protein production ingenetic synthesis, for which codon optimization is necessary. The existing optimization methods are based on biological indexes. In this paper, we propose a novel codon optimization method based on deep learning. First, we introduce the concept of codon boxes, via which DNA sequences can be recoded into codon box sequences while ignoring the order of bases. Then, the problem of codon optimization can be converted to sequence annotation of corresponding amino acids with codon boxes. The codon optimization models for Escherichia Coli were trained by the Bidirectional Long-Short-Term Memory Conditional Random Field. Theoretically, deep learning is a good method to obtain the distribution characteristics of DNA. In addition to the comparison of the codon adaptation index, protein expression experiments for plasmodium falciparum candidate vaccine and polymerase acidic protein were implemented for comparison with the original sequences and the optimized sequences from Genewiz and ThermoFisher. The results show that our method for enhancing protein expression is efficient and competitive.
url https://doi.org/10.1038/s41598-020-74091-z
work_keys_str_mv AT hongguangfu codonoptimizationwithdeeplearningtoenhanceproteinexpression
AT yanbingliang codonoptimizationwithdeeplearningtoenhanceproteinexpression
AT xiuqinzhong codonoptimizationwithdeeplearningtoenhanceproteinexpression
AT zhilingpan codonoptimizationwithdeeplearningtoenhanceproteinexpression
AT leihuang codonoptimizationwithdeeplearningtoenhanceproteinexpression
AT hailinzhang codonoptimizationwithdeeplearningtoenhanceproteinexpression
AT yangxu codonoptimizationwithdeeplearningtoenhanceproteinexpression
AT weizhou codonoptimizationwithdeeplearningtoenhanceproteinexpression
AT zhongliu codonoptimizationwithdeeplearningtoenhanceproteinexpression
_version_ 1715005112148033536