Genome re-sequencing and reannotation of the Escherichia coli ER2566 strain and transcriptome sequencing under overexpression conditions

Abstract Background The Escherichia coli ER2566 strain (NC_CP014268.2) was developed as a BL21 (DE3) derivative strain and had been widely used in recombinant protein expression. However, like many other current RefSeq annotations, the annotation of the ER2566 strain was incomplete, with missing gen...

Full description

Bibliographic Details
Main Authors: Lizhi Zhou, Hai Yu, Kaihang Wang, Tingting Chen, Yue Ma, Yang Huang, Jiajia Li, Liqin Liu, Yuqian Li, Zhibo Kong, Qingbing Zheng, Yingbin Wang, Ying Gu, Ningshao Xia, Shaowei Li
Format: Article
Language:English
Published: BMC 2020-06-01
Series:BMC Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12864-020-06818-1
id doaj-388fd0938fdc4f28a0ae8428109a789c
record_format Article
collection DOAJ
language English
format Article
sources DOAJ
author Lizhi Zhou
Hai Yu
Kaihang Wang
Tingting Chen
Yue Ma
Yang Huang
Jiajia Li
Liqin Liu
Yuqian Li
Zhibo Kong
Qingbing Zheng
Yingbin Wang
Ying Gu
Ningshao Xia
Shaowei Li
spellingShingle Lizhi Zhou
Hai Yu
Kaihang Wang
Tingting Chen
Yue Ma
Yang Huang
Jiajia Li
Liqin Liu
Yuqian Li
Zhibo Kong
Qingbing Zheng
Yingbin Wang
Ying Gu
Ningshao Xia
Shaowei Li
Genome re-sequencing and reannotation of the Escherichia coli ER2566 strain and transcriptome sequencing under overexpression conditions
BMC Genomics
Escherichia coli ER2566
Genome reannotation
Transcriptome sequencing
Engineer bacteria
author_facet Lizhi Zhou
Hai Yu
Kaihang Wang
Tingting Chen
Yue Ma
Yang Huang
Jiajia Li
Liqin Liu
Yuqian Li
Zhibo Kong
Qingbing Zheng
Yingbin Wang
Ying Gu
Ningshao Xia
Shaowei Li
author_sort Lizhi Zhou
title Genome re-sequencing and reannotation of the Escherichia coli ER2566 strain and transcriptome sequencing under overexpression conditions
title_short Genome re-sequencing and reannotation of the Escherichia coli ER2566 strain and transcriptome sequencing under overexpression conditions
title_full Genome re-sequencing and reannotation of the Escherichia coli ER2566 strain and transcriptome sequencing under overexpression conditions
title_fullStr Genome re-sequencing and reannotation of the Escherichia coli ER2566 strain and transcriptome sequencing under overexpression conditions
title_full_unstemmed Genome re-sequencing and reannotation of the Escherichia coli ER2566 strain and transcriptome sequencing under overexpression conditions
title_sort genome re-sequencing and reannotation of the escherichia coli er2566 strain and transcriptome sequencing under overexpression conditions
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2020-06-01
description Abstract Background The Escherichia coli ER2566 strain (NC_CP014268.2) was developed as a BL21 (DE3) derivative strain and had been widely used in recombinant protein expression. However, like many other current RefSeq annotations, the annotation of the ER2566 strain was incomplete, with missing gene names and miscellaneous RNAs, as well as uncorrected annotations of some pseudogenes. Here, we performed a systematic reannotation of the ER2566 genome by combining multiple annotation tools with manual revision to provide a comprehensive understanding of the E. coli ER2566 strain, and used high-throughput sequencing to explore how the strain adapted under external pressure. Results The reannotation included noteworthy corrections to all protein-coding genes, led to the exclusion of 190 hypothetical genes or pseudogenes, and resulted in the addition of 237 coding sequences and 230 miscellaneous noncoding RNAs and 2 tRNAs. In addition, we further manually examined all 194 pseudogenes in the Ref-seq annotation and directly identified 123 (63%) as coding genes. We then used whole-genome sequencing and high-throughput RNA sequencing to assess mutational adaptations under consecutive subculture or overexpression burden. Whereas no mutations were detected in response to consecutive subculture, overexpression of the human papillomavirus 16 type capsid led to the identification of a mutation (position 1,094,824 within the 3′ non-coding region) positioned 19-bp away from the lacI gene in the transcribed RNA, which was not detected at the genomic level by Sanger sequencing. Conclusion The ER2566 strain was used by both the general scientific community and the biotechnology industry. Reannotation of the E. coli ER2566 strain not only improved the RefSeq data but uncovered a key site that might be involved in the transcription and translation of genes encoding the lactose operon repressor. We proposed that our pipeline might offer a universal method for the reannotation of other bacterial genomes with high speed and accuracy. This study might facilitate a better understanding of gene function for the ER2566 strain under external burden and provided more clues to engineer bacteria for biotechnological applications.
topic Escherichia coli ER2566
Genome reannotation
Transcriptome sequencing
Engineer bacteria
url http://link.springer.com/article/10.1186/s12864-020-06818-1
work_keys_str_mv AT lizhizhou genomeresequencingandreannotationoftheescherichiacolier2566strainandtranscriptomesequencingunderoverexpressionconditions
AT haiyu genomeresequencingandreannotationoftheescherichiacolier2566strainandtranscriptomesequencingunderoverexpressionconditions
AT kaihangwang genomeresequencingandreannotationoftheescherichiacolier2566strainandtranscriptomesequencingunderoverexpressionconditions
AT tingtingchen genomeresequencingandreannotationoftheescherichiacolier2566strainandtranscriptomesequencingunderoverexpressionconditions
AT yuema genomeresequencingandreannotationoftheescherichiacolier2566strainandtranscriptomesequencingunderoverexpressionconditions
AT yanghuang genomeresequencingandreannotationoftheescherichiacolier2566strainandtranscriptomesequencingunderoverexpressionconditions
AT jiajiali genomeresequencingandreannotationoftheescherichiacolier2566strainandtranscriptomesequencingunderoverexpressionconditions
AT liqinliu genomeresequencingandreannotationoftheescherichiacolier2566strainandtranscriptomesequencingunderoverexpressionconditions
AT yuqianli genomeresequencingandreannotationoftheescherichiacolier2566strainandtranscriptomesequencingunderoverexpressionconditions
AT zhibokong genomeresequencingandreannotationoftheescherichiacolier2566strainandtranscriptomesequencingunderoverexpressionconditions
AT qingbingzheng genomeresequencingandreannotationoftheescherichiacolier2566strainandtranscriptomesequencingunderoverexpressionconditions
AT yingbinwang genomeresequencingandreannotationoftheescherichiacolier2566strainandtranscriptomesequencingunderoverexpressionconditions
AT yinggu genomeresequencingandreannotationoftheescherichiacolier2566strainandtranscriptomesequencingunderoverexpressionconditions
AT ningshaoxia genomeresequencingandreannotationoftheescherichiacolier2566strainandtranscriptomesequencingunderoverexpressionconditions
AT shaoweili genomeresequencingandreannotationoftheescherichiacolier2566strainandtranscriptomesequencingunderoverexpressionconditions
_version_ 1724641425960730624
spelling doaj-388fd0938fdc4f28a0ae8428109a789c2020-11-25T03:14:56ZengBMCBMC Genomics1471-21642020-06-0121111110.1186/s12864-020-06818-1Genome re-sequencing and reannotation of the Escherichia coli ER2566 strain and transcriptome sequencing under overexpression conditionsLizhi Zhou0Hai Yu1Kaihang Wang2Tingting Chen3Yue Ma4Yang Huang5Jiajia Li6Liqin Liu7Yuqian Li8Zhibo Kong9Qingbing Zheng10Yingbin Wang11Ying Gu12Ningshao Xia13Shaowei Li14State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, School of Public Health, Xiamen UniversityState Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, School of Public Health, Xiamen UniversityNational Institute of Diagnostics and Vaccine Development in Infectious Disease, School of Life Sciences, Xiamen UniversityNational Institute of Diagnostics and Vaccine Development in Infectious Disease, School of Life Sciences, Xiamen UniversityNational Institute of Diagnostics and Vaccine Development in Infectious Disease, School of Life Sciences, Xiamen UniversityNational Institute of Diagnostics and Vaccine Development in Infectious Disease, School of Life Sciences, Xiamen UniversityNational Institute of Diagnostics and Vaccine Development in Infectious Disease, School of Life Sciences, Xiamen UniversityNational Institute of Diagnostics and Vaccine Development in Infectious Disease, School of Life Sciences, Xiamen UniversityNational Institute of Diagnostics and Vaccine Development in Infectious Disease, School of Life Sciences, Xiamen UniversityNational Institute of Diagnostics and Vaccine Development in Infectious Disease, School of Life Sciences, Xiamen UniversityState Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, School of Public Health, Xiamen UniversityState Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, School of Public Health, Xiamen UniversityState Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, School of Public Health, Xiamen UniversityState Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, School of Public Health, Xiamen UniversityState Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, School of Public Health, Xiamen UniversityAbstract Background The Escherichia coli ER2566 strain (NC_CP014268.2) was developed as a BL21 (DE3) derivative strain and had been widely used in recombinant protein expression. However, like many other current RefSeq annotations, the annotation of the ER2566 strain was incomplete, with missing gene names and miscellaneous RNAs, as well as uncorrected annotations of some pseudogenes. Here, we performed a systematic reannotation of the ER2566 genome by combining multiple annotation tools with manual revision to provide a comprehensive understanding of the E. coli ER2566 strain, and used high-throughput sequencing to explore how the strain adapted under external pressure. Results The reannotation included noteworthy corrections to all protein-coding genes, led to the exclusion of 190 hypothetical genes or pseudogenes, and resulted in the addition of 237 coding sequences and 230 miscellaneous noncoding RNAs and 2 tRNAs. In addition, we further manually examined all 194 pseudogenes in the Ref-seq annotation and directly identified 123 (63%) as coding genes. We then used whole-genome sequencing and high-throughput RNA sequencing to assess mutational adaptations under consecutive subculture or overexpression burden. Whereas no mutations were detected in response to consecutive subculture, overexpression of the human papillomavirus 16 type capsid led to the identification of a mutation (position 1,094,824 within the 3′ non-coding region) positioned 19-bp away from the lacI gene in the transcribed RNA, which was not detected at the genomic level by Sanger sequencing. Conclusion The ER2566 strain was used by both the general scientific community and the biotechnology industry. Reannotation of the E. coli ER2566 strain not only improved the RefSeq data but uncovered a key site that might be involved in the transcription and translation of genes encoding the lactose operon repressor. We proposed that our pipeline might offer a universal method for the reannotation of other bacterial genomes with high speed and accuracy. This study might facilitate a better understanding of gene function for the ER2566 strain under external burden and provided more clues to engineer bacteria for biotechnological applications.http://link.springer.com/article/10.1186/s12864-020-06818-1Escherichia coli ER2566Genome reannotationTranscriptome sequencingEngineer bacteria