Design parameters to control synthetic gene expression in Escherichia coli.

BACKGROUND:Production of proteins as therapeutic agents, research reagents and molecular tools frequently depends on expression in heterologous hosts. Synthetic genes are increasingly used for protein production because sequence information is easier to obtain than the corresponding physical DNA. Pr...

Full description

Bibliographic Details
Main Authors: Mark Welch, Sridhar Govindarajan, Jon E Ness, Alan Villalobos, Austin Gurney, Jeremy Minshull, Claes Gustafsson
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2009-09-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC2736378?pdf=render
id doaj-51a890eac0a24abb9125ca2ab658fb60
record_format Article
spelling doaj-51a890eac0a24abb9125ca2ab658fb602020-11-25T02:12:47ZengPublic Library of Science (PLoS)PLoS ONE1932-62032009-09-0149e700210.1371/journal.pone.0007002Design parameters to control synthetic gene expression in Escherichia coli.Mark WelchSridhar GovindarajanJon E NessAlan VillalobosAustin GurneyJeremy MinshullClaes GustafssonBACKGROUND:Production of proteins as therapeutic agents, research reagents and molecular tools frequently depends on expression in heterologous hosts. Synthetic genes are increasingly used for protein production because sequence information is easier to obtain than the corresponding physical DNA. Protein-coding sequences are commonly re-designed to enhance expression, but there are no experimentally supported design principles. PRINCIPAL FINDINGS:To identify sequence features that affect protein expression we synthesized and expressed in E. coli two sets of 40 genes encoding two commercially valuable proteins, a DNA polymerase and a single chain antibody. Genes differing only in synonymous codon usage expressed protein at levels ranging from undetectable to 30% of cellular protein. Using partial least squares regression we tested the correlation of protein production levels with parameters that have been reported to affect expression. We found that the amount of protein produced in E. coli was strongly dependent on the codons used to encode a subset of amino acids. Favorable codons were predominantly those read by tRNAs that are most highly charged during amino acid starvation, not codons that are most abundant in highly expressed E. coli proteins. Finally we confirmed the validity of our models by designing, synthesizing and testing new genes using codon biases predicted to perform well. CONCLUSION:The systematic analysis of gene design parameters shown in this study has allowed us to identify codon usage within a gene as a critical determinant of achievable protein expression levels in E. coli. We propose a biochemical basis for this, as well as design algorithms to ensure high protein production from synthetic genes. Replication of this methodology should allow similar design algorithms to be empirically derived for any expression system.http://europepmc.org/articles/PMC2736378?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Mark Welch
Sridhar Govindarajan
Jon E Ness
Alan Villalobos
Austin Gurney
Jeremy Minshull
Claes Gustafsson
spellingShingle Mark Welch
Sridhar Govindarajan
Jon E Ness
Alan Villalobos
Austin Gurney
Jeremy Minshull
Claes Gustafsson
Design parameters to control synthetic gene expression in Escherichia coli.
PLoS ONE
author_facet Mark Welch
Sridhar Govindarajan
Jon E Ness
Alan Villalobos
Austin Gurney
Jeremy Minshull
Claes Gustafsson
author_sort Mark Welch
title Design parameters to control synthetic gene expression in Escherichia coli.
title_short Design parameters to control synthetic gene expression in Escherichia coli.
title_full Design parameters to control synthetic gene expression in Escherichia coli.
title_fullStr Design parameters to control synthetic gene expression in Escherichia coli.
title_full_unstemmed Design parameters to control synthetic gene expression in Escherichia coli.
title_sort design parameters to control synthetic gene expression in escherichia coli.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2009-09-01
description BACKGROUND:Production of proteins as therapeutic agents, research reagents and molecular tools frequently depends on expression in heterologous hosts. Synthetic genes are increasingly used for protein production because sequence information is easier to obtain than the corresponding physical DNA. Protein-coding sequences are commonly re-designed to enhance expression, but there are no experimentally supported design principles. PRINCIPAL FINDINGS:To identify sequence features that affect protein expression we synthesized and expressed in E. coli two sets of 40 genes encoding two commercially valuable proteins, a DNA polymerase and a single chain antibody. Genes differing only in synonymous codon usage expressed protein at levels ranging from undetectable to 30% of cellular protein. Using partial least squares regression we tested the correlation of protein production levels with parameters that have been reported to affect expression. We found that the amount of protein produced in E. coli was strongly dependent on the codons used to encode a subset of amino acids. Favorable codons were predominantly those read by tRNAs that are most highly charged during amino acid starvation, not codons that are most abundant in highly expressed E. coli proteins. Finally we confirmed the validity of our models by designing, synthesizing and testing new genes using codon biases predicted to perform well. CONCLUSION:The systematic analysis of gene design parameters shown in this study has allowed us to identify codon usage within a gene as a critical determinant of achievable protein expression levels in E. coli. We propose a biochemical basis for this, as well as design algorithms to ensure high protein production from synthetic genes. Replication of this methodology should allow similar design algorithms to be empirically derived for any expression system.
url http://europepmc.org/articles/PMC2736378?pdf=render
work_keys_str_mv AT markwelch designparameterstocontrolsyntheticgeneexpressioninescherichiacoli
AT sridhargovindarajan designparameterstocontrolsyntheticgeneexpressioninescherichiacoli
AT joneness designparameterstocontrolsyntheticgeneexpressioninescherichiacoli
AT alanvillalobos designparameterstocontrolsyntheticgeneexpressioninescherichiacoli
AT austingurney designparameterstocontrolsyntheticgeneexpressioninescherichiacoli
AT jeremyminshull designparameterstocontrolsyntheticgeneexpressioninescherichiacoli
AT claesgustafsson designparameterstocontrolsyntheticgeneexpressioninescherichiacoli
_version_ 1724908157319249920