First Genetic Code of an Organism Revised in a Research Laboratory

This achievement would facilitate the development of proteins with novel functionalities.

  • We succeeded in rewriting a rule that did not change in most organisms for billions of years.
  • We developed E. coli strains producing "alloproteins" consisting of non-natural amino acids in addition to the standard 20 amino acids.
  • This achievement would facilitate the development of novel proteins as therapeutic agents or biomaterial.

The genetic information in DNA directs the production of proteins in our body, a process called "translation". The "dictionary", used for the translation, is the genetic code, and any revision of the code would change the interpretation of the genetic information, causing a lethal effect for the organism. This is probably the reason why our genetic code did not change over a period of billions of years, after the code was established in the common ancestor of all living things. We attempted to see how the code could be changed and what conditions would be necessary for achieving any change in the code. We tried with E. coli cells. Through much experimenting and great effort by Mukai and Sakamoto's team members, the bacterial genetic "dictionary" was finally revised successfully, with the definition of a "word" being changed completely. Required modifications to the bacterial genome were so few that E. coli cells, vigorously multiplying, could accumulate them in a week, implying the flexibility of the genetic code, as opposed to the supposedly "frozen" code.

Proteins consist of the 20 standard amino acids. The revision of the genetic "dictionary" would allow the production of proteins consisting of more than 20, with the redefinition of a word as one specifying a non-natural amino acid. We actually created E. coli cells having a non-natural amino acid as an essential component in translation, and such bacteria would be useful for developing proteins with novel structures and functionalities.

DNA is the genetic material, consisting of 4 different chemical compounds, collectively called "bases". DNA can be thought of as a "book" using these 4 bases as alphabets. The genetic information, thus carried by DNA, directs the production of proteins, which consist of 20 kinds of amino acids, and protein biosynthesis is like a process to translate the book written with 4 kinds of letters (bases) into texts with a different system of 20 letters (amino acids). Any translation needs a dictionary, and the dictionary used by organisms for protein synthesis is called the genetic code. A triplet of bases is called a codon, and the genetic code assigns all of the possible 64 kinds of codons to 20 amino acids and stop signals for protein synthesis (Figure 1). These stop signals are the UAA, UAG, and UGA codons, which specify no amino acid and serve like the period at the end of a sentence. The genetic code is embodied through a number of interactions between molecules in the cells, and the ribosome plays a central role in synthesizing proteins.

Most organisms are using the same genetic code or the "universal" genetic code (Figure 1), which means that the meaning of each codon is identical among most organisms. Humans as well as E. coli cells assumedly inherited this code from the common ancestor of all living things. If the meaning of any codon were to change abruptly, the genetic information would not be translated correctly, preventing the production of functional proteins, necessary for supporting our bodily activities. In the exceptional cases of mycoplasmas and ciliates, using non-standard codes, the meanings of certain codons supposedly changed very gradually to achieve novel definitions of these codons. This change must have taken tens or hundreds of millions to manifest itself. A number of mutations need to accumulate in an organism, to change such a basic rule for living things, and this is probably the reason why changes in the genetic code occurred only in rare and exceptional organisms.

Figure 1, The universal genetic code.

We aimed to completely change the meaning of the UAG codon from a stop signal to an amino acid. A tRNA molecule, carrying its specific amino acid, binds to the corresponding codon on the ribosome. There is usually no tRNA to bind to the UAG codon in the cell, and a release factor (RF-1) instead binds to the codon to stop protein synthesis and defines the end of the protein. There should be two requirements for redefining UAG as a codon specifying an amino acid (a sense codon). First, tRNA that binds to the UAG codon must be introduced into E. coli, to allow the translation of UAG into an amino acid. Such tRNAs are known as "amber suppressor tRNAs". Second, to remove RF1 from the bacterium, its gene, prfA, needs to be deleted from the chromosome. However, this gene is essential for E. coli growth, and the challenge was to find out the conditions to avoid the lethal effect of deleting this gene from the E. coli genome.

Among the 4000 genes coding for proteins in E. coli, about 300 genes have UAG codon at their ends. Only seven out of these genes with UAG are reportedly necessary for supporting the bacterial growth. The redefinition of UAG as a sense codon would prevent the production of the functional products from these seven genes; therefore, we replaced the UAG codons with another stop signal, UAA, and introduced these engineered genes into E. coli.

Since we found that the seven engineered genes by themselves did not allow the deletion of prfA from the chromosome, a suppressor tRNA was also introduced into the cells. When these two conditions were fulfilled at the same time, prfA was able to be removed from E. coli. We call the prfA-lacking E. coli strains RFzero strains, and subjected them to genetic analyses, to confirm that the UAG codon has completely been stripped of the role as a stop codon, and serves only as a sense codon specifying the amino acid, carried by the introduced suppressor tRNA.

Finally, an enzyme that attaches a non-natural amino acid to the tRNA was introduced to RFzero cells, and UAG codos were successfully translated into this amino acid. This enzyme was previously developed in our team. The present achievement will facilitate the incorporating of such amino acids into a protein at many sites.

Figure 2, Translation of UAG codons into 3-iodotyrosine a non-natural amino acid.
The chemical structure of 3-iodotyrosine (left). A mass spectrometric analysis showed that the analyzed protein contained six iodotyrosines in the positions corresponding to the six UAG positions in the gene, marked with the asterisks in the fragments A and B (right).