Reply to NCSE on Universal Genetic Code

_{Paul Nelson
August 21, 2009

Molecular Homology

19} _{Categories
Molecular Homology}

The NCSE asserts that Explore Evolution‘s discussion of the universal genetic code, its variants, and puzzles ab

The NCSE asserts that Explore Evolution‘s discussion of the universal genetic code, its variants, and puzzles about the origin of the variant codes, “is based on misunderstanding and/or misrepresentation of the available knowledge and of the scientific record.” The NCSE’s own discussion of this area, however, is very deeply confused. The NCSE does not grasp basic facts about the variant code puzzle, and this leads to several serious errors in their discussion.

1. Making a hash of a fascinating puzzle — the NCSE discussion

We begin with the NCSE’s most serious error. They write:

First, contrary to the key assertion [of EE], scientists have been aware of natural genetic code mutants since at least the 1960s, and the actual molecular mechanism of some of these mutations (such as “suppressors of amber”) was elucidated in both bacteria and yeast (Goodman HM, Abelson J, Landy A, Brenner S, Smith JD. “Amber suppression: a nucleotide change in the anticodon of a tyrosine transfer RNA.” Nature. 1968; 217:1019-24; Capecchi MR, Hughes SH, Wahl GM. “Yeast super-suppressors are altered tRNAs capable of translating a nonsense codon in vitro.” Cell. 1975; 6:269-77.) Amber suppressor mutations change the read-out of certain codons from STOP to an amino acid by altering the structure of one of the transfer RNAs. This tRNA recognizes the codons in messenger RNAs and allows the addition of the correct amino acid during protein synthesis.

EE does not treat these well-known mutations because they are simply irrelevant. While mutations such as “suppressor of amber” affect parts of the coding system — e.g., transfer RNAs — they do not give rise to novel codon assignments that persist in species. The mechanistic challenges of permanently altering the genetic code (i.e., evolving and fixing a variant codon assignment) in a population of organisms requires far more than amber suppressor mutations, or mutations to tRNAs generally (see below). Variant genetic codes are fundamentally different from “natural genetic code mutants,” which explains why the discovery of mitochondrial (1979) and nuclear (1985) variant codes came as such a surprise.

Thus, the NCSE’s next statement (in italics) is a complete non-sequitur:

These mutants showed how new variant genetic codes can evolve, and what kind of selective pressures can favor such changes (in this case, the need for reversion of point mutations which introduce deleterious STOP codons in critical genes). (emphasis added)

This is false. Current theories of codon reasssignment invoke complicated multi-step scenarios for the origin of novel tRNAs, the elimination of wild-type tRNAs, and so on. Reversion mutations to an in-frame STOP (termination) mutation, on the other hand, would involve restoring the original amino acid in a protein sequence, not a novel codon assignment.

The NCSE’s next statement is also a non-sequitur, and moreover historically false:

Therefore, it was recognized fairly early that the genetic code did not need to be absolutely invariant to be fundamentally shared between all organisms (“universal”).

Recognized by a handful of investigators, perhaps, but not held as mainstream theory. Francis Crick and a few others may have speculated about the possibility of variant codes, but these speculations found little support in the literature. From the mid-1960s, when the code was elucidated, until the mid-to-late 1980s (or, in some cases, early 1990s, depending on the textbook in question), the prediction of a necessarily invariant, or universal, genetic code was widely held to follow from the theory of common descent. Almost any biology textbook from this period (1966-1990) carries the prediction.¹

2. Cell biology basics

To understand why, we should review the relevant cellular information-processing functions, and their parts.

Before any protein can be synthesized by the cell, its corresponding messenger RNA (mRNA) molecule must be produced by DNA transcription processes in the nucleus.² The mRNA is then exported to the cell’s cytoplasm, where the molecular apparatus of protein synthesis takes over. First, the small ribosomal subunit binds to the mRNA molecule. Next, a unique initiator transfer RNA (tRNA) molecule locates the small ribosomal subunit over a special start codon on the mRNA. The large ribosomal subunit attaches to complete the ribosome, and the elongation phase of protein synthesis begins. The polypeptide (nascent protein) grows in length as amino acids are added, step-wise, at the carboxyl-terminal end of the chain, via a three-phase cycle: (1) aminoacyl-tRNA molecules bind to the mRNA, (2) a peptide bond forms, and (3) the ribosome translocates, or moves, to the next site on the mRNA. Moving from codon to codon in the 5′-to3′ direction along the mRNA molecule, the ribosome stops when it reaches a stop codon. Then a specialized protein known as a release factor (RF, or, in the case of eukaryotes, eRF) binds to the stop codon. Translation terminates, and the completed polypeptide (protein) is released from the ribosome.

This entire process is mediated by the genetic code (see Table 1). In particular, specific aminoacyl-tRNA molecules function as decoding devices, which allow particular sequences of three ribonucleotides in the mRNA, the codons, to be translated as unique amino acids in the newly synthesized protein. Each amino acid is matched to its special mRNA codons via a two-step recognition process: (i) the amino acid is recognized by a unique aminoacyl-tRNA synthetase enzyme, which links it to a specific tRNA molecule; and (ii) the anticodon of the tRNA molecule recognizes the specific codon, or sequence of three nucleotides in the mRNA chain. Thus, in summary, the genetic code determines amino acid identity in protein assembly, by assigning messenger RNA (mRNA) triplets (codons) to specific amino acids, via transfer RNA (tRNA) and aminoacyl tRNA synthetases.

3. Is the code necessarily invariant?

Functionally speaking this would appear to be a system that cannot vary. As Watson et al. (1987, 453) express the point, in their widely-used molecular genetics textbook:

Consider what might happen if a mutation changed the genetic code. Such a mutation might, for example, alter the sequence of the serine tRNA molecule of the class that corresponds to UCU, causing them to recognize UUU sequences instead. This would be a lethal mutation in haploid cells containing only one gene directing the production of tRNAser, for serine would not be inserted into many of its normal positions in proteins. Even if there were more than one gene…this type of mutation would still be lethal, since it would cause the simultaneous replacement of many phenylalanine residues by serine in cell proteins.

Lehman (2001, R63) calls this functional gulf “a ‘Death Valley’ in the adaptive landscape.” Retrospectively considering the prediction of universality, he writes,

The standard view of the evolution of the genetic code had been that, once the code became fixed in some primitive lineage of organisms, then any coding change would be precluded because the transitory coding stage that a population must experience to change its code would be lethal. Consider, for example, mutations that change the charging specificity of a tRNA aminoacyl synthetase, such that it charged a glycyl-tRNA with arginine instead. Suddenly glycines are replaced by arginines throughout the genome, which would undoubtedly cause irreparable cellular chaos. This could be thought of as the quintessential case of stabilizing selection: a ‘Death Valley’ in the adaptive landscape. (2001, R63; reference numbers omitted)

Davis (1985, 256) provides a characteristic formulation of the prediction:

If organisms had arisen independently they could perfectly well have used different codes to connect the 64 trinucleotide codons to the 20 amino acids; but if they arose by common descent any alteration of the code would be lethal, because it would change too many proteins at once. Hence the finding of the same genetic code in microbes, plants and animals…spectacularly confirms a strong evolutionary prediction.

Students and teachers should know about these predictions, which (as noted) were widespread in the biological literature for nearly three decades. Understanding how predictions from evolutionary theory may fail, or be modified, is an important aspect of biological knowledge. What students and teachers should not be told is “well, we knew that all along — no surprise here.” Such a response passes beyond historical revisionism into outright falsehood.

4. The discovery of variant codes

In 1979, variant codes were discovered in mitochondria (energy-producing cell organelles with a small genetic complement of their own), where “it was found that the code in vertebrate mitochondria differed from the universal code by using codons AUA for methionine and UGA for tryptophan” (Osawa, Muto, Jukes, and Ohama 1990, 19). As Fox (1985, 132) argued, however, “mitochondria could be thought of as exceptions that prove the rule: their genetic systems produce only a very limited number of proteins and so might tolerate changes.”

Then, in the mid-1980s, variants in the nuclear code were discovered (see Table 2). In a commentary written in response to the first wave of discoveries, Fox (1985, 132) argued, “Some ‘real’ [nuclear] exceptions have come to light in both eukaryotic and prokaryotic free-living organisms, and the notion of universality will have to be discarded.” Osawa (1995) reviews the history of the discovery of a wide range of variant codes between 1979 and 1995, and the National Center for Biotechnology Information now maintains a web page where variant codes are catalogued.³

Do the variant codes challenge Common Descent? No, say investigators in the field. “Our ever-expanding list of nonstandard genetics,” argues Lehman (2001, R66), “is not serving to unravel the unity of biology.” One might assume, then, that we understand the pathways of natural transformation leading from the “universal” code to the variant codes, so that the domain of Common Descent can expand without strain to accommodate the new observations. On this view, the variant codon assignments differ “in pretty minor ways” (Wolfe 1996, 320) from the universal code — and thus the variants, while not strictly predicted by Common Descent, nevertheless represent only inconsequential (rare but viable) departures from universality: molecular noise, as it were.

This line of argument, however, soon runs aground on our ignorance. We do not understand mechanistically how codon assignments change, although that is not for a want of hypotheses. This can be illustrated by considering the variant code of the ciliated protozoan Tetrahymena thermophila.

5. Let’s change the code in a ciliated protozoan: what needs to happen?

Tetrahymena thermophila has a single stop codon, UGA, and assigns the other two canonical stop codons, UAA and UAG, to glutamine (Lozupone et al. 2001). Thus, its release factor protein, Tt-eRF1, recognizes only UGA at the A site of the ribosome (Karamyshev et al. 1999).⁴ To explain how UAA and UAG were reassigned in Tetrahymena, Osawa and Jukes (1989) presented the following hypothesis (see Figure 5):

1. Start with the universal code: UGA, UAA, UAG â†’ stop, and Tt-eRF1 recognizes all three stop codons.
2. Tt-eRF1 would then have evolved to be specific to UGA.
3. UAA and UAG would have been removed from the termination sites of any gene where they existed, and become unassigned, i.e., untranslatable nonsense codons.
4. Then the gene for tRNAgln, with the anticodon UmUG, would have duplicated.
5. The duplicate mutated to UmUA, pairing with UAA and UAG so that these now are translated as Gln. (Osawa 1995, 99)

What are the problems with this scenario? They lie between steps (2) and (3).⁵ As Tt-eRF1 is evolving its specificity for UGA, UAA and UAG codons would still be present in the Tetrahymenagenome as termination codons:

The model outlined by Jukes and Osawa . . . would lead to a potentially awkward intermediate stage where some genes end in [UAA and UAG], but neither [UAA or UAG] recognizing tRNAs nor…release factors exist in the cell. The outcome of this state in eukaryotes is not known, but in eubacteria the cognate tRNA of the penultimate codon remains covalently attached to the carboxyl-terminus of the protein. (Keeling 1997, 208)
. . . during the appearance of code deviations, ancient termination codons are acquiring a new sense and new UAA and UAG codons are accumulating in the reading frames. This will generate ambiguity in the length of translation products. (Cohen and Adoutte 1995, 105)

If even a small percentage of the protein-coding mRNAs present in Tetrahymena (at step 2 in the Jukes-Osawa scenario) terminate with the stop codons UAA and UAG, it is unclear how mutations to the codon-recognizing domains of Tt-eRF1, causing that protein to recognize only UGA, would affect cellular viability. If the ribosomes translating UAA- and UAG-terminating mRNAs “idle” (i.e., halt) at the penultimate codon of those mRNAs, awaiting the action of Tt-eRF1 (a cellular player which now however no longer recognizes UAA and UAG as stop), the nascent polypeptides will not release from their ribosomes, and will not fold properly.⁶

Recent work elucidating the molecular variation of release factors in ciliates (Luzopone, Knight, and Landweber 2001) does not address the problem of the translation of UAA- and UAG-terminating mRNAs during the transition phase. And no experiments have yet been attempted to modify eRF1, in order to restrict its specificity, in ciliates using the canonical code. Thus we do not really understand, in the step-by-step fashion envisioned by Charles Darwin, how the variant code in Tetrahymena thermophila evolved. The natural pathway is unknown.⁷

References Cited:

NCSE, Critique of Explore Evolution. Sept. 30, 2008. “Is the Genetic Code Universal, and Can It Change?” Available online (2008) at https://ncse.com/creationism/analysis/is-genetic-code-universal-can-it-change

[1.] As Judson (1979, 278) notes, the universality of the code was indeed predicted prior to its experimental elucidation. The year is 1954: “They [Crick, Brenner, and Watson] assumed, with some apprehension, that the genetic code would be the same for all living things. There was no evidence whatever for this; indeed, the very data in which Chargaff had perceived that the ratios of adenine to thymine and of guanine to cytosine were always unity also demonstrated that, except for those regularities, the nucleotide composition of DNA — that is, the cross ratio adenine plus thymine to guanine plus cytosine — varied widely from one species to another. Yet universality of the code seemed inevitable for an obvious reason: since a mutation that changed even one word or letter in the code would alter most of a creature’s proteins, it looked sure to be lethal.” Another sense of “prediction” relevant here is sketched by Brush (1989, 1125): “In looking at the technical literature one has to recognize that scientists, especially physicists, frequently use the word ‘prediction’ in a more general sense [than temporal precedence] that includes the deduction of previously known facts.” Wallace (1966, 156) provides an early formulation: “All cellular factories, regardless of the organism from which they are derived, interpret the instructions contained in artificial RNAs in the same manner. The genetic code, in other words, appears to be universal. There are literally billions of ways in which nucleic acid code words [codons] could have been assigned to the twenty amino acids. That all living organisms use precisely the same code is elegant evidence that these organisms have arisen from but one source, that life on earth has had but a single source from which all present forms have evolved.” For other formulations, see, e.g., Cairns Smith (1971, 148); Dobzhansky (1973, 125 9); Maynard Smith (1975, 82); Dobzhansky, Ayala, Stebbins, and Valentine (1977, 28); Futuyma (1979, 38); Huxley (1982, 148); Smith and Morowitz (1982, 278); Mayr (1983, 30 31); Raup and Valentine (1983, 2981); Dawkins (1986, 270); Ridley (1986, 119 20); Patterson (1988b, 61); Sober (1988, 9); Hoffman (1989, 8-9); Mayr (1991, 23); and Mayr (1997, 180).

[2.] We are describing eukaryotic protein synthesis in this paragraph.

[3.] See http://www.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c. Osawa expects that the discovery of variant codes will continue as a wider range of organisms are examined. “Despite an enormous diversity of organisms,” he writes (1995, 172-73), “. . . until recently only a handful of standard organisms, such as Escherichia coli, Bacillus subtilis, Saccharomyces cerevisae, Drosophila spp., and vertebrates, had been examined genetically. With the development of molecular phylogenetic studies and the rapid progress in gene technology, interest has begun to focus on various odd organisms. As a result, a relatively high incidence of non-universal codes has been discovered . . . . widely distributed in various groups of organisms and organelles . . . . New changes will be discovered as more organisms or organelles are examined.” Given however that relatively few laboratories now sequence proteins directly, the discovery of variant codes may be biased towards changes in stop codons. “A curious problem with the discovery of non-canonical codes,” writes Lehman (2001, R63), “. . . is that they may be biased to reveal changes involving stop codons, because these are the easiest to detect from nucleotide sequence data. Without corresponding protein sequences, the coding relationships of a gene are usually only found aberrant when canonical stop codons appear in the midst of a gene, and when these codons can be matched with amino acids appearing at the same positions in orthologous sequences from other organisms. Only the amino-acid sequence of a gene product or the identification of an unusual tRNA can confirm the existence of a non-canonical code.” O’Sullivan et al. (2001) argue that predicting amino-acid sequences from nucleotide data, via the universal code, may generate incorrect protein predictions, given that the code can no longer be assumed to be universal. They write that “the ultimate solution is a simple and largely unambiguous one: the assignment of all 64 codons must be confirmed by comparative DNA and protein sequencing before a genome sequence is released for a given species” (2001, 22).

[4.] Karamyshev et al. (1999) found no polypeptide release activity using Tetrahymena thermophilaeRF1 (Tt-eRF1) in other eukaryotic systems. “In spite of the overall conservative protein structure of Tt-eRF1 compared with mammalian and yeast eRF1s,” they write (1999, 487), “the soluble recombinant Tt-eRF1 did not show any polypeptide release activity in vitro using rat or Artemiaribosomes . . . . It is noteworthy that most known eRF1s from different eukaryotic organisms including Xenopus, human and yeast are functionally exchangeable in vitro and that Tt-eRF1 is the first exception to this property.”

[5.] The same difficulty affects the other main theory for codon reassignment, i.e., the Schultz-Yarus (1994, 1996) “ambiguous intermediate” hypothesis, in which codons are recognized by more than one tRNA, or by a tRNA and release factor, simultaneously. Schultz and Yarus (1996, 598) pass over the matter of modifying release factors, saying “we also neglect certain considerations. For example, though translational release factors are involved in the reassignment of stop codons to sense . . . we do not discuss them because mutations of RFs is similarly required by both schemes” (i.e., their own and the competing Jukes-Osawa codon capture hypothesis). Keeling argues that the problem “has been avoided altogether by other models such as that of Schultz and Yarus” (1997, 208). For his part, Keeling suggests that the termination codons to be reassigned must first “be drastically reduced in number, or even lost, and that this allows the loss of release factor without deleterious effect,” but concedes that “why these codons would become reduced in number is not obvious” (1997, 208). Schultz and Yarus (1996, 597) find “the total disappearance of hundreds, thousands, or tens of thousands of examples of a codon by mutation pressure alone, in diverse independent cases, an improbable evolutionary scenario.” See also Santos and Tuite (1995, 1485), who argue that “it is very unlikely that codons disappear from the entire set of mRNAs due to GC or AT [mutation] pressure.”

[6.] It is interesting to note that the real possibility of a “Death Valley” of inviability — i.e., the requirement that essential cellular function(s) be preserved by any evolutionary scenario — stands in the background of debates about the mechanism of codon reassignment. Whatever scenario one postulates must preserve viability. Thus, if reassignment occurs “in large genomes that encode many proteins,” observe Schultz and Yarus (1994, 1377), “any evolutionary mechanism must therefore account for a transitional organism’s survival despite potentially lethal amino acid substitutions as a consequence of coding reassignments.” Osawa (1995, 175-76) stresses that “the importance of specificity of coding for survival must be emphasized.”

[7.] Current investigations disagree about the mechanism(s) by which eRF (and Tt-eRF1) function, and how the protein was modified evolutionarily. “The crucial questions of the catalytic mechanism of peptide release” writes Ramakrishan (2002, 568), “as well as stop codon recognition remain unanswered.” While Luzopone et al. (2001) favor mutations to eRF1 causing changes in its stop codon recognition specificity, Moreira et al. (looking at the same question) oppose the idea: “Our results are not in favor of the hypothesis that eRF1 abruptly lose [sic] its ability to recognize one of the stop codons….changes in codon assignment in ciliates cannot be attributed to a particular region of the eRF1 polypeptide, not even to the domain involved in codon recognition” (2002, pp. 197-8). Frolova et al. (2002, 134) argue that “in eukaryotes, the decoding of stop codons within the ribosome is a complex process not yet understood.” See also Inagaki et al. 2002.