MtDNA Codon Usage
From CsiBioWiki
Contents |
Abstract
Research into the nucleotide composition of metazoan mitochondrial genomes has revealed that there is a large disparity in GC content among different species. Several hypotheses have been proposed as to how the interspecific differences in mitochondrial GC content have arisen. These include thermal stability, horizontal transfer, nucleotide availability, asymmetric replication and repair, abundance of tRNAs, metabolic rate, and tRNA binding affinity. We chose to investigate the effect of tRNA binding affinity on the codon usage of metazoan mitochondrial genomes. To do so, bioinformatic methods were used to analyse 910 whole metazoan mitochondrial genomes that were available in GenBank. Anticodon choice and codon usage were compared in all these genomes. The results show that anticodon choice is highly similar within each genetic code and among different genetic codes. This suggests that tRNA binding affinity is not the major mechanism that influences the GC content of metazoan mitochondrial genomes, although it is possible that it may still have a discernable effect. The results also show that the function of many tRNA may be impaired. The function of these tRNA are proposed to be restored either through RNA editing or by importing the hosts tRNA. The similarity of anticodon choice across different genetic codes also suggests that there is strong pressure for anticodon choice to remain the same.
Introduction
Metazoan mitochondrial genomes provide the ideal basis for large-scale analysis, mainly because there are a large number of them available for study. This is the consequence of a small size (relative to nuclear genomes) and a large copy number, which makes mitochondrial DNA easier to extract, purify, and sequence. They are very well characterised, with the majority of the genomes stored in GenBank having: 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes, a control region (containing the D-loop and heavy strand origin of replication), and a light strand origin of replication (Clayton 1992; Morlais and Severson 2002; Saccone et al. 2002). They have regions of rapidly-evolving sequence that are useful for population-level studies and for resolving the relationships of recently diverged species, and regions of slowly-evolving sequence that are useful for inferring the relationships of anciently diverged species (Kim et al. 1998). The mitochondrial genomes utilise many genetic codes to translate their genes. The GenBank repository currently covers the vertebrate, cnidarian, invertebrate, echinoderm, and ascidian mitochondrial genetic codes. The yeast, flatworm, chlorophycean, trematode, Scenedesmus obliquus, and thraustochytrium mitochondrial genetic codes have also been discovered, but are not yet used by the whole mitochondrial genomes contained within GenBank. The bacterial and plant plastid genetic code, and the euplotid, yeast, and blepharisma nuclear genetic codes are also available from GenBank (Benson et al. 2002). A large copy number, and the need to import the majority of its proteins from external sources, means that changes to a mitochondrion's genetic code may not have a great impact on the fitness of its host. Thus we see many more different mitochondrial genetic codes than nuclear genetic codes (Knight et al. 2001). In mitochondrial genomes, the synonymous codons (codons that code for the same amino acid) in the genetic code can be grouped together into codon families, where the first two positions of the codons always contain the same nucleotide, and the last position, corresponding to the wobble position, is variable. This creates codon families containing up to 4 codons. Six-fold degenerate amino acids such as Leucine have two codon families, one containing four codons and the other containing two. Given that there are 22 tRNA genes in most metazoan mitochondrial genomes and 22 codon families in most mitochondrial genetic codes, it is expected that each tRNA gene would encode a tRNA that would recognise one codon family (Table 1).
The analysis of whole metazoan mitochondrial genomes has revealed that the GC content of these genomes can range from as little as 3% in the stingless bee (Melipona bicolor) to as much as 61% in the beaked salmon (Gonorynchus greyi). These differences have been attributed to directional mutation pressure (DMP) (Asakawa et al. 1991; Jermiin et al. 1994; Jermiin et al. 1995; Foster et al. 1997; Saccone et al. 1999).
First proposed by Sueoka and Freese in 1962 and subsequently modified by Jermiin et al in 1995, DMP can be described as a non-random evolutionary force that causes the base composition of certain genomes to “drift” towards AT or GC richness (Sueoka 1962; Jermiin et al. 1996), is believed to alter amino acid usage (Jermiin et al. 1994; Knight et al. 2001), and is proposed to be the primary cause of the interspecific compositional heterogeneity observed in metazoan mitochondria. The effect of DMP is most clearly seen in non-functional sequences, where substitutions are not known to have any affect on the fitness of an individual. The mechanisms causing DMP are unknown, but several hypotheses attempt to explain strand-specific compositional skew and biased usage of synonymous codons. These include: Thermal stability (Marmur and Doty 1959). There are three hydrogen bonds between G and C whereas A and T only share two. For this reason, GC-rich sequences tend to be more stable and therefore have a higher melting temperature than AT-rich sequences. This hypothesis proposes that GC content may be influenced by the need for stability in hot environments, as seen among thermophilic bacteria. Horizontal transfer (Lawrence and Ochman 1998). Prokaryotes are known to acquire genes from other prokaryotes through horizontal, interspecific gene transfer. The region of sequence that is transferred retains its own base composition over a period of evolutionary time and can cause the average base composition of the genome to change during that period. While horizontal transfer might occur in mitochondria, this mechanism appears to be moving genes out of the mitochondrial genome and into the nuclear genome (Saccone et al. 2002). Nucleotide (dNTP) availability (Reyes et al. 1998). The availability of dNTPs during the replication of a mitochondrial genome may influence the base composition. In situations where one pool of dNTPs is low, the replication mechanism may attempt to substitute another type of dNTP. This hypothesis has not been experimentally investigated. Asymmetric replication and repair (Clayton 1992). During transcription, the light strand of a mitochondrial genome is exposed for a longer period of time to the damaging effects of the highly oxidative environment inside a mitochondrion for longer than the heavy strand, as replication of the light strand begins later than the replication of the heavy strand. Experimental evidence shows that the oxidative environment inside a mitochondrion causes the mutation of C to T (Beletskii and Bhagwat 1996). An increase of T on the non-transcribed strand results in an increase of A in the transcribed strand. This is also supported by evidence that the DNA is subject to strand-specific DMP (Jermiin et al. 1995). Abundance of tRNAs (Osawa 1995). The abundance of a particular type of tRNA is believed to affect the types of codon used. The more abundant a type of tRNA is (tRNA using the same anticodon), the more likely the cognate codon of its anticodon will have a higher usage than the other codons in the codon family. In some cases of nuclear genomes, this has been observed; however, each mitochondrial codon family is associated with only one type of tRNA. Therefore, this hypothesis is not applicable to mitochondria. Metabolic rate. The higher the metabolic rate of a mitochondrion, the more oxidising agents it produces; this is thought to lead to an increase in damage to the DNA sequence. However, evidence demonstrating that organisms with slow metabolisms can have high rates of mitochondrial mutation suggests that the metabolic rate may only have a small influence on mutation rates (Janke and Arnason 1997). tRNA binding affinity (Bulmer 1991). The types of tRNAs that are present in a mitochondrion are believed to affect the types of codon used to code for amino acids. In particular, the exact complement of a tRNA's anticodon (the cognate codon) will be used more than the other codons within the same family, due to a greater binding affinity and therefore possibly higher translational accuracy. For example, within the proline codon family, the possible anticodons are AGG, GGG, UGG, and CGG. If the proline tRNA of a mitochondrion uses the anticodon UGG, then it is expected that the codon CCA will have a higher usage than the codons CCU, CCC, and CCG. It is believed that DMP acts uniformly throughout a genome, with possible exceptions due to structural considerations, such as methylation or histone binding (Wei 1998). Not all mutations become fixed, due to the action of purifying selection on deleterious changes. Therefore, the greatest effects of DMP will be observed in sites that are not subject to selection, such as intergenic sites, introns, and most third codon sites.
The anticodon of the tRNA is able to bind to several different types of codon and not just the cognate codon. Crick proposed that the second and third positions of an anticodon must base-pair according to what are now the standard Watson and Crick base-pairing rules, but that the first position of an anticodon is allowed to “wobble” to some degree to pair with different types of bases in the third codon position; thus, the first anticodon position is called the “wobble position” (Crick 1996). By modelling the stereochemistry of the anticodon and codon molecules, Crick was able to propose basic rules for the different types of pairing that may occur in the wobble position. It soon became apparent, however, that his rules, although accurate, were incomplete. The modification of key bases in the tRNA molecule and unusual bonding between the middle anticodon base and the invariant U immediately 5’ to the anticodon, allows many more combinations to occur (Table 2) (Yokoyama et al. 1985; Wolstenholme 1992; Osawa 1995; Watanabe 1997; Tomita et al. 1999; Auffinger and Westhof 2001; Knight et al. 2001; Westhof and Auffinger 2001; Yasukawa et al. 2001).
We hypothesise that, if anticodon binding affinity has a strong effect on codon usage, then the cognate codons of the anticodon set for each genome, will have high usage. Due to the high variation in GC content among mitochondrial species, we expect that the anticodon sets will have high interspecific variation. Within a single genome, we expect that each tRNA will exhibit the cognate anticodon of the codon with the highest usage in each codon family.
Method
The data set, comprising 910 whole metazoan mitochondrial genomes, was downloaded from GenBank. The genomes were analysed separately according to their associated genetic code in order to circumvent possible differences arising from the use of different genetic codes. The 22 tRNA genes of each genome were extracted using a program, designed and coded in-house.
The extracted tRNA sequences were folded into their secondary structure using a computer program called tRNAscan-SE (Lowe and Eddy 1997), which provides a 99-100% detection rate of true positives and 0-1% detection rate of false positives when analysing nuclear genomes. However, the detection rate of tRNAscan-SE was notably lower when applied to organellar genomes. Thus the structure of roughly 65% of all the tRNA sequences and only 21 of the 22 different types of tRNA were inferred. In order to infer the structure of the remaining type of tRNA, a program called mfold (Zucker 2003) was used. Multiple sequence alignments were then used to assign loop and stem structures, and anticodon positions to the remaining tRNA sequences. The tRNA sequences were only aligned with others that were of the same tRNA type and genetic code (e.g., all Ala, vertebrate mitochondrial tRNA molecules were aligned together) using the computer program ClustalW (Thompson et al. 1994). The combination of these alignments, with the corresponding inferred secondary structures, allowed the identification of all the anticodons as well as the stem and loop regions for all but a few tRNA sequences.
The anticodon set for each genome was assembled by retrieving the predicted anticodon for each tRNA sequence from the alignments. The proportion of genomes using the same genetic code and the same anticodon set was then calculated. This proportion was then compared with the average codon usage for the entire genetic code. This information was used to determine if anticodon choice has an influence on interspecific heterogeneity in codon usage.
Results
Except for minor variations occurring in some closely related mitochondria, all genomes using the same genetic code also used the same anticodon set. Vertebrate Mitochondrial Genetic Code (Table 3) There were 279 vertebrate mitochondrial genomes in the data set obtained from GenBank. The vertebrate mitochondrial genetic code has 22 codon families. All five marsupial mitochondrial genomes in the data set use GCC instead of GUC as the anticodon for tRNAAsp and do not have identifiable tRNALys. The two sea toad (Chaunax) mitochondrial genomes use CGC instead of UGC as the anticodon for tRNAAla. Cnidarian Mitochondrial Genetic Code (Table 4) There were 2 cnidarian mitochondrial genomes in the data set obtained from GenBank. The cnidarian mitochondrial genetic code has 23 codon families. Only two of the available genomes use the mold/protozoan/coelenterate mitochondrial genetic code, both of which are cnidarian mitochondria. These have only tRNAMet(CAU) and tRNATrp(UCA) encoded in the mitochondria; and so the other tRNA genes are assumed to be encoded in the nuclear genomes and imported from the cytosol. The cnidarian mitochondria have severely reduced genomes. Invertebrate Mitochondrial Genetic Code (Table 5) There were 65 invertebrate mitochondrial genomes in the data set obtained from GenBank. The invertebrate mitochondrial genetic code has 22 codon families. In the invertebrate mitochondrial genetic code, the set of anticodons used by each genome is largely similar to the set used by vertebrate mitochondrial genomes. The secondary structure of tRNAAsn for one genome could not be inferred. The mitochondrial genome of Trichinella spiralis (a nematode) uses CGC instead of UGC as the anticodon for tRNAAla. Approximately half the invertebrate mitochondrial genomes use a UUU anticodon to decode Lys codons while the other half use CUU. Although there is no clear division along phylogenetic lines, the majority of mitochondria using tRNALys(CUU) are arthropod mitochondria and the majority of mitochondria tRNALys(UUU) are used by the remainder of the invertebrate mitochondria. Approximately half of the invertebrate mitochondria uses a GCU anticodon to decode the Ser (AGN) family of codons, while the other half uses UCU. Again, there is no clear division, although all nematode mitochondria and various arthropod mitochondria use tRNASer(UCU). Seven genomes, forming the majority of the nematode mitochondria, use ACG instead of UCG as the anticodon for tRNAArg. Echinoderm Mitochondrial Genetic Code (Table 6) There were 17 echinoderm mitochondrial genomes in the data set obtained from GenBank. The echinoderm mitochondrial genetic code has 22 codon families. Except for using tRNALys(CUU) instead of tRNALys(UUU), the echinoderm mitochondrial genomes use the same anticodon set as the vertebrate mitochondrial genomes. Despite being invertebrate mitochondria, the platyhelminth mitochondria are listed as using the echinoderm mitochondrial genetic code and have been analysed along with the other echinoderm mitochondrial genomes. All five cestode mitochondrial genomes use ACG instead in UCG as the anticodon for tRNAArg. Ascidian Mitochondrial Genetic Code (Table 7) There were 3 ascidian mitochondria genomes in the data set obtained from Genbank. The ascidian mitochondrial genetic code has 23 codon families. There are only three completely sequenced genomes that use the ascidian mitochondrial genetic code within the data set. This genetic code has 23 codon families as well as 24 tRNA genes, thus breaking the pattern of one tRNA per codon family. The amino acid Gly has an extra codon family: AGR. All ascidian genomes in this data set used two tRNAMet, one with CAU and the other with UAU as anticodon.
Discussion
The first question to be addressed is whether or not the anticodon choice of metazoan mitochondrial genomes is responsible for the heterogeneity in codon usage observed between the genomes. From a theoretical viewpoint, an anticodon would bind more stably with its cognate codon, thereby increasing the accuracy of the translation process. Accepting this assumption, the results should reveal one outcome: The metazoan mitochondrial genomes have widely varying anticodon sets where the anticodon choice of each genome closely mirrors the usage of the cognate codon. GC rich genomes should have a high proportion of Gs and Cs in the wobble position of the anticodon and vice versa. It is this assumption that will be reassessed in light of the results.
The results suggest that the wobble position of tRNA anticodons is under an extreme pressure to remain the same, although there are a few exceptions that will be addressed below. In nearly all tRNA types (i.e., tRNAAsnAAY(GUU)1, tRNACysUGY(GCA), tRNAGlnCAR(UUG), tRNAGluGAY(UUC), tRNAGlyGGN(UCC), tRNAHisCAY(GUG), tRNAIleAAY(GAU), tRNALeuUUR(UAA), tRNALeuCUN(UAG), tRNAMetAUR(CAU), tRNAPheUUY(GAA), tRNAProCCN(UGG), tRNASerUCN(UGA), tRNAThrACN(UGC), tRNATrpUGR(UCA), tRNATyrUAY(GUA), tRNAValGUN(UAC)), the anticodon choice remains the same across all genomes that utilise the same genetic code and even among different genetic codes. A brief investigation into nuclear genomes shows that, of all the tRNAs identified, there exist several tRNAs for each codon family and there are a variety of anticodons used. This contrasts with metazoan mitochondrial genomes, which often have only one tRNA per codon family, and which uses an extremely conserved set of anticodons. These results show that the heterogeneity in codon usage observed among mitochondrial genomes, for the most part, cannot be due to different anticodon sets, as most of them are identical. This is an extremely important observation, because interspecific heterogeneity in codon usage, and hence also in nucleotide content at the three codon sites, must be due to other evolutionary forces or directional mutation pressures. If, as originally assumed, anticodon binding affinity has an effect on codon usage, this effect must be minimal or at least secondary to some stronger factor. With the exception of the vertebrate mitochondrial genetic code, there is no strong correlation between the cognate codons of the anticodon set and high codon usage. Instead, there appears to be a clearly defined hierarchy of base usage at the third codon site.
• Vertebrate: A > U > C > G • Cnidarian: U > A > G > C • Invertebrate: U > A > C > G • Echinoderm: U > A > C > G • Ascidian: U > A > G > C
The hierarchy of base usage suggests that there are other mechanisms that have a greater influence over codon usage than anticodon binding affinity. These mechanisms may work with, or in opposition to, anticodon binding affinity when altering base usage. In the case of the mitochondrial genomes using the vertebrate mitochondrial genetic code, some form (or forms) of directional mutation pressure coincidentally, or otherwise, produces a similar bias to one that anticodon binding affinity would induce. Other research indicates that asymmetric replication and repair may be the main mechanism altering base usage (S. Y. W. Ho unpublished data). Although it is not the major force affecting codon usage, anticodon binding affinity may still have a small effect. This is supported by the observation that the cognate codons are usually the second most frequently used codons in each codon family.
Many phylogenetic groups exhibit identical anticodon sets that vary from the majority and would appear not to be able to recognise the full range of codons. These include: The sea toads (Chaunax tosaensis and Chaunax abei) and Trichinella spiralis (not related to the sea toads) all use tRNAAla(CGC) instead of tRNAAla(UGC). Currently, the wobble rules state that C and its modifications/derivatives are only able to pair with A or G (Table 2). However, the codons GCU and GCA are still being used (1.62% and 4.27%, of the total number of codons, respectively). One solution is that C might be able to pair with all four bases. RNA editing or base modification may provide an alternate explanation by changing the wobble position to a more suitable nucleotide (Janke et al. 1994, Janke et al. 1997, Janke et al. 2002). All marsupial mitochondria appear to utilise tRNAAsp(GCC) instead of tRNAAsp(GUC). According to the current base pairing rules, this would change the recognised codon family from Asp (GAY) to Gly (GGN), however Asp is incorporated into 1.77% of the total protein product of marsupials. The tRNA's function is perhaps restored through RNA editing, which changes the middle C to a U (Janke et al. 1994, Janke et al. 1997, Janke et al. 2002). The tRNALys in marsupials appears to have a highly irregular structure, and, if folded, would exhibit an anticodon that would recognise Tyr. However, the AAR codon must still be recognised and translated into Lys, as it is incorporated into 2.5% of total protein product. Janke et al. (1997) proposed that either tRNALys undergoes RNA modification, or is imported from the nucleus. However it is argued that it is unlikely that the metatherian mitochondrial tRNALys is modified, as it would take a minimum of five changes to make it functional and therefore it is also unlikely that the gene is transcribed into a functional product. If so, then tRNALys is most likely another mitochondrial pseudo-gene whose function is provided by the nuclear genome (Bensasson et al. 2001). The variation that occurs in the cnidarian genomes is peculiar in that they appear to have only two functional tRNAs, tRNAMet and tRNATrp. Mitochondria are not thought to import tRNA from the nucleus (Roe et al. 1981), but in this case tRNAs must be imported to provide all the necessary components required for translation (van Oppen et al. 2000). The presence of the two remaining tRNAs is required to compensate for the differences between the cnidarian genetic code and the standard genetic code. Tryptophan is encoded by both UGA and UGG in the cnidarian genetic code while UGA is a stop codon and UGG is tryptophan in the standard genetic code. Cnidarian mitochondrial tRNATrp is required to translate UGG into Trp. tRNAMet may be required in cnidarian mitochondrial genomes to incorporate formyl-methionine (used by mitochondria as the first amino acid) because nuclear tRNAMet can only incorporate normal methionine. All nematodes, except for Trichinella spiralis, use tRNAArg(ACG) instead of the more widely used tRNAArg(TCG). Experimental evidence has demonstrated that it is possible for an unmodified A to confer four-fold degeneracy to the tRNA when in the wobble position, although it may be necessary for the tRNA to have m1A8 (methyl-1-adenosine in position 8 of the tRNA) for the wobble to occur (Watanabe et al. 1997). Although echinoderms use tRNAAsn(GUU), which should only decode the codons AAU and AAC, they frequently (1.73% of the total number of codons) decode the codon AAA into the amino acid Asn. This is made possible by RNA editing of the second position of the anticodon from U to pseudouridine (Ψ). The presence of Ψ in the second anticodon position appears to allow the three-fold degeneracy of the Asn codon family in echinoderms (Tomita et al 1999). In each of the above cases, either RNA editing, base modification, or the importing of tRNA from the host cell restores the impaired function.
The anticodon sets of the invertebrate mitochondrial genomes can be approximately divided into two groups. Although the majority of the members of each group share the same anticodon set, there are some exceptions within each of the groups. The anticodon tRNALys in genomes utilising the invertebrate mitochondrial genetic code is either UUU or CUU. The majority of arthropods use tRNALys(CUU) while most of the rest (annelids, nematodes, molluscs, cephalochordates, and brachiopods) use tRNALys(UUU). The anticodon of tRNASerAGN in genomes utilising the invertebrate mitochondrial genetic code, is either GCU or UCU. All nematodes, annelids and brachiopods use tRNASerAGN(UCU) while the majority of the rest use tRNASerAGN(GCU).
Throughout this survey, it has been noted that the changes that have occurred in the anticodon sets, although independent, are often of the same type, such as the anticodon for Arg having changed from UCG -> ACG in genomes belonging to invertebrate and echinoderm mitochondria, the anticodon for Ala changing from UGC -> AGC in genomes belonging to vertebrate and invertebrate mitochondria and the anticodon change for Lys from CUU -> UUU has appeared to arise several times independently. There may be several explanations for this. There could be either a mechanism, or several mechanisms, that caused the same anticodons to undergo the same change with relative regularity throughout evolutionary history, or that changes could have occurred across all codon families, but were restricted through selection, or other means. Some changes appear to be linked in some manner. In invertebrates, the change from tRNASer(GCU) to tRNASer(UCU) is often accompanied by the change from tRNALys(CUU) to tRNALys(UUU).
[Conclusion] The majority of anticodon sets possessed by metazoan mitochondrial genomes are identical, which means that anticodon choice in metazoan mitochondria can not be the cause the heterogeneity observed in codon usage observed among the different metazoan mitochondrial genomes. However, anticodon binding affinity may still have an effect and cause some degree of homogeneity among the mitochondrial genomes. Asymetric replication and repair appears to be a much more viable mechanism causing interspecific heterogeneity among different species. There appears to be a strong (possibly selective) pressure that prevents variation from being seen in the anticodon sets. Variations among the anticodon sets that do occur, often appear to impair the translation of some codons. Possible mechanisms that counter this, either repair or replace the lost functions. Such variations, although independent, are often similar, suggesting that they are facilitated by a common mechanism.
