+ All Categories
Home > Documents > Overlapping genetic codes for overlapping frameshifted genes in Testudines, and Lepidochelys...

Overlapping genetic codes for overlapping frameshifted genes in Testudines, and Lepidochelys...

Date post: 25-Nov-2016
Category:
Upload: herve
View: 240 times
Download: 5 times
Share this document with a friend
17
Computational Biology and Chemistry 41 (2012) 18–34 Contents lists available at SciVerse ScienceDirect Computational Biology and Chemistry jou rnal h omepa g e: www.elsevier.com/locate/compbiolchem Research article Overlapping genetic codes for overlapping frameshifted genes in Testudines, and Lepidochelys olivacea as special case Hervé Seligmann a,b,a Center for Ecological and Evolutionary Synthesis, Department of Biological Sciences, University of Oslo, Blindern, N-0316 Oslo, Norway b Department of Ecology, Evolution and Behaviour, The Hebrew University of Jerusalem, Jerusalem 91404, Israel a r t i c l e i n f o Article history: Received 6 May 2011 Received in revised form 14 March 2012 Accepted 5 August 2012 Keywords: Codon reassignment tRNA synthetase Codon usage Synonymous codon optimization Homology Ribosomal frameshift Termination codon a b s t r a c t Mitochondrial genes code for additional proteins after +2 frameshifts by reassigning stops to code for amino acids, which defines overlapping genetic codes for overlapping genes. Turtles recode stops UAR Trp and AGR Lys (AGR Gly in the marine Olive Ridley turtle, Lepidochelys olivacea). In Lepi- dochelys the +2 frameshifted mitochondrial Cytb gene lacks stops, open reading frames from other genes code for unknown proteins, and for regular mitochondrial proteins after frameshifts according to the overlapping genetic code. Lepidochelysinversion between proteins coded by regular and overlapping genetic codes substantiates the existence of overlap coding. ND4 differs among Lepidochelys mitochon- drial genomes: it is regular in DQ486893; in NC 011516, the open reading frame codes for another protein, the regular ND4 protein is coded by the frameshifted sequence reassigning stops as in other turtles. These systematic patterns are incompatible with Genbank/sequencing errors and DNA decay. Random mixing of synonymous codons, conserving main frame coding properties, shows optimization of natural sequences for overlap coding; Ka/Ks analyses show high positive (directional) selection on overlapping genes. Tests based on circular genetic codes confirm programmed frameshifts in ND3 and ND4l genes, and predicted frameshift sites for overlap coding in Lepidochelys. Chelonian mitochondria adapt for overlapping gene expression: cloverleaf formation by antisense tRNAs with predicted anticodons matching stops coevolves with overlap coding; antisense tRNAs with predicted expanded anticodons (frameshift suppressor tRNAs) associate with frameshift-coding in ND3 and ND4l, a potential regulation of frameshifted overlap coding. Anaeroby perhaps switched between regular and overlap coding genes in Lepidochelys. © 2012 Elsevier Ltd. All rights reserved. 1. Introduction Genetic overprinting is the situation where two genes are coded by the same DNA sequence (Grassé, 1977). These are also fre- quently called ‘overlapping genes’, and code for two proteins when two frames of a DNA sequence are ‘open reading frames’, mean- ing that both are stopless (i.e., Delaye et al., 2008) and can be translated according to the genetic code into protein sequences. Overlapping genes, where two frames of the same sequence are sto- pless, are relatively rare. However, recently, evidence suggests that upon expression of tRNAs with anticodons matching stop codons, numerous overlapping genes occur, at least in mitochondria (for primates, Seligmann, 2011a; Faure et al., 2011; and for Drosophila, Seligmann, 2012a,b). In these cases, overlap coding (which appar- ently occurs in 50% of the total length of regular mitochondrial protein coding genes) is enabled by switching to a different, Correspondence address: Department of Ecology, Evolution and Behaviour, The Hebrew University of Jerusalem, Jerusalem 91404, Israel. E-mail address: [email protected] probably stopless genetic code, induced by the presence and trans- lational activity of antisense antitermination (or suppressor) tRNAs. The empirical evidence indicating the existence of these over- lapping genes and the associated overlapping genetic codes in primates and in Drosophila is presented below. Here I present sim- ilar analyses for a third taxonomic group, Testudines. This taxon was chosen for a third study of the phenomenon of overlap coding by overlapping genetic codes because, besides presenting data that confirm by independent replication the analyses from primates and Drosophila, the structure of the overlapping genes enables a new type of evidence for this phenomenon, of a more logical, and less quantitative nature. Because the analyses in turtles present both quantitative and logical evidence, they are an important piece of evidence for this hypothesis that more than doubles the number of protein coding genes coded by mitochondrial genomes. 1.1. General background on vertebrate mitochondrial genomes Vertebrate mitochondria possess highly compact genomes, with a conserved gene arrangement (Satoh et al., 2010). They typically include 13 protein coding genes, mostly associated with oxidative 1476-9271/$ see front matter © 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.compbiolchem.2012.08.002
Transcript
Page 1: Overlapping genetic codes for overlapping frameshifted genes in Testudines, and Lepidochelys olivacea as special case

R

Oa

Ha

b

a

ARRA

KCtCSHRT

1

bqtitOpunpSep

H

1h

Computational Biology and Chemistry 41 (2012) 18– 34

Contents lists available at SciVerse ScienceDirect

Computational Biology and Chemistry

jou rna l h omepa g e: www.elsev ier .com/ locate /compbio lchem

esearch article

verlapping genetic codes for overlapping frameshifted genes in Testudines,nd Lepidochelys olivacea as special case

ervé Seligmanna,b,∗

Center for Ecological and Evolutionary Synthesis, Department of Biological Sciences, University of Oslo, Blindern, N-0316 Oslo, NorwayDepartment of Ecology, Evolution and Behaviour, The Hebrew University of Jerusalem, Jerusalem 91404, Israel

r t i c l e i n f o

rticle history:eceived 6 May 2011eceived in revised form 14 March 2012ccepted 5 August 2012

eywords:odon reassignmentRNA synthetaseodon usageynonymous codon optimizationomologyibosomal frameshiftermination codon

a b s t r a c t

Mitochondrial genes code for additional proteins after +2 frameshifts by reassigning stops to codefor amino acids, which defines overlapping genetic codes for overlapping genes. Turtles recode stopsUAR → Trp and AGR → Lys (AGR → Gly in the marine Olive Ridley turtle, Lepidochelys olivacea). In Lepi-dochelys the +2 frameshifted mitochondrial Cytb gene lacks stops, open reading frames from other genescode for unknown proteins, and for regular mitochondrial proteins after frameshifts according to theoverlapping genetic code. Lepidochelys’ inversion between proteins coded by regular and overlappinggenetic codes substantiates the existence of overlap coding. ND4 differs among Lepidochelys mitochon-drial genomes: it is regular in DQ486893; in NC 011516, the open reading frame codes for another protein,the regular ND4 protein is coded by the frameshifted sequence reassigning stops as in other turtles. Thesesystematic patterns are incompatible with Genbank/sequencing errors and DNA decay. Random mixing ofsynonymous codons, conserving main frame coding properties, shows optimization of natural sequencesfor overlap coding; Ka/Ks analyses show high positive (directional) selection on overlapping genes. Tests

based on circular genetic codes confirm programmed frameshifts in ND3 and ND4l genes, and predictedframeshift sites for overlap coding in Lepidochelys. Chelonian mitochondria adapt for overlapping geneexpression: cloverleaf formation by antisense tRNAs with predicted anticodons matching stops coevolveswith overlap coding; antisense tRNAs with predicted expanded anticodons (frameshift suppressor tRNAs)associate with frameshift-coding in ND3 and ND4l, a potential regulation of frameshifted overlap coding.Anaeroby perhaps switched between regular and overlap coding genes in Lepidochelys.

. Introduction

Genetic overprinting is the situation where two genes are codedy the same DNA sequence (Grassé, 1977). These are also fre-uently called ‘overlapping genes’, and code for two proteins whenwo frames of a DNA sequence are ‘open reading frames’, mean-ng that both are stopless (i.e., Delaye et al., 2008) and can beranslated according to the genetic code into protein sequences.verlapping genes, where two frames of the same sequence are sto-less, are relatively rare. However, recently, evidence suggests thatpon expression of tRNAs with anticodons matching stop codons,umerous overlapping genes occur, at least in mitochondria (forrimates, Seligmann, 2011a; Faure et al., 2011; and for Drosophila,

eligmann, 2012a,b). In these cases, overlap coding (which appar-ntly occurs in 50% of the total length of regular mitochondrialrotein coding genes) is enabled by switching to a different,

∗ Correspondence address: Department of Ecology, Evolution and Behaviour, Theebrew University of Jerusalem, Jerusalem 91404, Israel.

E-mail address: [email protected]

476-9271/$ – see front matter © 2012 Elsevier Ltd. All rights reserved.ttp://dx.doi.org/10.1016/j.compbiolchem.2012.08.002

© 2012 Elsevier Ltd. All rights reserved.

probably stopless genetic code, induced by the presence and trans-lational activity of antisense antitermination (or suppressor) tRNAs.

The empirical evidence indicating the existence of these over-lapping genes and the associated overlapping genetic codes inprimates and in Drosophila is presented below. Here I present sim-ilar analyses for a third taxonomic group, Testudines. This taxonwas chosen for a third study of the phenomenon of overlap codingby overlapping genetic codes because, besides presenting data thatconfirm by independent replication the analyses from primates andDrosophila, the structure of the overlapping genes enables a newtype of evidence for this phenomenon, of a more logical, and lessquantitative nature. Because the analyses in turtles present bothquantitative and logical evidence, they are an important piece ofevidence for this hypothesis that more than doubles the number ofprotein coding genes coded by mitochondrial genomes.

1.1. General background on vertebrate mitochondrial genomes

Vertebrate mitochondria possess highly compact genomes, witha conserved gene arrangement (Satoh et al., 2010). They typicallyinclude 13 protein coding genes, mostly associated with oxidative

Page 2: Overlapping genetic codes for overlapping frameshifted genes in Testudines, and Lepidochelys olivacea as special case

iology

peoNocnbptPc

1

ddn(cfmtRatd

1

oloSiparscT2tl

1

swsttauiHgne2cin

H. Seligmann / Computational B

hosphorylation, all code for transmembrane proteins (Andersont al., 1981). Genes are organized along the genome in the followingrder: ND1, ND2, CO1, CO2, AT8, AT6, CO3, ND3, ND4l, ND4, ND5,D6 and Cytb. ND6 is the only gene coded by the heavy strand,thers are on the light strand. Most proteins necessary for the mito-hondrion’s ‘housekeeping’ are imported from the cytosol, coded byuclear genes presumably transferred early in the organelle’s sym-iotic history to the host cell’s nucleus. Beyond coding for proteins,rotein coding genes have also other properties which help regulateranslation, such as secondary structure formation (Seligmann andollock, 2003a; Krishnan et al., 2004a,b, 2008) and off frame stopodons (Seligmann and Pollock, 2003b, 2004; Seligmann, 2007).

.2. Stop codons after frameshifts

Off frame stop codons stop early protein synthesis after acci-ental ribosomal slippages, or frameshifts, usually producingysfunctional proteins (i.e., van Leeuwen et al., 2006). The orga-ization of the genetic code maximizes densities of off frame stopsItzkovitz and Alon, 2007) and selection at specific sites at thirdodon positions causes overrepresentation of off frame stops inrameshifted protein coding genes (Tse et al., 2010). In vertebrate

itochondria, densities of off frame stops are inversely propor-ional to stabilities of secondary structures formed by ribosomalNAs (presumably inversely proportional to slippage frequency),nd to developmental instability (Seligmann, 2010a). The func-ional importance of off frame stops is stressed by their systematicigression from deamination gradients (Seligmann, 2012a).

.3. Readthrough at stops

However, recent analyses of +1 and +2 frameshifted sequencesf mitochondrial protein coding genes reveal that about 30% of theirength participates in same strand coding, despite the presencef off frame stops (in primates, Seligmann, 2011a; in Drosophila,eligmann, 2012a). Stops in these overlapping genes are, accord-ng to alignment analyses, reassigned to code for amino acids (inrimates, AGR → Arg; in Drosophila, UAR → Ser). The translationalpparatus of mitochondria adapts for coding by overlapping geneseassigning stop codons to amino acids: the predicted anticodons ofome antisense tRNAs (Seligmann, 2010b,c, 2011b) matching stopodons (antitermination, or suppressor tRNAs, Seligmann, 2010c).hese suppressor tRNAs coevolve with overlap coding (Seligmann,011a, 2012b). The latter phenomenon is in the view of this authorhe strongest bioinformatic evidence favoring the existence of over-ap coding by overlapping genetic codes yet.

.4. Replication gradients and overlapping genes

Other analyses confirm coding by overlapping genes. These con-ist of examining deamination gradients at third codon position,hich typically follow replicational or translational times spent

ingle stranded by the DNA region, following the principle thathe greater the duration spent single stranded, the more deamina-ions occurr. Examination of mutational deamination levels (A → Gnd C → T) enabled to describe genes that remained until thenndescribed (Glusman et al., 2006). Transitions do not alter cod-

ng properties of third codon positions in vertebrate mitochondria.ence third codon positions usually fit well deamination mutationradients that are due to replication, which result from sponta-eous chemistry of DNA in the single stranded state (Krishnant al., 2004a,b; Seligmann et al., 2006; Seligmann and Krishnan,

006; Seligmann, 2008; and Seligmann, 2010d). In Homo, thirdodon position involved in overlap coding, and therefore function-ng also as 1st or 2nd coding positions in the overlapping gene, doot fit deamination gradients (Seligmann, 2012a). These analyses

and Chemistry 41 (2012) 18– 34 19

confirm the existence of hidden coding sequences using a methodtotally different from alignments and conservation. This methodeven confirms the details of the coding structure of the overlap-ping gene, indicating whether the main frame 3d codon positionis expected to function as a 1st, 2nd or 3d codon position in theoverlapping gene: the digression from the expected deaminationgradient is least when the nucleotides function as 3d codon pos-itions in the overlapping gene, and greatest when it functions as 2ndcodon position, with intermediate levels for 1st codon positions.

1.5. Expression of overlapping genes

Other evidence is not solely of computational nature: inDrosophila, computationally estimated adaptations of antisensetRNAs and overlapping genes for translational activity correlatewith observed abundances of corresponding RNAs (Seligmann,2012b); and monoclonal antibodies designed to detect a proteinpredicted by alignment analyses to be coded by an overlappingmitochondrial gene in CO1, GAU, detect a signal strictly localizedin human mitochondria (Faure et al., 2011).

1.6. Frameshift coding in regular main frame genes

Note that frameshift recoding also exists for ‘regular’ genes,and in particular in turtles. In Testudines, ND3 and ND4l includeframeshift recoding, and some of these frameshifted sequencesimply reassignments of stop codons to unknown amino acid(s)(Russell and Beckenbach, 2008). This situation, and the fact thatstop codon reassignments vary among taxa, made me analyze withspecial attention frameshifted sequences of chelonian mitochon-drial protein coding genes.

Preliminary examinations detected a yet unique situation inthe mitochondrial genome of a sea turtle, the Olive Ridley, Lepi-dochelys olivaceus (NC 011516): there are no off frame stops in the+2 frameshifted (equivalent to −1 frameshifted) sequence of Cytb(Seligmann, 2011d). This frame, in Cytb genes of other turtles, con-tains about 25 off frame stops (standard deviation 3, for all completeturtle Cytb genes available in Genbank in November 2010, n = 66),suggesting that Cytb codes for an additional protein in Lepidochelys.

Here I examine off frame sequences of CytB and other genes inLepidochelys, and compare them to those of other turtles. Analysesdetect a number of unusual coding properties for the mitochon-drial protein coding genes of Lepidochelys, as well as overlappinggenes and two overlapping genetic codes reassigning stop codonsto amino acids during translation of these overlapping genes inTestudines. Various tests confirm the validity of these findings:simulations show that synonymous codon usages are optimizedfor overlap coding; analyses show that circular genetic code prop-erties (Arqués and Michel, 1996, 1997; Michel, 2008; Ahmed andMichel, 2011; Gonzalez et al., 2011) associate with frameshiftingsites in turtles, as observed for known overlapping genes (Ahmedet al., 2007, 2010; Ahmed and Michel, 2011); and antisense tRNAscoevolve with requirements for frameshifted overlap coding. In all,these various lines of evidence confirm overlap coding in turtlemitochondria, and show reassignments of stop codons to aminoacids that differ from those previously suggested for primates andDrosophila (Seligmann, 2011b, 2012b).

2. Results

2.1. Lepidochelys: Cytb

The sequence of Cytb in Lepidochelys is unique among all (morethan 200, mainly vertebrate) mitochondrial genomes examined, inthe fact that in two frames it totally lacks any stop codon, of any ofthe four types of stop codons in the vertebrate genetic code (UAA,

Page 3: Overlapping genetic codes for overlapping frameshifted genes in Testudines, and Lepidochelys olivacea as special case

2 iology

UosuiGtwbaufihbo(9altoasaatlawiotbfipp

2a

oi(TaadwiicsrfTmipipptaa

0 H. Seligmann / Computational B

AG, AGA and AGG). It is interesting to note that the annotationf the CytB gene, in December 2010, does not present a proteinequence that aligns with other cytochrome B proteins, but annknown protein. The actual CytB gene that codes for cytochrome B

s the +1 frameshifted sequence of the sequence as it is annotated inenbank. This means that if the annotation was correctly indicating

he frame coding for cytochrome B, the overlapping (stopless) geneould be its +2 frameshifted sequence. The search for alignments

y blast (Altschul et al., 1997, 2005) shows that the first 92 aminocids of this +2 frameshifted sequence fit, with some insertions andsing blast’s standard procedure with default parameter values, therst 77 amino acids of other chelonian cytochrome B proteins. Thisas two implications: a) at least two variants of cytochrome B cane produced by that gene, one coded by the regular main frame, andne produced by the +2 frameshifted sequence, which requires aprogrammed) frameshift to the main frame after reaching residue2. This situation probably explains the inaccuracy of Genbank’snnotation, as the leading region of the protein sequence trans-ated from the frameshifted gene matches Cytb. The function ofhe protein sequence coded by the stopless rest (and major part)f the frameshifted Cytb sequence remains unknown. The aminocid composition of the putative protein coded by this frameshiftedequence suggests it is not a transmembrane protein (and hence not

variant form of cytochrome B), because it is not biased towardsmino acids with high hydrophobicity. This amino acid composi-ion does not minimize costs of protein synthesis by avoidance ofarge amino acids (Seligmann, 2003), there is no avoidance of aminocids sensitive to oxidative agents (Archetti and Di Giulio, 2007),hich is particularly surprising for mitochondria, nor is there bias

n favor of amino acids with high tendencies to form beta sheetsr alpha helices in protein secondary structures. It is almost cer-ain that this frameshifted open frame is translated and expressed,ut further studies are necessary to confirm that this produces aunctional protein, and to indicate its physiological activities. Ift is expressed, this should be at conditions with low oxidativeotential. Alternatively, this protein could trap oxidative agents,rotecting regular physiologically functional proteins.

.2. Programmed frameshifts in Lepidochelys: ND1, CO2–3, ND3,nd ND5–6

I examined the +1 and +2 frameshifted sequences of all 12ther protein coding genes in the mitochondrial genome of Lep-

dochelys, translating according to the vertebrate genetic codeusing the online service EMBOSS transeq, http://www.ebi.ac.uk/ools/emboss/transeq/index.html) these frameshifted sequences,nd inserting asterisks (*) for stop codons. Hence the translatedmino acid sequences putatively coded by the frameshifted geneso not contain any information on identities of amino acids aligningith stop codons. The sequence coding for ND1, as it is annotated

n Genbank, possesses only two regions, at its 5′ and its 3′ extrem-ties, recognized by blast as ND1. The +1 frameshifted sequenceodes for a protein that aligns well with known chelonian ND1equences. Hence the regular ND1 protein is not coded by the openeading (stopless) sequence as annotated in Genbank, but by its +1rameshifted sequence, which contains 8 stops (4 AGR and 4 UAR).his implies readthrough at stops and hence a probable reassign-ent of stops to amino acids, while the open reading frame, as it

s annotated in Genbank, probably codes for variants of ND1, byrogrammed frameshifts between this frame and the frame that

ncludes stops but aligns well with regular ND1 proteins. It is alsoossible that the sequence lacking stops but not aligning with ND1

roteins (the major part of the open reading frame) codes for aotally different protein of unknown function. Blast does not detectny alignments with the +2 frameshifted sequence of the gene asnnotated in Genbank.

and Chemistry 41 (2012) 18– 34

There was nothing exceptional in the various frames of ND2, AT8and AT6: the annotated sequences code for the adequate proteins,and the frameshifted sequences did not align with any known pro-tein sequences. These are the three only ‘normal’ protein codinggenes in the mitochondrial genome of Lepidochelys.

Another case of probable programmed frameshift exists for CO2.The annotated open frame does code for a protein matching CO2,but a stopless, short region of the +1 frameshifted sequence (codingfor residues 37–70) matches the homologous region coded by theopen frame of CO2 in 32 other turtles (highest similarity, 98%, withCyclemys atripons, ABJ99492).

Similar cases of alignments between regular proteins from otherchelonians and +1 frameshifted sequences of Lepidochelys existfor longer regions in other genes. Residues 73–162 of the +1frameshifted sequence of CO3 match the homologous residues ofthe open reading frame of CO3 in 28 other turtles (highest similar-ity, 95%, with Trachemys scripta, ACJ11685). This region differs fromthat in CO2 because it includes 4 AGR stop codons, which implies areassignment of stops AGR to some amino acid(s).

For ND3, the situation is slightly more complex: residues 50–116of the +1 frameshifted sequence match residues 1–67 of the openreading frames of ND3 from 11 other turtles (the highest similarity(85%) was in this case with ND3 of a bird, Premnoplex brunnescens(ADO17751), followed by 84% similarity with the ND3 protein fromthe chelonian Gopherus flavomarginatus (ABG74337)). Note thatND3 is the gene for which programmed frameshifts have beendescribed previously (Mindell et al., 1998). This region includes anAGR stop in Lepidochelys. If this is a programmed frameshift involv-ing the open reading frame and this +1 frame, it should produce aND3 containing twice the functional regions encoded by the firsthalf of the gene, and not at all the second part of ND3.

For ND5, the open reading frame as it is annotated in Genbankcodes for the regular ND5, but parts of each +1 and +2 frameshiftedsequences apparently code for programmed frameshifts: in the +1frameshifted sequence, residues 219–591 match residues 217–589in 34 other turtles (highest similarity, 87%, with Eretmochelys imbri-cata, ABF66063) and include 9 AGR stops. In the +2 frameshiftedsequence of ND5 from Lepidochelys, 2 stopless regions (residues26–120 and 235–256) align with open reading frame sequences ofND5 from 32 other turtles (highest similarities, 80% and 91%, withhomologous regions from Chelonia mydas (NP 008774)).

In ND6 too, the annotated open reading frame codes for the reg-ular ND6 protein, and regions from both +1 and +2 frameshiftedsequences align with homologous regions of ND6 from other tur-tles: residues 100–174 in the +1 frameshifted sequence (highestsimilarity, 65%, with C. mydas (NP 008775)), includes one AGR stopcodon; and residues 92–136 in the +2 frameshifted region (highestsimilarity, 87%, with E. imbricata (ABF66064)), include 2 AGR and 2UAR stop codons.

2.3. Gene overlap and programmed frameshift in CO1 ofLepidochelys

The situation for CO1 is even more complex than in the previousgenes, because the open reading frame annotated in Genbank doesnot match the regular protein coded by this gene, but does matchvarious regions of CO1 genes from different, phylogenetically dis-tant, mainly non-vertebrate species. This open reading frame doesnot align with regular CO1 proteins, and hence does probably codefor totally different protein(s). The relation with the presumedCO1 from distant species is unclear and should be investigatedindependently. The regular CO1 is coded by a combination of +1

and +2 frameshifted sequences. For the +1 frameshifted sequence,residues 2–271 (including 7 AGR stop codons) and 478–507 matchhomologous residues from open reading frames of CO1 genes of28 turtles, but the highest similarities were with CO1 genes from
Page 4: Overlapping genetic codes for overlapping frameshifted genes in Testudines, and Lepidochelys olivacea as special case

iology

adCbss1Rscs

2

atpwotrfbficibios

TF

H. Seligmann / Computational B

mphibian origins (highest similarity 94%, CO1 from the salaman-er Batrachuperus mustersi (ABC56114)). The second most similarO1 was chelonian (Indotestudo elongata, ABF83436), followedy 28 amphibians, and only after that, by the bulk of chelonianequences. The second half of CO1 is coded by the +2 frameshiftedequence. Its residues 263–515 match homologous residues from9 other turtles and include one AGA and one UAA stop codon.esidues 478–507 translated from each +1 and +2 frameshiftedequences match with CO1 from other species. This suggests thatombinations of programmed frameshifts could produce severalimilar, but not identical CO1 proteins in Lepidochelys.

.4. ND4l and ND4 in Lepidochelys

In two adjacent genes, ND4l and ND4, the open reading framennotated in Genbank does not align with the corresponding pro-eins from other species, but probably code for other unknownroteins. The +1 frameshifted ND4l sequence of Lepidochelys alignsith proteins coded by the open reading frames of ND4l from 31

ther turtles, and includes two UAA stop codons. It is possible thathe 3′ region of the open reading frame codes for the few firstesidues of ND4l, but the major part of the protein is coded by the +1rameshifted sequence. This is, as for ND3, a known case of recodingy frameshift in turtles (Russell and Beckenbach, 2008) and con-rms the method used to detect frameshift-coding on these knownases (ND3 and ND4l). Similarly, the open reading frame annotatedn Genbank of ND4 does not match ND4 proteins from other species,

ut the +1 frameshifted sequence codes for the entirety of ND4,

ncluding 10 AGR stop codons. It matches the regular ND4 of 38ther turtles. Information about frames and alignments with otherequences is given in Table 1.

able 1rameshift-coding in protein coding genes of two mitochondrial genomes from Lepidoche

Genomea Geneb Framec Residuesd

DQ486893 ND1 0 1–254

NC 011516 ND1 0 1–323

DQ486893 ND1 +1 39–322

NC 011516 ND1 +1 39–344

CO1 0 Various regions

DQ486893 CO1 +1 2–274

NC 011516 CO1 +1 2–271 and 478–509CO1 +2 263–515

CO2 +1 37–70

CO2 +2

DQ486893 CO3 +1 73–162

NC 011516 CO3 +1 73–158

DQ486893 ND3 +1 58–116

NC 011516 ND3 +1 50–116

ND4l 0

ND4l +1 13–98

DQ486893 ND4 0 1–460

NC 011516 ND4 0 1–460

DQ486893 ND4 +2 1–459

NC 011516 ND4 +1 2–460

ND5 0 1–207

ND5 +1 219–591

ND5 +2 26–120 and 235–25ND6 0 1–140

DQ486893 ND6 +1 100–174

NC 011516 ND6 +1 76–174

ND6 +2 92–136

Cytb 0 1–92

Cytb +1 Various regions

Cytb +2 18–380

a Indicates which Lepidochelys genome when these differ.b Regular gene usually coded by open reading frame.c 0—stopless, open reading frame, +1 to +1 frameshifted sequence, +2 to +2 frameshifted Residues numbered from start of translated sequence aligning with proteins from Gee Identity of Genbank protein aligning with residues in “d”.f Mainly stop codon reassignments.

and Chemistry 41 (2012) 18– 34 21

2.5. Other alignments

The case of alignments between the open reading frame ofCO1 and different regions of CO1 from other distant, mainly non-vertebrate species remains unexplained. A similar situation existsfor the open reading frame of ND4. Other interesting and unex-plained cases are for the +2 frameshifted sequence of CO2, whichaligns (residues 81–209) with CO2 of the termite Coptotermes curvi-gnathus (CAH69508, residues 74–202, similarity 51%), and the +1frameshifted sequence of CytB, which also has several regionsthat align with other CytB proteins from distant organisms. Theseare probably convergent cases of programmed frameshifts, sug-gesting adaptive causes. The +2 frameshifted sequence of AT8is less mysterious: it aligns (residues 2–38) with a short regionof an anuran alpha-2-macroglobulin-like 1 from Xenopus tropi-calis (NP 001106498, 63% similarity for residues 156–192 in thatspecies). The putative protein translated from the +2 frameshiftedsequence of CO3 (residues 180–243) aligns with an unnamed pro-tein from the teleostean fish Tetraodon nigroviridis (CAG14365,similarity 74%, residues 3–66). It is unlikely that all these casesare statistical artefacts of blast’s alignment search procedure, someof them at least indicate overlap coding, some potentially for dif-ferent proteins, and some might indicate coding for programmedframeshifts producing different or very different variants of thesame functional protein.

2.6. The overlapping genetic code for overlapping genes in

Lepidochelys

An important aspect of the previous sections is that if the mito-chondrion of Lepidochelys is to fulfill its regular function, it has to

lys olivacea.

Alignmente Notef

ND1ND1ND1 AGR → GlyND1 AGR → GlyCO1 Distant speciesCO1 AGR → Gly

CO1 AGR → GlyCO1 No stopsCO2CO2 Coptotermes curvignathusCO3 AGR → GlyCO3 AGR → GlyND3 AGR → GlyND3 AGR → GlyUnknownND4lND4UnknownFrame 0 of NC 011516 AGR → Lys, UAR → TrpND4 of DQ486893 AGR → GlyND5ND5 AGR → Gly

6 ND5 AGR → GlyND6ND6ND6ND6, Eretmochelys AGR → Lys, UAR → TrpCytb, rest unknownCytb Distant speciesCytb

d sequence.nbank.

Page 5: Overlapping genetic codes for overlapping frameshifted genes in Testudines, and Lepidochelys olivacea as special case

2 iology

etNTaifpocccwoimbtAAofetwotvggetoosidi

2a

vgbtrNo+brmToflioidfoff

2 H. Seligmann / Computational B

xpress frameshifted sequences that include stop codons, becausehe proteins normally encoded by four genes (ND1, CO1, ND4l, andD4) cannot be translated without readthrough at stop codons.his is a strong indication for stop codon reassignments to aminocids in the mitochondrial genome of this species. The open read-ng frames of five other genes (CO2, CO3, ND3, ND5 and ND6) codeor the usual proteins associated with these genes, but also includeresumed programmed frameshifts potentially coding for variantsf these proteins, part of these also require readthrough at stopodons. The analyses of alignments for these 9 proteins can indi-ate the most likely candidate cognate amino acid for these stopodons, if systematically a given stop codon in Lepidochelys alignsith a given amino acid in other turtles. In this respect, potential

verlapping sequences with proteins from distant species are lessnformative (i.e. a termite for CO2), because amino acid replace-

ents are likely to confound the codon’s amino acid reassignment,ut this is less relevant for alignments with proteins from other tur-les. The frameshifted overlapping genes include 38 AGR stops (32GA), which align with glycine in other turtles at 34 sites (AGA, 29;GG, 5). Two cases, one for AGA and one for AGG, were with lysine,ne AGA maps with an alanine, and another one with a gap. Theewer UAR codons (9 cases) do not yield a clear picture, and do notnable to decide whether codon reassignment occurred, and if so,o which amino acid. UAG mapped twice with tryptophan and onceith each glycine and tyrosine. UAA mapped 4 times with gaps and

nce with leucine. The alignments with gaps suggest the possibilityhat UAR codons function as stops. This nevertheless defines a newertebrate mitochondrial genetic code, where AGR is reassigned tolycine, as it is clear that 90% of 38 AGR codons cannot align withlycine by chance or as a result of sequencing artefacts, annotationrrors, or DNA decay taphonomy. Fig. 1 shows the alignment forhe +1 frameshifted ND4 of Lepidochelys with the regular ND4 fromther turtles. It is possible that this genetic code functions with-ut stops, or that UAR stops are ambiguous, sometimes coding forome unknown amino acid, and sometimes for a stop. This situations not surprising, as recent advances on termination of mitochon-rial translation (Lightowlers and Chrzanowska-Lightowlers, 2010)

mply that translational termination might not be well understood.

.7. Variation within Lepidochelys and switches between framesnd codes

Two complete mitochondrial genomes from Lepidochelys oli-acea are available in Genbank (NC 011516, DQ486893). For mostenes, alignment analyses yielded similar results about frames foroth genomes, but Table 1 shows a significant difference betweenhese two genomes for gene ND4. In ND4 from DQ486893, the openeading frame codes for the regular ND4-encoded protein, while forC 011516, it codes for an unknown protein (as described in previ-us sections). In the latter genome, the regular ND4 is coded by the1 frameshifted sequence, which aligns well with the protein codedy the main frame of ND4 from DQ486893. The alignment clearlyeassigns AGR codons to glycine, as found for other frameshifteditochondrial protein coding gene sequences from Lepidochelys.

he protein translated from the +2 frameshifted sequence of ND4f DQ486893 aligns with the protein coded by the open readingrame of ND4 from NC 011516. In this case, AGR codons matchysine, and UAR codons match tryptophan. This means that codingn ND4 is switched between frames in NC 011516, as it is for manyther genes in Lepidochelys. However, for ND4 from DQ486893, cod-ng is as usual for the open reading frame. The Genbank recordso not indicate the origins of the two genomes. These could be

rom different individuals with different age, sex, or geographicrigins (suggesting in the latter case species level differences), orrom different organs of the same individual. The matter deservesurther investigation. However, both similarities and differences

and Chemistry 41 (2012) 18– 34

between the two Lepidochelys genomes follow systematic patternsthat cannot result from sequencing or annotation errors, nor fromDNA denaturation during DNA decay in dead or dying animals: suchphenomena would not systematically alter stop codons to code forLys and Trp, but also to a variety of other amino acids, and suchalterations would not be restricted to these two amino acids, andto a single gene, ND4 in DQ486893.

2.8. Circular genetic codes and frameshift sites

Though programmed frameshifts require a clear mechanismcausing ribosomal frameshift (Farabaugh, 1996), most known casesof frameshift recoding are detected by chance. However, a yetunexplained statistical association exists between frameshifts andthe usage of the most commonly used codons, which form cir-cular genetic codes (Ahmed et al., 2007). The latter codons areavoided at frameshifting sites, while the trinucleotides AAA, CCC,GGG and UUU seem favored at and around sites causing ribosomalframeshifts, following the principle that DNA repeats produceinsertions and deletions by polmerases (i.e. Ripley et al., 1986).

This negative association between frameshifts and circularcodes can be confirmed for turtle mitochondria by examining theframeshifting sites that are known to exist within ND3 and ND4l.The alignment method described in previous sections was used todetect the frameshifting codon in these genes, in all complete tur-tle mitochondrial genomes available in Genbank. All three framesof these two genes were examined in these species, according tothe main circular genetic codes associated with each frame. Trin-ucleotides matching AAA, CCC, GGG or UUU were coded −1, thosematching the codons belonging to the circular genetic codes gotscore 1, other codons got score 0. All three frames of each genewere examined according to each of the three circular codes (calledX0, X1 and X2, each corresponding to one frame), because themain frame moves between frames, and these genes are a mix-ture of frames. For Nd3, across turtle species, the mean scoreat codons where frameshifts are predicted is expectedly lower,according to each of the three circular codes, than at other positionsaccording to one-sided t-tests (X0, P = 0.0001; X1, P = 0.00044; X2,P = 0.0006). Results were not significant for Nd4l (X0, P = 0.42; X1,P = 0.67; X2, P = 0.61), but using Fisher’s method to combine P val-ues from both genes confirms that predicted frameshift positionsin ND3 and ND4l differ from other positions in relation to themain ‘C3’ circular genetic codes (X0, P = 0.0005; X1, P = 0.0003;X2, P = 0.0003). These positions are known frameshift positions.Hence this test confirms that this method based on circular codescan be applied to confirm new frameshifting sites predicted inother Lepidochelys genes. The average circular code score for pre-dicted frameshift positions in Lepidochelys (X0, 0.26 ± 1.54; X1,0.39 ± 1.41; X2, 0.09 ± 1.24, N = 23) was expectedly lower than forother positions (X0, 0.72 ± 0.9, P = 0.0074; X1, 0.69 ± 0.94, P = 0.068;X2, 0.72 ± 0.98, P = 0.0011, N = 3782, one sided t-tests). This testconfirms independently the frameshifts predicted by alignmentmethods used in previous sections. It has the caveat that the causefor the association between frameshift and circular codes is not yetunderstood. It seems that frameshifts associate with the ambiguityof repeats, and that frameshifting associates with rare codons, asobserved before (Mindell et al., 1998). This result is neverthelessfrom a heuristic point of view important as it is an independentsubstantiation of the predicted frameshifts.

2.9. Antisense antitermination tRNAs in Lepidochelys

Reassignment of stops to amino acids implies that stop codonsare read by tRNAs with anticodons matching stop codons. This pos-sibility exists for AGR stops, when cytosolic tRNAs are importedinto the mitochondrion (Schneider and Drouard-Maréchal, 2000;

Page 6: Overlapping genetic codes for overlapping frameshifted genes in Testudines, and Lepidochelys olivacea as special case

H. Seligmann / Computational Biology and Chemistry 41 (2012) 18– 34 23

ND4, frame +1, 2-460, UAR-X; AGR-Z.Lepi YEVMLPTIMLLPTIMLCEPKQLWPSTLIHSLMIATLSLQWFEPSTKPIMNFSNDYMZMNQISAPLLILSCWLTPMMILASQNHLALKPSSRKRTFTLTIILLQISLMLAFSTMKLIMFFILepi YEVMLPTIMLLPTIMLCKPKQLWPSTLIHSLMIATLSLQWFKPSTEPIMNFSNNYMGMDQISAPLLILSCWLTPMMILASQNHLALEPSSRKRTFTLTIILLQISLMLAFSTMELIMFFIEret –KILLPTIMLLPMTTLCKPKQLWISTLIHSLMIATLSLQWFKPSMEPTMNFSNNYLGVDQISAPLLILSCWLTPLMILASQNHLTTEPTSRKRIFTFTIILLQISLMLAFSTTELIMFFICmyd –KILLPTIMLLPTITLCKPKQLWLSTLTHSLMIAILSLQWFKPSMEPTMNFSNYYLGVDQISAPLLILSCWLTPLMILASQNHLTTEPTSRKRTFIFTIILLQISLILAFSTTELIMFFITgra –KIIIPTILLLPTITFCKPNQLWPTTLIHSFGITLLSLQLFKPSME-LMIFSNHYLGVDQISTPLLILSCWLTPLMILASQNHLISEPTSRKRTFMTTIIFLQISLILAFSTTELIMFFI Ther –KIIIPTVLLLPTVTFCKPKQLWSTMLIHSFGIALLSLQLSKPAME-LMIFSNHYLGVDRISAPLLILSCWLTPLMILASQNHLTLEPTSRKRTFMTTIIFLQISLILAFSTTELIMFFIIfor –KIIIPTVLLLPTITFCNPKQLWSTTLTHSLGISLLSLQLSKPSME-LMIFSNHYLGVDRISAPLLTLSCWLTPLMILASQNHLTTEPISRKRTFTATIIFLQISLILAFSATELIMFFITkle –KIIIPTILLLPTITFCKPKQLWPTSLIYSLGIALLSLQLFKPSME-LMIFSNHYLGVDRISAPLIILSCWLTPLMILASQNHLILEPISRKRTFMATITFLQVSLILAFSTTELIMFFI Tmar –KIIIPTILLLPTITFCKPKQLWPTSLTYSFGIALLSLQLFKPSME-LMTFSNHYLGVDRISAPLLVLSCWLTPLMILASQNHLTSEPISRKRTFMTTITFLQISLILAFSTTELIMFFICser –KILAPTIMLLPTIMLCKTKQLWPTTLTYSFMIALLSLQWFKSPLESTMNFSNHYLGVDQISSPLLILSCWLTPLMILASQNHLTTEPILRKRTFMTTMILLQISLILAFSTTELIMFFIMtor –KIIVPMVLLLPTITFCKPKQLWSTTLAHSFGVTLLSLQLFKPSMELTM-FSNYYLGVDQISTPLFILSCWLSPLMILASQNHLTMEPISRKRTFAATIILLQTSLILAFSTTELIMFFIMimp –KIIIPTILLLPTVMLCKPKQLWPATLTHSFGIALLSLQWFKPSME-LTTFSNHYLGMDQISAPLLILSCWLTPLMILASQNHLTTEPTSRKRTFISTIILLQIPLILAFSATELIMFFIMtem –KILTPTIMLLPTIMLCKTKQLWPTTLTYSFTIALLSLQLFKPSLESTMDFSNYYLGVDQISAPLLILSCWLTPLMILASQNHLATEPILRKRTFIITMISLQISLILAFSATELIMFFIIelo –KIIIPTVLLLPTITFCNPKQLWSTTLTYSFGISLLSLQLSKSSME-LMIFSNHYLGVDRISAPLLTLSCWLTPLMILASQNHLTTEPISRKRTFTATIIFLQISLILAFSATELIMFFIMemy –KIIIPTIMLLPTAMLCKPKQLWPATLTHSFGIALLSLQWFKPSME-LTTSSNHYLGVDQISAPLLILSCWLTPLMILASQNHLIAEPMSRKRTFISTIISLQIPLTLAFSATELIMFFIGpar –KIIIPTILLLPTTMLCKPKQLWPTTLTSSLGIALLSLQWLKPSSE-LTTFSNYYLGVDPISAPLLTLSCWLTPLMILASQNHLTTEPTPRKRTFITTIIMLQISLILAFSAMELIMFFIPmeg –KIMLPTIMLLPTTMLCKPKQLWITMLTHSFGIAFLSLQWFKPSME-FTSFSNHYLGVDQISAPLLILTCWLSPLMVLASQNHLTKEPVTRKRTFISTIILLQTSLILAFSATELIMFFMMter –KILLPTFMLFPTITLCKPKQLWPTMSINTFGIALLSMQWFKPSQELTMSFSNYYMSIDYISAPLLTLSCWLTPLMILASQNHLITEPISRKRTFTFIIILLQVSLVLAFSATELIMFFITscr –KILLPTFMLLPTITLCKPKQLWSTMSINTFGIALLSMQWFKPSQELTMSFSNYYMSIDYISAPLLTLSCWLTPLTILASQNHLITEPIPRKQTFTFIIILLQISLVLAFSATELIMFFIThor –KIIIPTVLLLPTITFCKPKQLWSTTLIYSFGVALLSLQLFKPSME-LMTFSNHYLGVDRISAPLLILSCWLTPLMILASQNHLTMEPISRKRTFTATIIFLQISLILAFSTTELIMFFICpic –KILLPMTALLLTTTICKPKQLWPTTSINTFGIALLSLQWFKPSQELTMSFSNYYTSIDYISAPLLTLSCWLTPLMILASQNHLIMEPISRKRTFIFITTLLQISLVLAFSATELIMFFIMmut –KMMIPMILLLPTTMLCKSKQLWPTTLIHSFWIALLSLQWFKPSTE-LTIFSNSYLGVDQISAPFLILSCWLTPMMIMAGQNNLITEPTSRKRTFIFITILLQISLILAFSTTELIMFFICgal –KMVIPMILLLPTTMLCKSKQLWPTALAHSFWIALLSLQWFKPSTE-LTIFSNCYLGVDQISAPFLILSCWLTPMMIMAGQNNLIMEPTPRKRTFIFITILLQISLILAFSTTELIMFFICamb –KMMIPMILLLPTIMLCKSKQLWPTALTHSFWIALLSLQWFKPSTE-LMTFSNCYLGVDQISAPFLILSCWLTPMMIMAGQSNLTTEPTPRKRTFILITILLQISLILAFSTTELIMFFICree –KILLPTIMLLPTITLCKPKQLWLSTLTHSLMIAILSLQWFKPSMEPTMTSPIITLGVDQISAPLLILSCWLTPYPPNNGQNNMTMEPTPRKRTFIFITILLQISLILAFSTTELIMFFIAfer –KILIPTMMLIPTTTMCKPTQLWYLPLIHSMLISLFSLQLFNPSLQPIMNFSNHNLATDQVSTPLIILSCWLTPLMILASQNHLSTEPLLRKRTFIITTILLQTLLIMTFSATDLMMFFVCpan –KMMIPMILLLPTAMLCKSKQLWPAALAHSFWIALLSLQWFKPSSE-LMVFSNCYLGVDQISAPFLILSCWLTPMMIMAGQNNLIMEPTPRKRTFIFTTILLQISLILAFSTTELIMFFICaur –KMMIPMILLLPTAMLCKSKQLWPAALAHSFWIALLSLQWFKPSSE-LMVFSNCYLGVDQISAPFLILSCWLTPMMIMAGQNNLIMEPTPRKRTFIFTTILLQISLILAFSTTELIMFFIPsin –KIIIPTIMLIPTTTMCKPTQLWYTTLIHSMLISLLSLQLFYPSLQPIMNFSNHNMAADQTSTPLIILSCWLTPLMILASQNHLSTEPLSRKRTFIITTIILQMLLIMTFSTTDLMMFFVPmou –KMIIPMILLLPTITLCKSKQFWPTALAHSFWIALLSLQWFKPSME-LMIFSNCYLGVDQISVPFLILSCWLTPMMIMAGQNNLITEPTSRKRTIIFITILLQISLILAFSTTELIMFFIPste –KIIIPTIMLIPTATMCKPTQLWHSPLIHSMFISLLSLQLFNPSLQPTMNFSNYNLATDQMSTPLIILSCWLTPLMILASQNHLSTEPLPRKRTFIITAITLQMLLIMTFSTTDLMMFFVCatr –KMIIPMILLLPTIALCKSKQLWPTALTHSFWIALLSLQWFKPTTE-LLIFSNCYLGVDKISAPLLILSCWLTPMMIMAGQNNLAVEPTSRKRTFIIITILLQVSLVLAFSATELIMFFICfla –KMMIPMILLLPTTMLCKSKQLWTAALTHSFWIALLSLQWFKPSTE-LMIFSNCYLGVDPISAPFLILSCWLTPMMIMAGQNNLIMEPTPRKRTFIFTTILLQFSLILAFSTTELIMFFITtri –KILMPTIALIPTITLCKPTQLWYTTLTHSMLISLLSLQLLNPPLYPTMNLSNPNLATDQVSTPLIILSCWLTPLMILASQNHLSTEPLPRKRTFITTVILLQTLLIMTFSATDLMMFFVSqua –KMIIPMVLLLPTIALCKSKQLWPTALTHSFWIALLSLQWFKPPME-LMTFSNYYLAVDQISAPFLILSCWLTPMMIMAGQNNLITEPTSRKLTNLTST---------SFFNMELIMFFIPsub LCKKNHLSYTTLLFSFTIALLSLQWLKPPFELTTTFSNTYMGVDPISTPLLILTSWMTPLMILVSKNHLIQEPLSRKRTFTTTIISLQISLTLAFSALEMMLFFTKleu CSTKQLWPMSTIHSFTVALLSLQLFKSSLETTLCHSNYFMGVDQISSLLITLSCWLTPLMLLASQNHLTTEPITRKRTFTITIIILQISLILAFSTTELIMFFILpun –KILLPTIMLIPTVTMCKPTNLWYTSLTHTLLISLLSLKLLNYTQQPVMIMSNPYLATDQISTPLIILTCWLTPLMIIASQNHMSTEPLPRKRIFTITIVLLQTLLIMTFSTTNLMLFFICins –MLLPTIMLIPTTILCKPNQLWMTTTTHSLWIATMSLKWLKPSPWQTMTFSNL-YMAVDQISAPLLILSCWLTPLMIMASQNHLITETTQRKRMFISTLIFLQTSLILAFSTTDLIMFFI

Lepi TFETTLIPTLVIITRWSDQMKRLNAZTYFLFYTLIGSLPLLIALLSLYTENGSSSMYTMQLNQPIMPSSWTYTT-WWFALLMAFMIKMPLYGLHLWWPKAHVKAPIASSMILAAVLLKLZZLepi TFETTLIPTLVIITRWGNQMERLNAGTYFLFYTLIGSLPLLIALLSLYTENGSSSMYTMQLNQPIMPSSWTYTT-WWFALLMAFMIKMPLYGLHLWWPKAHVEAPIAGSMILAAVLLKLGGEret AFETTLIPTLVIITRWGNQMERLSAGTYFLFYTLIGSLPLLIALLSLKTKNGSLSMHTMTLTQPTMRNSWTYTT-WWFALLMAFMTKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGK Cmyd TFETTLIPTLVIITRWGNQMERLNAGTYFLFYTLIGSLPLLIALLSLNTENGSLSMHTMQLNQPTMLNSWTHMT-WWFALLMAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGTgra TFETTLIPTLVIITRWGNQMERLNAGTYFLFYTLIGSLPLLVALSYMQTQNGTLFIYMMQLNQPTMTTSWAHS-MWWFALLIAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGTher TFETTLIPTLVIITRWGNQMERLNAGTYFLFYTLIGSLPLLVALSCMQAQNSTLSIYTMQLNQPTMTTSWAHS-MWWLALLIAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGIfor TFETTLIPTLIIITRWGNQMERLNAGTYFLFYTLVGSLPLLVALSYMQTQNGTLSIYTMQLNQPTMTTSWAHST-WWFALLIAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGTkle AFETTLIPTLVIITRWGNQMERLNAGTYFLFYTLIGSLPLLVALSYMQAQNGTLFTHTMQLNQPTMMTSWAHS-MWWFALLTAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGTmar AFETTLIPTLVIITRWGNQMERLNAGTYFLFYTLIGSLPLLVALSYMQTQNGTLSTLMMQLNQPTMMTSWAHS-MWWFALLTAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGCser MFESTLIPTLVIITRWGSQMERLNAGTYFLFYTLIGSLPLLIALLSLYTQNGTLSLSIMQLNQPTMLNSWTH-TMWWFALLTAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGMtor MFEATLIPTLVVITRWGNQMERLNAGTYFLFYTLIGSMPLLVALLYMQTKNGTLSTYTMQLNQPTMMTSWAHS-MWWFALLIAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGMimp AFETTLIPTLMIITRWGNQVERLNAGTYFLFYTLIGSLPLLVALLSMQAQNGTLSTCTMQLNQPTMMNSWTHS-MWWFALLTAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGMtem MFEATLIPTLVIITRWGNQMERLSAGTYFLFYTLTGSLPLLIALLSLYTQNGTLSLHMMQLNQPIMLNSWAH-TMWWFALLTAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGIelo TFETTLIPTLIIITRWGNQMERLNAGTYFLFYTLIGSLPLLVALSYMQTQNGTLSTYTMQLNQPTMMTSWAHST-WWFALLIAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGMemy AFETTLIPTLVIITRWGNQVERLNAGTYFLFYTLIGSLPLLVALLFMQTQNGTLSTCTMQLNQPTMMNSWTHS-MWWFALLTAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGGpar AFEATLIPTLVIITRWGNQMERLNAGTYFLFYTLIGSLPLLVALSFMQTQNGTLSIYMMQLNQPTLTNSWSHS-MWWLALLTAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGPmeg AFEATLIPTLVIITRWGNQMERLNAGTYFLFYTLIGSLPLLIALLSLHTQNGTLSLCTIQLNQHAMLNSWTHTT-WWFALLTAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGMter MFETTLIPTLAIITRWGNQMERLNAGTYFLFYTLIGSLPLLIALLSLQNQIGTLSTHMIQLNQPTMSNTWAHTT-WWFALLTAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGTscr MFEATLIPTLAIITRWGNQMERLNAGTYFLFYTLIGSLPLLIALLSLQNQIGTLSTHMIQLNQPTMSNTWAHTT-WWFALLTAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGThor TFETTLIPTLVIITRWGNQMERLNAGTYFLFYTLIGSLPLLVALSYMQTRNGTMSTYTMQLNQPTMMTSWAHSL-WWFALLIAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGCpic MFETTLIPTLAIITRWGNQMERLNAGTYFLFYTLIGSLPLLIALLSLQNQIGTLSTYMIQLNQPTMLNTWAHTT-WWFALLTAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGMmut –FETTLLPTLVIITRWGGQMERLNAGTYFLFYTLIGSLPLLVALLATQTYSGTLSICTLQLSTYPNSMNPWTHTMWWLALFTAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGCgal MFESTLLPTLVIITRWGGQMERLNAGIYFLFYTLIGSLPLLVALLTTQTYSGTLSICTLQLSTYPNTMNPWTHTMWWLALFTAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGCamb –FESTLLPTLVIITRWGGQMERLNAGTYFLFYTLIGSLPLLVALLTTQTYSGTLSICTLQLSTYPNTMNSWTHTMWCLALFTAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGCree –FESTLLPTLVIITRWGGQMERLNAGTYFLFYTLIGSLPLLVALLTTQTFSGTLSICTLQLSTYPSMMNPWTHTMWWLALFTAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGAfer LFESTLIPTLMLITRWGNQMERLNAGTYFLFYTLMGSLPLLIALLSLHSNTNSLSILTMQL-NPPTMMMTWTNSMWLLAMLTAFMIKLPLYGLHLWLPKAHVEAPIAGSMILAAILLKLGGCpan AFESTLLPTLVIITRWGGQMERLNAGTYFLFYTLIGSLPLLVALLTAQTYSGTLSICTLQLSTYPNMMNPCAHTMWWLALFTAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGCaur AFESTLLPTLVIITRWGGQMERLNAGTYFLFYTLIGSLPLLVALLTAQTYSGTLSICTLQLSTYPNMMNPCAHTMWWLALFTAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGPsin LFEATLIPTLMMITRWGNQMERLNAGTYFLFYTLMGSLPLLVALLSLYSNTNSLSILTMQLSPT-MLKNTWTNSMWLLAALTAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAILLKLGGPmou AFESTLLPTLVIITRWGGQMERLNAGTYFLFYTLIGSLPLLVALLTTQTYSGTLSICTLQLSTYPDMMNPWTHTMWWLALFTAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGPste AFEATLIPTLMMITRWGNQMERLNAGSYFLFYTLMGSLPLLIALLSLYSNTNSLSILTMQLK-PPTFTNTWTNSMWLLAALTAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAILLKLGGCatr AFESTLLPTLVIITRWGVQMERLSAGTYFLFYTLIGSLPLLVALLTTQTYTSTLSVYMLQLSPYPYMMNPWTHTMWWLALFIAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGCfla AFESTLLPTLVIITRWGGQMERLNAGTYFLFYTLIGSLPLLVALLAAQTYSGTLSICTLQLSIYPNMTNPWTHTVWWLALFTAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGTtri LFEATLIPTLMMITRWGNQMERLNAGTYFLFYTLIGSLPLLIALLSLYSNTNSLSMPTMQL-NPPTMTNTWTNSMWLLAALTAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAILLKLGGSqua AFESTLLPTLVIITRWGGQMERLNAGTYFLFYTLIGSLPLLVALLSTQTYS-TLSIYMLQLSD-YYKTNPWACTMWWLALFTAFMIKMPLYGLHLWLPKAHVEAPVAGSMILAAVLLKLGGPsub MFEATLIPTLIIITRWGNQMERLSAGTYFLFYTLIGSLPLLIALTSLHTNYNTLSLFILQL-NPPNLTNSWAHTMWWFALLMAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAGVLLKLGGKleu MFETTLIPTVVMITRWGNQVERLNAGTYFLFYTLIGSLPLLIALLSFNTKNGTLHLCTTLL-NQTPVPNTWTHTMWSFALLTAFMIKMPLYGLHLWLPKAHVEAPIAGSMILAAVLLKLGGLpun TFEATLVPTLMMITRWGNQMERLNAGTYFMLYTLIGSLPLLVALLSLHSHTSSLFMPVLQL-NPPTMTNTWTNSMWLLAVLTAFTQSMPLYGLHLWLPKAHVEAPIAGSMILAAILLKLGGCins AFETTLIPTTIIITRWGNQMERLNAGTYFLFYTLIGSIPLLIALMYLYMHSNTLYMPNMQL-NPQTTSNTWTNTMWTSALLIAFMIKAPLYGLHLWLPKAHVEAPIAGSMILAAILLKLGG

Fig. 1. (Continued) .

Page 7: Overlapping genetic codes for overlapping frameshifted genes in Testudines, and Lepidochelys olivacea as special case

24 H. Seligmann / Computational Biology and Chemistry 41 (2012) 18– 34

Lepi YGIIRITMMLNPLSKTLSYPFMVLALWZVIMTGSICLRQTNLKSLIAYSSVSHMZLIIAATLTQTQWAYTGAITLMIAHGLTSSMLFCLANTNYERTHSRTLLLARNMQLLLPLMSLWWL Lepi YGIIRITMMLNPLSKTLSYPFMVLALWGVIMTSSICLRQTDLKSLIAYSSVSHMGLIIAATLTQTQWAYTGAITLMIAHGLTSSMLFCLANTNYERTHSRTLLLARNMQLLLPLMSLWWLEret YGIIRVTMMMTPLSKTLSYPFMALALWGVIMTSSICLRQTDLKSLIAYSSVSHMGLVIAATLTQTQWAYTGAITLMIAHGLTSSMLFCLANTNYERTHSRTLLLARNMQLLLPLMSLWWLCmyd YGIIRITMMLNPLSKTLSYPFMVLALWGVIMTSSICLRQTDLKSLIAYSSVSHMGLVIASTLTQTQWAYTGAITLMIAHGLTSSMLFCLANTNYERTHSRTLLLARNMQLLLPLMSLWWLTgra YGIIRIMMTLNPLSKTLSYPFMAMALWGVIMTSSICLRQTDLKSLIAYSSVSHMGLVIAATLTQTQWSYTGAITLMIAHGLTSSMLFCLANTNYERTHSRMLLLTQNMQLLLPLMGLWWLTher YGIIRITMTLNPLSKTLSYPFMVMALWGVIMTSSICLRQTDLKSLIAYSSVSHMGLVIAATLTQTQWSHTGAITLMIAHGLTSSMLFCLANTNYERTHSRMLFLTRNMQSLLPLMGLWWLIfor YGIIRIMMTLNPLSKTLSYPFMTMALWGVIMTSSICLRQTDLKSLIAYSSVSHMGLVIAAALTQTQWSYTGAVTLMIAHGLTSSMLFCLANTNYERTHNRMLFLTRNMQSLLPLMGLWWLTkle YGIIRIMPTLNPLSKTLSYPFMAMALWGVIMTSSICLRQTDLKSLIAYSSVSHMGLVIAAALTQTQWSYTGAITLMIAHGLTSSMLFCLANTNYERTHSRMLLLTQNMQLLLPLMGLWWLTmar YGIIRIMLTLNPLSKTLSYPFMAMALWGVIMTSSICLRQTDLKSLIAYSSVSHMGLVIAATLTQTQWSYTGAITLMIAHGLTSSMLFCLANTNYERTHSRMLLLTQNMQLLLPLMGLWWLCser YGIMRIMMMLNPMSKVLSYPFMVFSLWGVIMAGSICLRQTDLKSLIAYSSVSHMGLVIAAALTQTQWAYTGAITLMIAHGLTSSMLFCLANTNYERTHSRTLLLTQNMQLVLPLMGLWWLMtor YGIIRITMTLNPLSKTLSYPFTVMALWGVIMTSSICLRQTDLKSLIAYSSISHMGLVIAATLTQTQWSHTGAITLMIAHGLTSSMLFCLANTNYERTHSRMLFLTQNMQLLLPLMGLWWLMimp YGIIRITMTLDPLSKTLSYPFMMMALWGVVMTSSTCLRQTDLKSLIAYSSVSHMGLVIAATLAQTQWSYTGATTLMIAHGLTSSMLFCLANTNYERTHSRTLLLTRNMQVLLPLMGLWWLMtem YGIMRITMMLNPASKMLYYPFMVFSLWGVIMTSSICLRQTDLKSLIAYSSVSHMGLVIAAALTQTQWAYTGAITLMIAHGLTSSMLFCLANTNYERTHSRTLLLTQNMQLVLPLMGLWWLIelo YGIIRITMTLNPLSKTLSYPFMTMALWGVVMTSSICLRQTDLKSLIAYSSVSHMGLVIAAALTQTQWSYTGAITLMIAHGLTSSMLFCLANTNYERTHSRMLFLTRNMQLLLPLMGLWWLMemy YGIIRITMTLDPLSKTLSYPFMVMALWGVIMTSSICLRQTDLKSLIAYSSVSHMGLVIAATLTQTQWSYTGAITLMIAHGLTSSMLFCLANTNYERTHSRTLLLTRNMQVLLPLMALWWLGpar YGIIRITMTLDPLSKTLSYPFMVMALWGVIMTSSICLRQTDLKSLIAYSSVSHMGLVIAATLTQTQWSYTGAITLMIAHGMTSSMLFCLANTNYERTHSRVLLLTQNMQLSLPLMGLWWLPmeg YGIIRIMMTLDPLSKTLSYPFMVLALWGVVMTSSICLHQTDLKSLIAYSSVSHMGLVTAAALTQTHWAHTGAITLMIAHGLTSSMLFCLANTNYERTHNRTLLIARNMQLFLPLMGTWWLMter YGIIRIMPTLNPLSKTLPYPFMVLALWGVIMTSSICLRQTDLKSLIAYSSVSHMGLVIAATLTQTQWAYTGAITLMIAHGLTSSMLFCLANTNYERTHSRTLLLARNMQLLYPLMSLWWLTscr YGIIRIMPTLNPLSKTLSYPFMVLALWGVIMTGSICLRQTDLKSLIAYSSVSHMGLVIAATLTQTQWAYTGAITLMIAHGLTSSMLFCLANTNYERTHSRTLLLARNMQLLYPLMSLWWLThor YGIIRITMTLSPLSKTLSYPFMVMALWGVIMTSSICLRQTDLKSLIAYSSVSHMGLVIAATLMQTQWSHTGAMTLMIAHGLTSSMLFCLANTNYERTHSRMLLLTQNMQLLLPLMGLWWLCpic YGIIRIMPTLNPLSKTLSYPFMVLALWGVIMTGSICLRQTDLKSLIAYSSVSHMGLVIAATLTQTQWAYTGAITLMIAHGLTSSMLFCLANTNYERTHSRTLLLARNMQLLYPLMGLWWLMmut YGIIRMTMTLAPPLKMLSYPFMMLALWGVIMTGFICLRQTDLKSLIAYSSVGHMGLVIAATLTRTEWACTGAITLMIAHGLTSSMLFCLANTNYERTNSRTLLLARNMQLLLPLMGLWWFCgal YGIIRMTMTLAPPLKMLSYPFMMLALWGVIMTGFICLRQTDLKSLIAYSSVGHMGLVIAATLTRTEWACTGAITLMIAHGLTSSMLFCLANTNYERTNSRTLLLARNMQLLLPLMGLWWFCamb YGIIRMTMTLTPPLKMLSYSFMMLALWGVIMTGFICLRQTDLKSLIAYSSVSHMGLVIAATLTQTEWACTGAITLMIAHGLTSSMLFCLANTNYERTNSRTLLLARNMQLLLPLMGLWWFCree YGIIRMTMTLAPPLKMLSYPFMMLALWGVIMTGFICLRQTDLKSLIAYSSVGHMGLVIAAALTRTEWACTGAITLMIAHGLTSSMLFCLANTNYERTNSRTLLLARNMQLLLPLMGLWWFAfer YGIIRITLTTNFLSKTTYYPFMILALWGIIMTGLICVRQTDLKSLIAYSSVSHMGLVTAATLSQTEWAYTGAITLMIAHGLTSSMLFCLANTNYERIHSRTLLLTQNMQLLLPLMGTWWLCpan YGIIRMTMTLAPPLKMLSYPFMMLALWGLIMTGFICLRQTDLKSLIAYSSVGHMGLVIAAALTRTEWACTGAITLMIAHGLTSSMLFCLANTNYERTNSRTLLLARNMQLLLPLMGLWWFCaur YGIIRMTMTLAPPLKMLSYPFMMLALWGLIMTGFICLRQTDLKSLIAYSSVGHMGLVIAAALTRTEWACTGAITLMIAHGLTSSMLFCLANTNYERTNSRTLLLARNMQLLLPLMGLWWFPsin YGIIRIMSTTNFLSKTPYYPFMILALWGIIMTSLICLRQTDLKSLIAYSSVSHMGLVTAATLTQTEWAYTGAITLMIAHGLTSSLLFCLANTNYERIHSRTLLLTQNMQLLLPLMGTWWLPmou YGIIRMTMTLAPPLKMLSYPFMMLALWGVIMTGFICLRQTDLKSLIAYSSVGHMGLVIAATLTQTEWACTGAITLMIAHGLTSSMLFCLANTNYERTNSRTLLLARNMQLLLPLMGLWWFPste YGIIRIMSTTDFLSKTLYYPFMMLALWGIIMTSLICLRQTDLKSLIAYSSVSHMGLVTAATLTQTEWAYTGAITLMIAHGLTSSMLFCLANTNYERIHSRTLLLTQNMQLLLPLMGMWWLCatr YGIIRMTMTLAPPLKMPSYPFMMLALWGVVMTSFICLRQTDLKSLIAYSSVSHMGLVIATTLTRTEWACTGAITLMIAHGLTSSMLFCLANTNYERTSSRTLLLARNMQLLLPLMSLWWFCfla YGIIRMTMTLAPPLKMFSYPFMMLALWGVIMTGFICLRQTDLKSLIAYSSVGHMGLVIAAALTRTEWACTGAITLMIAHGLTSSMLFCLANTNYERTNSRTLLLARNMQLLLPLMGLWWFTtri YGIIRIMMTTALTFKTPYYPFMILALWGIIMTGLICLRQTDLKSLIAYSSVSHMGLVIAATLTQTEWAYTGAITIMIAHGLTSSMLFCLANTNYERIHSRTLLLTQNMQLLLPLMGTWWLSqua YGIIRMTMTLAPPLKFLSYPFMMLALWGVIMTGFICLCQMDLKSLIAYSSVSHMGLVIAATLTKTEWACTGAITLMIAHGLTSSMLFCLANTNYERTNSRTLILARNMQLLLPLMGLWWFPsub YGIIRVTLMLNPLTKSLSYPFMTLSLWGIIMTGLICLRQTDLKSLIAYSSVGLMGLVISAALLQTPLSITGAIILMIAHGLSSSMLFCLANTNYERTHNRTLLLTHSMQTLLPLMTIWWLKleu YGIMRTTMATYPLPKKLHYPFTILALWGLIMTSSICLRQTDLKSLIAYSSVAHMGLVIAATLTQTKKAYTGATTLMIAHGLTSSMLFCLANTNYERTHSRMLLLTQNMQLLLPLMATWWLLpun YGIIRIMLTTNFMSKTLYYPFIILGLWGIIMTSLICLRQTDLKSLIAYSSVSHMGLITAATLTQTEWAYTGTIILMIAHGLTSSMLFCLANMNYERIHSRTLLMAQNMQTLLPLMGTWWLCins YGTIRILMSTNLPTNNLSYPFIILALWGMIMTGSICLRQTDLKSLIAYSSVSHMGLVIAAALTQTEWGNNGAIALMITHGFTSSMLFSLANMNYERIHSRIMLMAQNMQLLSPLLSLWWL

Lepi LASLTDMALPPTIDLMZKLTIITSLFNWSNITILMTZLZTLITATYTLYMLSTTQWZKTPSYMKTIPPTHTRKHLLMSLHILPMILLMTKPKLIWSPFY Lepi LASLTNMALPPTINLMGELTIITSLFNWSNITILMTGLGTLITATYTLYMLSTTQWGETPSYMKTIPPTHTREHLLMSLHILPMILLMTKPELIWGPFYEret LASLTNMALPPTINLMGELTIITSLFNWSNITILLTGLRTLITATYTLYMLSTTQWGETPSYIKTISPTHTREHLLMSLHILPMTLLMMKPELIWGPFYCmyd LASLTNMALPPTINLMGELTIIASLFNWSNITILMTGLGTLITATYTLYMLSTTQWGETPSYIKTIPPTHTREHLLMSLHTLPMILLMMKPELIWGSFHTgra LASLTNMALPPTINLMGELTIIASLFNWSNITILMTGLGTLITATYTLYMLIMTQWGETPSYMKTIPPTHTREHLLMTLHILPMILLMMKPELIWGNFYTher LASLTNMALPPTINLMGELTIITSLFNWSNITILMTGLGTLITATYTLYMLVMTQWGETPSYMKTIPPTHTREHLLMTLHILPMTLLMMKPELTWGTFYIfor LASLTNMALPPTINLMGELTIITSLFNWSNTTILMTGLGTLITATYTLFMLITTQWGETPSYMKTIPPTHTREHLLMTLHILPMILLMMKPELTWGTFYTkle LASLTNMALPPTINLMGELTIIVSLFNWSNITILMTGLGTLITATYTLYMLITTQWGETPSYMKTIPPTHTREHLLMMLHILPMILLTMKPELIWGNFYTmar LASLTNMALPPTINLMGELTIIASLFNWSNITILMTGLGTLITATYTLYMLITTQWGETPSYVKTTPPTHTREHLLMTLHILPMMLLMMKPELIWGNFYCser LASLTNMALPPTINLMGELTIIASLFNWSNTTIIMTGLGTLITATYTLYMYSTTQWGNMASYIKTIPPTHTREHLLMALHILPMMLLMIKPQLIWGTFYMtor LASLTNMALPPTINLMGELTIITSLFNWSNITILMTGLGMLITATYTLYMLITTQWGKTPSYMKTISPTHTREHLLMMLHIMPTILLMMKPELIWGNFYMimp LASLTNMALPPTINLMGELTIITSLFNWSNATILMTGLGTLITATYTLYMLITTQWGETP-HMKTIPPTHTREHLLMTLHVMPMTLLMVKPELVWGIFYMtem LASLTNMALPPTINLMGELTIIASLFNWSNITITMTGLGTLITATYTLYMYSTTQWGKMASYIKAIPPTHTREHLLMALHILPMMLLVMKPQLIWGAFYIelo LASLTNMALPPTINLMGELTIITSLFNWSNTTILMTGLGMLITATYTLFMLITTQWGETPSYMKAIPPTHTREHLLMTLHILPMILLMMKPELTWGTFY Memy LASLTNMALPPTINLMGELTIIASLFNWSNTTILMTGLGTLTTATYTLYMLITTQWGETPSHMKTIPPTHTREHLLMMLHILPMTLLVMKPELIWGIFYGpar LASLTNMALPPTINLMGELTIIVSLFNWSNITILMTGLGTLITATYTLYMLTTTQWGETPSHTKSIPPTHTREHLLMALHILPMALLMTKPELIWGTFYPmeg LASLANMAIPPTINLMGELTIIASLFNWSNITILMTGLGTLLTATYTLHMLSTTQWGKTPSYIKTIPPTHTREHLLMALHTMPMALLMVKPELIWGAFHMter LASLANMAIPPTINLMGELTIIASLFNWSNITILMTGSGTIITATYTLYMLATTQWGGTPSYIKTMPPTHTREHLLMVLHTLPMMLLVMKPELIWGVFYTscr LASLANMAIPPTINLMGELTIITSLFNWSNITILMTGSGTIITATYTLYMLSTTQWGGTPSYIKTMPPTHTREHLLMILHILPMTLLVMKPELIWGTFThor LASLTNMALPPTINLMGELTIIASLFNWSNITILTTGLGTLITTTYTLYMLITTQWGQTPSYMKTIPPTHTREHLLMALHALPMILLMMKPELTWGTFYCpic LTSLANMAIPPTINLMGELTIIASLFNWSNITILATGLGTIITATYTLYMLSTTQWGGTPSFIKMMPPTHTREHLLMILHILPMTLLMMKPELIWNTFYMmut SASLTNMALPPTINLMGELTIIASLFNWSNTTILMTGLGTLLTAAYTLYMLITTQWGESPSHTKTIPPTHTREHLLMMLHMVPMALLMVKPELICgal SASLTNMALPPTINLMGELTIIISLFSWSNITILMTGLGTLMAATYTLYMLITTQWGETPSHMKTIPPTHTREHLLMMFHTLPMALLMMQPELICamb SASLTNMALPPTINLMGELTIIVSLFNWSNITILLTGLGTLMTAAYTLYMLTITQWGETPSHMKTIPPTHTREHLLMMLHMLPMALLMMQPELICree SASLTNMALPPTINLMGELTIIVSLFNWSNTTILMTGLGTLLTAAYTLYMLITTQWGETPSYTKTIPPTHTREHLLMMLHMLPMALLMAKPELIWGSFHAfer LASLTNMALPPTINLMGELTIITSLFNWSNTTIIITGLGTLITATYTLHMFSSTQWGELPQHIKTISPSHTREHLIMTLHILPMMLLMMKPELIWGPFHCpan SASLTNMALPPTINLMGELTIIVSLFNWSNTTILMTGLGTLMAAAYTLYMLIITQWGETPSHTKTIPPTHTREHLLMMLHMLPMALLMTQPELICaur SASLTNMALPPTINLMGELTIIVSLFNWSNTTILMTGLGTLMAAAYTLYMLIITQWGETPSHTKTIPPTHTREHLLMMLHMLPMALLMTQPELIPsin LASLTNMALPPTINLMGELTIITSLFNWSNTTIIITGLGTLITAIYTLHMFSSTQWGELPQHIKIMAPSHTREHLIMTLHILPMTLLMMKPELIWGPLYPmou SASLTNMALPPTINLMGELTIIISLFNWSNTTILMTGLGTLMAAAYTLYMLIITQWGETPSHTKTIPPTHTREHLLMMLHMLPMVLLMMQPELIPste LASLTNMALPPTINLMGELTIITSLFNWSNPTIIITGLGTLITATYTLHMFSSTQWGKLPQHIKTITPSHTREHLIMTLHTMPMVLLMMKPELIWGPLYCatr SASLANMALPPTINLMGELAIIVSLFNWSNTTILMTGLGTLMTTTYTLYMLITTQWGETPSHTKTIPPTHTREHLLMTLHILPMMLLMMKPELICfla SASLTNMALPPTINLMGELTIIVSLFNWSNTTILMTGLGTLMAATYTLYMLIITQWGETPSHTKTIPPTHTREHLLMMLHMLPMALLMMQPELITtri IASLTNMALPPTINLMGELTIITSLFNWSHPTIILTGLGTLITATYTLHMFSSTQWGELPQHMKTITPTHTREHLIMTLHVLPMMLLTLKPELIWGPFSqua SASLANMALPPTINLMGELTIIVSLFNWSDTTILMTGLGTLMTAAYTLYMLITTQWGETPSHTKTISPTHTREHLLMMLHILPMMLLMMKPPsub LASLMNMALPPTINLMGELTIIASLFSWANITIILTGLGTLISALYSLHMFSTTQWGGTPPHHMHTITPSHTREHLIMMLHMVPLILLMMKPQLMKleu LASLTNMALPPTVNLMGELIMIVAIFNWSHVTILMTGLGALITAIYTLYMFSSTQWNELPSYIKTISPTHTREHLIMALpun LASLTNMALPPSINLMGELTIITSMFIWSPPTILITAIGTLITAMYTLHMFSTTQWGPHLTHMKSTSPSHTREHLTITLHILPSILLMTKPEPIWGYFYCins LTCMNNMALPPTINLLSELTSMISLFNWSSTTIILTGLSTLITATYTLYMFTTTQWSQLPSHIKNISPTQTREHLISIFHIMPPMLLILKPELIWG

Fig. 1. (Continued) .

Page 8: Overlapping genetic codes for overlapping frameshifted genes in Testudines, and Lepidochelys olivacea as special case

iology

BepittbtsottEstmpasmac

Taaecmslttm2iatroctodicas

2

d

FlomTsgAgi–Ds

H. Seligmann / Computational B

hattacharyya and Adhya, 2004; Mager-Heckel et al., 2008; Rubiot al., 2008; Duchène et al., 2009), but these cytosolic tRNAs arerobably loaded with arginine, the amino acid that matches AGR

n nuclear genes. It is possible that the relevant mitochondrialRNA synthetase differs in this respect and aminoacylates glycineo these tRNAs from the cytosol. However, a combination of furtherioinformatic analyses indicates possible mitochondrial origins forRNAs matching stops. It is likely that antisense tRNAs are at least inome conditions expressed (Seligmann, 2010b, 2011b), and somef these have anticodons matching stops (antisense antitermina-ion tRNAs (Seligmann, 2010c)). Running the tRNA search programRNAscanSe (http://lowelab.ucsc.edu/tRNAscan-SE/, Lowe andddy, 1997; Schattner et al., 2005) on the antisense sequences of theense tRNAs of Lepidochelys (after decreasing the COVE cutoff valueo −100) detects for the antisense of tRNA CUA (anticodon UAG,

atches leucine) an anticodon which is the exact inverse com-lement of the sense anticodon (anticodon CUA). This antisensenticodon matches stop codon UAG. For another antisense tRNA (ofense tRNA Ser UCA), the anticodon is not the exact inverse comple-ent of the anticodon of the sense tRNA (a case of sense-antisense

nticodon asymmetry, see Seligmann, 2010b,c, 2011b), the anti-odon’s location is shifted and corresponds to stop codon AGG.

A further bioinformatic application, tfam (http://tfam.lcb.uu.se,aquist et al., 2007), aligns input sequences of putative tRNAs with

collection of tRNAs with experimentally known cognate aminocid. Comparisons between similarities of the input sequence andach functional group of reference tRNAs suggests which is theognate of the input tRNA. This prediction is based on the pri-ary structure (the linear sequence), and does not use cloverleaf

econdary structure prediction to detect anticodons in anticodonoops as done by tRNAScanSe. According to tfam’s alignments,he antisense sequence of sense tRNA Ser UCN is most similar tohe functional tRNA group with cognate glycine. This result also

atches well with other analyses (see last figure in Seligmann,011b) that indicate that the antisense of tRNA Ser UCN (at least

n primates) is the antisense tRNA most adapted for translationalctivity. Hence all the evidence available (alignments between pro-ein sequences and alignments between tRNA sequences) clearlyeassign glycine to AGR codons in the overlapping genetic codef Lepidochelys. For the antisense of tRNA CUA, the result is lesslearcut, but a positive association exists between the similari-ies with functional tRNA groups output by tfam and the numberf times their cognate amino acid was found, in the alignmentsescribed in previous sections, matching with UAR codons. Accord-

ngly, tryptophan would be the most probable amino acid for UARodon reassigments in Lepidochelys, though the evidence is weaknd the issue of reassigning UAR to tryptophan in Lepidochelyshould be considered as speculative at this point.

.10. Overlapping genes in other turtles

It is important to note that blast’s alignment search can onlyetect alignments with proteins that are in Genbank’s database. For

ig. 1. Alignment of the putative protein translated from the +1 frameshifted sequence

onian ND4 proteins, as detected by Genbank’s blast. The first row marked by ‘Lepi’ is forf Lepidochelys olivacea (DQ486893) in the second line. Sites matching stop codons UAR aatch stop codons in Lepidochelys. Other turtles are: Eret – Eretmochelys imbricata, NC

her – Testudo hermanni, DQ080046; Ifor – Indotestudo forstenii, NC 007696; Tkle – Testuerpentina, NC 011198; Mtor – Malacochersus tormieri, NC 007700; Mimp – Manouria imprata, NC 007695; Memy – Manouria emys, NC 007693; Gpar – Geochelone pardalis, NC 00CB47140*; Tscr – Trachemys scripta, NC 011573; Thor – Testudo horsfieldii, NC 007697; Cpalbinifrons, NC 014102; Camb – Cuora amboiensis; Cree – Chinemys reevesi, NC 006082; Aftata, NC 009509; Psin – Pelodiscus sinensis, NC 006132; Pmou – Pixidea mouhotii, NC 0109Cuora flavimarginata, NC 012054. Other complete genomes used in this study but not paogania subplana, NC 002780; Kleu – Kinosternon leucostomum, NC 014577; Lpun – Lyssem

ubrufa, NC 001947; Squa – Sacalia quadriocellata, NC 011819; Ttyr – Tryonix triunguis, NC

and Chemistry 41 (2012) 18– 34 25

example, searching for alignments with the protein sequence of achelonian cytochrome B cannot retrieve the cytochrome B of Lepi-dochelys, because, as it is annotated in Genbank by now December2010, the protein coded by that gene differs a lot from cytochromeB. Only the +1 frameshifted sequence of the gene (in relation towhat is annotated at this point) matches cytochrome B, but thisprotein sequence is by now not in Genbank’s database, due to theinaccuracy in annotation because the gene has two overlappingopen reading frames, but only one is annotated in Genbank. Inorder to detect potential overlapping genes in turtles other thanLepidochelys (only for species with complete mitochondrial genomeavailable in Genbank), I also translated the frameshifted sequencesof all their protein coding genes, and used blast, as done for Lep-idochelys sequences, to search for matching protein sequencesfrom Genbank. In the majority of cases, the proteins matchingframeshifted sequences were the proteins translated from the openreading frames of Lepidochelys, described in previous sections, andwhich are not the typical conserved proteins coded by these mito-chondrial genes. Hence such alignments were detected for genesfrom a large percentage of turtle species with most proteins fromLepidochelys genes (frames of the frameshifted proteins are indi-cated between parentheses): ND1 (+2), CO1 (+2), CO2 (+2), CO3(+2), ND3 (+1, +2), ND4l (+2), ND4 (+2), ND5 (+1, +2), ND6 (+1,+2) and Cytb (+1). The overlapping gene is in most cases in the+2 frameshifted sequence. Note that this method detects occa-sionally programmed frameshifts. For example, residues 43–76 ofthe +1 frameshifted sequence of ND2 in Carettochelys insculpta(ACO83360) align with 85% similarity with homologous sites inthe protein coded by the regular open reading frame of the ND2gene of C. mydas (NP 008765) and 8 other turtles (with lesser sim-ilarities). This frameshifted sequence in Carettochelys is stopless.The data collected across all genes and species could enable a sys-tematic analysis of these cases, in terms of species, genes, andregions of genes, as well as the evolution of these programmedframeshifts, the specifics of the properties of nucleotide sequencesassociated with these programmed frameshifts (notably their pri-mary and secondary structures, the presence or absence of stops,and further analyses of associations with circular codes), all topicsof great interest. Here I focus on results indicating the existenceof overlapping genes and overlapping genetic codes, though thesecases are not necessarily stopless as in the ND2 example of Caret-tochelys as described above and elsewhere (see examples for ND3in Russell and Beckenbach (2008)). Alignments with proteins fromnon-chelonian origins, though interesting too, are not included inthese analyses because these are less useful in terms of detectingstop codon reassignments to amino acids.

The frameshifted sequences typically contain stops. Hence ifthese proteins are to be expressed (at least occasionally) in turtlespecies other than Lepidochelys, as they apparently are routinely

in Lepidochelys because they match the regular open readingframes in Lepidochelys, stop codons have to be reassigned in theseturtle species too. The alignments with the proteins translated fromthe open reading frame of Lepidochelys can indicate stop codon

of gene ND4 in the Olive Ridley Lepidochelys olivacea (NC 011516) with other che- the sequence from NC 011516, which aligns with the regular open reading framere indicated by X, those matching AGR are indicated by Z. Underlined amino acids012398; Cmyd – Chelonia mydas, NC 000886; Tgra – Testudo graeca, NC 007692;do kleinmanni, NC 007699; Tmar – Testudo marginata, NC 007698; Cser – Chelydraessa, NC 011815; Mtem – Mauremys temminckii, NC 009260; Ielo – Indotestudo elon-7694; Pmeg – Platysternon megacephalum, NC 007970; Mter – Malaclemys terrapin,ic – Chrysemys picta, NC 002073; Mmut – Mauremys mutica, NC 009330; Cgal – Cuoraer – Apalone ferox, NC 014054; Cpan – Cuora pani, NC 014401; Caur – Cuora aurocap-73; Pste – Palea steindachneri, NC 013841; Catr – Cyclemys atripons, NC 010970; Cflart of the alignment in this figure: Cins – Carettochelys insculpta, NC 014048; Dsub –ys punctata, NC 012414; Mmut – Mauremys mutica, NC 009330; Psub – Pelomedusa

012833. *Species for which no complete mitochondrial genome is available.

Page 9: Overlapping genetic codes for overlapping frameshifted genes in Testudines, and Lepidochelys olivacea as special case

26 H. Seligmann / Computational Biology and Chemistry 41 (2012) 18– 34

Frame+1 of cytochrome B, Chelonia myda s

Chelonia 19 SSTYQAPPTSLHDETSDH YXPP AXHYKLSPES FXQYTTHQTS PXLFHQLLTSPEMYNTAD 7 8SS YQAP TSLHDETSDHY PPA HYK ES QY THQ SP FHQ SPE YNT D

Lepidochelys 20 SSIYQAPLTSLHDETSDH YWPP AWHYKSLLES SWQYITHQMS PWPFHQSPMSPETYNTDD 7 9

Chelonia 79 LSVTYMLTAPPYSSYVSTSMLDEEFTTVPTYMKKPETLESSFY YXSXLPH SXATSYHEDK 13 8LSVT+M T PYSSY STS L EE TTVPT+ KKPET E S YY H ATSYHED K

Lepidochelys 80 LSVTHMPTEHPYSSYASTSTLGEESTTVPTFTKKPETPELSSY YWWWPLH LWATSYHEDK 13 9

Chelonia 139 YHFEGPPSSQTYSQPSHTSAT HXYNESEEGF QXTMQ PXPDSSPSTSYYHLPLPALQQYIY 19 8YHFEGPPS TY Q HTS H YNESE Q T QP PDSSPSTSYYH PL LQ YI Y

Lepidochelys 140 YHFEGPPSLPTYYQLFHTSEM HWYNESEGDS QWTTQ PWPDSSPSTSYYHSPLQVLQWYIY 19 9

Chelonia 199 YSCTKQDQTTQQ DXIQMPTKSPSTPTSPTKT YXDS FXYXLS SXPXHFSPPT YXETQTTSH 25 8YS TKQ QTTQQD Q TKS STPTSPTKTY D Y S P HFSP TY ETQTTS H

Lepidochelys 200 YSYTKQAQTTQQ DWTQTLTKSLSTPTSPTKT YWDL SWYWPF SWPWHFSPHT YWETQTTSH 25 9

Chelonia 259 QPTLYPLLPTSNQNDTSYLPTQSYDQSQT NXAE YXPYYSP SXFY SXYPPYTRQNNEQPHS 31 8Q T YP LPTSNQNDTSY P QSYDQ QTN AEY PYY P +YS YPPYT QNNEQ H

Lepidochelys 260 QLTPYPPLPTSNQNDTSYSPMQSYDQFQT NWAE YWPYYPP FWYY SWYPPYTHQNNEQLHF 31 9

Chelonia 319 DHSPKSYSD PXXLT SXYXHESEDNQSKTHSLSSA KXPLHFT SXSYYSLYLLQ VXSKTKY 37 7DHSPKSY DP L Y HESE N SK H AK PL FTS Y+S Y + SKTK Y

Lepidochelys 320 DHSPKSYFD PWWLI YWYWHESEANPSKIHLSPLA KQPLAFT SWFYFSSYPWR AWSKTKY 37 8

b

a

Frame+2 of ND4, Chelonia mydas, UAR-X; AGR-Z.Chelonia 5 TPNNYTITNNHI MXTKTTMTFYTN PXPNNRHSKFTM IXTL HZTNHKLLQLL PZGZPNFSP 6 4

TPNNY TNNH M TKTTMTF+ NP PNN HSK TMI T H TNHKL Q L PN S Lepidochelys 6 TPNNYITTNNHT MWTKTTMTFHIN PQPNNCHSKLTM IWTFHKTNHKLFQWL YKNKPNLSS 6 5

Chelonia 65 FTNPIMLTYPPNNLSQPKPFNHWTNFTKTNLYLHHYPTTNLTNSSFLN HZINYIFYHIWN 12 4TNPIMLTYP NNL QPKP + T FTKTNLY +++ TTNLTN+S LNH NYI YHIWN

Lepidochelys 66 TTNPIMLTYPYNNLGQPKPLSPKTIFTKTNLYFNYHLTTNLTNTSLLN HKTNYILYHIWN 12 5

Chelonia 125 HTHPNISNHHT MZXSNZTTK CZNLLPILYPYWISPTTNRPPISKHRKWLPIYTYNTTKPT 18 4HT+PN SNHHTM N TTKC NLLPILYP+WISPTTNRP I HRK I YNTTK T

Lepidochelys 126 HTYPNTSNHHT MKWPNKTTK CKNLLPILYPHWISPTTNRPLIPMHRKRFLINMYNTTKST 18 5

Chelonia 185 YYTKLMNPHNMMICTTNSFY NXNTTMWITPVITKST RZSSN CZLNNSSRSITK TZZMWYY 24 4+YTKLMN +NMMICTTNS YN NTTM +T ++TKST S NC LNN SRSITKT MWY Y

Lepidochelys 186 HYTKLMNLYNMMICTTNSLY NQNTTMRLTFMMTKST CKSPN CKLNNPSRSITK TKKMWYY 24 5

Chelonia 245 PHYNNTKPLIKNTFLPLHSTRT MZSNH NXLHLLTT NZPKIINCLLISKS YZPSYCLNTNT 30 4PHYNNTKPLIKNT LP H TR M NHN +LLTTN KIIN LLISK + ++ NTN T

Lepidochelys 246 PHYNNTKPLIKNTLLPFHGTRI MKGNH NWFYLLTT NKSKIINRLLISKP HKTNHRRNTNT 30 5

Chelonia 305 NSMSMHWRYHTNNCPRPNIINTLLLSQHKLRTDPQPNTIISSKHTTLTPTNKPMMTTCQL 36 4N MS+H R +T N P N+INTLL SQHKLRT+ QPNT S K+TT TP NK MMTTCQ L

Lepidochelys 306 NPMSLHRRNYTYNRPWLNVINTLLPSQHKLRTNSQPNTTTSPKYTTPTPPNKSMMTTCQL 36 5

Chelonia 365 NQHSPSTNHQS NZZINHYCLTVQLIQHYNSN NZIZDPNHRHLHPMHTIHNT MZZNTLM HX 42 4N HSP TNH N IN++ T+QLIQHYNSNN I +PN+R+LH MH I+NTM NTLMH

Lepidochelys 366 NWHSPPTNHWP NKKINYHYFTIQLIQHYNSN NKIKNPNYRYLHSMHIIYNT MKKNTLM HK 42 5

Chelonia 425 NYPTNPH TZTPPHITTYPTNNSTNNK TZTNL 45 5NYP NPHT T PHI T+PTNNSTNNKT TNL

Lepidochelys 426 NYPPNP HTKTSPHIITHPTNNSTNNK TKTNL 45 6

Fig. 2. Alignment between putative proteins translated from frameshifted sequences of Chelonia mydas genes and open reading sequences of Lepidochelys. (a) For the +1frameshifted sequence from CytB of Chelonia mydas and (b) for the +2 frameshifted sequence from ND5 of Chelonia mydas. Sites matching stop codons UAR are indicated byX ns in

rc(tomsc(wiArwbi

, those matching AGR are indicated by Z. Underlined amino acids match stop codo

eassignments, as done for frameshifted sequences of Lepidochelysompared with proteins from open reading frames of other turtlesi.e. Fig. 1). I focus here on alignments between putative proteinsranslated from frameshifted sequences of C. mydas and those frompen reading sequences of Lepidochelys. Fig. 2a and b shows align-ents for the putative protein translated from the +1 frameshifted

equence of the Cytb gene of Chelonia with the unknown proteinoded by the overlapping gene in the Cytb gene from LepidochelysFig. 2a), and the protein from the +2 frame of ND4 of Cheloniaith the unknown protein from the open reading frame from Lep-

dochelys (Fig. 2b). The +1 frameshifted sequence does not containGR codons, and hence is uninformative in terms of amino acid

eassignment for that stop codon family. It contains 28 UAR codons,hich match in 27 cases tryptophan in the unknown protein coded

y the overlapping gene in the Cytb gene of Lepidochelys. Hence,f codons UGA and UGG in the regular vertebrate genetic code in

Lepidochelys.

Lepidochelys indeed code for tryptophan (there are no a priori rea-sons for that not to be the case), UAR codons in Chelonia (and otherturtles examined) are reassigned to tryptophan in the overlappinggenetic code for overlapping genes. This reassignment matches thereassignment of UAR codons in ND4 of one of the two available Lepi-dochelys genomes, DQ486893. Here again, the fact that 97% of these28 UAR stops align with Trp precludes that this systematic patternin the result is due to chance or various types of artefacts, annota-tion errors and/or DNA decay. This suggests that the reassignmentof UAR to Trp is indeed a true biological phenomenon.

This reassignment is also confirmed by the alignment in Fig. 1for putative overlapping genes from ND4. The region aligning with

proteins translated from the open reading frame of Lepidochelyscontains 7 UAR codons, among which 4 align with tryptophan,2 with glutamine, and 1 with lysine. All 22 AGR codons arematched by lysine. Hence the association between AGR and lysine
Page 10: Overlapping genetic codes for overlapping frameshifted genes in Testudines, and Lepidochelys olivacea as special case

iology and Chemistry 41 (2012) 18– 34 27

rprbtmamff

(csfnctrng

2o

cssaoiotltlftttacni2etfctbitf(oafthiiI+w

Fig. 3. Coevolution between structural component of cove index (tRNAscan-Se,Lowe and Eddy, 1997) for cloverleaf formation of predicted antisense antitermi-nation tRNAs matching AGR codons in mitochondrial genomes of chelonian species

H. Seligmann / Computational B

eassigns AGR codons to lysine. The reassignment of UAR to trypto-han is slightly less clear in this case, perhaps because amino acideplacements confound codon reassignment. Indeed, the alignmentetween the frameshifted sequence of ND4 from DQ486893 andhe open reading frame from NC 011516, both from Lepidochelys,

atch all UAR codons with tryptophan (amino acid replacementsre less likely between closely related individuals). This reassign-ent UAR → Trp is also confirmed by alignments of proteins from

rameshifted sequences from other genes with proteins translatedrom the open reading frame of Lepidochelys.

This would suggest that the overlapping genetic code of turtlesexcluding Lepidochelys NC 011516) is stopless, unless some otherodon reassignment occurred and another yet undetected codonignals translational termination. The sequences translated fromrameshifted gene sequences contain very few codons with gua-ine at their first codon position (in C. mydas I did not find any GCRodon, nor any GAA codon in +2 frameshifted sequences), hencehe possibility exists that these (or some of these) codons wereeassigned to stops. It is also possible that other unknown mecha-isms stop translation in this context of expression of overlappingenes.

.11. Coevolution between antisense antitermination tRNAs andverlapping genes

Translation of most overlapping genes requires tRNAs with anti-odons matching stop codons. Bioinformatic analyses predict thatuch tRNAs exist, in the form of antisense tRNAs, which corre-pond to sequences complementary of sense tRNAs. Translationalctivity by such antisense tRNAs is plausible for a number of previ-usly exposed reasons (Seligmann, 2010b,c, 2011b). A major ones that capacities of regular sense tRNAs to form cloverleaf sec-ndary structures correlate positively with genome-wide usages ofheir cognate amino acid (Seligmann, 2010b, 2012b). This coevo-ution between the translational apparatus and the genes showshat translational requirements drive the microevolution of clover-eaf formation capacities. Such coevolution was also observedor the majority of antisense tRNAs (Seligmann, 2010b), one ofhe evidences suggesting translational activity also by antisenseRNAs. For antisense antitermination tRNAs, the rationale of theranslational constraint yields the opposite prediction, becausentitermination disrupts proper protein synthesis. Hence negativeoevolution between cloverleaf formation by antisense antitermi-ation tRNAs and in frame usage of the corresponding stop codons

s expected. Indeed, such negative coevolution exists (Seligmann,010c), and is evidence that antisense antitermination tRNAs arexpressed, but that several mechanisms suppress ‘in frame’ theirranslational activity. However, if proteins corresponding to therameshifted overlapping genes are translated, one expects positiveoevolution between capacities of antisense antitermination tRNAso form cloverleaves and overlap coding in the same genomes,ecause the frameshifted genes include stops. Though direct exper-

mental evidence is still lacking and existence of each antisenseRNAs and overlapping genes is only or mainly predicted by bioin-ormatics analyses, such coevolution between the two systemscoding and translational) is strong preliminary evidence in favorf the working hypothesis, in addition to the alignments from blastnalyses described in previous sections. One could argue that theact that antisense tRNAs form cloverleaf secondary structures isrivial due to the close symmetry between complementary strands,owever, coevolution between variation in this capacity and an

ndependent estimate of the functional need for translational activ-

ty by these putative antisense antitermination tRNAs is not trivial.ndeed, Fig. 3 shows positive coevolution between numbers of2 frameshifted chelonian mitochondrial protein coding genes forhich blast analyses find at least one alignment with proteins in

(x-axis) and numbers of alignments of Genbank sequences with +2 frameshiftedproteins translated from mitochondrial protein coding genes detected by blast forthe same chelonian mitochondrial genome.

Genbank, and the structural component of the COVE index of anti-sense antitermination tRNAs from the same chelonian genomespossessing anticodons matching AGR ‘stop’ codons. As explainedabove, this result that coevolution between putative antisenseantitermination tRNAs and putative overlapping genes occurs con-firms the working hypothesis that overlapping genes are translatedaccording to an overlapping genetic code induced by the expres-sion of antisense antitermination tRNAs, presumably loaded withlysine as cognate amino acid in chelonians (and with glycine in Lep-idochelys). The result in Fig. 3 confirms similar coevolution trendsobserved between antisense antitermination tRNAs and overlap-ping genes in primates and Drosophila (Seligmann, 2011a, 2012b;Faure et al., 2011).

2.12. Optimization of synonymous codon usages for overlapcoding

The statistical significance of the correlation in Fig. 3 con-firms independently the working hypothesis of overlapping genescoded by overlapping genetic codes in a third taxonomic group,in addition to primates (Seligmann, 2011a; Faure et al., 2011) andDrosophila (Seligmann, 2012b). A further approach could yield evi-dence in relation to overlap coding. Overlap coding after frameshiftinvolves that synonymous codon usages are optimized to enablecoding in more than one frame. This is testable by using simulationsthat randomize synonymous codon usages, maintaining naturalgenomic synonymous codon frequencies and coding properties ofthe gene’s regular main frame. The protein sequences translatedfrom the +1 and +2 frames of these simulated sequences are thenblasted in order to test whether these align with sequences existingin Genbank, exactly as done for the natural sequences. In total, 20simulated sequences were created for each gene, for which the sim-ulated protein sequences from the translated +1 and +2 frames wereblasted. Table 2 compares results from these simulated overlap-

ping genes with those from the natural sequences. For Lepidochelys,on average, blast detects 5.85 and 5.35 hits per simulated genomefor the +1 and +2 frameshifted sequences, respectively, as com-pared to 10 and 8 for the natural sequences. This was for natural
Page 11: Overlapping genetic codes for overlapping frameshifted genes in Testudines, and Lepidochelys olivacea as special case

28 H. Seligmann / Computational BiologyTa

ble

2C

omp

aris

ons

betw

een

blas

t

anal

yses

of

20

sim

ula

ted

sequ

ence

s

for

Lepi

doch

elys

gen

es

and

its

nat

ura

l gen

es, f

or

+1

and

+

fram

esh

ifte

d

sequ

ence

s.

Sim

ula

tion

s

Nat

ura

l

Gen

e

Ida

Fr

+

1

Fr

+

2

Fr

+

1

Fr

+

2

Hit

sbSt

artc

Len

gth

dSi

mil

eId

-sim

ilf

Hit

sbSt

artc

Len

gth

dSi

mil

eId

-sim

ilf

Star

tcLe

ngt

hd

Sim

ile

Star

tcLe

ngt

hd

Sim

ild

AT6

39

1

24.0

0

50.0

0

56

62

2

125.

50

90.5

0

46

8

0.00

0.00

AT8

42

0

0.00

1

1.00

33.0

0

61

−1

0.00

2.00

37.0

0 62

CO

1

39

20

6.35

264.

80

43

65

20

263.

90

211.

25

60

2

2.00

302.

00

92

263.

00

257.

00

82C

O

2

44

3

136.

67

35.3

3

60

9

16

113.

19

95.5

6

53

−23

37.0

0

34.0

0

97

81.0

0

129.

00

50C

O

3

39

7

84.4

3

92.1

4

48

47

18

179.

44

68.6

1

59

18

73.0

0

86.0

0

94

180.

00

64.0

0

73C

ytb

39

10

140.

90

85.9

0

47

7

20

21.2

5

359.

95

63

25

40.0

0

300.

00

37

31.0

0 32

7.00

38N

D1

39

19

65.6

3

192.

53

43

−29

3

225.

67

50.0

0

54

−4

1.00

267.

00

56

39.0

0

292.

00

76N

D2

42

0

0.00

5

164.

00

70.0

0

56

−44

0.00

0.00

ND

3

39

6

55.3

3

42.8

3

56

18

1

33.0

0

48.0

0

56

19

50.0

0

67.0

0

84

0.00

ND

4

40

20

17.9

5

427.

80

45

77

0

0.00

2.00

459.

00

97

0.00

ND

4l

38

8

29.2

5

60.1

3

55

42

0

0.00

13.0

0

87.0

0

82

0.00

ND

5

41

20

256.

65

336.

15

46

27

18

51.5

0

100.

22

57

34

219.

00

380.

00

87

26.0

0

117.

00

82N

D6

34

3

43.3

3

46.3

3

57

18

3

103.

67

38.6

7

58

15

100.

00

75.0

0

65

92.0

0

45.0

0

87

aM

ean

per

cen

tage

s

of

cod

ons

rem

ain

ing

iden

tica

l wit

h

nat

ura

l seq

uen

ces

afte

r

sim

ula

tion

s

ran

dom

izin

g

syn

onym

ous

cod

on

usa

ge.

bN

um

bers

of

sim

ula

ted

sequ

ence

s

alig

nin

g

acco

rdin

g

to

blas

t

wit

h

a

Gen

ban

k

sequ

ence

.c

Mea

n

loca

tion

of

the

firs

t

resi

du

e

in

the

blas

t

alig

nm

ent.

dM

ean

tota

l ali

gnm

ent

len

gth

.e

sim

ilar

ity

of

alig

nm

ent

wit

h

Gen

ban

k

sequ

ence

.f

Pear

son

corr

elat

ion

coef

fici

ent

(100

×)

betw

een

Id

and

sim

il

for

sim

ula

tion

s

yiel

din

g

alig

nm

ents

acco

rdin

g

to

blas

t.

For

com

par

ison

, th

e

sam

e

ind

icat

ors

are

give

n

for

the

+1

and

+2

fram

esh

ifte

d

nat

ura

l seq

uen

ces.

and Chemistry 41 (2012) 18– 34

sequences from each frame more than for any of the 20 simu-lated genomes. Not only are simulated sequences less frequentlymatching existing proteins in Genbank than natural sequences, onaverage, aligned sequences are, shorter for 8 among 10 genes and6 among 8 genes, for +1 and +2 frameshifted sequences, respec-tively (one sided sign tests, P = 0.027 and P = 0.072). In addition,mean similarities between simulated sequences and proteins exist-ing in Genbank were also lower than for the natural Lepidochelyssequences, for 9 among 10 genes and 6 among 8 genes for +1and +2 frameshifted sequences, respectively (one sided sign tests,P = 0.0054 and P = 0.072). Hence simulated sequences match lessexisting Genbank proteins in terms of basic probability of match-ing one, alignment length, and alignment quality, especially for the+1 frameshifted sequences from Lepidochelys.

Similar analyses were done for the genome of the relativelyclosely related C. mydas (in terms of chelonian phylogeny), whoseputative overlapping genes and genetic code is similar to other tur-tles (and differs from Lepidochelys which appears as a special case).Simulations for Chelonia yield results qualitatively similar to thosepresented in Table 2 for Lepidochelys and above in this section.The average total number of alignments for simulated sequencesis 12.45, which is less than 15 found for the natural sequence;mean alignment length was shorter in 11 among 13 combinationsof frame and genes (one sided sign test, P = 0.016); but note thatsimilarities were not consistently lower (only in 6 among 13 cases).

2.13. Overlap coding and simulated sequences

Analyses in the previous section show that results from blastanalyses for natural sequences are not due to the fact that Gen-bank is large and by chance any random protein sequence couldalign with an existing protein. The natural sequences align betterand more frequently with proteins in Genbank than randomizedsimulated sequences constrained to keep the main frame codingproperties. This is despite that in Lepidochelys, these constraintskeep on average 40% of the codons unchanged. This % varies slightlyamong genes, because randomization is affected by the eveness ofsynonymous codon usages and whether the amino acid is codedby 2, 4 or 6 synonymous codons (the more even usages andthe more synonymous codons for an amino acid, the greater theprobability that simulations alter synonymous codons). These con-siderations yield a further possibility for analyzing results fromsimulations. This approach was already successfully applied toDrosophila sequences (Seligmann, 2012b), indicating how natu-ral sequences are optimized for overlap coding. If synonymouscodon usage is optimized for overlap coding, the less the simula-tion process altered synonymous codons as compared to the naturalsequence, the greater should be the similarity between the simu-lated frameshifted sequence and the protein from Genbank. Hencepositive correlations are expected between percentages of iden-tities between simulated and natural sequences and similaritiesfrom the blast alignments of the same simulated sequence withthe protein in Genbank.

Alternatively, a negative correlation would indicate optimiza-tion to avoid overlap coding, a possibility that a priori should notbe ruled out and which would also be compatible with the generalspirit of the working hypothesis. A lack of correlation would meanthat alignments are random. The two last columns in Table 2 showthe correlation coefficients between identity percentage (betweensimulated and natural Lepidochelys sequences) and correspondingblast alignment similarities for each frame and gene for which dataenable such calculations in Lepidochelys. Correlations were positive

for 10, respectively 7, among 11 cases for +1 and +2 frameshiftedgenes (one sided sign tests, P = 0.0029 and P = 0.137, respectively).Among these correlations, five were significant at P < 0.05, all for+1 frameshifted sequences. Three correlations remained significant
Page 12: Overlapping genetic codes for overlapping frameshifted genes in Testudines, and Lepidochelys olivacea as special case

H. Seligmann / Computational Biology

Fig. 4. Pearson correlation coefficients r between alterations in natural sequencesdue to simulations randomly changing synonymous codon usage and similaritiesbetween alignments from blast between simulated frameshifted sequences andproteins in Genebank (y-axis, see text), as a function of mean identity between sim-ulated and natural sequences for that gene (x-axis) in Lepidochelys. Positive valueson the y-axis indicate that the more simulations altered the natural sequences, theless these simulated sequences resemble proteins existing in Genbank, indicatingoptimization of synonymous codon usage for overlap coding. Negative values on theyac

(poetisroc

2c

slartnss

sacitbIa(Sm

-axis indicate that synonymous codons in the natural sequence are optimized tovoid similarity with existing proteins, which also indicates an adaptation at overlapoding level and hence supports in that sense the working hypothesis.

P < 0.05) after Bonferroni’s overconservative correction for multi-le tests, which resets significance levels according to the numberf tests done (in this case P = 0.05/22 = 0.00227). Hence the gen-ral result that the more simulations altered the natural sequence,he less the simulated frameshifted proteins resemble proteinsn Genbank is verified. This means that blast hits for simulatedequences mainly result from the fact that the simulated sequencesesemble, by chance, the natural sequence, keeping intact muchf the synonymous codon usage that is optimized for overlapoding.

.14. Synonymous codon optimizations for and against overlapoding

The considerations in the previous section suggest that thepecificities of the amino acid contents of a gene determine to aarge extent the level of identity between the simulated sequencesnd the natural sequences, and hence this factor may influenceesults of correlation analyses as described in the previous sec-ion. Indeed, the mean level of identity between simulated andatural sequences affects the correlation between blast’s alignmentimilarity and the identity between the simulated and the naturalequence.

Fig. 4 shows that for genes with relatively low identity betweenimulated and natural gene sequences, the less the simulationffected the synonymous codon usage, the more the synonymousodon usage is optimized to match overlap coding constraints. Thiss the case for a majority of genes, as described in the previous sec-ion. However, for some genes with relatively high mean identityetween simulated and natural sequences, the opposite is the case.

n these genes, general properties in terms of amino acid usages

ffect synonymous codon usage so as that random perturbationssimulations) do not affect much frameshifted coding properties.ynonymous codon usages in these genes seem to have been opti-ized to avoid high similarities with existing Genbank proteins. It

and Chemistry 41 (2012) 18– 34 29

seems the overlapping genes in these cases had to adapt to require-ments specific to this species (Lepidochelys), though these cases arerelatively few and the correlations are at best borderline significantin statistical terms (for the +1 frameshifted sequence of ND1). How-ever, the evidence that speaks in favor of a possible phenomenonof optimization that avoids too high similarities of putative over-lapping genes with existing proteins, especially for proteins withsynonymous codon usages that cannot be easily altered by randomprocesses as evolutionary drift or simulations, is the general nega-tive trend observed in Fig. 4, which means that the phenomenon isnot only indicated by the few weak negative correlation coefficientsfound for some genes, but by the tendency that results from thedataset as a whole. Similar analyses for simulations of genes fromC. mydas yield tendencies very similar to those described for Lepi-dochelys in this and the previous section. These results from turtlesindependently confirm data suggesting occasional avoidance ofoverlap coding in Drosophila (Seligmann, 2012b).

2.15. Ratios between non-synonymous and synonymousmutations in overlapping genes

The ratio between non-synonymous and synonymous muta-tions (Ka/Ks) is frequently interpreted as an estimate of selectionon a protein coding gene (i.e., Yang and Nielsen, 2000). Evidence forselection on a sequence, as estimated by such ratios, can indicatewhether the sequence is expressed or not. Hence tests designed todetect overlapping genes, or confirm their existence, have focusedon Ka/Ks, suggesting positive selection in one frame and purifying(also termed stabilizing) selection in the other frame (i.e., Pavesi,2006; McCauley et al., 2007), but the fact that overlapping framesare not independent of each other makes this rather complex (fortentative solutions to this problem, see Sabath et al., 2008; Sabathand Graur, 2010).

Ka/Ks analyses suffer from the assumption that synonymousmutations are considered neutral, an approximation that is veryfrequently incorrect, also for non-overlapping genes: to cite anexample other than the off frame stop codon hypothesis describedin above sections, codon–anticodon mismatches differ betweensynonymous codons (Seligmann, 2010e, 2011d) and this propertyassociates with frequencies of pathogenic synonymous mutationsin mitochondria, suggesting selection against synonymous codonswith high mismatch probabilities, and in favor of those with lowermismatch probabilities, especially if these are matched with mis-acylation of the corresponding tRNA by the non-cognate amino acidcoded by the mismatched codon (Seligmann, 2012a).

In addition, Ka/Ks analyses are based upon phylogenetic recons-tructions of ancestral sequences, which are themselves frequentlyinaccurate (see for example discussion in Seligmann, 2010f). Thetests designed to account for lack of independence between over-lapping frames also fail to consider a possible hierarchy betweenthe overlapping genes. However, the mitochondrial gene overlapsdescribed here and elsewhere (Seligmann, 2011a, 2012b) suggestthat the main frame is also primarily expressed, and that the otherframe is expressed at special, rarer conditions. The simulation anal-yses described in previous sections do not suffer from the caveatsof the Ka/Ks approach, but neither are perfect alternatives. How-ever, some special cases among the candidate overlapping genesdo enable analyses without complex phylogenetic reconstructions,because sequences from closely related taxa or the same taxon varyin terms of overlap coding, such as ND4 and CytB in Lepidochelys,and CO1 in the silk moth Samia (Seligmann, 2012b). A further case

was observed for a large section of ND4 in the snake Cylindrophisruffus, and is also analyzed here. This is not an extensive survey ofsuch clear cases of mitochondrial overlap coding, and other casesexist.
Page 13: Overlapping genetic codes for overlapping frameshifted genes in Testudines, and Lepidochelys olivacea as special case

3 iology

pCciculcrt

3fslism3tefotowowiemdae

ousoBadwuEwtuu

asma(fpctAftcts

0 H. Seligmann / Computational B

These four cases split into two groups: when both genes are sto-less according to the regular genetic code in one sequence (CytB,O1, and ND4 (Cylindrophis)), and when the genetic codes asso-iated with each protein are switched between sequences (ND4n Lepidochelys, comparing sequences from DQ486893 where ND4oding is ‘normal’ and that from NC 011516, where ND4 coding isnusual, after a frameshift, and including numerous stops). In the

atter case, as well as all the other cases, I consider that the unusualoding frame structure of the gene is derived, while the ‘usual’ geneeflects the ancestral state. These special cases were chosen becausehis hierarchy is unequivocal.

In ND4 from Lepidochelys, the unusual sequence mutated at1 locations, as compared to the normally coded ND4. All dif-erences were nonsynonymous, including 10 mutations creatingtop codons, which were always coding for a glycine in the regu-ar ND4. This extreme situation, where no synonymous mutations observed, is highly unusual. When comparing the frameshiftedequence in DQ486893 that codes for the unknown protein thatatches the stopless main frame of ND4 in NC 011516, there are

2 differences, none is synonymous, and 31 of these correspondo stops in DQ486893 that were systematically mutated to matchither Lys or Trp in NC 011516, according to which stop codonamily, AGR or UAR, occurs in DQ486893 (as reported in a previ-us section). This again shows a highly unusual situation, wherehere are no synonymous mutations at all. In addition, the nonsyn-nymous mutations are highly directed, to match specific residuesith stops. This indicates very high levels of positive selection

n each overlapping gene. I remind here that such a situation,here all mutations are restricted to very specific, directed cod-

ng changes, cannot result from sequencing or annotation/typingrrors, nor from DNA decay or denaturation, which would result inuch more diverse, random patterns. The fact that the two Lepi-

ochelys genomes differ specifically in ND4, but not in other genes,nd in this non-random way, cannot be interpreted as a probablerror.

Cytb in Lepidochelys has two stopless, open reading frames,ne codes for the regular cytochrome b, and the other for annknown protein. This situation occurs in both genomes from thatpecies, but can be compared to CytB in E. imbricata, which hasnly a single, regular open reading frame coding for cytochrome. The situation in Eretmochelys is considered again as ancestral,nd that in Lepidochelys as derived. The comparison shows 112ifferences between the real CytB of these two species, amonghich 104, the wide majority (92.3%), were synonymous. For thenknown protein coded by a frame that usually contains stops, as inretmochelys, there were 352 mutations, among which 251 (71%),ere synonymous. This difference indicates that there is more posi-

ive selection (relatively more non-synonymous mutations) on thenusual overlapping gene than on Cytb, hence indicates that thenusual overlapping gene is functional.

For CO1 from the silkmoth Samia, two frames are stopless in given sequence (AAZ06647). According to this sequence, expres-ion of the regular CO1 protein requires a frameshift. In contrast, forost CO1 sequences from Samiai found in Genbank, CO1 is coded

s usual, and the frameshifted sequence includes numerous stopssee Seligmann, 2012b). Here too, the unusual sequence where tworames are stopless is considered as derived, and compared to theresumed ancestral sequence with only one open reading frameorresponding to CO1 (JN215366). At codon 238 of CO1 (about inhe middle of CO1), a frameshifting mutation switches in sequenceAZ06647 between frames. While in the regular sequence, that

rameshifted sequence includes 30 UAR stop codons, in AAZ06647,

hese mutated to code systematically for serine. The actual frameoding for the second half of CO1 in AAZ06647 mutated at 47 loca-ions, 31 (66%) of these are nonsynonymous. The correspondingtopless region that does not code for the regular CO1 includes

and Chemistry 41 (2012) 18– 34

44 mutated codons, all of which (100%) are nonsynonymous, indi-cating very high levels of directional/positive selection on thatframe.

A similar frameshifting mutation occurs at codon 228 of ND4of the snake C. ruffus in a given sequence (ABN55914). This regioncontains 10 stop codons in the frameshifted sequence of the regu-lar ND4 gene of that species (NC007401), but in ABN55914 all thesecases mutated to different amino acids, and no stops occur in thatregion, for two frames, one coding, after frameshift, for the regu-lar ND4, and the other, for an unknown protein. The frame codingfor ND4 in ABN55914 differs at 109 codons as compared to theregular gene (NC007401), among which 26 (24%) are nonsynony-mous mutations. The corresponding region that does not code forND4 but is stopless in ABN55914 has 107 mutated codons, amongwhich 103 (96%) are nonsynonymous mutations.

The results from the analyses of these various cases of overlap-ping genes show varying situations, but positive selection, in termsof high (or even very high) proportions of nonsynonymous muta-tions, are in all four cases observed for the stopless frame that codesfor the unusual protein. There was a similar, extreme tendency forpositive selection in the frame coding for ND4 in Lepidochelys, butin all other cases, the frame coding for the regular protein includedlarger percentages of synonymous mutations. This indicates thatregular main frame genes are under selection to undergo muta-tions that accommodate overlap coding, and this specifically to alterthe amino acids coded by the overlapping gene, resulting in veryfew synonymous mutations at the level of the overlapping genes,which is interpreted as high positive selection on that frame of thesequence. The pressures to accommodate overlap coding are strongenough to cause majorities of nonsynonymous mutations also inthe regular gene, at least in ND4 of Lepidochelys and CO1 of Samia,but this is not the case in Cytb of Lepidochelys and ND4 of Cylin-drophis. More examples probably exist and have to be analyzed tounderstand better these phenomena. They however clearly suggestnonrandom patterns in terms of joint Ka/Ks ratios in these overlap-ping genes, with extremely high tendencies for nonsynonymousmutations in the ‘new’ overlapping gene.

3. General discussion

3.1. Anaeroby and alignments

The fact that the system of overlapping genes in the mitochon-drion of the marine turtle Lepidochelys is particularly developed, asopposed to other turtles, as well as other taxa where the matter hasbeen explored, primates and Drosophila (Seligmann, 2011a, 2012b),could suggest the possibility that this phenomenon, and the expres-sion of the unknown proteins, associates with anaerobic conditions.A further indication in this respect comes from the putatively alter-nate CO1 protein translated from the +1 frameshifted sequence ofCO1 from Lepidochelys. This is the only case where the greatest sim-ilarity was for a distant non-chelonian vertebrate (a salamander)species. Molecular convergence with such a distant group suggestsadaptation. It is probable that the variant CO1 proteins coded afterprogrammed frameshifts are adapted for more or less aerobic con-ditions, which are part of the regular variations in the lifestyle ofthis species. It is also possible that in other genes and species, thevariants are associated with other conditions. Convergence in somecases with proteins from birds could also fit the rationale of mod-ulation of synthesizing protein variants adapted to different levelsof aerobic tissue activity and/or metabolic rates. This convergence

with birds could as well be interpreted as the result of adaptationfor high anaerobic metabolism as of relatively anaerobic condi-tions occurring during egg incubation (i.e., Seymour and Ackerman,1980). For CytB, the unknown protein in Lepidochelys’ overlapping
Page 14: Overlapping genetic codes for overlapping frameshifted genes in Testudines, and Lepidochelys olivacea as special case

iology

oc(

3a

saabTLwiLtdt(Abpfwtmtattt

3

tmeAlattscsdtcccaopowmtdttb

H. Seligmann / Computational B

pen reading frame could also act as a trap for excess free radi-als, because its amino acid composition includes many oxyphobicreactive with free radicals) amino acids.

.2. Alignments for antitermination tRNAs confirm stop/aminocid reassignments

The overlapping genetic code in Lepidochelys apparently reas-igns AGR codons to glycine. This is also confirmed by alignmentnalysis of the antisense tRNA of Ser UCN of this species, whosenticodon would match AGR codons. This antisense tRNA resem-les most regular tRNAs loaded by glycine (online software TFAM,aquist et al., 2007). For UAR, the evidence is less clear inepidochelys. But a main candidate in this case is tryptophan,hich is identical to the relatively clear indication for UAR reass-

gnment to tryptophan in other turtles and the ND4 gene ofepidochelys DQ486893. TFAM’s alignment analyses of antisenseRNAs with anticodons matching AGR codons from turtles (Lepi-ochelys excluded) yield results compatible with a reassignmento lysine. Some of these tRNAs are the antisense of tRNA Ser UCN11 species), some of tRNA Lys (in 4 species), and some of tRNAla (7 species). In all these antisense tRNAs, lysine was a proba-le cognate according to TFAM’s alignment method (calculations ofrobable cognate were as for analyses for primate antisense tRNAsrom Seligmann, 2010b). For UAR, the majority of antisense tRNAsith anticodons matching UAR codons were the antisense of sense

RNA Leu CUA (24 species). According to TFAM, tryptophan is theost likely cognate for these antisense tRNAs. In six other species

he tRNA possessing an anticodon matching UAR codons was thentisense of tRNA His, in four species the antisense of tRNA Asn,hree of tRNA Lys, and one for tRNA Ser UCN. In all cases, accordingo TFAM, tryptophan was a more probable cognate for the antisenseRNA than the majority of the other 19 amino acids.

.3. Antitermination tRNAs and overlapping genetic codes

Only three chelonian species (Dogania subplana, Lyssemys punc-ata and Platysternon megacephalum) had no antisense tRNA

atching either UAR or AGR codons. Seventeen among 38 speciesxamined had at least one antisense tRNA matching each UAR andGR codons. Only 4 species (C. mydas, E. imbricata, Kinosternon

eucostomum and Palea steindachneri) lacked antisense tRNAs withnticodons matching UAR codons, but 14 species had no antisenseRNA matching AGR codons. This is compatible with the possibilityhat tRNAs recognizing AGR codons are imported from the cyto-ol (AGR codons match arginine according to the nuclear geneticode). Import of tRNAs matching UAR codons, which function astops also according to the nuclear genetic code, is improbable. Thisistribution of antisense antitermination tRNAs is compatible withranslation of overlapping genes according to overlapping geneticode(s) in the great majority of turtle species examined. This alsoould suggest the mechanism by which overlapping genetic codesoexist with the regular genetic code: the putative expression ofntisense tRNAs, as well as import of cytosolic tRNAs, probablyccur only in specific, regulated conditions, possibly stresses asreviously suggested (Seligmann, 2010a,b, 2011a,b,c,d). Hence theverlapping genes associated with the overlapping genetic codesould be expressed according to the regulated presence of antiter-ination tRNAs. The switch between proteins expressed according

o the regular and the overlapping genetic code described in Lepi-

ochelys suggests that the conditions occurring most of the time forhat species are associated with these ‘atypical’ proteins. Becausehis species spends most of its life in the sea, these conditions maye anaerobic.

and Chemistry 41 (2012) 18– 34 31

3.4. Functions of unknown proteins

The analysis of the amino acid sequences of the unknown pro-teins coded by the overlapping genes will probably suggest to someextent their functions. This is probably valid also for comparisonsbetween amino acid sequences coded by the regular open readingframe and the frameshifted variant of genes. For example, aminoacids are more or less oxyphobic (Archetti and Di Giulio, 2007),and this is particularly relevant in relation to the hypothesis thatthe unknown proteins could be related to anaerobic conditions.The analysis of the amino acid replacements between the regularprotein and the one coded by the frameshifted region could indi-cate systematic tendencies in the amino acid replacements in termsof oxyphobicity between the two variant proteins. This can alsobe done for other amino acid properties. It is nevertheless evidentthat direct experiments will be required to assess the function(s) ofthe unknown proteins coded by overlapping genes. In this respect,Lepidochelys is the best system to study these proteins.

The variation detected for ND4 (see Table 1) in two mitochon-drial genomes from Lepidochelys shows that switches between theprotein coded by the main open reading frame (which uses theregular vertebrate mitochondrial genetic code), and the proteincoded by the frameshifted sequence (and using the genetic codethat reassigns stops to amino acids) occurs even in closely relatedmitochondria. The data suggest that in relation to ND4, in the Lep-idochelys genome DQ48689, the frameshifted sequence reassignsAGR to lysine, as found in other turtles, excluding Lepidochelys.For the rest of the genes from DQ48689, alignments suggest thatAGR is reassigned to glycine, as found for all genes including ND4of NC 011516 of the same species. This could indicate a complexsituation in DQ48689, with two overlapping genetic codes. The sit-uation in DQ48689 might be intermediate between NC 011516 andother turtle species, because the codon reassignments for ND4 andthe genes’ frameshifted coding structure resembles that of otherturtle species, while for other genes, DQ48689 resembles (or isidentical) to NC 011516 (Table 1). This means that a large scalesurvey of variation in mitochondrial genomes of Lepidochelys, espe-cially if combined with physiological experiments, could reveal thefunction of the frameshifted, overlapping genes. The patterns ofsimilarity and of differences between the two genomes from Lepi-dochelys are systematic and could not result from artefacts, errorsor DNA decay. Such phenomena could not explain systematic reas-signments of stop codons to specific amino acids, nor extreme Ka/Ksratios in both overlapping frames of the gene.

3.5. Coevolution between antitermination tRNAs and overlappinggenes

The coevolution shown by the correlation in Fig. 3 betweencloverleaf formation capacities of antisense tRNAs matching AGRstops is the closest functional evidence for the working hypothe-sis, in the absence of direct evidence on expression of overlappinggenes and antisense tRNAs. However, the fact that a second frameof Cytb of Lepidochelys lacks stops and is translatable by the regularvertebrate genetic code is a strong indication that proteins cor-responding to the frameshifted overlapping genes are translated.Again, I stress here that such results where stops are systemati-cally avoided in a presumably non-coding frame are unlikely to bedue to chance, especially if the codons that code for stops in otherspecies are systematically mutated to code for specific amino acids,as found here. The fact that variation in cloverleaf formation capac-

ity of antisense tRNAs matching AGR stops “follows” variation innumbers of overlapping genes requiring for translation such tRNAsis as close to functional evidence as bioinformatics analyses canyield.
Page 15: Overlapping genetic codes for overlapping frameshifted genes in Testudines, and Lepidochelys olivacea as special case

32 H. Seligmann / Computational Biology and Chemistry 41 (2012) 18– 34

Fig. 5. Putative antisense tRNA with expanded anticodon loop from Testudo graeca,aei

3m

(C2o(adl(ctafwpronaao

3

hoearmg

Fig. 6. Percent ND3 and ND4l genes requiring a frameshift for expressing ND3and ND4l proteins as a function of numbers of antisense tRNAs with predicted

s predicted by tRNAscan Se. This is the antisense of tRNA Lys in that species. Thexpanded anticodon CAAU could induce a frameshift in gene translation, suppress-ng a frameshifting mutation in case an insertion occurred.

.6. Expanded anticodons in antisense tRNAs as frameshiftingechanism

It is known that some tRNAs recognize quadruplet codonstetracodons) and ‘suppress’ frameshifting mutations (Riddle andarbon, 1973; Sroga et al., 1992; Tuohy et al., 1992; Moore et al.,000; Magliery et al., 2001; Dunham et al., 2007). Not all, but somef these tRNAs possess easily recognizable expanded anticodonsDunham et al., 2007). In most turtle genomes examined, severalntisense tRNAs had such unusual expanded anticodons, as pre-icted by tRNAScan Se (see example from the antisense of tRNA

ys fron the mitochondrion of Testudo graeca, Fig. 5), from nonein Chelus fimbriata, Pelodiscus sinensis, and Platysternon megalo-ephalum) to five (K. leucostomum), on average 1.9 ± 1.12 antisenseRNA with expanded anticodon per genome. Such tRNAs could play

role in the expression of frameshifted sequences, by causing therameshifts. Considering the chelonian genes ND3 and ND4l, inhich frameshifts are known to be required to express the regularroteins coded by these genes, the percentage of these two genesequiring a frameshift among species with given expected numbersf antisense tRNAs with expanded anticodons increases with thatumber of unusual tRNAs with expanded anticodons (Fig. 6). Thisssociation could indicate that tRNAs with expanded anticodonsre part of the frameshifting mechanisms used for the expressionf proteins coded by frameshifted overlapping genes.

.7. Simulations and synonymous codon usages

The lack of direct experimental evidence for the workingypothesis is in part compensated also by further analyses basedn simulations. In addition to converging evidence from two differ-nt Lepidochelys mitochondrial genomes which indicate that results

re not due to sequencing errors, nor to annotation errors, theesults from simulations show that the fact that blast detects align-ents between putative proteins translated from frameshifted

ene sequences is not a matter of chance, because simulated

expanded anticodon in chelonian mitochondrial genomes. Numbers next to dat-apoints indicate the number of genomes for which the number of antisense tRNAswith expanded anticodons matched the number on the x-axis.

sequences match less frequently, and to lesser extents proteinsexisting in Genbank. They also indicate that natural sequences haveoptimized synonymous codon usages to enable overlap coding.

3.8. Circular genetic codes and Ka/Ks ratios

Empirical evidence has shown that specific groups of codons,called circular codes, are avoided at frameshifting sites. Thoughcauses for this association are not yet well understood, analysesbased on circular codes confirm independently the frameshiftingsites indicated by the alignment analyses.

Another technique frequently used to detect coding propertiesof DNA sequences are nonsynonymous/synonymous (Ka/Ks) ratios,as indicating selection on the sequence. This method can also beused for overlapping genes, though it involves several difficulties. Infour overlapping genes for which the technical difficulties of phylo-genetic reconstruction are avoidable, Ka/Ks ratios indicate extremepositive selection on the unusual overlapping gene.

3.9. Recapitulation of evidence favoring the working hypothesisof overlapping genes

Despite a lack of direct evidence, the bioinformatic evidencepresented here is relatively clear and diverse, from indepen-dent backgrounds and methods, so that the provocative workinghypothesis can be considered as worth consideration. The majorpoints are: (a) the lack of stops in a second frame in the Cytb gene ofLepidochelys potentially coding for an unknown protein; (b) align-ments between putative proteins translated from frameshiftedgenes with existing Genbank proteins, especially that (c) stops inthe frameshifted gene sequences systematically match the sameamino acid in the alignments with Genbank proteins, an associ-ation strongly suggesting reassignments of stop codons to theseamino acids, and hence an overlapping genetic code; (d) the anti-sense tRNAs with anticodons matching these stop codons resembleregular tRNAs loaded with the cognate amino acid predicted bythe alignments mentioned in (c); (e) numbers of overlapping

genes requiring for their translation these tRNAs coevolve withthe cloverleaf formation capacity of these tRNAs that read stopcodons; (f) synonymous codon usages in natural gene sequencesare designed to enable overlap coding, as revealed by comparisons
Page 16: Overlapping genetic codes for overlapping frameshifted genes in Testudines, and Lepidochelys olivacea as special case

iology

baffapci2o

trtfntman

fctepibgerrvbbppppL

gBmoaomgGtfrttaf2bs

A

p

H. Seligmann / Computational B

etween blast analyses and alignments of the natural sequencesnd those from simulations keeping coding properties of the mainrame but randomizing synonymous codons; (g) codon usages atrameshifting sites concur with predictions from circular codes;nd (h) nonsynonymous/synonymous (Ka/Ks) ratios indicate highositive selection on the unusual overlapping frame. The resultsonfirm similar analyses done for primates and Drosophila, whichnclude also direct evidence on RNA expression profiles (Seligmann,012b) and detection of a protein, GAU, expected to be coded byne of the overlapping genes in CO1 (Faure et al., 2011).

The hypothesis that some kind of artifact is the cause ofhe patterns observed is highly unlikely, whether this be theesult of sequencing/typing errors, DNA decay, or the ‘pollu-ion’ of the mitochondrial genome data by nuclear pseudogenesrom mitochondrial origin. This is because errors, as well ason-functional pseudogenes, would not yield systematic pat-erns that fit functional predictions. Among others, the regular

itochondrial-encoded proteins would be by far more ‘degener-ted’ if these were non-functional, precluding the hypothesis ofuclear pseudogenes, including for Lepidochelys.

While analyzes of data from turtles confirm several bioin-ormatic analyses from primates and Drosophila, mainly theoevolution between overlap coding and antisense antiterminationRNAs, it lacks direct experimental (manipulative) evidence. How-ver, the body of evidence from Lepidochelys includes strong logicalroof that overlap coding by overlapping genetic codes occurs. This

s because several proteins that are normally encoded by verte-rate mitochondria could not be expressed if such an overlappingenetic code that reassigns stops to specific amino acids did notxist. At the same time, the proteins coded by the regular openeading frames in this species are in several cases clearly not theegular mitochondrion-encoded proteins, but their expression isery likely, because they are coded according to the regular verte-rate mitochondrial genetic code. These match proteins that woulde encoded in other species by frameshifted sequences, using a sto-less overlapping genetic code. These systematic switches betweenroteins coded by the regular and the overlapping genetic codesrove the existence of each, proteins associated with the overlap-ing genes and the overlapping genetic code. This dissociation inepidochelys seems very unusual and is worth further exploration.

It is probable that overlapping genetic codes for overlappingenes exist in genomes of other organisms, especially bacteria.ut at this point, the situation in Lepidochelys is probably theost straightforward system to investigate the microevolution

f mitochondrial overlapping genes and genetic codes, as wells their function. This system might reflect the dual lifestylesf the symbiont or presymbiontic organism from which modernitochondria evolved. The evolution of alternative mitochondrial

enetic codes associates with genome size reduction (Massey andarey, 2007; McCutcheon et al., 2009), but there is no indica-

ion of this in chelonian mitochondria: numbers of overlappingrameshifted genes detected by the alignment method do not cor-elate with mitochondrial genome size in Testudines. Neither doeshe number and cloverleaf foldability of antisense antitermina-ion tRNAs correlate with mitochondrial genome size. However,mong the potential mechanisms for codon reassignments (whichrequently include stop codons (Abascal et al., 2006; Sengupta et al.,007; Johnson, 2010), one should add the possibility of switchesetween the regular and the overlapping genetic codes, though thispeculation is not central to the present issues.

cknowledgments

I am indebted to an anonymous reviewer of a manuscriptresenting similar results on Drosophila mitochondria for testing

and Chemistry 41 (2012) 18– 34 33

synonymous codon optimization by simulations, and to reviewer 2of this manuscript for suggesting analyses involving circular codes.

References

Abascal, F., Posada, D., Knight, R.D., Zardoya, R., 2006. Parallel evolution of the geneticcode in arthropod mitochondrial genomes. PLoS Biology 4, e127.

Ahmed, A., Frey, G., Michel, C.J., 2007. Frameshift signals in genes associated withthe circular code. In Silico Biology 7, 155–168.

Ahmed, A., Frey, G., Michel, C.J., 2010. Essential molecular functions associated withcircular code evolution. Journal of Theoretical Biology 264, 613–622.

Ahmed, A., Michel, C.J., 2011. Circular code signal in frameshift genes. Journal ofComputer Science & Systems Biology 4, 7–15.

Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman,D.J., 1997. Gapped BLAST and PSI-BLAST: a new generation of protein databasesearch programs. Nucleic Acids Research 25, 3389–3402.

Altschul, S.F., Wootton, J.C., Gertz, E.M., Agarwala, R., Morgulis, A., Schäffer, A.A., Yu,Y.K., 2005. Protein database searches using compositionally adjusted substitu-tion matrices. FEBS Journal 272, 5101–5109.

Anderson, S., Bankier, A.T., Barrell, B.G., de Bruijn, M.H., Coulson, A.R., Drouin, J.,Eperon, I.C., Nierlich, D.P., Roe, B.A., Sanger, F., Schreier, P.H., Smith, A.J., Staden,R., Young, I.G., 1981. Sequence and organization of the human mitochondrialgenome. Nature 290, 457–465.

Archetti, M., Di Giulio, M., 2007. The evolution of the genetic code took place in ananaerobic environment. Journal of Theoretical Biology 245, 169–174.

Arqués, D.G., Michel, C.J., 1996. A complementary circular code in the protein codinggenes. Journal of Theoretical Biology 182, 45–58.

Arqués, D.G., Michel, C.J., 1997. A circular code in the protein coding genes ofmitochondria. Journal of Theoretical Biology 189, 273–290.

Bhattacharyya, S.N., Adhya, S., 2004. The complexity of mitochondrial tRNA import.RNA Biology 1, 84–88.

Delaye, L., DeLuna, A., Lazcano, A., Becerra, A., 2008. The origin of a novel genethrough overprinting in Escherichia coli. BMC Evolutionary Biology 8, 31.

Duchène, A.M., Pujol, C., Drouard-Maréchal, L., 2009. Import of tRNAs and aminoacyl-tRNA synthetases into mitochondria. Current Genetics 55, 1–18.

Dunham, C.M., Selmer, M., Phelps, S.S., Kelley, A.C., Suzuki, T., Joseph, S., Ramakr-ishnan, V., 2007. Structures of tRNAs with an expanded anticodon loop in thedecoding center of the 30S ribosomal subunit. RNA 13, 817–823.

Farabaugh, P.J., 1996. Programmed translational frameshifting. MicrobiologicalReviews 60, 103–134.

Faure, E., Delaye, L., Tribolo, S., Levasseur, A., Seligmann, H., Barthélémy, R.-M., 2011.Probable presence of an ubiquitous cryptic mitochondrial gene on the antisensestrand of the cytochrome oxidase I gene. Biology Direct 6, 56.

Glusman, G., Qin, S.Z., El-Gewely, R., Siegel, A.F., Roach, J.C., Smit, A.F.A., 2006. Athird approach to gene prediction suggests thousands of additional human tran-scribed regions. PLoS Computational Biology 2, e18.

Gonzalez, D.L., Giannerini, S., Rosa, R., 2011. Circular codes revisited: a statisticalapproach. Journal of Theoretical Biology 275, 21–28.

Grassé, P.P., 1977. Evolution of Living Organisms. Academic Press, New York, p. 307.Itzkovitz, S., Alon, U., 2007. The genetic code is nearly optimal for allowing additional

information within protein-coding sequences. Genome Research 17, 405–412.Johnson, L.J., 2010. Pseudogene rescue: an adaptive mechanism of codon reassign-

ment. Journal of Evolutionary Biology 23, 1623–1630.Krishnan, N.M., Seligmann, H., Raina, S.Z., Pollock, D.D., 2004a. Phylogenetic analysis

of site-specific perturbations in asymmetric mutation gradients. In: Granada, A.,Bourne, P.E. (Eds.), Curr. Comp. Mol. Biol. ACM Press, San Diego, CA, pp. 266–267.

Krishnan, N.M., Seligmann, H., Raina, S.Z., Pollock, D.D., 2004b. Detecting gradientsof asymmetry in site-specific substitutions in mitochondrial genomes. DNA andCell Biology 23, 707–714.

Krishnan, N.M., Seligmann, H., Rao, B.J., 2008. Relationship between mRNA sec-ondary structure and sequence variability in Chloroplast genes: possible lifehistory implications. BMC Genomics 9, 48.

Lightowlers, R.N., Chrzanowska-Lightowlers, Z.M.A., 2010. Terminating humanmitochondrial protein synthesis A shift in our thinking. RNA Biology 7, 282–286.

Lowe, T.M., Eddy, S.R., 1997. tRNAscan-SE: a program for improved detection oftransfer RNA genes in genomic sequence. Nucleic Acids Research 25, 955–964.

Mager-Heckel, A.M., Entelis, N., Brandina, I., Kamenski, P., Krasheninnikov, I.A.,Martin, R.P., Tarassov, I., 2008. The analysis of tRNA import into mammalianmitochondria. Methods in Molecular Biology 372, 235–253.

Magliery, T.J., Anderson, J.C., Schultz, P.G., 2001. Expanding the genetic code: selec-tion of efficient suppressors of four-base codons and identification of shiftyfour-base codons with a library approach in Escherichia coli. Journal of MolecularBiology 307, 755–769.

Massey, S.E., Garey, J.R., 2007. A comparative genomics analysis of codon reassign-ments reveals a link with mitochondrial proteome size and a mechanism ofgenetic code change via suppressor tRNAs. Journal of Molecular Evolution 64,399–410.

McCauley, S., Groot, S., Mailund, T., Hein, J., 2007. Annotation of selection strengthsin viral genomes. Bioinformatics 23, 2978–2986.

McCutcheon, J.P., McDonald, B.R., Moran, N.A., 2009. Origin of an alternative geneticcode in the extremely small and gc-rich genome of a bacterial symbiont. PLoSGenetics 5, e1000565.

Michel, C.J., 2008. A 2006 review of circular codes in genes. Computers & Mathemat-ics with Applications 55, 984–988.

Page 17: Overlapping genetic codes for overlapping frameshifted genes in Testudines, and Lepidochelys olivacea as special case

3 iology

M

M

P

R

R

R

R

S

S

S

S

S

S

S

S

S

S

S

S

S

S

S

4 H. Seligmann / Computational B

indell, D.P., Sorenson, M.D., Dimcheff, D.E., 1998. An extra nucleotide is not trans-lated in mitochondrial ND3 of some birds and turtles. Molecular Biology andEvolution 15, 1568–1571.

oore, B., Persson, B.C., Nelson, C.C., Gesteland, R.F., Atkins, J.F., 2000. Quadrupletcodons: implications for code expansion and the specification of translation stepsize. Journal of Molecular Biology 298, 195–209.

avesi, A., 2006. Origin and evolution of overlapping genes in the family Microviridae.Journal of General Virology 87, 1013–1017.

iddle, D.L., Carbon, J., 1973. Frameshift suppression: a nucleotide addition in theanticodon of a glycine transfer RNA. Nature. New Biology 242, 230–234.

ipley, L.S., Clark, A., de Boer, J.G., 1986. Spectrum of spontaneous frameshift muta-tions. Sequences of bacteriophage T4 rII gene frameshifts. Journal of MolecularBiology 191, 601–613.

ubio, M.A.T., Rinehart, J.J., Duvezin-Caubet, S., Reichert, A.S., Söll, D., Alfonzo, J.D.,2008. Mammalian mitochondria have the innate ability to import tRNAs by amechanism distinct from protein import. Proceedings of the National Academyof Sciences of the United States of America 105, 9186–9191.

ussell, R.D., Beckenbach, A.T., 2008. Recoding of translation in turtle mitochondrialgenomes: programmed frameshift mutations and evidence of a modified geneticcode. Journal of Molecular Evolution 67, 682–695.

abath, N., Graur, D., 2010. Detection of functional overlapping genes: simulationand case studies. Journal of Molecular Evolution 71, 308–316.

abath, N., Landan, G., Graur, D., 2008. A method for the simultaneous estimation ofselection intensities in overlapping genes. PLoS ONE 3, e3996.

atoh, T.P., Sato, Y., Masuyama, N., Miya, M., Nishida, M., 2010. Transfer RNA genearrangement and codon usage in vertebrate mitochondrial genomes: a newinsight into gene order conservation. BMC Genomics 11, 479.

chattner, P., Brooks, A.N., Lowe, T.M., 2005. The tRNAscan-SE, snoscan and snoGPSweb servers for the detection of tRNAs and snoRNAs. Nucleic Acids Research 33,W686–W689.

chneider, A., Drouard-Maréchal, L., 2000. Mitochondrial tRNA import: are theredistinct mechanisms? Trends in Cell Biology 10, 509–513.

eligmann, H., 2003. Cost minimization of amino acid usage. Journal of MolecularEvolution 56, 151–161.

eligmann, H., 2007. Cost minimization of ribosomal frameshifts. Journal of Theo-retical Biology 249, 162–167.

eligmann, H., 2008. Hybridization between mitochondrial heavy strand tDNA andexressed light strand tRNA modulates the function of heavy strand tDNA as lightstrand replication origin. Journal of Molecular Biology 379, 188–199.

eligmann, H., 2010a. The ambush hypothesis at the whole-organism level: Offframe, ‘hidden’ stops in vertebrate mitochondrial genes increase developmentalstability. Computational Biology and Chemistry 34, 80–85.

eligmann, H., 2010b. Undetected antisense tRNAs in mitochondrial genomes? Biol-ogy Direct 5, 39.

eligmann, H., 2010c. Avoidance of antisense, antiterminator tRNA anticodons invertebrate mitochondria. BioSystems 101, 42–50.

eligmann, H., 2010d. Mitochondrial tRNAs as light strand replication origins; simi-larity between anticodon loops and the loop of the light strand replication originpredicts initiation of DNA replication. BioSystems 99, 85–93.

eligmann, H., 2010e. Do anticodons of mysacylated tRNAs preferentially mismatchcodons coding for the misloaded amino acid? BMC Molecular Biology 11, 41.

eligmann, H., 2010f. Positive correlations between molecular and morphologicalrates of evolution. Journal of Theoretical Biology 264, 799–807.

eligmann, H., 2011a. Two genetic codes, one genome: frameshifted primatemitochondrial genes code for additional proteins in presence of antisenseantitermination tRNAs. BioSystems 105, 271–285.

and Chemistry 41 (2012) 18– 34

Seligmann, H., 2011b. Pathogenic mutations in antisense mitochondrial tRNAs. Jour-nal of Theoretical Biology 269, 287–296.

Seligmann, H., 2011c. Error compensation of tRNA misacylation by codon–anticodonmismatch prevents translational amino acid misinsertion. Computational Biol-ogy and Chemistry 35, 81–95.

Seligmann, H., 2011d. Frameshifted chelonian mitochondrial genes code for addi-tional proteins in presence of antitermination tRNAs and the special case ofLepidochelys. S85–S86. Societas Europaea Herpetologica European Congress ofHerpetology and Deutsche Gesellschaft fuer Herpetologie und TerrarienkundeDeutscher Herpetologentag, Luxemburg und Trier, 25–29 September 2011. talk,abstract pp. 85–86.

Seligmann, H., 2012a. Coding constraints modulate chemically spontaneous muta-tional replication gradients in mitochondrial genomes. Current Genomics 13,37–54.

Seligmann, H., 2012b. An overlapping genetic code for frameshifted overlappinggenes in Drosophila mitochondria: antisense antitermination tRNAs UAR insertserine. Journal of Theoretical Biology 298, 51–76.

Seligmann, H., Krishnan, N.M., 2006. Mitochondrial replication origin stability andpropensity of adjacent tRNA genes to form putative replication origins increasedevelopmental stability in lizards. Journal of Experimental Zoology Part B 306,433–449.

Seligmann, H., Krishnan, N.M., Rao, B.J., 2006. Possible multiple origins of replica-tion in primate mitochondria: alternative role of tRNA sequences. Journal ofTheoretical Biology 241, 321–332.

Seligmann, H., Pollock, D.D., 2003a. Function and Evolution of Secondary Structurein Human Mitochondrial mRNAs. Midsouth Computational Biology and Bioin-formatics Society (Abstract 26).

Seligmann, H., Pollock, D.D., 2003b. The Ambush hypothesis: Hidden Stop CodonsPrevent Off-frame Gene Reading. Midsouth Computational Biology and Bioin-formatics Society (Abstract 36).

Seligmann, H., Pollock, D.D., 2004. The ambush hypothesis: Hidden stop codonsprevent off-frame gene reading. DNA and Cell Biology 23, 701–705.

Sengupta, S., Yang, X.G., Higgs, P.G., 2007. The mechanisms of codon reassign-ments in mitochondrial genetic codes. Journal of Molecular Evolution 64,662–688.

Seymour, R.S., Ackerman, R.A., 1980. Adaptations to underground nesting in birdsand reptiles. American Zoologist 20, 437–447.

Sroga, G.E., Nemoto, F., Kuchino, Y., Bjork, G.R., 1992. Insertion (sufB) in the anti-codon loop or base substitution (sufC) in the anticodon stem of tRNA(Pro)2 fromSalmonella typhimurium induces suppression of frameshift mutations. NucleicAcids Research 20, 3463–3469.

Taquist, H., Cui, Y., Ardell, D.H., 2007. TFAM 1.0: an online tRNA function classifier.Nucleic Acids Research 35, W350–W353.

Tse, H., Cai, J.J., Tsoi, H.W., Lam, E.P.T., Yuen, K.Y., 2010. Natural selectionretains overrepresented out-offrame stop codons against frameshift peptidesin prokaryotes. BMC Genomics 11, 491.

Tuohy, T.M., Thompson, S., Gesteland, R.F., Atkins, J.F., 1992. Seven, eight and nine-membered anticodon loop mutants of tRNA(2Arg) which cause +1 frameshifting.Tolerance of DHU arm and other secondary mutations. Journal of MolecularBiology 228, 1042–1054.

van Leeuwen, F.W., Hol, E.M., Fischer, D.F., 2006. Frameshift proteins in

Alzheimer’s disease and in other conformational disorders: time for theubiquitin–proteasome system. Journal of Alzheimer’s Disease, 319–325.

Yang, Z., Nielsen, R., 2000. Estimating synonymous and nonsynonymous substitutionrates under realistic evolutionary models. Molecular Biology and Evolution 17,32–43.


Recommended