Polymerization of non-complementary RNA: systematic symmetric nucleotide exchanges mainly involving...

Our reference: BIO 3360 P-authorquery-v9

AUTHOR QUERY FORM

Journal: BIO Please e-mail or fax your responses and any corrections to:

E-mail: [email protected]

Article Number: 3360 Fax: +353 6170 9272

Dear Author,

Please check your proof carefully and mark all corrections at the appropriate place in the proof (e.g., by using on-screenannotation in the PDF file) or compile them in a separate list. Note: if you opt to annotate the file with software other thanAdobe Reader then please also highlight the appropriate place in the PDF file. To ensure fast publication of your paper pleasereturn your corrections within 48 hours.

For correction or revision of any artwork, please consult http://www.elsevier.com/artworkinstructions.

Any queries or remarks that have arisen during the processing of your manuscript are listed below and highlighted by flags inthe proof. Click on the ‘Q’ link to go to the location in the proof.

Location in Query / Remark: click on the Q link to goarticle Please insert your reply or correction at the corresponding line in the proof

The reference given here is cited in the text but is missing from the reference list – please make thelist complete or remove the reference from the text: “Michel (2008)”, “Seligmann (2001)”, “Seligmann(2012g)”, “Seligmann (2003)”.

Q1 Please confirm that given name and surname have been identified correctly.Q2 Please check the address for the corresponding author that has been added here, and correct if necessary.Q3 Ref. “Seligmann (2003)” is cited in the text but not provided in the reference list. Please provide it in

the reference list or delete this citation from the text.Q4 Ref. “Seligmann (2012g)” is cited in the text but not provided in the reference list. Please provide it in

the reference list or delete this citation from the text.Q5 Ref. “Seligmann (2001)” is cited in the text but not provided in the reference list. Please provide it in

the reference list or delete this citation from the text.Q6 Ref. “Michel (2008)” is cited in the text but not provided in the reference list. Please provide it in the

reference list or delete this citation from the text.Q7 Please update references: Seligmann (in press-a, in press-b).

Please check this box if you have nocorrections to make to the PDF file

Thank you for your assistance.

mailto:[email protected]

http://www.elsevier.com/artworkinstructions

HE

Sticky Note

the citation should be "Seligmann, H., 2003. Cost minimization of amino acid usage. J. Mol. Evol. 56, 151-161.

HE

Sticky Note

name and surname are correct

HE

Sticky Note

address correct

HE

Sticky Note

this reference is still in press, will probably be published in 2013. Hence reference should be "2013", or "in press".

HE

Sticky Note

"Seligmann (2001)" should be "Seligmann (2012e)"

HE

Sticky Note

"in press-a" has been updated to 2012f, in press-b should be "2013" or "in press".

HE

Sticky Note

The missing reference is: Michel, C.J., 2008. A 2006 review of circular codes in genes. Computer and Mathematics with Applications 55, 984-988.

Please cite this article in press as: Seligmann, H., Polymerization of non-complementary RNA: Systematic symmetric nucleotideexchanges mainly involving uracil produce mitochondrial RNA transcripts coding for cryptic overlapping genes. BioSystems (2013),http://dx.doi.org/10.1016/j.biosystems.2013.01.011

ARTICLE IN PRESSG Model

BIO 3360 1–19

BioSystems xxx (2013) xxx– xxx

Contents lists available at SciVerse ScienceDirect

BioSystems

journa l h o me pa g e: www.elsev ier .com/ locate /b iosystems

Polymerization of non-complementary RNA: Systematic symmetricnucleotide exchanges mainly involving uracil produce mitochondrialRNA transcripts coding for cryptic overlapping genes

1

2

3

Hervé Seligmanna,b,∗Q14

a National Natural History Museum Collections, The Hebrew University of Jerusalem, 91904 Jerusalem, Israel5b Department of Life Sciences, Ben Gurion University, 84105 Beer Sheva, Israel6

7

a r t i c l e i n f o8

9

Article history:10

Received 24 October 201211

Received in revised form 24 January 201312

Accepted 29 January 201313

14

Keywords:15

Expressed sequence tags16

Nucleotide misinsertion17

Human DNA polymerase gamma18

Genome compression19

Antitermination tRNA20

Termination codon21

a b s t r a c t

Usual DNA→RNA transcription exchanges T→U. Assuming different systematic symmetric nucleotideexchanges during translation, some GenBank RNAs match exactly human mitochondrial sequences(exchange rules listed in decreasing transcript frequencies): C↔U, A↔U, A↔U+C↔G (two nucleotidepairs exchanged), G↔U, A↔G, C↔G, none for A↔C, A↔G+C↔U, and A↔C+G↔U. Most unusual transcriptsinvolve exchanging uracil. Independent measures of rates of rare replicational enzymatic DNA nucleotidemisinsertions predict frequencies of RNA transcripts systematically exchanging the corresponding misin-serted nucleotides. Exchange transcripts self-hybridize less than other gene regions, self-hybridizationincreases with length, suggesting endoribonuclease-limited elongation. Blast detects stop codon depletedputative protein coding overlapping genes within exchange-transcribed mitochondrial genes. Thesealign with existing GenBank proteins (mainly metazoan origins, prokaryotic and viral origins under-represented). These GenBank proteins frequently interact with RNA/DNA, are membrane transporters,or are typical of mitochondrial metabolism. Nucleotide exchange transcript frequencies increase withoverlapping gene densities and stop densities, indicating finely tuned counterbalancing regulation ofexpression of systematic symmetric nucleotide exchange-encrypted proteins. Such expression necessi-tates combined activities of suppressor tRNAs matching stops, and nucleotide exchange transcription.Two independent properties confirm predicted exchanged overlap coding genes: discrepancy of thirdcodon nucleotide contents from replicational deamination gradients, and codon usage according to cir-cular code predictions. Predictions from both properties converge, especially for frequent nucleotideexchange types. Nucleotide exchanging transcription apparently increases coding densities of proteincoding genes without lengthening genomes, revealing unsuspected functional DNA coding potential.

© 2013 Published by Elsevier Ireland Ltd.

1. Introduction22

The question ‘why are there several stop codons?’ (Krizek23

and Krizek, 2012) has an apparently satisfying answer: off frame,24

protein coding genes include numerous stops (Seligmann and25

Pollock, 2004a,b; Singh and Pardasani, 2009; Tse et al., 2010) which26

decrease protein synthesis costs due to unprogrammed ribosomal27

slippage (Seligmann, 2007, 2010a; Warnecke and Hurst, 2011). In28

addition, the genetic code’s codon–amino acid assignments maxi-29

mize off frame stop numbers (Itzkovitz and Alon, 2007), and third30

codon positions that are part of off frame stops tend to mutate31

less than comparable positions (Seligmann, 2012a). However, this32

explanation hides a further function that stop codons play in off33

∗ Correspondence address: National Natural History Museum Collections, TheQ2Hebrew University of Jerusalem, 91904 Jerusalem, Israel.

E-mail address: [email protected]

frame sequences: it seems that when antitermination (suppres- 34

sor) tRNAs are active in translation, the regular genetic code is de 35

facto transformed into another, stopless genetic code (Seligmann, 36

2010b). Translating sequences into proteins according to that over- 37

lapping code reveals numerous previously undetected genes and 38

proteins, their number coevolving with capacities of antitermina- 39

tion tRNAs (tRNAs with anticodons matching stops) to translate the 40

stops they include (Faure et al., 2011; Seligmann, 2011a, 2012a,b). 41

Inclusion of stops codons in the regular genetic code enables a 42

double coding system, based on the same sequences, and whose 43

expression is efficiently regulated by the presence or absence of 44

suppressor (antitermination) tRNAs. That way, numbers of coded 45

proteins can be high while keeping a relatively short genome, by 46

switching from the regular genetic code to a stopless code. 47

Genome length is an important factor limiting replication 48

and cellular multiplication rates, apparently affecting also devel- 49

opmental rates of metazoan organisms (Sessions and Larson, 50

1987; Gregory and Hebert, 1999; Chipman et al., 2001). Ample 51

0303-2647/$ – see front matter © 2013 Published by Elsevier Ireland Ltd.http://dx.doi.org/10.1016/j.biosystems.2013.01.011

Original text:

Inserted Text

Seligmann⁎varanuseremius

dx.doi.org/10.1016/j.biosystems.2013.01.011


http://www.sciencedirect.com/science/journal/03032647

http://www.elsevier.com/locate/biosystems

Original text:

Inserted Text

systematic

Original text:

Inserted Text


Original text:

Inserted Text

Israel; Department

Original text:

Inserted Text

T->U

Original text:

Inserted Text

C<->U, A<->U, A<->U+C<->G

Original text:

Inserted Text

G<->U, A<->G, C<->G, none for A<->C, A<->G+C<->U, and A<->C+G<->U

Original text:

Inserted Text

et al., 2012)

Original text:

Inserted Text

2007; Seligmann, 2010

Original text:

Inserted Text

codon-amino

Original text:

Inserted Text

Corresponding author.AbstractUsual DNA->RNA

mailto:[email protected]

Original text:

Inserted Text

comNational

Original text:

Inserted Text

Larsson


HE

Sticky Note

my name and surname, and my adresses, are correct

HE

Sticky Note

address correct

HE

Sticky Note

this should read "stop", not "stops", please delete "s"



BIO 3360 1–19

2 H. Seligmann / BioSystems xxx (2013) xxx– xxx

data suggest that even at the level of single amino acids, pro-52

tein sequences minimize metabolic synthesis costs (Akashi andQ353

Gojobori, 2002; Seligmann, 2003; Barton et al., 2010), notably of54

cognate amino acids (Perlstein et al., 2007; Alves and Savageau,55

2005; Seligmann, 2012b). Protein length reduction apparently fol-56

lows similar principles (Brocchieri and Karlin, 2005; Warringer57

and Blomberg, 2006; Seligmann, 2012b). Considering this, it is58

very probable that similar forces decrease genome length. Accord-59

ingly, there would be a strong advantage for being able to code60

for more proteins, while keeping the genome short, a phenomenon61

that increases coding density by coding compression, such as over-62

lapping genes, including those induced by antitermination tRNA63

activity (Seligmann, 2011a, 2012c,in press-a; Faure et al., 2011).64

Recent analyses suggest that mitochondrial genomes include sev-65

eral overlapping genes coded in the 3′-to-5′ direction of regular66

protein coding genes, apparently expressed upon putative ‘inver-67

tase’ activity, which would invert the sequence polymerized into68

RNA in the 3′-to-5′ direction (Seligmann, 2012d). A further mech-69

anism apparently increasing coding density is that of protein70

coding genes based on tetracodons, quadruplet codons recognized71

by (among others) tRNAs with expanded anticodons (Seligmann,72

2012e). Mitochondrial genes for ribosomal RNAs seem also to73

include overlapping protein coding genes (Seligmann, 2012g)Q474

It is in this context that a group of phenomena called RNA75

recoding is considered here. These imply typically changing frames76

(Namy et al., 2005) and various phenomena of exon/intron reshuf-77

fling (i.e., Jin et al., 2007; Lev-Maor et al., 2007). In some cases,78

recoding alters the nucleotides used, such as adenosine-to-inosine79

RNA editing (Reenan, 2005; Paz et al., 2007; Daniel et al., 2011).80

1.1. Nucleotide exchanges as a working hypothesis for cryptic81

overlapping genes82

The systematic ‘recoding’ of T (thymidine) to U (uracil) in tran-83

scription from DNA to RNA is also a type of recoding, by DNA→RNA84

polymerases that systematically exchange T by U, and U by T for85

reverse transcriptases. This suggests the hypothesis that coding86

density might be increased by other types of systematic nucleotide87

exchanges, i.e. A by C and C by A (or any other symmetric exchange88

of this type). The fact that during regular DNA replication, ribonu-89

cleotides are frequently inserted instead of deoxynucleotides by90

the mitochondrial DNA gamma polymerase (Kasiviswanathan and91

Copeland, 2011) indicates that polymerases have some flexibility92

in that respect. Misinsertion of non-complementary nucleotides is93

also a basic property of polymerase (mis)function (Lee and Johnson,94

2006). The possibility of polymerase activity implying systematic95

misinsertions, producing non-complementary DNA and/or RNA96

strands, cannot be excluded.97

Such recoded RNA, based on the template of regular DNA98

sequence, could code for additional protein coding gene(s). Inter-99

estingly, if this occurs at DNA level, this could be a mechanism for100

producing new genes, but in this case the assumed mechanism of101

transcription exchanging between nucleotides implies that genes102

code according to ‘direct’ (non-exchanging) and exchange tran-103

scription. In some ways, the former can be seen as explicit, and104

the latter as implicit coding, nevertheless, both levels would be105

inherent simultaneously to the gene’s primary structure.106

Hence if such nucleotide exchanging activity exists, by some107

kind of unknown or modified DNA→RNA polymerases during RNA108

polymerization or editing, inducing such activity might unleash a109

very large coding potential, enabling to code for proteins without110

increasing genome size. In addition, this system implies very sim-111

ple regulation, as each set of genes associated with a given type112

of nucleotide exchange would be induced by the expression of its113

specific ‘nucleotide exchanger’ polymerase/editing activity.114

Table 1The nine different RNA sequences produced from transcription of a single DNAsequence (ACGT) according to the nine types of symmetric nucleotide exchangerules. The amino acid coded by the three first nucleotides according to the ver-tebrate mitochondrial genetic code is also indicated, as well as the percentage ofnucleotides that remain identical after that type of exchange transcription.

Exchange rule Initial DNA5′-ACGT-3′

Codonfor Thr

Similarity toinitial DNAsequence

A↔C 5′-CAGU-3′ Gln 50%A↔G 5′-GCAU-3′ Ala 50%A↔U 5′-UCGA-3′ Ser 50%C↔G 5′-AGCU-3′ Ser 50%C↔U 5′-AUGC-3′ Met 50%G↔U 5′-ACUG-3′ Thr 50%A↔C and G↔U 5′-CAUG-3′ His 0%A↔G and C↔U 5′-GUAC-3′ Val 0%A↔U and C↔G 5′-UGCA-3′ Cys 0%

In total, considering only the four usual nucleotides, nine sym- 115

metric nucleotide exchanges are possible, multiplying by nine the 116

coding potential of any single sequence. Six of these involve only 117

two types of nucleotides (A↔C, A↔G, A↔U, C↔G, C↔U, G↔U) 118

and three all four types of nucleotides, implying two symmet- 119

ric exchanges (A↔C+G↔U, A↔G+C↔U, and A↔U+C↔G). Table 1 120

shows the different RNA sequences produced by each of these rules 121

from a single, given initial DNA sequence. Note that this procedure 122

alters at least 50% of the nucleotides in the initial sequence used in 123

Table 1, and that the amino acid coded by the three first nucleotides 124

in that sequence is changed in almost all cases after systematic 125

symmetric nucleotide exchange. 126

Along the same lines, asymmetric nucleotide recodings are also 127

possible (such as an exchange rule including three nucleotide 128

exchanges, i.e., A→C, C→G and G→A, in total 14 asymmetric 129

exchange possibilities exist (including also rules with four asym- 130

metric nucleotide exchanges). For practical reasons, I explore here 131

only symmetric exchanges Separating symmetric from asymmet- 132

ric exchanges is also justified by the possibility that symmetric 133

and asymmetric nucleotide exchanges may depend upon different 134

types of polymerization (or editing) mechanisms. 135

First, I explore GenBank’s EST (expressed sequence tags) RNA 136

databank for sequences matching the ‘exchanged’ human mito- 137

chondrial genome according to each of the nine symmetric 138

exchange rules and report the results for the various types of 139

exchanges. Then Blast alignment analyses explore whether RNA 140

recoded by each of these exchanges could be coding for pro- 141

teins, using various bioinformatics methods to indicate whether 142

the detected putative overlapping genes seem functional or not. 143

A meta-analysis of the data shows that frequencies of RNAs 144

associated with the different types of symmetric exchanges are 145

proportional to the bioinformatics estimations of overlap protein 146

coding gene functionalities, indicating that coding compression 147

through RNA exchange/editing occurs, and this at different fre- 148

quencies for different types of nucleotide exchanges. Most notably, 149

DNA nucleotide misinsertion rates during replication predict rates 150

of nucleotide exchanging RNA transcription. 151

2. Materials and methods 152

2.1. Sequence manipulations and alignments with existing RNA transcripts 153

All analyses are done for GenBank’s reference complete human mitochondrial 154

genome (NC 012920). Its entire sequence is copy pasted from GenBank into a blank 155

Microsoft Word file. In ‘Word’, the sequence of the genome was altered by using the 156

software’s ‘Replace’ function, mimicking a putative systematic nucleotide exchange. 157

For example, for the symmetric exchange rule A↔C, the function ‘Replace’ was used 158

to replace all ‘A’s in the genome by ‘X’, then all ‘C’s by ‘A’, and then all ‘X’s by ‘C’. The 159

intermediate stage using ‘X’ (or any other arbitrary symbol differing from the four 160

letters used to symbolize the four nucleotides) is necessary to avoid that ‘A’s changed 161

into ‘C’s at the first step are changed back into ‘A’ at the second step. The resulting 162

Original text:

Inserted Text



Original text:

Inserted Text


Original text:

Inserted Text

a; Faure et al., 2011; Seligmann, 2012c,d).

Original text:

Inserted Text

3’-to-5’

Original text:

Inserted Text

3’-to-5’

Original text:

Inserted Text

2012e).

Original text:

Inserted Text

2012f).

Original text:

Inserted Text

.

Original text:

Inserted Text

DNA->RNA

Original text:

Inserted Text

DNA->RNA

Original text:

Inserted Text

5’-ACGT-3’Codon

Original text:

Inserted Text

sequenceA<->C5’-CAGU-3’Gln50%A<->G5’-GCAU-3’Ala50%A<->U5’-UCGA-3’Ser50%C<->G5’-AGCU-3’Ser50%C<->U5’-AUGC-3’Met50%G<->U5’-ACUG-3’Thr50%A<->C and G<->U5’-CAUG-3’His0%A<->G and C<->U5’-GUAC-3’Val0%A<->U and C<->G5’-UGCA-3’Cys

Original text:

Inserted Text

A<->C, A<->G, A<->U, C<->G, C<->U, G<->U

Original text:

Inserted Text

A<->C+G<->U, A<->G+C<->U, and A<->U+C<->G

Original text:

Inserted Text

. A->C, C->G and G->A,

Original text:

Inserted Text

Methods

Original text:

Inserted Text

A<->C

HE

Sticky Note

"in press -a" should be "2012f"

HE

Sticky Note

this reference should be either "2013, in press" or simply "in press". It refers to the listed reference: Seligmann, H. Putative protein-encoding genes within mitochondrial rDNA and the D-loop region. In: Lin, Z., Liu, W. (Eds.), Ribosomes: Molecular Structure, Role in Biological Functions and Implications for Genetic Diseases, Nova Publishers, in press.

HE

Sticky Note

this should read "DNA polymerase gamma", not "gamma polymerase", please invert order of words.



BIO 3360 1–19

H. Seligmann / BioSystems xxx (2013) xxx– xxx 3

sequence where all ‘A’s present in the initial genome are replaced by ‘C’, and all ‘C’s163

in the initial genome are replaced by ‘A’, is copy/pasted from Word into GenBank’s164

online alignment software ‘Blastn’. Blastn is then requested to search, according165

to standard default search parameters, for RNA sequences from its ESTs database166

and matching that altered sequence, the human mitochondrial genome assuming167

systematic symmetric nucleotide exchange A↔C. This procedure combining Word168

and Blastn is repeated for each of the nine systematic symmetric exchange rules in169

Table 1.170

2.2. Prediction of secondary structure171

Mfold (Zuker, 2003) is used to predict secondary structure formation. This is172

done for the complete (exchanged) sequence of genes for which exchange trans-173

cripts are detected. Mfold’s output presents the secondary structures that are within174

5% of the optimal (most stable) secondary structure. The number of secondary struc-175

tures in which a site does not participate in self-hybridization is indicated by Mfold’s176

‘ss-number’. This number is averaged across all nucleotides, divided by the total177

number of secondary structures predicted by Mfold. This number represents the178

average ‘loopiness’ (tendency to form loops) for that RNA sequence. It is calculated179

separately for regions belonging to RNA transcripts that are transcribed accord-180

ing to a systematic nucleotide exchange rule, and for the rest of that gene. The181

difference between the latter loopiness and the loopiness of the region that has182

been transcribed by symmetric nucleotide exchange estimates the loopiness of the183

exchange transcribed region as compared to the rest of the gene, assuming it were184

also exchange transcribed. Potentially, this subtraction can indicate whether regions185

that were transcribed according to nucleotide exchange rules differ in secondary186

structure formation propensities from other regions of that gene.187

2.3. Candidate overlapping genes detected by Blastp alignments188

In order to investigate potential protein coding by nucleotide exchange, I trans-189

lated into putative protein sequences all six frames of all 13 human mitochondrial190

protein coding genes, after exchanging nucleotides along each of the 9 exchang-191

ing rules. Translation of exchanged RNAs was done by the online available software192

‘transeq’ at the EMBL-EBI site (http://www.ebi.ac.uk/Tools/st/), according to the reg-193

ular vertebrate mitochondrial genetic code, inserting asterisks (*) where stop codons194

occur in the exchange transcribed sequence. Hence putative proteins do not deter-195

mine the identities of amino acids inserted where stop codons occur in the exchange196

transcribed RNA. For any single sequence, a total of 6 × 9 = 54 hypothetical protein197

sequences were produced across frames and exchange rules for each protein coding198

gene, and a total of 13 × 54 = 702 hypothetical protein sequences for the 13 regular199

protein coding sequences in the human mitochondrial genome were examined.200

These 702 hypothetical protein sequences were analyzed by GenBank’s Blastp201

(Altschul et al., 1997, 2005) using standard default parameters of Blastp as has been202

used and described in previous publications (Seligmann, 2011a, 2012c,d,e, in press-203

a, in press-b). Blastp indicates whether the putative peptide is similar to proteins204

existing in GenBank. It produces a homology hypothesis that indicates the candidate205

overlapping genes coded after nucleotide exchange transcription.206

2.4. Duration spent single stranded by DNA during replication by mitochondrial207

protein coding genes208

Some analyses below describe patterns in nucleotide contents due to repli-209

cational deamination gradients along a gradient of duration of DNA single210

strandedness during DNA replication. This is because single strandedness increases211

A→G and C→T deamination rates more than it increases the opposite mutations212

G→A and T→C (Fredrico et al., 1990).213

These spontaneous mutations are counterselected at coding sites, but have214

detectable effects on nucleotide contents at third codon positions in protein coding215

genes (Krishnan et al., 2004a,b; Seligmann et al., 2006). Third codon positions usually216

have also an additional coding role when a sequence is involved in overlap coding.217

Systematic nucleotide exchanges may reveal such overlapping protein coding genes.218

The replicational gradient should not be detectable in such overlap coding regions219

if these are functional (Seligmann, 2012a,d). Hence analyses of effects of replica-220

tional gradients on nucleotide contents at third codon positions should highlight221

the coding status of overlapping genes.222

For that purpose, durations spent single stranded during replication are calcu-223

lated for each human mitochondrial protein coding gene, using the genes midpoint224

location along the genome. Duration spent single stranded by a site is a function of225

the distance of that site from the heavy strand replication origin (OH) and the light226

strand replication origin (OL). This duration spent single stranded is 2 × b/N for the227

genes ND1 and ND2 (genes located between the OH and OL), where b is the midlo-228

cation of the genes in the number of nucleotides counted from the OH, in the 5′→3′229

direction, of the genome’s heavy strand, and N is the total genome length. Note that230

standard mitochondrial genome annotations in GenBank indicate the numberings231

according to the light strand, which may cause some confusions in calculating dura-232

tions spent single stranded during replication (Tanaka and Ozawa, 1994; Raina et al.,233

2005; Seligmann et al., 2006). For the other genes, replicational single strandedness234

is 2 × (OL − b)/N, where OL indicates the midlocation of the light strand replicational235

origin, according to heavy strand numbering.

2.5. Circular code analyses 236

The circular code theory indicates that a set of 20 autocomplementary codons 237

(the 20 codons include the inverse complement of each of these codons) is overrep- 238

resented in the coding frame of regular protein coding genes (Arqués and Michel, 239

1996, 1997). Coded alphabetical communication in human languages consists typi- 240

cally of letters forming words, and of punctuation symbols (comma, question mark, 241

etc). Besides stop codons, in the genetic code, codons coding for amino acids appar- 242

ently have also ‘punctuation’ roles: the circular code codons apparently regulate the 243

reading frame, as suggested also by circular code properties of ribosomal RNA that 244

interacts with the mRNA (Michel, 2012). 245

It seems that when more than one frame in a sequence is coding, the statisti- 246

cal property of overrepresentation of circular code codons is lost, perhaps because 247

‘punctuation’ signals of several frames are mixed, or inexistent to facilitate pas- 248

sage between frames. On the other hand, homopolymer codons (AAA, CCC, GGG, 249

TTT), which tend to cause frameshifts (one of the main mechanisms for switching 250

between coding frames) are relatively overrepresented in overlap coding regions 251

(Ahmed et al., 2007; Ahmed and Michel, 2011). 252

Sequences solely composed by these 20 codons have a non-redundant feature: 253

if nucleotide triplets are not read according to the frame of the codons that compose 254

the sequence, one will soon find a codon that is not part of the initial set of 20 codons, 255

indicating that the reading frame is ‘incorrect’. This lack of redundancy between 256

frames is one of the characteristics of circular codes, and could be related to the 257

reason why circular code codons are underrepresented in overlapping genes. Hence 258

the proportion of homopolymers among the sum of homopolymers and circular code 259

codons should be greater in predicted overlap coding sequences than in adjacent 260

regions of a gene. Statistical confirmation of this prediction by sequence data should 261

be considered as consisting independent evidence for the function of that sequence 262

as overlap coding, in this case after systematic symmetric nucleotide exchange. 263

Note that the natural circular code is characterized by a set of 20 autocomple- 264

mentary codons associated with each frame, where the 20 circular codons of frames 265

+1 and +2 are produced by specific permutations of the nucleotides in the circular 266

codons in frame ‘0’ (frame +1: the first nucleotide in frame ‘0’ is permuted to the 267

third position; frame +2: the third nucleotide in the frame ‘0’ circular codon is per- 268

muted to first codon position, producing the circular code codon of frame +2). None 269

of these three sets of 20 circular codons includes any of the four homopolymers. 270

Hence tests performed here are on averages of homopolymer/circular code propor- 271

tions calculated over the three frames for each set of 20 circular code codons (frame 272

0, +1 and +2 circular codes). 273

2.6. Kinetics of nucleotide misinsertions and systematic nucleotide exchanges 274

It is plausible that rates (or frequencies) of the various types of systematic 275

nucleotide exchanges during RNA transcription correspond to known rates of occa- 276

sional nucleotide misinsertions by polymerases. These are not known for the RNA 277

polymerase, but one might use as proxy those known for the human mitochon- 278

drial DNA polymerase gamma (Lee and Johnson, 2006). These kinetic parameters 279

are indicated as kd and kpol, respectively, in Table 2 from Lee and Johnson (2006). 280

For each type of systematic symmetric nucleotide exchange, I averaged the corre- 281

sponding kds, and separately, kpols from Lee and Johnson (2006). For example, for 282

the systematic symmetric nucleotide exchange A↔C, the kd’s averaged were 160 283

(A→C), 540 (C→A), 150 (G→G) and 57 (T→T), resulting in the mean kd for that 284

type of nucleotide exchange of 226.75 �M. One expects that some proportionality 285

exists between these averages with independent estimates of frequencies or rates of 286

nucleotide exchange polymerization. Positive results would be strong confirmation 287

of the working hypothesis, as they would explain observations on transcripts exist- 288

ing in GenBank by independent parameters of DNA misinsertion polymerization 289

kinetics. 290

3. Results and discussion 291

3.1. RNAs in GenBank 292

A priori, there is no evidence that systematic nucleotide 293

exchanges occur, but the large online databases of RNA sequences 294

(expressed sequence tags, EST, in GenBank) allow searching for 295

RNAs that match the assumed exchange-based recoding of regular 296

genes. I explore, for all 9 symmetric nucleotide exchanges pre- 297

sented in Table 1, whether such RNAs exist in the database for the 298

complete human mitochondrial genome. Table 2 presents all RNAs 299

detected by Blastn (Zhang et al., 2000) for GenBank’s EST database 300

that align with some parts of the human mitochondrial genome, 301

after that genome has been recoded according to each of the nine 302

systematic symmetric nucleotide exchanges. 303

There are 51 such RNA sequences originating from 12 indepen- 304

dent studies of RNA expression. No RNA sequence was detected for 305

Original text:

Inserted Text



Original text:

Inserted Text


Original text:

Inserted Text

A<->C

http://www.ebi.ac.uk/Tools/st/

Original text:

Inserted Text

6x9=54

Original text:

Inserted Text

13x54=702

Original text:

Inserted Text

analysed

Original text:

Inserted Text

1997, Altschul et al., 2005)

Original text:

Inserted Text

a; 2012c-f, in press).

Original text:

Inserted Text

singlestrandedness

Original text:

Inserted Text

singlestrandedness

Original text:

Inserted Text

A->G and C->T

Original text:

Inserted Text

G->A and T->C

Original text:

Inserted Text

2006a).

Original text:

Inserted Text

, e).

Original text:

Inserted Text

2xb/

Original text:

Inserted Text

5’->3’

Original text:

Inserted Text

2007;

Original text:

Inserted Text

singlestrandedness is 2x(OL-b

Original text:

Inserted Text

etc…).

Original text:

Inserted Text

frames+1 and+2

Original text:

Inserted Text

A<->C

Original text:

Inserted Text

A->C), 540 (C->A), 150 (G->G) and 57 (T->T

Original text:

Inserted Text

Discussion

HE

Sticky Note

"in press-a" should be "f" "in press-b" should be "in press"

HE

Sticky Note

"genes" should be "gene's"

HE

Sticky Note

"160" should be replaced by "540"

HE

Sticky Note

"540" should be "160"



BIO 3360 1–19


Table 2Human RNA transcripts detected by Blastn in GenBank’s EST database and aligning with human mitochondrial genome sequences after symmetrically exchanging nucleotidesin the sequence. Columns are: 1. exchange nucleotide rule; 2. gene, and DNA strand matching EST transcript; 3. alignment first and last nucleotides; 4. alignment length; 5.similarity (%) between aligning sequences; 6. description of EST; 7. EST entry in GenBank; 9. EST reference.

Sub Gene Loc N Si Origin ID Ref

AG ND1− 687–812 131 77 Similar to NADH1, renal cell tumor AI367501 62 1AG ND5− 1423–1577 165 95 Homo sapiens hypothalamus AV721614 30 2AG 12s− 403–634 138 88 Normalized rat brain AI230934 53 3AG Ser4− 2–69 68 100 Female pectoral muscle after mastectomy AJ574322 50, AJ574357 51 4AU ND1+ 357–851 495 99 Colon AW176957 53 5AU ND1+ 1–231

27–231231210

9793

Colon insColon ins

BF798658 52, BF798678 53 BF798657 50 6

AU ND2+ 202–593 395 99 Head neck, FAPESP/LICR Human CancerGenome Project

AI940581 57 5

AU CO1− 122–634 513 99 Colon, The FAPESP/LICR Human CancerGenome Project

AW176982 50 5

AU ND4+ 852–1240 391 99 Colon CK327105 54 6AU 12s+ 2–268 271 97 Colon ins BF798660 51, BF798653 51, BF798647 (–) 52 6AU 16s+ 1405–1559 156 99 Colon ins BF798658 52, BF798678 53 6AU Leu2+ 1–75 75 100 Colon ins BF798658 52, BF798678 53 6CG AT6+ 486–669 188 83 Thymus BX457166 47 7CG 16s+ 565–715 151 97 Adult heart, female pectoral muscle after

mastectomyN41204 38, AJ574341 36, AJ574283 49 8, 4

CU ND1+ 770–952 183 99 Female pectoral muscle after mastectomy AJ574326 63 4CU CO1+ 1509–1542 34 100 Female pectoral muscle after mastectomy AJ574346 48, AJ574322 45 4CU ND4+ 851–1042 196 95 Prostate normal BF370011 56 6CU ND4+ 1265–1378 114 96 Female pectoral muscle after mastectomy AJ574311 60 4CU CytB+ 852–1133 285 99 Hypothalamus AV722273 59, AJ574321 56, AJ574347 57,

AJ574344 58, AJ574371 55, AJ574334 53,AJ574333 54

2, 4

CU 16s+ 24–401 378 97 Hypothalamus AV722267 45 2CU 16s+ 361–556 196 95 Female pectoral muscle after mastectomy AJ574370 46 4CU 16s+ 411–776 367 96 Hypothalamus, female pectoral muscle after

mastectomyAV721363 48, AJ574341 46, AJ574327 45 2, 4

CU 16s+ 122–382 211 99 Female pectoral muscle after mastectomy AJ574335 40, AJ574332 42 4CU 16s+ 724–937 548 99 Female pectoral muscle after mastectomy AJ574378 49, AJ574291 49 4GU ND1+ 12–377 369 92 Cell line AI525967 49 9GU ND2− 564–902 353 84 Nervous normal BI032899 59 6GU AT6+ 32–150 155 88 Gastric epithelial progenitor Mus musculus,

ATP6CF425368 36 10

GU AT6− 386–536 120 80 Head neck AW381066 61 5GU CytB+ 188–331 147 94 Adult heart AA413440 46 11GU 16s+ 328–780 456 95 Tissue culture AI541277 48 9GU 16s+ 46–423 302 96 Cell line AI525977 60 9GU 16s+ 57–1130 281 80 Human aorta polyA+ mRNA C15855 41 12

1. Strausberg, 1997. National Cancer Institute, Cancer Genome Anatomy Project (CGAP), Tumor Gene Index, unpub.2. Gu, Y., Peng, Y., Song, H., Huang, Q., Yang, Y., Gao, G., Xiao, H., Xu, X., Li, N., Qian, B., Liu, F., Qu, J., Gao, X., Cheng, Z., Xu, Z., Zeng, L., Xu, S., Gu, W., Tu, Y., Jia, J., Fu, G., Ren, S.,Zhong, M., Lu, G., Hu, R., Chen, J., Chen, Z., Han, Z., 2000. Homo sapiens cDNA HTB clones, unpub.3. Lee, N.H., Glodek, A., Chandra, I., Mason, T.M., Quackenbush, J., Kerlavage, A.R., Adams, M.D., 1998. Rat Genome Project: Generation of a Rat EST (REST) Catalog & Rat GeneIndex, unpub.4. Laveder, P., De Pitta, C., Vitulo, N., Valle, G., Lanfranchi, G., 2003. Oligo-directed RNase H cleavage of abundant mRNAs in skeletal muscle, unpub.5. Simpson, A.J.G., 1999 The FAPESP/LICR Human Cancer Genome Project, unpub.6. Dias Neto et al. (2000).7. Li, W.B., Gruber, C., Jessee, J., Polayes, D. 2001. Full-length cDNA libraries and normalization, unpub.8. Lui et al. (1995).9. Huang et al. (1999).10. Tidwell, R., Clifton, S., Marra, M., Hillier, L., Pape, D., Martin, J., Wylie, T., Theising, B., Bowers, Y., Gibbons, M., Ritter, E., Bennet, J., Ronko, I., Tsagareishvili, R., Belaygorod,L., Grow, A., Maguire, L., Waterston, R., Wilson, R., 2002. Unpublished.11. Liew et al. (1994).12. Fujiwara, T., Hirano, H., Katagiri, T., Kawai, A., Kuga, Y., Nagata, M., Okuno, S., Ozaki, K., Shimizu, F., Shimada, Y., Shinomiya, H., Takaichi, A., Takeda, S., Watanabe, T.,Takahashi, E., Hirai, Y., Maekawa, H., Shin, S., Nakamura, Y., 1995, unpub.

three exchange types: the exchange A↔C, and two of the three306

exchange rules involving all four nucleotide types, A↔C+G↔U,307

and A↔G+C↔U. Among nucleotide exchanges involving only two308

nucleotides, most common were RNAs where recoding exchanges309

involve uracil: C↔U (21 sequences), A↔U (14 sequences) and310

G↔U (8 sequences). The systematic exchanges A↔G and C↔G311

were found in 5 RNA sequences each. The exchange involving all312

four nucleotides A↔U+C↔G was quite common (10 sequences,313

data not presented in Table 2) and is analyzed in detail sepa-314

rately (Seligmann, 2012d). It is first of all notable that the three315

most common exchanges are those between uracil and the three316

other nucleotides. Hence uracil, which exchanges thymidine during317

regular transcription, seems also most frequently involved in 318

‘unusual’ exchanging transcription. 319

Blastn analyses detect in total 61 ‘exchanged’ sequences (includ- 320

ing the 10 for the A↔U+C↔G exchange rule not presented in 321

Table 2 (these 10 transcripts are presented in Table 1 from 322

Seligmann, 2012d)). This is 0.56% of the 10899 ESTs annotated 323

as from human mitochondrial origins in GenBank’s database by 324

June 2012. It would be also very interesting in this context to 325

explore the high accuracy transcript data available for the human 326

mitochondrial transcriptome (Mercer et al., 2011a,b). These data, 327

available at http://mitochondria.matticklab.com, are not search- 328

able at this point along the guidelines of nucleotide exchanging 329

Original text:

Inserted Text



Original text:

Inserted Text


Original text:

Inserted Text

Gene

Original text:

Inserted Text

Similarity

Original text:

Inserted Text

Description

Original text:

Inserted Text

1-687-81213177

Original text:

Inserted Text

5-1423-157716595

Original text:

Inserted Text

s-403-63413888

Original text:

Inserted Text

4-2-6968100

Original text:

Inserted Text

1+357-85149599

Original text:

Inserted Text

1+1-231 27-231231 21097 93Colon_ins Colon

Original text:

Inserted Text

2+202-59339599

Original text:

Inserted Text

1-122-63451399

Original text:

Inserted Text

4+852-124039199

Original text:

Inserted Text

+2-26827197

Original text:

Inserted Text

(-) 526AU16s+1405-155915699

Original text:

Inserted Text

2+1-7575100

Original text:

Inserted Text

6+486-66918883

Original text:

Inserted Text

+565-71515197Adult heart, Female

Original text:

Inserted Text

1+770-95218399

Original text:

Inserted Text

1+1509-154234100

Original text:

Inserted Text

4+851-104219695

Original text:

Inserted Text

4+1265-137811496

Original text:

Inserted Text

+852-113328599

Original text:

Inserted Text

+24-40137897

Original text:

Inserted Text

+361-55619695

Original text:

Inserted Text

+411-77636796

Original text:

Inserted Text

Female

Original text:

Inserted Text

+122-38221199

Original text:

Inserted Text

+724-93754899

Original text:

Inserted Text

1+12-37736992

Original text:

Inserted Text

2-564-90235384

Original text:

Inserted Text

6+32-15015588Gastric Epithelial Progenitor

Original text:

Inserted Text

6-386-53612080

Original text:

Inserted Text

+188-33114794

Original text:

Inserted Text

+328-78045695

Original text:

Inserted Text

+46-42330296

Original text:

Inserted Text

+57-113028180

Original text:

Inserted Text

1997,

Original text:

Inserted Text

unpub;.2. Gu,Y., Peng,Y., Song,H., Huang,Q., Yang,Y., Gao,G., Xiao,H.,Xu,X., Li,N., Qian,B., Liu,F., Qu,J., Gao,X., Cheng,Z., Xu,Z., Zeng,L., Xu,S., Gu,W., Tu,Y., Jia,J., Fu,G., Ren,S., Zhong,M., Lu,G., Hu,R., Chen,J., Chen,Z., Han,Z

Original text:

Inserted Text

.;3. Lee,N.H., Glodek,A., Chandra,I., Mason,T.M., Quackenbush,J.,Kerlavage,A.R., Adams,M.

Original text:

Inserted Text

upuib.;.4. Laveder,P., De Pitta,C., Vitulo,N., Valle,G., Lanfranchi,G

Original text:

Inserted Text

.;.5. Simpson A.J.G. 1999

Original text:

Inserted Text

.;.6. Dias Neto et al., 2000;.7. Li,W.B., Gruber,C., Jessee,J., Polayes,D. 2001

Original text:

Inserted Text

., 1995;.9. Huang et al., 1999 ;.10. Tidwell,R., Clifton,S., Marra,M., Hillier,L., Pape,D., Martin,J., Wylie,T., Theising,B., Bowers,Y., Gibbons,M., Ritter,E., Bennet,J., Ronko,I., Tsagareishvili,R., Belaygorod,L., Grow,A., Maguire,L., Waterston,R., Wilson,R., 2002. Unpublished;11. Liew et al., 1994;.11. Liew et al., 1994;.12. Fujiwara,T., Hirano,H., Katagiri,T., Kawai,A., Kuga,Y., Nagata,M., Okuno,S., Ozaki,K., Shimizu,F., Shimada,Y., Shinomiya,H., Takaichi,A., Takeda,S., Watanabe,T., Takahashi,E., Hirai,Y., Maekawa,H., Shin,S., Nakamura,Y., 1995,

Original text:

Inserted Text

A<->C

Original text:

Inserted Text

A<->C+G<->U, and A<->G+C<->U

Original text:

Inserted Text

uracile: C<->U (21 sequences), A<->U

Original text:

Inserted Text

G<->U

Original text:

Inserted Text

A<->G and C<->G

Original text:

Inserted Text

A<->U+C<->G

Original text:

Inserted Text

analysed

Original text:

Inserted Text

2012e).

Original text:

Inserted Text

A<->U+C<->G

Original text:

Inserted Text

2012e)).

http://mitochondria.matticklab.com/



BIO 3360 1–19


Fig. 1. Loopiness of transcripts in Table 2 as a function of their relative length. Sec-ondary structure predictions estimate the average number of structures in which theaverage site does not form a stem by self-hybridization in RNA (loopiness), assumingsymmetric exchanging transcription. The y axis is the subtraction of that mean forgene regions that are not within such nucleotide exchanging transcripts, from themean loopiness of the regions transcribed by exchanging transcription and listedin Table 2. The x axis is the relative proportion the exchanging transcript repre-sents from the total length of that gene. Gene identities, and the types of symmetricnucleotide exchange, are indicated next to each datapoint.

RNA transcription, but this database would probably yield addi-330

tional insights into the frequencies of the various types of exchange331

transcriptions.332

3.1.1. Exchanging artifacts333

The various sequences in Table 2 originate from 12 studies of334

RNA, with RNAs matching different types of exchanges originat-335

ing in some cases from the same study, and RNAs matching some336

types of exchanges originating from several studies. These data337

are important material evidence suggesting a family of previously338

undescribed RNA recoding types, a potentially major discovery for339

genomics and molecular biology. For that reason, these sequence340

data are examined along the lines of a number of possible artifacts.341

First, if all or most sequences originated from mainly one study,342

one could have suggested that exchanges were due to specific con-343

ditions associated with that study. Possibly, erroneous sequence344

manipulation, perhaps while incorrectly or only partly inverse345

complementing sequences by semi-automatized methods, could346

have created the sequences in Table 2. For example, the only347

symmetric exchange involving all four nucleotides for which cor-348

responding RNA has been detected (A↔U+C↔G) can result from349

complementing a sequence without inversing the nucleotide order,350

a possible, potential sequence manipulation error that could cre-351

ate the ten BLAST hits matching this exchange rule (which are not352

reported in Table 2). For analyses excluding the possibility of arti-353

facts for these 10 sequences, see Fig. 1 in Seligmann (2012d), which354

shows that their length increases with their relative secondary355

structure formation capacities. Erroneous partial complementing356

(of A↔U or C↔G) could create the RNAs detected and match-357

ing these two additional types of nucleotide exchanges. However,358

these annotation artifacts could not explain the occurrences of RNA359

corresponding to A↔G, C↔U and G↔U. It is most probable that360

the data in Table 2 are not the result of such in silico sequence361

manipulations, especially that as many as 12 studies produced such362

sequences.363

Another possibility is that of a statistical artifact. The exchanged 364

sequences usually exchange between two nucleotides, so they 365

remain identical to the original, regular sequence for the two other 366

nucleotides. Hence on average, half of the nucleotides are being 367

exchanged, expecting a mean similarity between the exchanged 368

RNA and the regular transcript of 50%. However, all (but one) 369

similarities in Table 2 are >80%, and only 7 are below 90%. 370

Nucleotide ratios vary locally, so that high similarities that do 371

not imply exchange transcription are possible because locally, in 372

these sequences, the exchanged nucleotides might have very low 373

frequencies. However, the sequences in Table 2 and their high sim- 374

ilarities are not compatible with extreme local nucleotide biases 375

creating the illusion of exchange transcription: the exchanged 376

nucleotides do never represent less than 30% of the EST sequence, 377

which would yield at best a similarity of 70% with the regular tran- 378

script, assuming that no nucleotide exchange actually occurred and 379

that the RNA reported in Table 2 does not result from exchanging 380

transcription, but from the low local proportion of the exchanged 381

nucleotides in its composition. In fact, all the RNA sequences 382

include all four nucleotides, and this in proportions that seem 383

incompatible with the high similarities observed if no systematic 384

transcriptional exchange occurred (see Table 2, percentages are 385

indicated next to GenBank entries). Hence nucleotide biases did 386

not create false positives for exchanging transcription for the wide 387

majority of transcripts presented in Table 2. Therefore, most data 388

in Table 2 does not result from statistical artifacts involving local 389

nucleotide biases. The specific cases of potential exceptions, three 390

transcripts in Table 2 with low similarities, are examined in some 391

details in a section below. 392

3.1.2. Alternative biological explanations 393

The next potential problems with the nucleotide exchange 394

interpretation of the data in Table 2 are of biological natures. It is 395

possible that regular transcription of other, nuclear DNA sequences, 396

produces the transcripts in Table 2. This possibility cannot be totally 397

ruled out a priori. The RNAs in Table 2 have high and even very 398

high similarities with the mitochondrial sequences after assuming 399

exchanging transcription. This means that if these RNAs are pro- 400

duced by regular transcription of nuclear (or cytosolic) sequences, 401

and not by exchanging transcription of mitochondrial sequences, 402

these nuclear sequences resulted from exchanging reverse tran- 403

scription of mitochondrial RNA into nuclear DNA, or some other 404

exchanging process creating a nuclear mitochondrial (pseudo)gene 405

that involved systematic nucleotide exchanges. 406

Hence even if the RNAs in Table 2 would not be the result 407

of exchanging transcription (=exchanging RNA polymerization), 408

they would reflect exchanging DNA polymerization. Searching with 409

BLAST GenBank’s human genome data does not yield any posi- 410

tive hits for the mitochondrial sequences transformed according 411

to any of the nine symmetric exchanging transcription rules. This 412

negative result does not totally rule out the possibility that regu- 413

lar transcription of nuclear or cytosolic DNA is at the origin of the 414

RNAs in Table 2, but there is no evidence to sustain this possibility 415

at this point. Hence this biological interpretation seems unlikely. 416

However, even if this nuclear origin was true, it would indicate 417

that DNA polymerization following exchanging rules occurs. Such 418

exchanging DNA polymerization would still be important indirect 419

evidence in favor of the working hypothesis that exchanging RNA 420

polymerization occurs, and would be compatible with the exist- 421

ence of protein coding genes within these exchanged sequences. 422

It would be direct evidence for the creation of new genes through 423

nucleotide exchanges. 424

The last considered biological alternative to exchanging tran- 425

scription relates to the fact that all the transcripts in Table 2 426

originate from studies that date from before the year 2004. This 427

suggests that the RNAs might result from rare dysfunctions by the 428

Original text:

Inserted Text



Original text:

Inserted Text


Original text:

Inserted Text

artefactsThe

Original text:

Inserted Text

artefacts

Original text:

Inserted Text

A<->U+C<->G

Original text:

Inserted Text

artefacts

Original text:

Inserted Text

(2012e),

Original text:

Inserted Text

A<->U or C<->G

Original text:

Inserted Text

artefacts

Original text:

Inserted Text

A<->G, C<->U and G<->U

Original text:

Inserted Text

artefact

Original text:

Inserted Text

artefacts

Original text:

Inserted Text

.3.1.2. Alternative

Original text:

Inserted Text

polymerisation

Original text:

Inserted Text

favour

HE

Sticky Note

this should be "do", not "does"

HE

Sticky Note

"a priori" should be in italics



BIO 3360 1–19


reverse transcriptases used in the creation of cDNA libraries from429

extracted RNA transcripts, and which form the EST databases. It is430

indeed possible that such flaws were discovered at some point and431

that EST libraries produced after 2003 are exempt of these flaws. If432

this is the case, the data in Table 2 do not reflect directly exchang-433

ing transcription activity, but exchanging reverse transcription.434

At this stage, the correct interpretation of the data presented in435

Table 2 could be that occasionally natural exchanging transcrip-436

tion occurs, or that occasionally exchanging reverse transcription437

by the enzymes used to create the EST libraries occurs. However,438

even in the latter case, such exchanging reverse transcription would439

be at least indirect evidence that exchanging transcription and440

associated protein coding genes might exist, as RNA and DNA poly-441

merases have great similarities. It is notable in this context that the442

human mitochondrial polymerase gamma, which usually replicates443

the mitochondrial genome, has also reverse transcription activity444

(Kasiviswanathan and Copeland, 2011).445

Even if the RNAs in Table 2 are not of natural origin, but result446

from some kind of dysfunctions by the reverse transcriptase used447

to produce the EST libraries, the frequencies of the various types448

of nucleotide exchanges suggested by the data in Table 2 would449

still be informative from a biological point of view: these dysfunc-450

tion frequencies would probably indicate the occurrence of natural451

dysfunctions of these types. In the next section, analyses of sec-452

ondary structures formed by exchanged transcripts suggest that453

the reverse transcriptase ‘artefact’ is the less likely explanation454

3.1.3. Secondary structure formation by exchanging transcripts455

According to the scenarios described in Section 3.1.2, the data456

in Table 2 would reflect a biological reality, which has 2–3 alterna-457

tive interpretations, but all based on the principle of ‘exchanging’458

polymerization, of RNA on the basis of DNA (transcription), of DNA459

on the basis of DNA (replication), or of DNA on the basis of RNA460

(reverse transcription).461

A further analysis confirms that the sequences in Table 2 reflect a462

biological phenomenon, most probably due to RNA polymerization,463

though transcript edition cannot be ruled out. Using Mfold (Zuker,464

2003), secondary structure formation by gene sequences tran-465

scribed assuming the specific exchanging rule was predicted for466

the complete (exchanged) sequence of genes for which exchange467

transcripts were found (Table 2). I calculated separately for the468

regions belonging to the RNA sequences presented in Table 2, and469

for the rest of that gene the mean of the percentage participa-470

tion in loops for nucleotides. Loopiness of the ‘exchanged RNA’ was471

greater in regions that underwent exchanging transcription accord-472

ing to Table 2 than in the rest of the gene transcribed, assuming473

exchanging transcription for that region (though no such exchange474

transcription was detected for that region), in 23 of the 33 (62%)475

of the sequences for which ‘exchanging’ transcripts were detected.476

Table 2 lists more sequences because in some cases several trans-477

cripts were found matching the same genome region. This slight478

majority is statistically significant according to a one tailed sign test479

(P = 0.047), suggesting that the transcripts produced by exchanging480

transcription tend usually to form less secondary structures than481

the rest of the gene (assuming it was also exchange transcribed).482

This means that the transcripts in Table 2 have a common feature,483

and are not a random sample of potential transcripts.484

The tendency for high loopiness differed among various types485

of exchange transcriptions, it is weakest for A↔U exchanges486

(33%), intermediate for C↔G and G↔U exchanges (50%), 60% for487

A↔U+C↔G exchanges (not in Table 2), and strongest for A↔G and488

C↔U exchanges (100 and 90%, respectively). These differences are489

in no way statistically significant, as the number of cases is too low490

for even considering statistical tests (one transcript for A↔G and491

two for C↔G).492

However, for C↔U, considering that there are 10 cases, a one 493

tailed sign test indicates a statistically significant tendency for 494

greater loopiness in regions that underwent C↔U exchanging tran- 495

scription than in other regions of the same gene, assuming they too 496

had undergone nucleotide exchange transcription along the C↔U 497

rule. For the 10 transcripts following the C↔U rule, the probability 498

of getting 9 among 10 cases where loopiness is greater than for the 499

rest of the gene, yields according to a binomial distribution (the dis- 500

tribution used in sign tests), the statistical significance P = 0.0054. 501

In other words, if one was to assume that loopiness in exchange 502

transcribed regions is as likely to be above as below the loopi- 503

ness in surrounding regions, the probability of getting 9 among 10 504

exchange transcripts with greater loopiness is about half a percent. 505

Hence it is unlikely that exchange transcribed regions have on aver- 506

age the same loopiness as other regions. This tendency indicates 507

that self-hybridization disfavours the production of ‘exchanged’ 508

transcripts. This strengthens the possibility that exchanges result 509

from editing of transcripts, where secondary structure might pre- 510

vent or at least impede editing after polymerization. However, 511

this does not preclude that exchanges occurred during transcrip- 512

tion itself. If RNA polymerization that systematically exchanges 513

nucleotides is relatively slow, it might be particularly impeded by 514

secondary structure formation, and hence loopiness might promote 515

it. 516

3.1.4. The length of exchanging transcripts 517

An additional observation might give a clue on the nature of 518

the process involved, and which relates to the capacity for sec- 519

ondary structure formation by the RNAs in Table 2 in relation to 520

their length: the loopiness of exchanging transcripts, as compared 521

to the rest of the gene sequence, decreases with the relative length 522

of the exchanging transcript (Fig. 1). This suggests that exchange 523

transcription (or edition) is favored by free access to the elon- 524

gating RNA polymer, but that in order for that polymer to reach 525

a sizeable proportion of the total length of the gene, it should 526

form secondary structure. The ‘paradox’ between the requirement 527

that an exchange-transcribed region forms little secondary struc- 528

ture, and the requirement, for its elongation, that it self-hybridizes, 529

could explain why transcripts produced by systematic nucleotide 530

exchanges are rare. 531

I propose in this context the following interpretation. By def- 532

inition, exchanging transcription does not produce RNA that is 533

the inverse complement of its template DNA strand, and hence 534

the elongating RNA does not form a duplex with DNA dur- 535

ing its elongation. As a result, it is single stranded and open 536

to digestion by endoribonucleases, which would shorten them. 537

Hence in order to reach relatively great lengths, polymeriza- 538

tion of non-complementary RNA (or DNA) requires protection 539

by self-hybridization (secondary structure formation), as it can- 540

not be protected by hybridization with existing DNA (or RNA) as 541

for regular transcripts. Regular transcripts are protected by both, 542

hybridization with the ‘maternal’ strand and self-hybridization, but 543

for transcripts produced by exchanging transcription, complemen- 544

tarity is much lower, and hence elongation is much more dependent 545

on protection due to self-hybridization. In the extreme case of the 546

exchange rule that involves two pairs of exchanged nucleotides 547

(A↔U+C↔G), there is no complementarity at all, and protection can 548

only result from self-hybridization. Therefore for the 10 transcripts 549

following that rule, the correlation between relative loopiness and 550

transcript length is much stronger than for the other exchange 551

transcription types: r = −0.65. The association for the rest of the 552

transcripts, in Fig. 1, is much weaker and is only statistically sig- 553

nificant if transcripts are split into two groups, those below and 554

those above the relative length of 20% of the length of their gene. 555

A one tailed Fisher exact test indicates that there are more trans- 556

cripts with more loopiness in the exchange transcribed part than 557

Original text:

Inserted Text



Original text:

Inserted Text


Original text:

Inserted Text

explanation.3.1.3. Secondary

Original text:

Inserted Text

section 3.1.2.,

Original text:

Inserted Text

2-3

Original text:

Inserted Text

polymerisation

Original text:

Inserted Text

A<->U

Original text:

Inserted Text

C<->G and G<->U

Original text:

Inserted Text

A<->U+C<->G

Original text:

Inserted Text

A<->G and C<->U

Original text:

Inserted Text

A<->G and two for C<->G).However, for C<->U

Original text:

Inserted Text

C<->U

Original text:

Inserted Text

C<->U

Original text:

Inserted Text

C<->U

Original text:

Inserted Text

.This

Original text:

Inserted Text

.3.2The

Original text:

Inserted Text

favoured

Original text:

Inserted Text

A<->U+C<->G

Original text:

Inserted Text

=-0.65.

HE

Sticky Note

this sentence should have a "." at its end. please add "."



BIO 3360 1–19


the rest of the gene (positive loopiness values on the y axis in558

Fig. 1) for transcripts with relative length <20% than for those with559

relative length >20% (P = 0.052). Excluding the extreme length out-560

lier indicated by a triangle in Fig. 1 yields P = 0.042. Including the561

ten sequences of the A↔U+C↔G exchange type (from Table 1 in562

Seligmann, 2012d), the test yields P = 0.023.563

These negative correlations between exchange transcript564

lengths and loopiness indicate that endoribonucleases (or other565

enzymes with similar activities) are active during exchanging tran-566

script production. This situation is not totally incompatible with the567

possibility that the sequences listed in Table 2 were produced by568

occasional dysfunctional reverse transcription to create the cDNA569

libraries in GenBank, but seems more in line with transcription570

occurring under natural physiological circumstances. Hence the571

working hypothesis that occasionally, various types of exchanging572

transcriptions occur under natural physiological conditions, seems573

the most probable explanation for the data in Table 2, and is not574

incompatible with the alternative explanations that could not be575

totally ruled out (exchanging replication or exchanging reverse576

transcription).577

3.1.5. Putative protein coding genes in exchanging transcripts578

Table 2 presents some data favoring the working hypothesis579

of exchange transcription. The working hypothesis is formulated580

on the basis of an evolutionary principle of minimizing costs due581

to genome size, assuming that overlap coding (which in this case582

results from exchange transcription) increases the number of genes583

and the genome’s coding density without increasing its size. Hence584

evidence confirming that transcripts of protein coding genes after585

systematically exchanging nucleotides potentially include regions586

that code for proteins would strengthen the hypothesis on two587

grounds: first, because it would confirm the basic evolutionary588

principle subjacent to the advantage associated with exchange589

transcription by indicating its role in revealing coding potential;590

and second, because consistent patterns in (exchange transcrip-591

tion) overlap coding genes would be in themselves evidence that592

exchange transcription occurs, independently of physical evidence593

for RNA transcripts presumably produced by exchange transcrip-594

tion (Table 2). In addition, if analyses of coding properties of RNA595

after nucleotide exchange converge with those in Table 2, for exam-596

ple if coding seems more probable for exchange types that are597

relatively more represented in Table 2, and less probable in those598

for which no transcripts were detected, this coherence between dif-599

ferent types of independent data and analyses would, in the context600

of a meta-analysis, be strong evidence for (overlap) protein coding601

based on exchange transcription.602

There are 702 hypothetical peptides for the 13 human mito-603

chondrial protein coding genes. These were analyzed by GenBank’s604

Blastp (Altschul et al., 1997, 2005) and hits with proteins existing in605

GenBank were recorded (Table 3). These analyses produced numer-606

ous hits, from 9 for A↔C exchanges, to 36 for G↔U exchanges, in607

total between 483 codons (for A↔C exchanges) and 2801 codons608

(for A↔G exchanges) putatively involved in overlap coding asso-609

ciated with exchange transcription. It is notable that several hits,610

mainly for exchanges involving transitions C↔U and A↔G, were611

for the frame corresponding with the gene’s regular main frame,612

and with proteins that are homologous to the protein coded by the613

regular main frame of that gene. These cases may be of interest, but614

are excluded from analyses of overlapping genes presented here,615

and also from the statistics on putative overlapping genes at the616

beginning of this paragraph.617

It is notable that the average length of putative alignments618

detected by Blastp for a type of nucleotide exchange is proportional619

to the number of transcripts detected for that type of exchange620

as reported in Table 2 (Pearson parametric correlation coeffi-621

cient r = 0.747, P = 0.0104; Spearman nonparametric correlation622

Fig. 2. Mean length of putative overlapping protein coding genes predicted by Blastpanalyses (from Table 3) as a function of the number of exchanging transcripts accord-ing to that exchange rule (from Table 2). The type of nucleotide exchange assumedby analyses is indicated next to each datapoint, followed by the number of align-ments with GenBank proteins interacting with DNA or RNA, membrane proteins,and proteins with physiological functions typical of mitochondria.

coefficient rs = 0.75, P = 0.0166, one tailed tests, Fig. 2). This result 623

is a type of meta-analysis of the data of all exchange transcription 624

types that is indicative that overall, overlap coding by nucleotide 625

exchange might occur, and this proportionally to the observed fre- 626

quency of exchange transcription. 627

3.1.6. Functions of proteins coded by ‘nucleotide exchange’ 628

encrypted overlapping genes 629

Table 3 suggests that Blastp alignment analyses of the 702 630

peptides translated from the exchange transcribed human pro- 631

tein coding genes detect 168 previously undetected polypeptides 632

putatively coded by exchange overlap coding. These genes were 633

previously undetected. This means that 23.9% of the hypothetical 634

translated sequences (the percentage ranges from 11.5% for A↔C 635

exchanges, to 46.2% for A↔U exchanges) are potentially protein 636

coding. For the sake of comparison, note that for ‘regular’ overlap 637

coding in the same sequences of that species, induced by suppressor 638

tRNA activity, the same Blastp analyses yield 24 putative overlap- 639

ping genes (36.9% of the hypothetical translated sequences from 640

the five alternative frames for the 13 genes, see Seligmann, 2011a). 641

According to Table 3, these putative exchange overlapping genes 642

include alignments with 11 proteins interacting with DNA or RNA 643

(4 for G↔U exchanges, 3 for A↔G as well as A↔U+C↔G exchanges 644

(data not in Table 3 for that exchange type that is analyzed in detail 645

by Seligmann, 2012d), and one for A↔G+C↔U exchanges). These 646

putative overlapping genes might code themselves for protein(s) 647

involved in the production of the exchange transcripts. This could 648

be indicated by the positive correlation between their percent- 649

age within the sample of putative overlapping genes and observed 650

exchange transcript numbers (r = 0.45, not statistically significant 651

even at P < 0.20). 652

Fig. 2 indicates the number of such candidate overlapping genes 653

for each type of nucleotide exchange, the number of predicted 654

membrane proteins, and of proteins with functions frequently asso- 655

ciated with typical mitochondrial metabolism. The latter are most 656

numerous, in total 29 and occur in all nucleotide exchange types 657

(least (one) for A↔U, most (6) for G↔U) and include sequences 658

aligning with an alkyl hyperoxide reductase for G↔U exchange 659

Original text:

Inserted Text



Original text:

Inserted Text


Original text:

Inserted Text

>20%(

Original text:

Inserted Text

A<->U+C<->G

Original text:

Inserted Text

2012e),

Original text:

Inserted Text

).3.2.1Putative

Original text:

Inserted Text

favouring

Original text:

Inserted Text

analysed

Original text:

Inserted Text

1997, Altschul et al., 2005)

Original text:

Inserted Text

A<->C

Original text:

Inserted Text

G<->U

Original text:

Inserted Text

A<->C

Original text:

Inserted Text

A<->G

Original text:

Inserted Text

C<->U and A<->G

Original text:

Inserted Text

.3.2.2Functions

Original text:

Inserted Text

A<->C

Original text:

Inserted Text

A<->U

Original text:

Inserted Text

G<->U exchanges, 3 for A<->G as well as A<->U+C<->G

Original text:

Inserted Text

analysed

Original text:

Inserted Text

2012e), and one for A<->G+C<->U

Original text:

Inserted Text

P < 0.20).

Original text:

Inserted Text

A<->U, most (6) for G<->U

Original text:

Inserted Text

G<->U



BIO 3360 1–19


Table 3List of GenBank proteins aligning according to Blastp with putative peptide sequences translated from mitochondrial protein coding genes assuming nucleotide exchangingtranscription. Columns are: 1. gene identity and frame (1–3, + strand; 4–6, − strand); 2. first and last amino acids in alignment; 3. alignment length; 4. alignment similarity;5. entry of GenBank protein aligning with peptide translated from exchanging transcription; 6. description of GenBank protein; 7. number of stops in alignments; 8. type ofexchanging transcription.

Gene Loc N Si Id Origin Ter

ND1 1 48–180 139 45 AEQ35744 ND1 Pan troglodytes 5 G-TND1 3 5–98 104 41 AAB39554 Nitrate reductase Agrostemma githago 1 G-TND1 4 49–282 246 35 EFC39457 Hyp. Naegleria gruberi 4 G-TND1 6 188–290 104 49 EGH13632 Exonuclease Pseudomonas syringa 0 G-TND2 1 3–341 353 41 AAL48391 ND2 Homo sapiens 29 G-TND2 5 297–340 47 57 EFZ00925 Hyp. Metarhizium anisopliae 0 G-TND2 6 169–282 126 46 ABM38496 Antitermination Polaromonas naphthalenivorans 0 G-TCO1 6 150–220 73 51 EFQ64751 Hyp. Pseudomonas fluorescens 0 G-TCO1 6 307–460 154 43 XP002804370 Hyp. Macaca mulatta 3 G-TCO2 3 14–107 96 42 XP003230408 Brevican core -like Anolis carolinensis 0 G-TCO2 6 7–35 29 69 EGT47210 Hyp. Caenorhabditis brenneri 0 G-TAT8 1 2–68 67 55 ACR09286 AT8 Homo sapiens 7 G-TAT8 4 1–58 61 46 EGF97967 Hyp. Melampsora larici-populina 0 G-TAT8 4 28–68 40 58 AEE71795 Lipoprotein Propionibacterium acnes 0 G-TAT8 5 1–45 45 64 AAG44787 DC48 Homo sapiens 1 G-TAT8 5 41–59 19 84 EEV38750 Glycosyl transferase I Enterococcus casseliflavus 0 G-TAT8 6 20–51 32 59 BAL01249 Sodium/glutamate symporter Oscillibacter valericigenes 0 G-TAT8 6 7–57 51 51 CAM64590 Hyp. Mycobacterium abscessus 0 G-TAT8 6 14–59 47 55 EEY51301 Glycosyltransferase 2 Bacteroides sp. 2 1 33B 0 G-TAT6 1 11–226 227 43 ADU77956 AT6 Homo sapiens 22 G-TAT6 3 80–149 70 47 AF467769 Glycoprotein precursor Crimean-Congo hemorrhagic fever virus 3 G-TAT6 5 115–170 58 52 EGV19704 Hyp. Thiocapsa marina 0 G-TAT6 6 151–224 74 61 AAG44787 DC48 Homo sapiens 1 G-TAT6 6 9–75 68 49 EED92930 Hyp. Thalassiosira pseudonana 0 G-TAT6 6 115–172 59 53 CCB67068 Hyp. Hyphomicrobium sp. MC1 0 G-TCO3 1 2–158 170 40 ADL31200 CO3 Homo sapiens 12 G-TCO3 3 16–166 119 50 BAB93516 OK/SW-CL.16, Homo sapiens 6 G-TCO3 5 111–169 59 49 EGU76154 Hyp. Fusarium oxysporum 1 G-TCO3 5 103–141 39 64 EDP05187 Hyp. Chlamydomonas reinhardtii 0 G-TND3 4 31–95 65 43 EGW6025 Transcriptional regulator Dechlorosoma suillum 0 G-TND3 6 41–76 36 53 CAJ86300 H0124B04.17 Oryza sativa Indica 0 G-TND4l 3 39–75 37 62 EGU85298 Hyp. Fusarium oxysporum 0 G-TND4l 5 4–92 89 46 EEA93813 Alkyl hyperoxide reductase Pseudovibrio 1 G-TND4l 6 31–92 62 48 BQ76407 Diguanylate cyclase/phosphodiesterase with PAS/PAC sensor Pseudomonas putida 0 G-TND4 1 2–451 463 39 ADL31476 ND4 Homo sapiens 50 G-TND4 3 92–216 125 46 BAC5228 Hyp. Homo sapiens 6 G-TND4 3 137–265 121 44 AAG44628 DC24 Homo sapiens 3 G-TND4 5 246–368 138 42 EAT38723 Hyp. Aedes aegypti 0 G-TND5 1 2–581 614 43 ACU09622 ND5 Homo sapiens 50 G-TND5 6 188–255 71 46 ADH63862 O-Acetylhomoserine/O-acetylserine sulfhydrylase Meiothermus silvanus 1 G-TND6 6 112–172 66 45 CAB07382 Caenorhabditis elegans 0 G-TCytB 6 199–276 78 46 Y86845 Serine esterase family Metarhizium acridum 0 G-TCytB 6 229–293 67 31 ACA19730 Transcriptional regulator Methylobacterium 1 G-TND1 1 21–229 218 42 ACT75317 ND1 Phaeoceros laevis 4 C-TND1 1 229–305 80 46 EEB33262 Hyp. Desulfovibrio piger 1 C-TND2 6 51–93 45 56 ADQ43189 Oligopeptide transporter Eutrema parvulum 0 C-TND2 6 212–319 108 47 AFB2830 Hyp. Rickettsia rickettsii 6 C-TCO1 1 11–395 398 32 ACT75318 CO1 Phaeoceros laevis 4 C-TCO1 6 157–326 171 38 CAZ61577 CO1 Sciadicleithrum variabilum 8 C-TCO1 6 316–504 190 41 AEI55877 CO1 Penicillium polonicum 13 C-TCO2 1 19–124 109 39 ABG40996 Redoxin Pseudoalteromonas atlantica 1 C-TCO2 1 66–149 90 48 EFH6873 Possible ribosomal prot. Clostridium difficile 1 C-TCO3 1 57–250 194 39 ACS71775 CO3 Isoetes engelmannii 7 C-TND3 1 51–106 59 56 EFQ96882 Ankyrin repeat domain-containing Arthroderma gypseum 1 C-TND3 6 11–60 53 58 EFW38918 Efflux ABC transporter, permease Treponema phagedenis 0 C-TND4l 2 3–62 61 49 XP002121272 Hyp. Ciona intestinalis 0 C-TND4l 2 10–49 40 50 ADN36160 Glycosyl transferase Methanoplanus petrolearius 0 C-TND5 6 60–125 78 41 ABF33578 Oligohyaluronate lyase Streptococcus pyogenes 1 C-TND6 1 26–174 149 41 ADT82255 ND6 Hylobates muelleri 0 C-TND6 5 38–114 83 48 EDY73983 GA28377 Drosophila pseudoobscura 1 C-TCytB 1 15–292 281 41 ACI01099 Apocytochrome b Isoetes engelmannii 2 C-TND1 1 1–317 317 58 CAA66304 ND1 Pongo pygmaeus 20 A-GND1 2 161–224 69 46 EEU48753 Hyp. Nectria haematococca 1 A-GND1 2 28–117 109 42 BAJ78673 RNA polymerase II largest subunit Bemisia tabaci 2 A-GND1 3 85–197 113 41 BAE91117 Macaca fascicularis 2 A-GND2 1 1–347 347 57 AEL64185 ND2 Homo sapiens 17 A-GND2 2 187–286 138 34 CAF93389 Tetraodon nigroviridis 3 A-GND2 3 45–208 182 37 ADZ45521 GREBP cGMP-response element-binding Homo sapiens 1 A-GND2 6 119–333 229 40 XP002611624 Hyp. Branchiostoma floridae 13 A-GCO1 1 7–505 499 54 CAC37979 Co1 Macaca sylvanus 23 A-GCO1 3 337–440 104 41 XP002801723 Hyp. Macaca sylvanus 2 A-GCO1 4 83–193 111 47 EAL47953 DENN domain protein Entamoeba histolytica 1 A-G

Original text:

Inserted Text



Original text:

Inserted Text


Original text:

Inserted Text

Gene

Original text:

Inserted Text

(1-3, + strand; 4-6, -strand); 2. First

Original text:

Inserted Text

Alignment length; 4. Alignment

Original text:

Inserted Text

Description

Original text:

Inserted Text

Number

Original text:

Inserted Text

Type

Original text:

Inserted Text

148-18013945

Original text:

Inserted Text

35-9810441AAB39554nitrate

Original text:

Inserted Text

449-28224635

Original text:

Inserted Text

6188-29010449EGH13632exonuclease

Original text:

Inserted Text

13-34135341

Original text:

Inserted Text

5297-3404757

Original text:

Inserted Text

6169-28212646ABM38496antitermination

Original text:

Inserted Text

6150-2207351

Original text:

Inserted Text

6307-46015443

Original text:

Inserted Text

314-1079642XP003230408brevican

Original text:

Inserted Text

67-352969

Original text:

Inserted Text

12-686755

Original text:

Inserted Text

41-586146

Original text:

Inserted Text

428-684058AEE71795lipoprotein

Original text:

Inserted Text

51-454564

Original text:

Inserted Text

541-591984EEV38750glycosyl

Original text:

Inserted Text

620-513259BAL01249sodium

Original text:

Inserted Text

67-575151

Original text:

Inserted Text

614-594755EEY51301glycosyltransferase

Original text:

Inserted Text

111-22622743

Original text:

Inserted Text

380-1497047AF467769glycoprotein

Original text:

Inserted Text

5115-1705852

Original text:

Inserted Text

6151-2247461

Original text:

Inserted Text

69-756849

Original text:

Inserted Text

6115-1725953

Original text:

Inserted Text

12-15817040

Original text:

Inserted Text

316-16611950

Original text:

Inserted Text

5111-1695949

Original text:

Inserted Text

5103-1413964

Original text:

Inserted Text

431-956543EGW6025transcriptional

Original text:

Inserted Text

641-763653

Original text:

Inserted Text

339-753762

Original text:

Inserted Text

54-928946

Original text:

Inserted Text

631-926248BQ76407diguanylate

Original text:

Inserted Text

12-45146339

Original text:

Inserted Text

392-21612546

Original text:

Inserted Text

3137-26512144

Original text:

Inserted Text

5246-36813842

Original text:

Inserted Text

12-58161443

Original text:

Inserted Text

6188-2557146ADH63862O-acetylhomoserine

Original text:

Inserted Text

6112-1726645

Original text:

Inserted Text

6199-2767846Y86845serine

Original text:

Inserted Text

6229-2936731ACA19730transcriptional

Original text:

Inserted Text

121-22921842

Original text:

Inserted Text

1229-3058046

Original text:

Inserted Text

651-934556ADQ43189oligopeptide

Original text:

Inserted Text

6212-31910847

Original text:

Inserted Text

111-39539832

Original text:

Inserted Text

6157-32617138

Original text:

Inserted Text

6316-50419041

Original text:

Inserted Text

119-12410939

Original text:

Inserted Text

166-1499048

Original text:

Inserted Text

157-25019439

Original text:

Inserted Text

151-1065956EFQ96882ankyrin

Original text:

Inserted Text

611-605358EFW38918efflux

Original text:

Inserted Text

23-626149

Original text:

Inserted Text

210-494050ADN36160glycosyl

Original text:

Inserted Text

660-1257841

Original text:

Inserted Text

126-17414941

Original text:

Inserted Text

538-1148348

Original text:

Inserted Text

115-29228141ACI01099apocytochrome

Original text:

Inserted Text

11-31731758

Original text:

Inserted Text

2161-2246946

Original text:

Inserted Text

228-11710942

Original text:

Inserted Text

385-19711341

Original text:

Inserted Text

11-34734757

Original text:

Inserted Text

2187-28613834

Original text:

Inserted Text

345-20818237

Original text:

Inserted Text

6119-33322940

Original text:

Inserted Text

17-50549954

Original text:

Inserted Text

3337-44010441

Original text:

Inserted Text

483-19311147



BIO 3360 1–19


Table 3 (Continued)


CO2 1 6–227 222 55 ABB78341 CO2 Homo sapiens 15 A-GCO2 2 32–143 116 37 ABL11420 Ammonia monooxygenase B uncultured crenarchaeote 3 A-GCO2 3 7–73 72 39 EGI53545 FAD dependent oxidoreductase Sphingomonas 1 A-GAT8 1 1–68 68 60 AEQ36342 AT8 Pan paniscus 3 A-GAT8 2 10–54 51 41 CAA04176 DNA gyrase B subunit Myxococcus xanthus 1 A-GAT8 3 11–57 48 46 ACB74250 N-acetylmuramyl-l-alanine amidase, negative regulator of AmpC, AmpD Opitutus

terrae1 A-G

AT8 3 1–42 42 50 XP003199307 KH domain-containing, RNA-binding, signal transduction-associated Danio rerio 0 A-GAT8 4 2–67 66 42 XP003500291 Interferon-activable protein 203-like Cricetulus griseus 2 A-GAT8 6 12–56 45 56 EEQ30817 Hyp. Arthroderma otae 2 A-GAT6 1 4–225 222 62 ADG46521 AT6 Homo sapiens 5 A-GAT6 2 7–108 102 39 XP535203 Nucleolar GTP-binding protein 1 Canis lupus familiaris 5 A-GAT6 3 44–104 62 47 ABZ06095 Hyp. uncultured marine microorganism 0 A-GAT6 3 146–223 83 42 ACX89269 Type VI secretion system Vgr Pectobacterium wasabiae 1 A-GCO3 1 1–256 256 54 ABU64439 CO2 Homo sapiens 17 A-GCO3 3 15–148 134 48 BAB93516 OK/SW-CL.16 Homo sapiens 0 A-GCO3 5 11–86 77 47 AEQ53502 Potassium efflux system KefA protein/small-conductance mechanosensitive

channel Pelagibacterium halotolerans1 A-G

ND3 1 1–102 102 69 AEQ35815 ND3 Pan troglodytes 5 A-GND3 2 2–31 30 57 AAA33306 Regulatory protein Emericella nidulans 0 A-GND3 2 23–74 52 44 EGT79756 4-Alpha-glucanotransferase Haemophilus haemolyticus 0 A-GND3 3 67–103 37 51 GAA30488 Zinc finger protein Clonorchis sinensis 0 A-GND3 5 1–101 104 43 EAW14848 Histone acetylase complex Aspergillus clavatus 5 A-GND3 6 52–81 30 70 ACZ10040 Hyp. Sebaldella termitidis 0 A-GND4l 1 1–97 97 65 ADT82590 ND4l Nomascus siki 2 A-GND4l 2 30–67 39 59 EGC44959 Peptidyl-prolyl cis-trans isomerase Ajellomyces capsulatus 1 A-GND4l 2 2–98 100 39 EEE79199 Hyp. Populus trichocarpa 3 A-GND4l 3 1–25 58 52 EGZ30743 Hyp. Phytophthora sojae 0 A-GND4l 3 24–89 66 45 EET89248 Phosphoketolase Clostridium carboxidivorans 1 A-GND4l 4 12–97 87 43 EHA98470 Sodium/potassium/calcium exchanger 1 Heterocephalus glaber 2 A-GND4 1 1–459 459 58 ABU47843 ND4 Pan troglodytes 22 A-GND4 3 100–216 117 43 BAC85228 Homo sapiens 1 A-GND4 3 135–214 80 46 AAG44628 DC24 Homo sapiens 1 A-GND4 3 61–108 48 52 XP003388243 Mitochondrial import inner membrane translocase Tim22-like Amphimedon

queenslandica0 A-G

ND5 1 16–602 587 55 CAR95863 ND5 Homo sapiens 21 A-GND6 1 5–173 169 54 CAA77005 ND6 Papio hamadryas 15 A-GND6 6 82–129 51 61 EDP00960 Flagella associated membrane Chlamydomonas reinhardtii 0 A-GCytB 1 1–372 372 57 ABU67123 CytB Homo sapiens 14 A-GCytB 3 239–371 134 40 BAB12147 Hyp. Macaca fascicularis 1 A-GND1 2 21–319 320 40 EDL19272 Zonadhesin Mus musculus 7 A-TCO1 1 23–228 210 44 AAY22220 CO1 Macaca nemestrina 5 A-TCO2 1 68–129 62 48 ACK71749 GCN5-related N-acetyltransferase Cyanothece 2 A-TCO2 2 67–131 65 55 EEE25729 Hyp. Toxoplasma gondii 0 A-TCO2 3 3–76 75 51 EDQ71631 Hyp. Physcomitrella patens 0 A-TAT8 1 14–69 59 51 EAY11801 Hyp. Trichomonas vaginalis 1 A-TCO3 2 77–139 83 46 ADE84631 Arginine exporter Rhodobacter capsulatus 0 A-TND3 2 20–85 66 47 NP001186517 Adenosine monophosphate deaminase Gallus gallus 0 A-TND3 2 4–73 72 46 EFQ33388 Hyp. Glomerella graminicola 0 A-TND3 5 42–78 40 60 EDS34822 Hyp. Culex quinquefasciatus 0 A-TND4l 5 22–44 24 54 EFM59912 Efflux transporter Brucella 1 A-TND4 3 53–201 150 42 XP001503289 Synaptopodin-2 Equus caballus 5 A-TCytB 1 79–151 91 49 EAL66426 Alpha adducin Dictyostelium discoideum 3 A-TCytB 1 8–141 136 38 XP002733786 Hyp. Saccoglossus kowalevskii 2 A-TND1 3 191–266 76 55 ACU14018 Hyp. Glycine max 1 C-GND1 5 203–266 64 58 EFH42847 Photosystem I reaction center subunit psi-N Arabidopsis lyrata 1 C-GND1 5 49–126 78 42 CAN60489 Hyp. Vitis vinifera 1 C-GCO1 1 237–395 159 47 AAX37529 CO1 Dolichopoda euxina 12 C-GAT8 5 18–67 50 50 CCD46646 Hyp. Botryotinia fuckeliana 1 C-GAT6 5 120–171 52 54 ZP00052904 Glutamate decarboxylase Magnetospirillum magnetotacticum 0 C-GCO3 5 136–211 76 41 ZP04750338 Short chain dehydrogenase Mycobacterium kansasii 1 C-GCO3 5 146–228 84 40 ABL93860 Short-chain dehydrogenase/reductase SDR Mycobacterium 1 C-GND3 6 13–71 62 55 EGC31827 Hyp. Dictyostelium purpureum 0 C-GND3 6 2–46 48 58 EHB94143 NLP/P60 Pseudoxanthomonas spadix 0 C-GND4l 1 3–97 95 58 ADZ37133 ND4l Rhinopithecus avunculus 6 C-GND4l 4 39–74 37 59 EED89742 Hyp. Thalassiosira pseudonana 1 C-GND4 3 103–205 103 56 BAC85228 Homo sapiens 0 C-GND4 6 81–159 79 43 BAD92431 UDP-Gal:betaGlcNAc beta 1,4-galactosyltransferase 6 variant Homo sapiens 0 C-GND5 1 101–526 426 48 ABB97838 ND5 Homo sapiens 34 C-GND6 1 1–174 174 55 ADO19968 ND6 Homo sapiens 2 C-GND6 3 9–105 101 46 EFX75312 Hyp. Daphnia pulex 3 C-GCO1 1 15–477 463 45 ACY39526 CO1 Tetropium fuscum 0 A-CCO2 1 100–227 128 39 ACM71926 CO2 Homo sapiens 0 A-CCO2 1 2–58 57 51 CCA59686 Hyp. Streptomyces venezuelae 0 A-CCO2 1 143–175 35 66 ZP10139372 Polyketide synthase Fluoribacter dumoffii 0 A-CCO2 3 28–64 37 65 EGT86488 MmpL4 7 Mycobacterium colombiense 1 A-C

Original text:

Inserted Text



Original text:

Inserted Text


Original text:

Inserted Text

16-22722255

Original text:

Inserted Text

232-14311637ABL11420ammonia

Original text:

Inserted Text

37-737239

Original text:

Inserted Text

11-686860

Original text:

Inserted Text

210-545141

Original text:

Inserted Text

311-574846

Original text:

Inserted Text

-L-

Original text:

Inserted Text

31-424250

Original text:

Inserted Text

42-676642XP003500291interferon

Original text:

Inserted Text

612-564556

Original text:

Inserted Text

14-22522262

Original text:

Inserted Text

27-10810239XP535203nucleolar

Original text:

Inserted Text

344-1046247

Original text:

Inserted Text

3146-2238342ACX89269type

Original text:

Inserted Text

11-25625654

Original text:

Inserted Text

315-14813448

Original text:

Inserted Text

511-867747AEQ53502potassium

Original text:

Inserted Text

11-10210269

Original text:

Inserted Text

22-313057AAA33306regulatory

Original text:

Inserted Text

223-745244EGT797564-alpha

Original text:

Inserted Text

367-1033751GAA30488zinc

Original text:

Inserted Text

51-10110443EAW14848histone

Original text:

Inserted Text

652-813070

Original text:

Inserted Text

11-979765

Original text:

Inserted Text

230-673959EGC44959peptidyl

Original text:

Inserted Text

22-9810039

Original text:

Inserted Text

31-255852

Original text:

Inserted Text

324-896645

Original text:

Inserted Text

412-978743

Original text:

Inserted Text

11-45945958

Original text:

Inserted Text

3100-21611743

Original text:

Inserted Text

3135-2148046

Original text:

Inserted Text

361-1084852XP003388243mitochondrial

Original text:

Inserted Text

116-60258755

Original text:

Inserted Text

15-17316954

Original text:

Inserted Text

682-1295161EDP00960flagella

Original text:

Inserted Text

11-37237257

Original text:

Inserted Text

3239-37113440

Original text:

Inserted Text

221-31932040

Original text:

Inserted Text

123-22821044

Original text:

Inserted Text

168-1296248

Original text:

Inserted Text

267-1316555

Original text:

Inserted Text

33-767551

Original text:

Inserted Text

114-695951

Original text:

Inserted Text

277-1398346ADE84631arginine

Original text:

Inserted Text

220-856647NP001186517adenosine

Original text:

Inserted Text

24-737246

Original text:

Inserted Text

542-784060

Original text:

Inserted Text

522-442454EFM59912efflux

Original text:

Inserted Text

353-20115042XP001503289synaptopodin

Original text:

Inserted Text

179-1519149EAL66426alpha

Original text:

Inserted Text

18-14113638

Original text:

Inserted Text

3191-2667655

Original text:

Inserted Text

5203-2666458EFH42847photosystem

Original text:

Inserted Text

549-1267842

Original text:

Inserted Text

1237-39515947

Original text:

Inserted Text

518-675050

Original text:

Inserted Text

5120-1715254

Original text:

Inserted Text

5136-2117641ZP04750338short

Original text:

Inserted Text

5146-2288440ABL93860short

Original text:

Inserted Text

613-716255

Original text:

Inserted Text

62-464858

Original text:

Inserted Text

13-979558

Original text:

Inserted Text

439-743759

Original text:

Inserted Text

3103-20510356

Original text:

Inserted Text

681-1597943

Original text:

Inserted Text

1,4- galactosyltransferase

Original text:

Inserted Text

1101-52642648

Original text:

Inserted Text

11-17417455

Original text:

Inserted Text

39-10510146

Original text:

Inserted Text

115-47746345

Original text:

Inserted Text

1100-22712839

Original text:

Inserted Text

12-585751

Original text:

Inserted Text

1143-1753566ZP10139372polyketide

Original text:

Inserted Text

328-643765



BIO 3360 1–19


Table 3 (Continued)


AT8 1 29–58 32 63 EAU75829 AGAP012039-PA Anopheles gambiae 0 A-CAT8 3 8–58 51 61 EAA32833 Nitrate reductase Neurospora crassa 1 A-CCO3 1 10–260 253 43 ABU64439 CO3 Homo sapiens 9 A-CND3 3 66–101 37 59 CCB89733 Hyp. Simkania negevensis 0 A-CND4l 3 33–95 63 56 CAG82479 YALI0C22946p Yarrowia lipolytica 1 A-CND5 1 19–468 455 39 ADH8404 ND5 Sinogastromyzon sichangensis 16 A-CND6 1 1–170 170 61 AFF86056 ND6 Homo sapiens 1 A-CND6 2 91–172 83 46 EDX74154 Hyp. Microcoleus chthonoplastes 1 A-CND6 2 41–128 88 45 CAG99131 KLLA0E02135p Kluyveromyces lactis 0 A-CCytB 1 30–351 324 43 CAP58774 CytB Cobitis sinensis 9 A-CND1 4 165–256 110 44 XP002807608 Adenylate cyclase type 3 Callithrix jacchus 3 ACGTND1 5 54–99 49 61 CBJ30960 Ectocarpus siliculosus 0CO2 6 90–139 51 59 ZP08847098 Pectate lyase Anaerophaga thermohalophila 0AT8 2 18–69 54 56 BAD93095 TEA domain family member 1Homo sapiens 1

25–69 51 51 XP003582797 Leucine-rich repeat transmembrane Bos taurus 1AT8 3 20–56 37 57 AAC49419 Repellent protein Ustilago maydis 2AT8 4 19–49 31 55 AAO43936 Ca2+ homeostasis Arabidopsis thaliana 1AT8 6 15–54 40 58 ACB76837 Multi-sensor hybrid histidine kinase Opitutus terrae 0ND3 4 23–70 48 56 ZP06460666 Major facilitator transporter Pseudomonas syringae 0ND4l 5 5–46 42 64 CBK22878 ABC transporter type 1 Blastocystis hominis 1ND4 5 104–186 85 52 EAQ38580 Hyp. Dokdonia donghaensis 1ND6 1 88–120 33 48 AES66793 Cysteine-rich receptor-like kinase Medicago truncatula 0CytB 5 139–199 61 61 CAI76264 Proton translocating inorganic pyrophosphatase Theileria annulata 1CytB 6 260–319 60 47 EGI65165 Purity of essence Acromyrmex echinatior 0ND1 3 86–149 74 41 ACR37944 Zea mays 1 AGCTND2 2 264–344 88 51 EAZ06402 Hyp. Oryza sativa 0CO1 1 173–218 46 57 GAB46836 Sodium/sulphate symporter Gordonia terrae 0CO1 6 275–356 82 45 CBY23304 Oikopleura dioica 1CO2 2 41–99 61 54 ADE11632 Diguanylate cyclase/phosphodiesterase with PAS/PAC sensor(s) Sideroxydans

lithotrophicus0

CO2 3 14–65 58 43 EHL87055 Aspartate-ammonia ligase Tannerella 0AT8 1 4–48 46 57 EHI12376 Regulator Mycobacterium thermoresistibile 0AT8 4 1–47 50 42 EAQ88006 Hyp. Chaetomium globosum 1AT8 5 9–42 34 56 EFK66562 ABC transporter ATP-binding Streptomyces 1AT8 6 39–68 32 66 EEN61262 Hyp. Branchiostoma floridae 0AT6 2 117–151 35 60 AAH73232 MGC80562 Xenopus laevis 0AT6 3 95–178 103 38 EGO64651 DEAD 2 domain protein Acetonema longum 1

17–71 56 48 CCC14912 Sordaria macrospora 0AT6 6 60–119 115 45 EEW37665 HMP/thiamine-binding Granulicatella adiacens 3CO3 6 111–161 56 45 XP002826409 Cardiotrophin-2-like Pongo abelii 1ND3 6 4–42 39 51 ACAZ21358 Oxidoreductase Sanguibacter keddieii 0

36–63 28 64 CCG92029 Phosphoenolpyruvate carboxylase Methylacidiphilum fumariolicum 0ND4l 3 18–47 32 72 EGC17145 Transcriptional regulator Thiocapsa marina 0

47–79 33 55 EFV05470 Beta-glucosidase Prevotella salivae 1ND4l 5 15–93 83 49 EEC06142 Dihydrodipicolinate synthase Ixodes scapularis 3ND6 5 41–151 110 42 ZP09248514 Ammonium transporter Acaryochloris 0ND6 6 29–120 97 48 ACQ69531 7TM receptor with intracellular metal dependent phosphohydrolase

Exiguobacterium1

81–131 51 51 AEA47822 Phosphoesterase RecJ domain Archaeoglobus veneficus 053–133 81 48 AFI46966 Drug resistance transporter Pasteurella multocida 0

coding, a redoxin for C↔U exchange coding, a FAD dependent660

oxidoreductase for A↔G exchange coding, and a short chain dehy-661

drogenase for C↔G exchange coding. A detailed discussion of each662

case would not be constructive at this preliminary stage of explo-663

ration of nucleotide exchange coding. However, the distribution of664

functions does not seem random, especially in relation to functions665

typically associated with mitochondrial metabolism, including for666

nucleotide exchanges for which no or few transcripts were found667

in Table 2. Hence protein alignment data suggest that one cannot668

exclude the occurrence of any type of nucleotide exchange, though669

some seem more frequent than others.670

All regular mitochondrial main frame-encoded proteins are671

membrane proteins, and these are also frequent among the672

alignment data in Table 3 (25 cases). These include numerous trans-673

porters and symporters, and for example the mitochondrial import674

inner membrane translocase Tim22-like for the A↔G nucleotide675

exchange. Here again, the data at hand suggest protein func-676

tions that seem non-random in relation to known mitochondrial677

functions in the cell’s metabolism. Note that for the A↔C+G↔U678

exchange, a type of nucleotide exchange for which no tran-679

script was detected, alignments with membrane proteins were680

most numerous (7), while no alignments with proteins inter- 681

acting with DNA or RNA were found, and only few (2) with 682

physiological functions associated with mitochondrial metabolism. 683

This would suggest that this type of RNA recoding by nucleotide 684

exchange would specifically produce membrane bound proteins 685

in apparently particularly rare conditions inducing that type of 686

transcriptional nucleotide exchange. This confirms that at this pre- 687

liminary stage, no type of nucleotide exchange should be excluded, 688

even if RNA alignment data in Table 2 are non-existent for that 689

type of nucleotide exchange. It is indeed plausible that each type of 690

nucleotide exchange is induced by specific, perhaps stress- and/or 691

ontogeny-associated conditions, and some might be rarer than 692

others. Further bioinformatics analyses of the putative nucleotide 693

exchange overlap coding sequences yield clues in this respect. 694

3.1.7. Origins of proteins in Table 3 695

The distribution of the proteins in Table 3 along broad system- 696

atic groups is informative to some extent. Table 3 includes only 697

one alignment with proteins from viral, and one from archean 698

origins (0.6% each). Most common were bacterial origins (41.5%), 699

followed by metazoan (25.7%), fungal (17.5%) and ‘vegetal’ (from 700

Original text:

Inserted Text



Original text:

Inserted Text


Original text:

Inserted Text

129-583263

Original text:

Inserted Text

38-585161EAA32833nitrate

Original text:

Inserted Text

110-26025343

Original text:

Inserted Text

366-1013759

Original text:

Inserted Text

333-956356

Original text:

Inserted Text

119-46845539

Original text:

Inserted Text

11-17017061

Original text:

Inserted Text

291-1728346

Original text:

Inserted Text

241-1288845

Original text:

Inserted Text

130-35132443

Original text:

Inserted Text

4165-25611044XP002807608adenylate

Original text:

Inserted Text

554-994961

Original text:

Inserted Text

690-1395159ZP08847098pectate

Original text:

Inserted Text

218-695456

Original text:

Inserted Text

125-695151XP003582797leucine

Original text:

Inserted Text

320-563757AAC49419repellent

Original text:

Inserted Text

419-493155

Original text:

Inserted Text

615-544058ACB76837multi

Original text:

Inserted Text

423-704856ZP06460666major

Original text:

Inserted Text

55-464264

Original text:

Inserted Text

5104-1868552

Original text:

Inserted Text

188-1203348

Original text:

Inserted Text

5139-1996161CAI76264proton

Original text:

Inserted Text

6260-3196047

Original text:

Inserted Text

386-1497441

Original text:

Inserted Text

2264-3448851

Original text:

Inserted Text

1173-2184657GAB46836sodium

Original text:

Inserted Text

6275-3568245

Original text:

Inserted Text

241-996154ADE11632diguanylate

Original text:

Inserted Text

314-655843EHL87055aspartate

Original text:

Inserted Text

14-484657EHI12376regulator

Original text:

Inserted Text

41-475042

Original text:

Inserted Text

59-423456

Original text:

Inserted Text

639-683266

Original text:

Inserted Text

2117-1513560

Original text:

Inserted Text

395-17810338

Original text:

Inserted Text

117-715648

Original text:

Inserted Text

660-11911545

Original text:

Inserted Text

6111-1615645XP002826409cardiotrophin

Original text:

Inserted Text

64-423951

Original text:

Inserted Text

036-632864

Original text:

Inserted Text

318-473272EGC17145transcriptional

Original text:

Inserted Text

047-793355EFV05470beta

Original text:

Inserted Text

515-938349EEC06142dihydrodipicolinate

Original text:

Inserted Text

541-15111042ZP09248514ammonium

Original text:

Inserted Text

629-1209748

Original text:

Inserted Text

181-1315151AEA47822phosphoesterase

Original text:

Inserted Text

053-1338148AFI46966drug

Original text:

Inserted Text

C<->U

Original text:

Inserted Text

A<->G

Original text:

Inserted Text

C<->G

Original text:

Inserted Text

A<->G

Original text:

Inserted Text

A<->C+G<->U

Original text:

Inserted Text

.3.2.3Origins

Original text:

Inserted Text

archaean



BIO 3360 1–19


Fig. 3. Relative proportion of proteins from broad systematic evolutionary classesof organisms and viruses in Table 3 as a function of their proportion in GenBank’sprotein database. Proteins from eukaryotic origins (especially metazoan) are over-represented in Table 3. The line indicates x = y.

Viridiplantae) origins (11.11%). Fig. 3 compares these with the701

relative representation of proteins from these respective origins702

in GenBank’s database. This clearly shows that eukaryotic ori-703

gins (organisms that contain mitochondria) are overrepresented704

in Table 3, especially metazoan origins. This general ranking705

and pattern varied little between different types of nucleotide706

exchange codings, though for A↔G+A↔U, metazoan origins were707

more common than bacterial origins, and for G↔U, proteins from708

Viridiplantae were more frequent than from Metazoa. However,709

these variations might be stochastic and are small in relation to710

the overall pattern found when pooling all data from Table 3 and711

presented in Fig. 3.712

Considering that the sequences analyzed are of metazoan ori-713

gins (Homo sapiens), this suggests that overlap coding through714

nucleotide exchange transcription is usually not due to horizon-715

tal transfers (including viruses), but probably evolves gradually716

within phylogenetic groups, and occasionally, the genome will717

include a gene coding for that protein in the regular (non-exchange)718

form. This might result from occasional reverse transcription719

of nucleotide exchange transcripts and their integration in the720

nuclear genome of organisms that possess mitochondria. This721

phenomenon would be compatible with the positive correlation722

between detected protein alignment lengths and exchange RNA723

transcripts in Fig. 2. If this is the mechanism subjacent to the inte-724

gration of a gene coding without nucleotide exchange for a protein725

usually coded by mitochondrial nucleotide exchange transcription,726

the fact that viruses are underrepresented in Table 3 suggests that727

proteins coded by nucleotide exchange are probably proteins with728

some adaptive physiological function. This confirms the fact that729

Table 3 includes numerous proteins that seem adequate for mito-730

chondrial metabolism.731

Along that rationale, the organism indicated in Table 2 in732

which the protein is coded directly by DNA, without nucleotide733

exchange, and whose protein aligns with the human mitochon-734

drial sequence translated after nucleotide exchange, could be an735

organism where the physiological function of the protein coded by736

nucleotide exchange became more frequently required, justifying737

to include a gene that explicitly (=without nucleotide exchange)738

codes for that protein. The methods used here would not detect739

any nucleotide exchange-encoded protein coding gene if this did740

Fig. 4. Mean number of stops per putative protein coding region as detected byBlastp for nucleotide exchange recoded RNAs of human mitochondrial protein cod-ing genes (from Table 3) as a function of the number of exchanging transcriptsaccording to that exchange rule (from Table 2). Relatively common nucleotideexchange transcripts tend to include more stops, indicating that protein expres-sion is limited by the fact that transcription frequency is counterbalanced by thepresence of stop codons necessitating translational activity by suppressor tRNAs.

not occur occasionally. It is probable that not all actual nucleotide 741

exchange encoded genes have been detected by this method, 742

because direct integration of nucleotide exchange coding contents 743

into the genome might not have yet occurred for all nucleotide 744

exchange-encrypted genes, and because GenBank may not include 745

sequences of organisms where this has occurred. Hence it is very 746

likely that Table 3 underestimates numbers of nucleotide exchange 747

encoded genes. 748

The difference between pro- and eukaryotic origins could have 749

an alternative explanation, that exchange transcription and coding 750

is rarer in prokaryotes. Though this possibility exists, it is not very 751

likely, especially that the genome analyzed here is mitochondrial, 752

which probably reflects its prokaryotic ancestor. The evolution- 753

ary scenario for overlapping genes is a more probable explanation 754

for the overrepresentation of proteins from eukaryotic origins in 755

Table 3. 756

3.1.8. Stop codons in putative overlap coding genes 757

Table 3 indicates stop codon numbers within putative overlap 758

coding gene sequences. Considering the density of stops within 759

these sequences across genes, but for each type of nucleotide 760

exchange, stops are much less frequent within putative overlap 761

coding regions than within the rest of the genes after nucleotide 762

exchange. This ranges from 2.26 times less frequent for puta- 763

tive proteins coded by C↔U exchange transcription, to 8.71 times 764

less frequent for those coded by A↔C exchange transcription. 765

The former is the nucleotide exchange type represented by the 766

least, the latter by the most frequent RNA transcript data in 767

Table 2. Hence it seems that stops within putative overlap cod- 768

ing sequences according to exchange transcription modulate the 769

expression of overlapping genes associated with each exchange 770

transcription type, by constraining this expression to conditions 771

where suppressor tRNA activity occurs. Indeed, Fig. 4 shows 772

that stop codon numbers per overlapping gene (from Table 3) 773

increases with numbers of exchange transcripts observed for 774

that type of exchange transcription (from Table 2): r = 0.747, 775

P = 0.0104; rs = 0.854, P = 0.0078, one tailed tests. This suggests that 776

Original text:

Inserted Text



Original text:

Inserted Text


Original text:

Inserted Text

A<->G+A<->U

Original text:

Inserted Text

G<->U

Original text:

Inserted Text

analysed

Original text:

Inserted Text

analysed

Original text:

Inserted Text

3.3.2.4Stop

Original text:

Inserted Text

C<->U

Original text:

Inserted Text

A<->C

HE

Sticky Note

please replace "former" by "latter"

HE

Sticky Note

please replace "latter" by "former"



BIO 3360 1–19


the expression of proteins coded by nucleotide exchanges is finely777

tuned by two regulatory mechanisms, one positive, the frequency of778

specific nucleotide exchange transcription and the associated fre-779

quency of transcripts it produces (from Fig. 2), and one negative, the780

frequency of stops making the expression of the proteins depend-781

ent on the joint activities of nucleotide exchange transcription and782

suppressor tRNAs, both probably rare events.783

The close match between transcript frequencies and mean num-784

bers of stops per putative protein coding gene (Fig. 4) suggests that785

the system is finely tuned so that the two regulatory forces, one786

positive, one negative, balance each other: translation of trans-787

cripts produced by rare types of nucleotide exchange transcription788

is relatively unhindered by stop signals, while relatively frequent789

exchange transcription is constrained by numerous stops. This sug-790

gests that expression levels of proteins associated with the various791

types of nucleotide exchanges is regulated so as to be relatively792

equal, despite differences in frequencies at which the different793

types of nucleotide exchange transcriptions apparently occur. This794

highly structured pattern between a positive and a negative reg-795

ulatory mechanism is a strong indication that analyses reveal real796

biological coding phenomena with important, even if probably rare,797

physiological functions. Hence it seems that the expression of genes798

associated with frequent types of exchange transcription is condi-799

tioned by a further condition, that of suppressor tRNA activity. The800

pattern in Fig. 4 is so strong that it suggests an adaptive component801

to this, where suppressor tRNA activity downregulates these types802

of nucleotide exchanges, especially if transcripts are frequent. This803

result is a further indication that such transcription and expression804

is a physiological reality in very specific and unknown conditions.805

3.1.9. Deamination along replicational gradients in genomic806

single strandedness and nucleotide exchange overlapping genes in807

human mitochondrial protein coding genes808

Single stranded DNA is very mutable, as compared to duplex809

DNA. This situation occurs when DNA is replicated, and when RNA810

is transcribed. Mitochondrial DNA replication is unidirectional,811

involving a heavy strand and a light strand replication origin (OH812

and OL). Distances of sites in relation to each OH and OL deter-813

mine the duration sites remain single stranded during replication814

(see for example Krishnan et al., 2004a,b; Seligmann et al., 2006;815

Seligmann, 2008, 2011b).816

In the single stranded state, hydrolytic deaminations A→G and817

C→T are most frequent (note that in this context, A→G and C→T818

are spontaneous mutations occurring at the DNA level, these are819

not systematic nucleotide exchanges during RNA transcription).820

Replicational single strandedness creates gradients in mitochon-821

drial genome nucleotide contents that reflect these spontaneous822

mutations (partial review in Seligmann, 2012a). Their effect on823

nucleotide contents is counterbalanced by functional constraints824

when the nucleotide has crucial coding functions at the protein825

level, and hence nucleotide contents at second codon positions826

barely reflect the mutational single stranded gradients. However,827

these are detectable at third codon positions, the situation is inter-828

mediate for first codon positions (Seligmann et al., 2006).829

Analysing nucleotide contents at third codon positions of the830

regular human main frame protein coding genes in relation to over-831

lapping genes predicted by Blastp analyses assuming suppressor832

tRNA activity confirmed that these regions are involved in over-833

lap coding, as they fit less well deamination gradients than the834

adjacent regions that are not involved in overlap protein cod-835

ing (Seligmann, 2012a). This method also confirmed overlapping836

genes coded by codons of four nucleotides (tetragenes coded by837

tetracodons, Seligmann, 2001) and protein coding genes codedQ5838

in the 3′-to-5′ direction of mitochondrial sequences (Seligmann,839

2012). This method is used here to confirm the existence of the840

putative overlapping genes associated with exchange transcription841

Fig. 5. A/(A+G) nucleotide ratios at 3d codon positions (light strand DNA) in humanmitochondrial protein coding genes as a function of the time spent single strandedduring replication by that gene. The base ratios reflect the C→T deamination thatoccurs on heavy strand DNA during replication, until the complementary lagging(light) strand is polymerized. Filled datapoints are for predicted overlap codingregions after A↔C nucleotide exchange transcription, as presented in Table 2. Hol-low datapoints are for the rest of the gene, not predicted involved in overlap codingafter A↔C exchange transcription. Both datasets fit well the same predicted deami-nation gradient, which suggests the putative overlap coding genes are not functional.Functionality would imply that overlap coding regions cannot mutate according tothe replicational gradients, in order to preserve coding properties, and should hencenot fit the gradient observed for other genome regions. This lack of difference is com-patible with the fact that no RNA transcripts fitting human mitochondrial genes havebeen found in GenBank for nucleotide exchange rule A↔C (Table 2), and that A↔Cexchange transcription is predicted to include few overlap coding genes accordingto Table 3.

and predicted by Blastp (Table 3), for each of the nine symmet- 842

ric nucleotide exchange types. Only analyses for two nucleotide 843

exchange types are presented here, though such detailed analyses 844

were done for each of the nine types of nucleotide exchange. 845

The test of the replicational deamination gradient expects that 846

if a region functions as an overlapping gene, it does not fit well 847

the deamination gradient. However, if candidate overlapping genes 848

are not expected to be frequently expressed and hence are not 849

expected to be functional, these sequences should fit well within 850

the replicational deamination gradient observed for other regions, 851

not expected to function as overlapping genes, and involved only in 852

regular main frame coding. The overlapping genes coded by A↔C 853

nucleotide exchange transcription are expected to be the least func- 854

tional ones according to both criteria available at this point: no 855

RNA transcripts were detected fitting this type of exchange tran- 856

scription, and Blastp analyses of hypothetical protein sequences 857

translated from these exchange transcripts yield the lowest number 858

of alignments with proteins existing in GenBank. No RNA trans- 859

cripts fitting predictions from nucleotide exchange transcription 860

types along the rules A↔C+G↔U, and A↔G+C↔U were detected, 861

but RNAs transcribed along these exchange transcription rules 862

apparently code for numerous proteins, according to Table 3. Hence 863

their situation is much less clear, they might be more functional 864

then indicated by transcript numbers in Table 2. 865

Fig. 5 plots the nucleotide contents A/(A+G) ratio at the third 866

codon position (according to regular main frame codons of the reg- 867

ular protein coding gene) for human mitochondrial protein coding 868

Original text:

Inserted Text



Original text:

Inserted Text


Original text:

Inserted Text

.3.2.5Deamination

Original text:

Inserted Text

2008; Seligmann, 2011

Original text:

Inserted Text

singlestranded

Original text:

Inserted Text

A->G and C->T

Original text:

Inserted Text

A->G and C->T

Original text:

Inserted Text

singlestrandedness

Original text:

Inserted Text

singlestranded

Original text:

Inserted Text

3’-to-5’

Original text:

Inserted Text

/A+G

Original text:

Inserted Text

C->T

Original text:

Inserted Text

A<->C

Original text:

Inserted Text

A<->C

Original text:

Inserted Text

A<->C

Original text:

Inserted Text

A<->C

Original text:

Inserted Text

Blastp(Table

Original text:

Inserted Text

A<->C

Original text:

Inserted Text

A<->C+G<->U, and A<->G+C<->U

HE

Sticky Note

the reference should be "Seligmann, 2012e"

HE

Sticky Note

this reference should be "Seligmann, 2012d"



BIO 3360 1–19


Fig. 6. A/(A+G) nucleotide ratios at 3d codon positions (light strand DNA) in humanmitochondrial protein coding genes as a function of the time spent single strandedduring replication by that gene for overlapping genes coded by G↔U exchangetranscription. Filled datapoints are for predicted overlap coding regions after G↔Unucleotide exchange transcription, as presented in Table 2. Hollow datapoints arefor the rest of the gene, not predicted involved in overlap coding after G↔U exchangetranscription. The latter fit well the predicted deamination gradient, but the formermuch less, as expected if overlap coding genes were functional. This fits with the factthat Tables 1 and 2 include numerous G↔U exchange transcripts and predicted over-lapping genes, respectively. This pattern contrasts with the one observed in Fig. 5,which jointly confirms the test’s result reflect overlapping gene functionalities.

genes as a function of the duration spent single stranded during869

replication by that gene, separately for putative overlapping genes870

(filled symbols) and for the rest of the genes (open symbols) for871

A↔C nucleotide exchange transcripts. Note that the nucleotide872

ratios are for the regular human mitochondrial DNA contents of the873

light strand, that encodes most of the regular human mitochondrial874

protein coding genes, and not after A↔C exchange.875

The replicational gradient in light strand A/(A+G) reflects the876

increase in T after C→T (deamination) mutations in single stranded877

heavy strand DNA (Krishnan et al., 2004a,b). The correlation for878

putative overlapping genes is r = 0.50. This Pearson correlation coef-879

ficient is stronger than that for the rest of the genome (r = 0.43). This880

is the opposite of what would be expected for overlap coding genes,881

these data should not fit a gradient. In addition, if one excludes from882

analyses the outlier datapoint for the non-overlap coding region of883

the gene AT8, which is based on very few codons, because most of884

that gene is predicted to be involved in overlap coding, the regres-885

sion lines for both putative overlap and non-overlap coding regions886

become almost identical (dashed line in Fig. 5). This means that the887

deamination gradient test does not confirm the predicted overlap888

coding status for A↔C exchange transcripts. These regions behave889

exactly along the predictions of the deamination gradients, as do890

the other regions of the same genes. In other terms, deamination891

mutations occur according to the same rules in these regions as in892

regions not expected to be involved in overlap coding after A↔C893

exchanges.894

At the other extreme, G↔U exchange transcribed genes are895

predicted to include the largest number of overlapping protein cod-896

ing genes, and several RNA transcripts were detected in GenBank897

matching G↔U exchange transcription of the human mitochon-898

drial genome. Fig. 6 presents the replicational deamination gradient899

analysis of the same human mitochondrial protein coding genes900

as in Fig. 5, but separating putative overlapping genes from the901

rest of the gene for overlap coding predicted for G↔U (and not902

A↔C) transcribed genes. The gradient is clear for regions not pre- 903

dicted involved in overlap coding (r = 0.55, one tailed P = 0.0398), 904

but weaker for predicted overlap coding regions (r = 0.43, one tailed 905

P = 0.0548). This situation fits what is predicted if overlap coding 906

genes are functional. 907

Similar analyses for the other nucleotide exchange types yield 908

qualitatively similar results (deamination gradient weaker for pre- 909

dicted overlap coding regions than for other regions) in all the 910

remaining six types of symmetric nucleotide exchanges. Hence 911

qualitatively, only for A↔C nucleotide exchanges, gradient anal- 912

yses do not fit predictions that overlapping genes are functional 913

(analysis presented in Fig. 5). This functions as a kind of negative 914

positive control (negative because overlapping and other regions 915

behave similarly, positive because the null hypothesis expects the 916

detection of a deamination gradient). According to a one tailed 917

sign test, the probability of getting by chance the qualitative result 918

expected if predicted overlap coding genes are functional (that the 919

gradient should be weaker for predicted overlapping regions than 920

other regions) 8 among 9 times has P = 0.0098 according to a one 921

tailed sign test. Hence the replication gradient analyses apparently 922

confirm that overlap coding according to nucleotide exchange tran- 923

scription predicted by Blastp analyses (Table 3) is generally not an 924

artifact. 925

The only type of nucleotide exchange transcription for which 926

the qualitative result of comparisons between deamination gra- 927

dients observed for predicted overlap coding regions and other 928

regions do not confirm the functionality of the overlapping genes 929

is for the nucleotide exchange type that according to other anal- 930

yses (in Tables 1 and 2) is the least likely to occur. This also 931

strengthens the working hypothesis, as well as the adequacy of 932

deamination gradient analyses as a test for functionality of pre- 933

dicted mitochondrion-encoded overlapping genes. 934

One obtains qualitatively similar results when performing these 935

gradient analyses for C/(C+T) nucleotide contents at third codon 936

positions. In that case, the replicational gradient is stronger in reg- 937

ular regions than in predicted overlap coding regions for seven 938

among nine types of nucleotide exchanges, which is also a statis- 939

tically significant majority of cases according to a one sided sign 940

test (P = 0.0449). While weaker, this result nevertheless confirms 941

overlap coding status for most predicted candidate overlap coding 942

regions and most types of nucleotide exchanges. 943

3.2. Circular code analyses confirm overlap coding 944

The possibility that nucleotide exchange transcription increases 945

the coding potential of genes could be a major discovery, but at this 946

point, evidence on proteins translated from the predicted overlap- 947

ping genes is still totally missing. For that reason, an additional 948

computational test is used to strengthen the status of the predicted 949

overlap coding genes presented in Table 3. This test uses a theoret- 950

ical background that totally differs from the deamination gradient 951

analyses presented in the previous section, is based on different 952

information and sequence properties, and hence is totally inde- 953

pendent of the deamination gradient test presented in the previous 954

section. 955

Empirical observations have shown that some codons are over- 956

represented in overlapping genes, as compared to regular genes, 957

while other codons are underrepresented (Ahmed et al., 2007, 958

2010; Ahmed and Michel, 2011). The overrepresented codons are 959

homopolymer codons, hence AAA, CCC, GGG and UUU. The under- 960

represented ones form a circular code (Arqués and Michel, 1996, Q6 961

1997; Michel, 2008; Ahmed and Michel, 2011; Gonzalez et al., 962

2011). The reasons for that remain unclear, but from an empirical 963

point of view, this enables to test whether codon usages in pre- 964

dicted overlap coding genes are indeed optimized along the lines of 965

avoiding circular code codons and preferred usage of homogenous 966

Original text:

Inserted Text



Original text:

Inserted Text


Original text:

Inserted Text

/A+G

Original text:

Inserted Text

G<->U

Original text:

Inserted Text

G<->U

Original text:

Inserted Text

G<->U

Original text:

Inserted Text

G<->U

Original text:

Inserted Text

A<->C

Original text:

Inserted Text

A<->C

Original text:

Inserted Text

C->T

Original text:

Inserted Text

A<->C

Original text:

Inserted Text

A<->C

Original text:

Inserted Text

G<->U

Original text:

Inserted Text

G<->U

Original text:

Inserted Text

G<->U (and not A<->C

Original text:

Inserted Text

A<->C

Original text:

Inserted Text

artefact

Original text:

Inserted Text

.3.3Circular

Original text:

Inserted Text

2007; Ahmed et al., 2010;

Original text:

Inserted Text

1996; 1997;

HE

Sticky Note

this should be "Tables 2 and 3"



BIO 3360 1–19


codons. They might have to do with ribosomal frame-maintenance,967

as parts of ribosomal RNA involved in interacting with the mRNA968

also form circular codes (Michel, 2012).969

Independently of the interesting theoretical underpinnings to970

the links between circular codes and overlap coding, circular codes971

can be used to test whether codon usages in predicted overlap cod-972

ing regions fit the circular code. In that context, I analyzed each of973

the three frames of each human protein coding gene, in relation974

to each set of 20 ‘circular code codons’, each set associated with975

one frame, separating predicted overlap coding regions (accord-976

ing to Table 3) from other regions, for each of the nine nucleotide977

exchange types. Homogenous codons were scored −1, and codons978

belonging to the circular code for that frame were scored 1. These979

scores were averaged and compared between putative overlap980

coding regions and other regions, expecting lower scores for over-981

lapping genes (as predicted by analyses presented in Table 2) than982

for other regions if the predicted codons are over- and underrep-983

resented within the predicted overlapping gene as compared to984

regular coding regions.985

Mean scores for the two types of regions could be compared by986

t-tests, but here I restrict analyses to the statistically robust non-987

parametric sign test. Comparing mean scores obtained for each988

putative overlap coding region with the mean score of the rest989

of that gene, I tested whether the number of times that the score990

was lower in the overlap coding region is significantly more fre-991

quent than the 50% expected if no pattern exists in the data. This992

yields three nucleotide exchange types with P < 0.05 according to993

one tailed sign tests: A↔U exchange (14 positive results among994

21, which yields according to a one tailed sign test P = 0.0473);995

A↔G+C↔U (27 among 39 positive comparisons between regu-996

lar and predicted overlap coding regions, one tailed sign test997

P = 0.0059); and A↔U+C↔G (37 positive among 45 comparisons,998

P = 0.0000014). Note that several transcripts for two of these types999

of nucleotide exchanges have been detected (A↔U, in Table 2;1000

and A↔U+C↔G, Seligmann, 2012d), and though Table 2 does not1001

include any transcript fitting A↔G+C↔U exchange transcription,1002

this type of nucleotide exchange has the most numerous overlap1003

coding genes according to analyses in Table 3.1004

One can test the working hypothesis by combining the one tailed1005

P values obtained from sign tests for all nine types of nucleotide1006

exchanges. Fisher’s method for combining P values sums −2 × ln Pi,1007

where Pi is the P value obtained for the ith test, and i ranges from1008

1 to k. This sum is a chi-square statistic with 2 × k degrees of free-1009

doms, in the present case 43.82, which with 18 degrees of freedoms1010

has P = 0.00061. Hence the null hypothesis for the combined data1011

is rejected: predicted overlap coding genes tend to avoid circu-1012

lar code codons and prefer homogenous codons, as compared to1013

regular coding regions, when considering all types of nucleotide1014

exchanges altogether. This confirms their coding status according1015

to the circular code approach.1016

3.3. Convergence between functionality predictions of overlap1017

coding genes by deamination gradient and circular code tests1018

Examination of Figs. 5 and 6 shows that nucleotide ratios at1019

third codon positions for some predicted overlap coding genes1020

fit better replicational deamination gradients than for other pre-1021

dicted overlap coding genes. The extent by which the datapoint1022

digresses from the deamination gradient might be proportional1023

to gene functionality. In this respect, predicted overlap coding1024

regions match approximately as well the gradient as other regions1025

for A↔C exchange transcription, while digressions were much1026

greater for G↔U exchange transcription, which seems to match1027

the greater abundance of G↔U RNA transcripts in Table 2. By1028

extension, this rationale could apply to different overlapping genes1029

from the same type of nucleotide exchange. Possibly, those with1030

Fig. 7. Circular code overlapping gene score versus absolute residual of third codonposition A/(A+G) from deamination gradient. The y axis is the subtraction of the cir-cular code score calculated for gene regions coding only in the regular main frame,from the score obtained for regions predicted involved in overlap coding accordingto G↔U exchange transcription (from Table 3). Presumably, the lower this scoreis, the greater the functionality of the predicted overlap coding gene. The x axis isthe absolute residual of the A/(A+G) base ratio for the same overlap coding regionsfor G↔U exchange transcription from the replicational deamination gradient pre-sented in Fig. 6. Functionality of overlap coding genes is assumed proportional to thisabsolute residual. Hence the negative association in Fig. 7 suggests that functional-ity estimates for the same putative overlapping genes, but from different methods,tend to converge.

greater absolute digression from the gradient are more functional 1031

than those matching more closely the gradient. Hence functionality 1032

might be proportional to the absolute value of the residual of the 1033

A/(A+G) ratio at third codon position for a putative overlap coding 1034

gene from the deamination gradient observed for regular regions. 1035

A similar rationale can be developed for the subtraction of the 1036

mean ‘circular code’ scores for gene regions involved only in regular 1037

coding from the ‘circular code’ score obtained for predicted overlap 1038

coding genes in that gene. One might assume that overlap coding 1039

functionality decreases the more positive the value obtained from 1040

that subtraction. 1041

According to these functionality rationales, absolute residuals 1042

(from deamination gradients) and circular code score subtractions 1043

should be negatively correlated, because according to that interpre- 1044

tation, they would estimate the same phenomenon. It is important 1045

to remind in this context that the two tests are totally independent 1046

from each other in terms of theoretical backgrounds, and ana- 1047

lyze different properties of the sequences. Hence a positive result 1048

(meaning a negative correlation in this context) is not trivial. Fig. 7 1049

plots the circular code score for predicted overlap coding regions 1050

for G↔U exchange transcription according to Table 3, as a func- 1051

tion of the absolute value of residuals for A/(A+G) ratios at third 1052

codon positions for these putative overlap coding regions from the 1053

deamination gradient analysis in Fig. 6. The presumed functionality 1054

estimates from these independent tests are indeed negatively cor- 1055

related (r = −0.6272, one tailed P = 0.0082; but note that rs = −0.36, 1056

one tailed P = 0.095), as one would expect if these estimates reflect 1057

functionality of the different predicted overlap coding genes. Hence 1058

gene-wise results for the two tests of overlapping gene function- 1059

ality might confirm each other. The fact that the more robust but 1060

less sensitive nonparametric Spearman rank correlation analysis, 1061

rs, does not confirm the result of the parametric analysis does not 1062

invalidate the principle, but at this point does not allow high con- 1063

fidence in the result. 1064

Original text:

Inserted Text



Original text:

Inserted Text


Original text:

Inserted Text

analysed

Original text:

Inserted Text

-1,

Original text:

Inserted Text

ttests

Original text:

Inserted Text

A<->U

Original text:

Inserted Text

A<->G+C<->U

Original text:

Inserted Text

A<->U+C<->G

Original text:

Inserted Text

A<->U, in Table 2; and A<->U+C<->G, Seligmann, 2012e),

Original text:

Inserted Text

A<->G+C<->U

Original text:

Inserted Text

-2xln Pi

Original text:

Inserted Text

2xk

Original text:

Inserted Text

.3.4Convergence

Original text:

Inserted Text

A<->C

Original text:

Inserted Text

G<->U

Original text:

Inserted Text

G<->U

Original text:

Inserted Text

G<->U

Original text:

Inserted Text

G<->U

Original text:

Inserted Text

analyse

Original text:

Inserted Text

G<->U

Original text:

Inserted Text

=-0.6272,

Original text:

Inserted Text

=-0.36,



BIO 3360 1–19


Analyses similar to those in Fig. 7 were done for each of the eight1065

remaining types of nucleotide exchanges, and the correlation was1066

negative (as expected) in 5 among 9. It was statistically significant1067

according to a one tailed test for Pearson correlation coefficients1068

for putative overlap coding genes due to C↔U exchange transcrip-1069

tion (r = −0.538, P = 0.044) and those due to A↔U+C↔G exchange1070

transcription (r = −0.675, P = 0.0029). Hence convergence between1071

functionality estimates from replicational mutation gradient anal-1072

yses and circular score analyses was statistically significant at1073

P < 0.05 for three among the nine types of potential transcriptional1074

nucleotide exchanges. All three are according to Table 2 for types1075

of nucleotide exchanges that are relatively frequently encountered1076

at the level of RNA transcripts. The fact that two additional analy-1077

ses yield qualitatively the same statistically significant result does1078

increase the confidence level in the result, despite inconclusive con-1079

firmation of the trend in Fig. 7 by the nonparametric rs analysis. This1080

is because the replicability of a result, by independent tests, is the1081

best insurance against false rejection of the null hypothesis.1082

3.4. A meta-analysis of exchange nucleotide transcripts and1083

coding1084

The analysis in Fig. 7 indicates that each deamination gradient1085

and circular code analyses converge for putative overlap coding1086

genes according to G↔U nucleotide exchanges. Similar levels of1087

convergence were found for two other nucleotide exchanges (C↔U,1088

and A↔U+C↔G), all three are among the nucleotide exchanges with1089

the most abundant transcript data in Table 2. It is possible that the1090

level of convergence between deamination gradient and circular1091

code analyses, as estimated by r2 as the one in Fig. 7, is inversely1092

proportional to RNA transcription. Fig. 8 plots this r2 while keep-1093

ing the sign of the correlation coefficient (the more negative, the1094

more convergence, adding the value 1 to avoid negative numbers),1095

as a function of the number of genome regions for which RNA1096

transcripts were found (Table 2). The negative trend expected is1097

detected by Pearson’s parametric correlation coefficient r = −0.611,1098

one tailed P = 0.04, but cannot be statistically confirmed at P < 0.051099

by Spearman nonparametric rank correlation rs = −0.5, one tailed1100

P = 0.078.1101

Nevertheless, the fact that nucleotide exchange types for which1102

a high level of convergence between tests for overlap coding exists1103

are also those for which RNA transcription is relatively frequent is1104

not at all trivial. It is not simply a confirmation of cryptic coding after1105

nucleotide exchange, and of nucleotide exchange transcription. It1106

shows that the independent evidence for each of these phenomena1107

tends to be coherently integrated. Hence Fig. 8 integrates all the evi-1108

dence presented here, and shows consistency between all the types1109

of analyses. Hence despite the speculative impression given by the1110

working hypothesis due to its presumed revolutionary meaning1111

in relation to accepted principles of molecular biology, the data1112

at hand are a strong confirmation that the working hypothesis is a1113

valid approach for understanding coding properties of DNA, and the1114

way these are expressed. This implies that the number of protein1115

coding genes is approximately by one order of magnitude greater1116

than believed until now in the presumably well known vertebrate1117

mitochondrial genome.1118

3.5. Human DNA gamma polymerase misinsertion1119

polymerization rates and systematic symmetric nucleotide1120

exchange polymerization1121

There is a further important piece of evidence confirming the1122

working hypothesis, in relation to the existence of nucleotide1123

exchange polymerization. Unlike the analyses of putative over-1124

lapping genes, this evidence is solely based on direct empirical1125

experimental observation, and is therefore a very strong argument1126

Fig. 8. Convergence between deamination gradient and circular code analyses as afunction of the number of genome regions that are exchange transcribed accordingto Table 2. The y axis is the Pearson correlation coefficient r (+1) between the abso-lute value of residual A/(A+G) at 3d codon positions for regions predicted to functionas overlapping genes according to a given nucleotide exchange rule (according toTable 3) and the circular code score for that putative overlapping gene, for eachof the nine types of nucleotide exchanges. The lower the value according to the yaxis, the greater the convergence between deamination and circular code analysesin confirming overlap coding for that type of nucleotide exchange. Fig. 7 shows thedata used to calculate that Pearson correlation coefficient for G↔U exchange tran-scription, analyses similar to those in Fig. 7 for nucleotide exchange G↔U were donefor each of the nine nucleotide exchange rules and Pearson correlation coefficientsfrom these analyses are used in the y axis of Fig. 8. The x axis is the number of genomeregions for which RNA was detected according to that specific nucleotide exchangerule. The trend in Fig. 8 shows that nucleotide exchange types according to whichRNA has been detected for numerous regions are also those for which analyses asthose in Fig. 7 indicate a high degree of convergence between deamination and cir-cular code analyses. This shows that convergence between two types of independentbioinformatics analyses converges with detected frequencies of RNA transcripts, an‘experimental’ confirmation (x axis) of complex computational results (y axis).

favoring the working hypothesis. It is plausible that systematic 1127

nucleotide exchanges during RNA polymerization follow in prin- 1128

ciple very similar physico-chemical and enzymatic processes as 1129

the occasional nucleotide misinsertions (corresponding to the same 1130

replacing and replaced nucleotides as in exchange transcription), 1131

as these are known for the human mitochondrial DNA gamma 1132

polymerase (Lee and Johnson, 2006) and some other polymerases 1133

(i.e., Bertram et al., 2010; Zamft et al., 2012). Hence this approach 1134

assumes that properties of misinsertions, such as their rate param- 1135

eters, should be proportional to the abundance of RNAs produced 1136

by systematic nucleotide exchanges corresponding to the replaced 1137

and replacing nucleotides by that DNA misinsertion. In short, 1138

systematic nucleotide exchanges should follow kinetic principles 1139

observed for occasional (erroneous) nucleotide exchanges (misin- 1140

sertions). 1141

Transcription is a DNA→RNA directed process, but no data 1142

on the mitochondrial RNA polymerase’s fidelity is available. 1143

Because DNA and RNA are quite similar, misinsertion rates by 1144

the mitochondrial DNA polymerase gamma were used for these 1145

analyses. This is also adequate because one cannot exclude 1146

that this enzyme is responsible for exchanging RNA polymer- 1147

ization, perhaps in combination with specific conditions and/or 1148

other proteins. The modulation of which type of systematic 1149

nucleotide exchanging RNA polymerization could be determined 1150

by such interactions with the polymerase(s) responsible for 1151

nucleotide exchanging RNA polymerization. According to a sim- 1152

plistic Michaelis–Menten approach to enzymatic reaction kinetics, 1153

Original text:

Inserted Text



Original text:

Inserted Text


Original text:

Inserted Text

C<->U

Original text:

Inserted Text

=-0.538,

Original text:

Inserted Text

A<->U+C<->G

Original text:

Inserted Text

=-0.675,

Original text:

Inserted Text

P <0.05

Original text:

Inserted Text

.3.5A

Original text:

Inserted Text

G<->U

Original text:

Inserted Text

C<->U, and A<->U+C<->G

Original text:

Inserted Text

=-0.611,

Original text:

Inserted Text

P < 0.05

Original text:

Inserted Text

=-0.5,

Original text:

Inserted Text

.3.6Human

Original text:

Inserted Text

G<->U

Original text:

Inserted Text

G<->U

Original text:

Inserted Text

favouring

Original text:

Inserted Text

. Bertram et al., 2009;

Original text:

Inserted Text

DNA->RNA

Original text:

Inserted Text

Michaelis-Mentgen

HE

Sticky Note

this should be "polymerase gamma", not "gamma polymerase", please invert order of words.

HE

Sticky Note

this should be "polymerase gamma", not "gamma polymerase", please invert order of words.

HE

Sticky Note

this should be "Mentgen", not "Menten", please add "g"



BIO 3360 1–19


Fig. 9. Mean kd for nucleotide misinsertions by the human mitochondrial DNApolymerase gamma as a function of numbers of RNA transcripts with systematicsymmetric RNA nucleotide exchanges corresponding to nucleotide misinser-tions in the DNA. Transcript abundances are from Table 2, mean kds (affinities)are from Table 2 in Lee and Johnson (2006). This shows that RNAs producedby nucleotide exchange transcription are predicted by misinsertion kinetics ofexchanged nucleotides.

reactions are parametrized according to the affinity and the maxi-1154

mal reaction rate, the first reflecting initial reaction rates when the1155

substrate is rare (the medium has few free nucleotides for insertion,1156

the enzyme is relatively frequent as compared to its substrate), the1157

second when it is saturated (the medium is rich in free nucleotides1158

to be (mis)inserted). These parameters are indicated as kd and kpol,1159

respectively, in Table 2 from Lee and Johnson (2006).1160

For each type of systematic symmetric nucleotide exchange, I1161

averaged the corresponding kds, and separately, kpols from Lee1162

and Johnson (2006). The mean polymerization kd (calculated for1163

each type of nucleotide exchange) is negatively correlated with1164

the mean length of RNA transcripts detected for the correspond-1165

ing nucleotide exchanges (r = −0.689, P = 0.02; rs = −0.6, P = 0.045,1166

one tailed tests, Fig. 9). The association with kpol is statistically1167

weaker, and positive, which is not surprising because kd and kpol1168

are inversely proportional, a well known phenomenon in kinet-1169

ics: a high enzymatic specificity for its substrate (high affinity, kd)1170

comes at the expense of its maximal rate. Hence results suggest that1171

frequent types of nucleotide exchanges correspond to nucleotide1172

misinsertions with high kpol (low kd). The observation that statis-1173

tically, correlations are strongest with kd suggests that symmetric1174

systematic nucleotide exchanges are limited by conditions where1175

nucleotides are relatively rare. Putatively, these systematic sym-1176

metric nucleotide exchanges occur when nucleotides are relatively1177

scarce, hence explaining a stronger effect of kd than kpol on their1178

elongation.1179

This result is remarkable because it means that some physico-1180

chemical and/or enzymatic principles inherent to nucleotide1181

misinsertions coherently explain the data in Table 2. This excludes1182

that artifacts created the RNAs in Table 2. The phenomena described1183

here are shown meaningful on each chemical and biological1184

grounds.1185

The mean kd also predicts levels of expressions of predicted1186

overlapping genes, as these are estimated by the difference1187

between the strength of the replicational deamination gradients1188

observed at (main frame) third codon positions for regions that are1189

predicted involved in overlap coding (after systematic nucleotide1190

exchange) versus third codon positions in other regions of the1191

Fig. 10. Difference between strengths of replicational deamination gradient inregions not involved in overlap coding and in those involved in overlap coding as afunction of mean kd for nucleotide misinsertions by the human mitochondrial DNApolymerase gamma corresponding to the nucleotide exchanges observed in the RNA.Open circles are for A→G, closed circles for C→T deamination gradients (light strandannotation, not to be confused with nucleotide exchanges during RNA transcription,A→G and C→T in this case represent spontaneous mutations by deaminations dur-ing DNA replication). The y axis is calculated, for each type of nucleotide exchange,from an analysis as that presented in Figs. 5 and 6. The x axis is identical to that inFig. 9. The result shows as for Fig. 8 that computational results from bioinformaticsanalyses converge with misinsertion kinetics of exchanged nucleotides.

same genes (see analyses in Section 2.2.4 and corresponding 1192

Figs. 5 and 6). Along that approach, the stronger the gradient for 1193

non-overlap coding regions as compared to predicted overlap cod- 1194

ing regions, the weaker the expression of the predicted overlapping 1195

genes encoded by that type of symmetric systematic nucleotide 1196

exchange. 1197

Fig. 10 plots this difference (after a z transformation of the 1198

Pearson correlation coefficients (Amzallag, 2001) that estimate the 1199

strengths of the replicational deamination gradients, the z trans- 1200

formation accounts for sample size effects (Seligmann et al., 2007)) 1201

as a function of the mean kd. The gradient analyses were done 1202

separately for two types of transitions predicted to follow the repli- 1203

cational gradient, A→G and C→T (hollow circles and filled symbols 1204

in Fig. 10, respectively). Note that in this case A→G and C→T are 1205

mutations due to deaminations that occur during DNA replica- 1206

tion, not nucleotide exchanges occurring during RNA transcription. 1207

For each A→G and C→T gradients, the replicational gradient is 1208

stronger for regions not expected involved in overlap coding than 1209

for those expected involved in overlap coding in a majority of 1210

types of symmetric nucleotide exchanges (values above ‘1’ on the 1211

y axis in Fig. 10), and this difference increases with mean kd (A→G 1212

gradient, r = 0.622, P = 0.037, rs = 0.533, P = 0.0655; C→T gradient, 1213

r = 0.722, P = 0.014, rs = 0.717, P = 0.021, one tailed tests). Hence 1214

types of nucleotide exchange polymerizations that are expected to 1215

have high rates of polymerization at low nucleotide concentrations 1216

seem to be most expressed, and therefore predicted overlapping 1217

protein coding genes are proportionally more conserved as com- 1218

pared to the replicational deamination gradient observed in other 1219

regions of the genes. 1220

The same principle is observed in relation to circular code anal- 1221

yses (Section 2.3 and y axis of Fig. 7) as estimating expression 1222

of predicted overlapping protein coding genes. The relative usage 1223

of homopolymers as opposed to circular code codons is expected 1224

greater in expressed overlap coding regions than in other regions, 1225

Original text:

Inserted Text



Original text:

Inserted Text


Original text:

Inserted Text

=-0.689, P=0.02; rs=-0.6,

Original text:

Inserted Text

artefacts

Original text:

Inserted Text

A->G

Original text:

Inserted Text

C->T

Original text:

Inserted Text

A->G and C->T

Original text:

Inserted Text

section 2.2.4. and

Original text:

Inserted Text

A->G and C->T

Original text:

Inserted Text

A->G and C->T

Original text:

Inserted Text

A->G and C->T

Original text:

Inserted Text

A->G

Original text:

Inserted Text

C->T

Original text:

Inserted Text

(section 2.3.

HE

Sticky Note

this does not refer to "Table 2" in this paper, but to table to in the paper by Lee and Johnson. Hence rewrite this as "table" (majuscules for references to Tables/Figures in the same paper, minuscules for items in other papers), and the words should not link to "Table 2" in this paper.



BIO 3360 1–19


Fig. 11. Difference between proportions of homopolymers among circular codecodons in predicted overlap coding regions and in other regions for different typesof systematic symmetric nucleotide exchanges as a function of the mean kpol ofcorresponding nucleotide misinsertions (from Table 2 in Lee and Johnson, 2006).Overlapping genes expected more expressed according to the y axis correspondto types of nucleotide misinsertions with high DNA polymerization rates (x axis).The result shows as for Figs. 8 and 9 that computational results from bioinformaticsanalyses converge with the known DNA misinsertion kinetics of the RNA exchangednucleotides.

and this difference is expected to increase with expression levels.1226

Fig. 11 shows that high mean kpols correspond to types of sys-1227

tematic nucleotide exchanges where this difference is large, and1228

vice versa (r = 0.708, P = 0.016; rs = 0.70, P = 0.024, one tailed tests).1229

Hence here, bioinformatics analyses estimating high expression1230

levels correspond to types of nucleotide exchange polymerizations1231

that are expected to have high rates of polymerization at high1232

nucleotide concentrations.1233

These results suggest that deamination gradient analyses esti-1234

mate more expression of predicted overlapping genes encrypted1235

by systematic symmetric nucleotide exchanges at low nucleotide1236

concentrations. Hence these might be associated with stressful con-1237

ditions such as low resource availability and low metabolism, as1238

suggested for other types of alternative mitochondrial gene expres-1239

sions (Seligmann, 2010c, 2011a), which putatively would favor1240

deaminations. Figs. 7 and 8 show that deamination gradient and1241

circular code analyses tend to converge in their overall patterns, yet1242

it seems that each approach fits better specific conditions. Circular1243

code analyses seem to estimate better expression of overlapping1244

genes encrypted by nucleotide exchanges at high concentrations1245

of free nucleotides.1246

4. General discussion1247

The analyses presented above confirm the hypothesis that1248

transcription that exchanges systematically nucleotides (in a sym-1249

metric manner) reveals protein coding genes that were not1250

detected until now in the human mitochondrial genome. A num-1251

ber of lines of evidence suggest this: (1) RNA transcripts fitting1252

polymerization according to several nucleotide exchange rules are1253

detected in GenBank’s EST database (Table 2); (2) Blastp analy-1254

ses of putative polypeptides translated from ‘exchange transcribed’1255

sequences yield numerous alignments with proteins existing in1256

GenBank; (3) identities of proteins aligning seem non-random in 1257

relation to mitochondrial metabolism and include numerous pro- 1258

teins interacting with DNA and RNA (putatively, future studies 1259

will find that some of the proteins responsible for exchange tran- 1260

scription are among these); (4) these putative overlapping protein 1261

coding genes include few stop codons; (5) bias against stop codons 1262

within putative overlapping protein coding genes is inversely pro- 1263

portional to transcript abundances of nucleotide exchange types, 1264

suggesting a balance between positive and negative regulations 1265

of expression of overlapping genes coded by nucleotide exchange 1266

transcription (upregulation) and stop codon presence (downreg- 1267

ulation); (6) replicational deamination gradient analyses tend to 1268

confirm the coding status of putative overlapping protein coding 1269

genes; (7) circular code analyses of codon usages in putative overlap 1270

coding regions also confirm this status; (8) results of 6 and 7 tend 1271

to converge; (9) that level of convergence is consistent with the 1272

number of genome regions that are found ‘exchange transcribed’; 1273

(10) frequencies and lengths of RNA transcripts corresponding to 1274

different types of nucleotide exchanges are explained by kinetic 1275

parameters of occasional nucleotide misinsertions by the human 1276

mitochondrial DNA polymerase gamma that reflect the assumed 1277

transcriptional nucleotide exchanges. It is particularly notable that 1278

results for each of the 10 levels are independent, yet yield a highly 1279

integrated overall picture. 1280

This confirms that the coding system is much more complex 1281

than usually believed (Mercer et al., 2011a,b), and that some types 1282

of coding/recoding events, though apparently rare or very rare, 1283

actually exist. At this point, the next major steps are similar analyses 1284

for nucleotide exchanges that are not symmetric, and to investigate 1285

whether the proteins predicted by the analyses can be found and 1286

extracted from mitochondria. It is important to note that analyses 1287

suggest that some of the overlap coding genes seem more opti- 1288

mized than others. This could have two meanings: their expression 1289

level is greater, and/or their function is more important. It is not cer- 1290

tain that nucleotide exchange types that seem more frequent are 1291

necessarily those that are most important from a functional point 1292

of view. Hence transcript abundance does not need to be perfectly 1293

correlated with optimization. It seems plausible that the impor- 1294

tance of a coding system associated with a given type of nucleotide 1295

exchange is not only reflected by the abundance of transcripts 1296

detected (Table 2). This is also reflected by the number of puta- 1297

tive protein coding genes detected (Table 3), and the extents by 1298

which overlap coding is independent of transcription. 1299

The analyses compare between different types of nucleotide 1300

exchanges. It is possible that these are not all variations of the same 1301

phenomenon. Besides the fact that six nucleotide exchange rules 1302

involve only a pair of nucleotides, and that three involve two pairs, 1303

some of these pairs exchange between nucleotides of the same 1304

type (purine to purine, or pyrimidine to pyrimidine), while oth- 1305

ers do not. This might imply mechanisms of different natures. In 1306

addition, the nucleotide exchange A↔U+C↔G could be compatible 1307

with a different type of polymerization, which does not necessarily 1308

imply nucleotide exchange, but would result in the same transcript 1309

sequence. It might result from regular 5′-to-3′ RNA polymerization 1310

where the progression follows the 3′-to-5′ direction of a sequence, a 1311

phenomenon that has not yet been observed (but note that 3′-to-5′1312

directed RNA polymerization occurs (Jackman et al., 2012), and that 1313

also in mitochondria), but for which evidence exists (Seligmann, 1314

2012d). Such RNA is also compatible with RNA that forms DNA- 1315

RNA triplexes according to antiparallel Hoogsteen base pairings, 1316

which have been observed in vertebrate mitochondria (Annex and 1317

Williams, 1990; Rocher et al., 2002; Takamatsu et al., 2002). 1318

There are other notable observations pertaining to overlap 1319

coding through systematic nucleotide exchanges. Most proteins 1320

aligning with sequences translated from such exchange tran- 1321

scribed human mitochondrial sequences have eukaryotic, mainly 1322

Original text:

Inserted Text



Original text:

Inserted Text


Original text:

Inserted Text

c; 2011

Original text:

Inserted Text

favour

Original text:

Inserted Text

1)

Original text:

Inserted Text

2)

Original text:

Inserted Text

3)

Original text:

Inserted Text

4)

Original text:

Inserted Text

5)

Original text:

Inserted Text

6)

Original text:

Inserted Text

7)

Original text:

Inserted Text

8)

Original text:

Inserted Text

9)

Original text:

Inserted Text

10)

Original text:

Inserted Text

don’t

Original text:

Inserted Text

A<->U+C<->G

Original text:

Inserted Text

5’-to-3’

Original text:

Inserted Text

3’-to-5’

Original text:

Inserted Text

3’-to-5’

Original text:

Inserted Text

2012e).

HE

Sticky Note

this does not refer to "Table 2" in this paper, but to table to in the paper by Lee and Johnson. Hence rewrite this as "table" (majuscules for references to Tables/Figures in the same paper, minuscules for items in other papers), and the words should not link to "Table 2" in this paper.



BIO 3360 1–19


metazoan origins. The alignment data suggests that occasionally,1323

the nucleotide exchange-coded genes are recoded and integrated1324

in the genome so that the protein is coded without nucleotide1325

exchange. This seems to occur relatively rarely, as in most cases,1326

only one protein from one organism aligns with the protein trans-1327

lated from nucleotide exchange transcripts. The data indicate1328

some phyletic clustering for mitochondrially nucleotide-exchange1329

encoded overlap coding genes and those that are encoded with-1330

out nucleotide exchange: organisms possessing mitochondria are1331

overrepresented among the latter.1332

The analyses clearly exclude the possibility that transcripts1333

detected as exchanging nucleotides are due to some kind of annota-1334

tion error or statistical artifacts (i.e., Fig. 1), especially that exchange1335

transcription rates are predicted by rate parameters of misinser-1336

tion kinetics for corresponding nucleotides: systematic nucleotide1337

exchange rates during RNA transcription are proportional to occa-1338

sional replicational mutation rates due to DNA misinsertion of the1339

same nucleotide types (Figs. 9–11). Hence nucleotide exchanging1340

transcription fits basic biochemical nucleotide properties that also1341

affect their DNA misinsertion rate kinetics. However, the possibil-1342

ity that these ESTs are the product of dysfunctional polymerases1343

during the process creating the cDNA libraries is also a possibility.1344

In that case, the data in Table 2 would not directly reflect frequen-1345

cies of naturally occurring nucleotide exchanging transcription in1346

the mitochondrion. These would only be indirectly estimated, from1347

the production of cDNAs by RNA→DNA reverse transcription. Both1348

possibilities are plausible, and are not mutually exclusive. However,1349

even if the ‘unnatural’ scenario for exchange transcript production1350

was correct, the transcript abundances produced by that ‘unnatural’1351

mechanism are proportional to computational predictions of over-1352

lap protein coding genes embedded in nucleotide exchange RNA1353

transcripts (Figs. 2, 4, 8, 10 and 11). This coherence between gene1354

contents and transcript abundance indicates that abundances from1355

Table 2 reflect a natural reality of mitochondria (and cells), even1356

if RNA→DNA reverse transcription, and not DNA→RNA transcrip-1357

tion, produced the suspected transcripts. In that case, occasional1358

RNA→DNA reverse transcriptase dysfunctions would have given1359

insights to the existence of a previously unknown family of related1360

types of polymerization.1361

Nucleotide exchange coding, as a way to encode for more genes1362

without increasing genome length, seems particularly adequate for1363

the dense vertebrate mitochondrial genome, however, there is no1364

ground a priori to assume that such coding is limited to the mito-1365

chondrial genome. It is very probable that at various levels, this1366

type of coding occurs also in the nucleus, and in prokaryotes. Hence1367

protein coding genes encoded by genomes might be much more1368

numerous than believed.1369

References1370

Ahmed, A., Frey, G., Michel, C.J., 2007. Frameshift signals in genes associated with1371

the circular code. In Silico Biol. 7, 155–168.1372

Ahmed, A., Frey, G., Michel, C.J., 2010. Essential molecular functions associated with1373

circular code evolution. J. Theor. Biol. 264, 613–622.1374

Ahmed, A., Michel, C.J., 2011. Circular code signal in frameshift genes. J. Comp. Sci.1375

Syst. Biol. 4, 7–15.1376

Akashi, H., Gojobori, T., 2002. Metabolic efficiency and amino acid composition in1377

the proteomes of Escherichia coli and Bacillus subtilis. Proc. Natl. Acad. Sci. U.S.A.1378

99, 3695–3700.1379

Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman,1380

D.J., 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database1381

search programs. Nucl. Acids Res. 25, 3389–3402.1382

Altschul, S.F., Wootton, J.C., Gertz, E.M., Agarwala, R., Morgulis, A., Schäffer, A.A., Yu,1383

Y.K., 2005. Protein database searches using compositionally adjusted substitu-1384

tion matrices. FEBS J. 272, 5101–5109.1385

Alves, R., Savageau, M.A., 2005. Evidence of selection for low cognate amino acid1386

bias in amino acid biosynthetic enzymes. Mol. Microbiol. 56, 1017–1034.1387

Amzallag, G.N., 2001. Data analysis in plant physiology: are we missing the reality?1388

Plant Cell Environ. 24, 881–890.1389

Annex, B.H., Williams, R.S., 1990. Mitochondrial DNA structure and expression 1390

in specialized subtypes of mammalian striated muscle. Mol. Cell. Biol. 10, 1391

56171–65678. 1392

Arqués, D.G., Michel, C.J., 1996. A complementary circular code in the protein coding 1393

genes. J. Theor. Biol. 182, 45–58. 1394

Arqués, D.G., Michel, C.J., 1997. A circular code in the protein coding genes of 1395

mitochondria. J. Theor. Biol. 189, 273–290. 1396

Barton, M.D., Delneri, D., Oliver, S.G., Rattray, M., Bergman, M.C., 2010. Evolu- 1397

tionary systems biology of amino acid biosynthetic cost in yeast. PLoS One 5, 1398

e11935. 1399

Bertram, J.G., Oertell, K., Petruska, J., Goodman, M.F., 2010. DNA polymerase fidelity: 1400

comparing direct competition of right and wrong dNTP substrates with steady 1401

state and pre-steady state kinetics. Biochemistry 49, 20–28. 1402

Brocchieri, L., Karlin, S., 2005. Protein length in eukaryotic and prokaryotic pro- 1403

teomes. Nucl. Acids Res. 33, 3390–3400. 1404

Chipman, A.D., Khaner, O., Haas, A., Tchernov, E., 2001. The evolution of genome 1405

size: what can be learned from anuran development? J. Exp. Zool. A 291, 1406

364–374. 1407

Daniel, C., Wahlstedt, H., Ohlson, J., Bjork, P., Ohman, M., 2011. Adenosine-to-inosine 1408

RNA editing affects trafficking of the �-aminobutyric acid type A (GABAA) recep- 1409

tor. J. Biol. Chem. 286, 2031–2040. 1410

Dias Neto, E., Garcia Correa, R., Verjovski-Almeida, S., Briones, M.R., Nagai, M.A., 1411

da Silva Jr., W., Zago, M.A., Bordin, S., Costa, F.F., Goldman, G.H., Carvalho, A.F., 1412

Matsukuma, A., Baia, G.S., Simpson, D.H., Brunstein, A., deOliveira, P.S., Bucher, 1413

P., Jongeneel, C.V., O’Hare, M.J., Soares, F., Brentani, R.R., Reis, L.F., de Souza, S.J., 1414

Simpson, A.J., 2000. Shotgun sequencing of the human transcriptome with ORF 1415

expressed sequence tags. Proc. Natl. Acad. Sci. U.S.A. 97, 3491–3496. 1416

Faure, E., Delaye, L., Tribolo, S., Levasseur, A., Seligmann, H., Barthélémy, R.-M., 2011. 1417

Probable presence of an ubiquitous cryptic mitochondrial gene on the antisense 1418

strand of the cytochrome oxidase I gene. Biol. Direct 6, 56. 1419

Fredrico, A., Kunkel, T.A., Shaw, B.R., 1990. A sensitive genetic assay for the detection 1420

of cytosine deamination: determination of rate constants and the activation 1421

energy. Biochemistry 29, 2532–2537. 1422

Gonzalez, D.L., Giannerini, S., Rosa, R., 2011. Circular codes revisited: a statistical 1423

approach. J. Theor. Biol. 275, 21–28. 1424

Gregory, T.R., Hebert, P.D.N., 1999. The modulation of DNA content: proximate 1425

causes and ultimate consequences. Genome Res. 9, 317–324. 1426

Huang, G.M., Ng, W., Farkas, l., He, J., Liang, L., Gordon, H.A., Yu, D., Hood, J.L., 1999. 1427

Prostate cancer expression profiling by cDNA sequencing analysis. Genomics 59, 1428

178–186. 1429

Itzkovitz, S., Alon, Y., 2007. The genetic code is nearly optimal for allowing additional 1430

information within protein-coding sequences. Genome Res. 17, 405–412. 1431

Jackman, J.E., Gott, J.M., Gray, M.W., 2012. Doing it in the reverse: 3′-to-5′ polymer- 1432

ization by the Thg1 superfamily. RNA 18, 886–899. 1433

Jin, Y., Tian, N., Cao, J., Liang, J., Yang, Z., Mv, J., 2007. RNA editing and alternative splic- 1434

ing of the insect nAChR subunit alpha6 transcript: evolutionary conservation, 1435

divergence and regulation. BMC Evol. Biol. 7, 98. 1436

Kasiviswanathan, R., Copeland, W.C., 2011. Ribonucleotid discrimination and 1437

reverse transcription by the human mitochondrial DNA polymerase. J. Biol. 1438

Chem. 286, 31490–31500. 1439

Krishnan, N.M., Seligmann, H., Raina, S.Z., Pollock, D.D., 2004a. Detecting gradients 1440

of asymmetry in site-specific substitutions in mitochondrial genomes. DNA Cell 1441

Biol. 23, 707–714. 1442

Krishnan, N.M., Seligmann, H., Raina, S.Z., Pollock, D.D., 2004b. Phylogenetic anal- 1443

yses detect site-specific perturbations in asymmetric mutation gradients. Curr. 1444

Comput. Mol. Biol. 2004, 266–267. 1445

Krizek, M., Krizek, P., 2012. Why has nature invented three stop codons of DNA and 1446

only one start codon? J. Theor. Biol. 304, 183–187. 1447

Lee, H.R., Johnson, K.A., 2006. Fidelity of the human mitochondrial DNA polymerase. 1448

J. Biol. Chem. 281, 36236–36240. 1449

Lev-Maor, G., Sorek, R., Levanon, E.Y., Paz, N., Eisenberg, E., Ast, G., 2007. RNA-editing- 1450

mediated exon evolution. Genome Biol. 8, R29. 1451

Liew, C.C., Hwang, D.M., Fung, Y.W., Laurenssen, C., Cukerman, E., Tsui, S., Lee, 1452

C.Y., 1994. A catalogue of genes in the cardiovascular system as identified by 1453

expressed sequence tags. Proc. Natl. Acad. Sci. U.S.A. 91, 10645–10649. 1454

Lui, V.W.Y., Luk, S.C.W., Tsui, S.K.W., Tung, C.K.C., Yam, N.Y.H., Liew, C.C., Lee, C.Y., 1455

1995. Gene expression of adult human heart as revealed by random sequencing 1456

of cDNA library. In: Miami Winter BioTechnol. Symp. Proc., vol. 6, p. 90. 1457

Mercer, T.R., Dinger, M.E., Crawford, J., Smith, M.A., Shearwood, A.M., Haugen, E., 1458

Bracken, C.P., Rackham, O., Stamatoyannopoulos, J.A., Filipovska, A., Mattick, J.S., 1459

2011a. The human mitochondrial transcriptome. Cell 146, 645–658. 1460

Mercer, T.R., Gerhardt, D.J., Dinger, M.E., Crawford, J., Trapnell, C., Jeddeloh, J.A., 1461

Mattick, J.S., Rinn, J.L., 2011b. Targeted RNA sequencing reveals the deep com- 1462

plexity of the human transcriptome. Nat. Biotechnol. 30, 99–104. 1463

Michel, C.J., 2012. Circular code motifs in transfer RNA and 16S ribosomal RNAs: a 1464

possible translation code in genes. Comput. Biol. Chem. 34, 24–37. 1465

Namy, O., Lecointe, F., Grosjean, H., Rousset, J.-H., 2005. Translational recoding and 1466

RNA modifications. Fine-tuning of NRA functions by modification and editing. 1467

Top. Curr. Genet. 12, 2005–2340. 1468

Paz, N., Levanon, E.Y., Amariglio, N., Heimberger, A.B., Ram, Z., Constantini, S., 1469

Barbashi, Z.S., Adamsky, K., Safran, M., Hirschberg, A., Krupsky, M., Ben- 1470

Dov, I., Cazacu, S., Mikkelsen, T., Brodie, C., Eisenberg, E., Rechavi, G., 2007. 1471

Altered adenosine-to-inosine RNA editing in human cancer. Genome Res. 17, 1472

1586–1595. 1473

Perlstein, E.O., de Bivort, B.L., Schreiber, S.L., 2007. Evolutionary conserved optimiza- 1474

tion of amino acid biosynthesis. J. Mol. Evol. 98, 186–196. 1475

Original text:

Inserted Text



Original text:

Inserted Text


Original text:

Inserted Text

artefacts

Original text:

Inserted Text

RNA->DNA

Original text:

Inserted Text

RNA->DNA

Original text:

Inserted Text

DNA->RNA

Original text:

Inserted Text

RNA->DNA

Original text:

Inserted Text

believed.Uncited referencesBertram et al. (2010), Krizek and Krizek (2012), Raina et al. (2005), Seligmann (in press) and Sessions and Larson (1987).ReferencesAhmed

Original text:

Inserted Text

Nat. Ac. Sci. USA99200236953700

Original text:

Inserted Text

programsNuc.

Original text:

Inserted Text

Env.242001881890

Original text:

Inserted Text

triated

Original text:

Inserted Text

Cell Biol

Original text:

Inserted Text

yeastPloS ONE52010

Original text:

Inserted Text

proteomesNuc.

Original text:

Inserted Text

γ–aminobutyric

Original text:

Inserted Text

. S. A.97200034913496

Original text:

Inserted Text

energyBiochem.29199025322537

Original text:

Inserted Text

A statistical

Original text:

Inserted Text

3’-to5’

Original text:

Inserted Text

gradientsCurrents in Computational Molecular Biology20042004266267

Original text:

Inserted Text

. S. A.9119941064510649

Original text:

Inserted Text

.6199590

Original text:

Inserted Text

transcriptomeNature Biotech.30201199104

Original text:

Inserted Text

modifications, Fine

Original text:

Inserted Text

editingTopics in Current Genetics12200520052340

HE

Sticky Note

please delete "s", this should be "suggest"

HE

Sticky Note

"a priori" should be in italics

HE

Sticky Note

"In Silico" should be in italics

HE

Sticky Note

please insert here missing reference to "Michel, 2008": Michel, C.J., 2008. A 2006 review of circular codes in genes. Computer and Mathematics with Applications 55, 984-988.



BIO 3360 1–19


Raina, S.Z., Faith, J.J., Disotell, T.R., Seligmann, H., Stewart, C.B., Pollock, D.D., 2005.1476

Evolution of base-substitution gradients in primate mitochondrial genomes.1477

Genome Res. 15, 665–673.1478

Reenan, R.A., 2005. Molecular determinants and guided evolution of species-specific1479

RNA editing. Nature 434, 409–413.1480

Rocher, C., Letellier, T., Copeland, W.C., Lestienne, P., 2002. Base composi-1481

tion at mtDNA boundaries suggests a DNA triple helix model for human1482

mitochondrial DNA large-scale rearrangements. Mol. Genet. Metab. 76,1483

123–132.1484

Seligmann, H., 2007. Cost minimization of ribosomal frameshifts. J. Theor. Biol. 249,1485

162–167.1486

Seligmann, H., 2008. Hybridization between mitochondrial heavy strand tDNA and1487

expressed light strand tRNA modulates the function of heavy strand tDNA as1488

light strand replication origin. J. Mol. Biol. 379, 188–199.1489

Seligmann, H., 2010a. The ambush hypothesis at the whole-organism level: off1490

frame, ‘hidden’ stops in vertebrate mitochondrial genes increase developmental1491

stability. Comp. Biol. Chem. 34, 80–85.1492

Seligmann, H., 2010b. Avoidance of antisense antiterminator tRNA anticodons in1493

vertebrate mitochondria. Biosystems 101, 42–50.1494

Seligmann, H., 2010c. Undetected antisense tRNAs in mitochondrial genomes? Biol.1495

Direct 5, 39.1496

Seligmann, H., 2011a. Two genetic codes, one genome: frameshifted primate1497

mitochondrial genes code for additional proteins in presence of antisense1498

antitermination tRNAs. Biosystems 106, 271–286.1499

Seligmann, H., 2011b. Mutation patterns due to converging mitochondrial repli-1500

cation and transcription increase lifespan, and cause growth rate-longevity1501

tradeoffs. In: Seligmann, H. (Ed.), DNA Replication—Current Advances. InTech,1502

pp. 151–180 (Chapter 6).1503

Seligmann, H., 2012a. Coding constraints modulate chemically spontaneous muta-1504

tional replication gradients in mitochondrial genomes. Curr. Genomics 13,1505

37–54.1506

Seligmann, H., 2012b. Positive and negative cognate amino acid bias affects com-1507

positions of aminoacyl-tRNA synthetases and reflects functional constraints on1508

protein structure. BIO 2, 11–26.1509

Seligmann, H., 2012c. An overlapping genetic code for frameshifted overlapping1510

genes in Drosophila mitochondria: antisense antitermination tRNAs UAR insert1511

serine. J. Theor. Biol. 296, 61–76.1512

Seligmann, H. Overlapping genetic codes for overlapping frameshifted genes inQ71513

testudines, and Lepidochelys olivacea as a special case. Comp. Biol. Chem., in1514

press.1515

Seligmann, H., 2012d. Overlapping genes coded in the 3′-to-5′ direction in mito-1516

chondrial genes and 3′-to-5′ polymerization of non-complementary RNA by an1517

‘invertase’. J. Theor. Biol. 315, 38–52.

Seligmann, H., 2012e. Putative mitochondrial polypeptides coded by expanded 1518

quadruplet codons, decoded by antisense tRNAs with unusual anticodons. 1519

Biosystems 110, 84–106. 1520

Seligmann, H. Putative protein-encoding genes within mitochondrial rDNA and the 1521

D-loop region. In: Lin, Z., Liu, W. (Eds.), Ribosomes: Molecular Structure, Role in 1522

Biological Functions and Implications for Genetic Diseases, Nova Publishers, in 1523

press. 1524

Seligmann, H., Anderson, S.C., Autumn, K., Bouskila, A., Saf, R., Tuniyev, B.S., Werner, 1525

Y.L., 2007. Analysis of the locomotor activity of a nocturnal desert lizard (Rep- 1526

tilia: Gekkonidae: Teratoscincus scincus) under varying moonlight. Zoology 110, 1527

104–117. 1528

Seligmann, H., Krishnan, N.M., Rao, B.J., 2006. Possible multiple origins of replication 1529

in primate mitochondria: alternative role of tRNA sequences. J. Theor. Biol. 241, 1530

321–332. 1531

Seligmann, H., Pollock, D.D., 2004a. The ambush hypothesis: hidden stop codons 1532

prevent off-frame gene reading. In: Midsouth Computational Biology and Bioin- 1533

formatics Society, vol. 36, Abstract. 1534

Seligmann, H., Pollock, D.D., 2004b. The ambush hypothesis: hidden stop codons 1535

prevent off-frame gene reading. DNA Cell Biol. 23, 701–705. 1536

Sessions, S.K., Larson, A., 1987. Developmental correlates of genome size in plethod- 1537

ontid salamanders and their implications for genome evolution. Evolution 41, 1538

1239–1251. 1539

Singh, T.R., Pardasani, K.R., 2009. Ambush hypothesis revisited: evidences for phy- 1540

logenetic trends. Comput. Biol. Chem. 33, 239–244. 1541

Takamatsu, C., Umeda, S., Ohsato, T., Ohno, T., Abe, Y., Fukuoh, A., Shinagawa, H., 1542

Hamasaki, N., Kang, D., 2002. Regulation of mitochondrial D-loops by transcrip- 1543

tion factor A and single-stranded DNA-binding protein. EMBO Rep. 3, 451–456. 1544

Tanaka, M., Ozawa, T., 1994. Strand asymmetry in human mitochondrial mutations. 1545

Genomics 22, 327–335. 1546

Tse, H., Cai, J.J., Tsoi, H.-W., Lam, E.P.T., Yuen, K.-Y., 2010. Natural selection 1547

retains overrepresented out-of-frame stop codons against frameshift peptides 1548

in prokaryotes. BMC Genomics 11, 491. 1549

Warnecke, T., Hurst, L.D., 2011. Error prevention and mitigation as forces in the 1550

evolution of genes and genomes. Nat. Rev. Genet. 12, 875–881. 1551

Warringer, J., Blomberg, A., 2006. Evolutionary constraints on yeast protein size. 1552

BMC Evol. Biol. 6, 61. 1553

Zamft, B.M., Marblestone, A.H., Kording, K., Schmidt, D., Martin-Alarcon, D., Tyo, K., 1554

Boyden, E.S., Church, G., 2012. Measuring cation dependent DNA polymerase 1555

fidelity landscapes by deep sequencing. PLoS One 7, e43876. 1556

Zhang, Z., Schwartz, S., Wagner, L., Miller, W., 2000. A greedy algorithm for aligning 1557

DNA sequences. J. Comp. Biol. 7, 203–214. 1558

Zuker, M., 2003. Mfold web server for nucleic acid folding and hybridization predic- 1559

tion. Nucl. Acids Res. 31, 3406–3415. 1560

Original text:

Inserted Text



Original text:

Inserted Text


Original text:

Inserted Text

Genetics Metabolism762002123132

Original text:

Inserted Text

mitochondriaBioSystems

Original text:

Inserted Text

antisens tRNAs

Original text:

Inserted Text

Biol Direct

Original text:

Inserted Text

tRNAsBioSystems

Original text:

Inserted Text

Replication-Current Advances,62011InTech, chapter151180Seligmann

Original text:

Inserted Text

Antisense

Original text:

Inserted Text

2012eH.

Original text:

Inserted Text

3’-to-5’

Original text:

Inserted Text

3’-to-5’

Original text:

Inserted Text

2012fH.

Original text:

Inserted Text

Ribosomes: Molecular structure, role in biological functions and implications for genetic diseases, Lin Z., Liu W. (eds.),

Original text:

Inserted Text

Abstract362004Seligmann

Original text:

Inserted Text

trendsComp. Biom.

Original text:

Inserted Text

Reports32002451456

Original text:

Inserted Text

prokaryitesBMC

Original text:

Inserted Text

genomesNature Rev. Genetics122011875881

Original text:

Inserted Text

deeps equencingPLoS ONE72012

Original text:

Inserted Text

prediction. Nuc. Acids Res. 31, 3406-3415.�

HE

Sticky Note

add here reference related to Q3 Seligmann, H., 2003. Cost minimization of amino acid usage. J. Mol. Evol. 56, 151-161.

HE

Sticky Note

The updated reference is: Seligmann, H., 2012f. Overlapping genetic codes for overlapping frameshifted genes in Testudines, and Lepidochelys olivacea as special case. Comp. Biol. Chem. 41, 18-34.

HE

Sticky Note

this chapter will probably be published in 2013

Date post:	10-Dec-2023
Category:	Documents
Upload:	otmed
View:	0 times
Download:	0 times

Polymerization of non-complementary RNA: systematic symmetric nucleotide exchanges mainly involving...

Documents