Jena
InstituteofMolecular Biotechnology
Swetlana Nikolajewa, Andreas Beyer, Maik Friedel, Jens Hollunder, Thomas Wilhelm
Institute of Molecular Biotechnology, Jena Germany
Overview: Purine-Pyrimidine Patterns
Part 1 New Classification Scheme of the Genetic code
Part 2Type II Restriction Enzyme Binding Sites
Overview: Genetic Code
Part 1. The purine-pyrimidine scheme of the genetic codes shows
amino-acids patterns and regularities of codons
symmetry characteristics
possible predecessors of our contemporary quaternary triplet code
explanation for the number (22) of tRNA genes in mammalian mitochondrial genome
3 nucleobases (triplets) of A, G, C, U code for 20 AAs
64 possible codons (4x4x4=43)
3 termination codons: UGA, UAG, UAA
Met (AUG) codon is also the start codon
2nd base
U C A G
1stbase
U
UUU Phe UUC Phe UUA Leu UUG Leu
UCU Ser UCC Ser UCA Ser UCG Ser
UAU Tyr UAC Tyr UAA StopUAG Stop
UGU Cys UGC Cys UGA StopUGG Trp
UCAG
3rd base
C
CUU Leu CUC Leu CUA Leu
CUG Leu
CCU Pro CCC Pro CCA Pro CCG Pro
CAU His CAC His CAA Gln CAG Gln
CGU Arg CGC Arg CGA Arg CGG Arg
UCAG
A
AUU Ile AUC Ile AUA Ile AUG Met
ACU Thr ACC Thr ACA Thr ACG Thr
AAU Asn AAC Asn AAA Lys AAG Lys
AGU Ser AGC Ser AGA Arg AGG Arg
UCAG
G
GUU Val GUC Val GUA Val GUG Val
GCU Ala GCC Ala GCA Ala GCG Ala
GAU Asp GAC Asp GAA Glu GAG Glu
GGU Gly GGC Gly GGA Gly GGG Gly
UCAG
The Common Genetic Code Table
The Common Genetic Code Table contains 64 fields…
Purine-Pyrimidine Classification Scheme of the Genetic Code
binary representation of nucleobases
purines : A, G → 1
pyrimidines: C, U → 0
C G binds via 3 hydrogen bonds in the complementary base pairing A U binds via 2 hydrogen bonds in the complementary base pairing
23 = 8 different binary triplets 000 , 001, … ,111each of these has again 8 possibilities, for instance: 000 stands for three pyrimidines: CCC, CCU, UUC, …, UUU 111 stands for three purines: GGG, GGA, GAA, …, AAA
Codon Strong codons6 H bonds
Mixed codons5 H bonds
Mixed codons5 H bonds
Weak codons4 H bonds
Pro CC (A/G)Proline
Ala GC (C/U)Alanine
Ala GC (A/G)Alanine
Leu CU (A/G)Leucine
Thr AC (C/U)Threonine
Thr AC (A/G)Threonine
Ser UC (C/U)Serine
Val GU (C/U)Valine
Val GU (A/G)Valine
Phe UU (C/U)Phenylalanine
Ile AU (C/U)Isoleucine
Ile/Met AU (A/G)Isoleucine/Methionine
000
001
100
101
Arg CG (C/U)Arginine
Cys UG (C/U)Cystein
His CA (C/U)Histidine
Tyr UA (C/U)Tyrosine
010
Arg CG (A/G)Arginine
Stop/Trp UG (A/G)Tryptophan
Gln CA (A/G)Glutamine
Stop UA (A/G)011
Gly GG (C/U)Glycine
Asp GA (C/U)Asparatic acid
Asn AA (C/U)Asparagine
110
Gly GG (A/G)Glycine
Glu GA (A/G)Glutamatic acid
111
Leu CU (C/U)Leucine
Leu UU (A/G)Leucine
Ser UC (A/G)Serine
Ser AG (C/U)Serine
Arg AG (A/G)Arginine
Pro CC (C/U)Proline
Lys AA (A/G)Lysine
Purine-Pyrimidine Table of the Genetic Code
…the new scheme contains the same information in only 32 fields.
Codon Strong 6 hydrogen bonds
Mixed 5 hydrogen bonds
Mixed5 hydrogen bonds
Weak4 hydrogen bonds
Pro CC (A/G)
Ala GC (C/U)
Ala GC (A/G)
Leu CU (A/G)
Thr AC (C/U)
Thr AC (A/G)
Ser UC (C/U)
Val GU (C/U)
Val GU (A/G)
Phe UU (C/U)
Ile AU (C/U)
Ile/Met AU (A/G)
000
001
100
101
Arg CG (C/U) Cys UG (C/U) His CA (C/U) Tyr UA (C/U)010
Arg CG (A/G) Stop/Trp UG (A/G) Gln CA (A/G) Stop UA (A/G)011
Gly GG (C/U) Asp GA (C/U)Asparatic acid
Asn AA (C/U)Asparagine
110
Gly GG (A/G) Glu GA (A/G)Glutamatic acid
111
Leu CU (C/U)
Leu UU (A/G)Ser UC (A/G)
Ser AG (C/U)
Arg AG (A/G)
Pro CC (C/U)
Lys AA (A/G)Lysine
Amino Acid Patterns:Polar Requirement of NCN and NUN Codons
C. R. Woese, G. J. Olsen, M. Ibba, D. Söll Aminoacyl-tRNA Synthetases, the Genetic Code, and the Evolutionary Process. MMBR 2000(64) 202-236
Codon Strong 6 H-bonds
Mixed 5 H-bonds
Mixed 5 H-bonds
Weak 4 H- bonds
Pro CC (A/G)
Ala GC (C/U)
Ala GC (A/G)
Leu CU (A/G)
Thr AC (C/U)
Thr AC (A/G)
Ser UC (C/U)
Val GU (C/U)
Val GU (A/G)
Phe UU (C/U)
Ile AU (C/U)
Ile/Met AU (A/G)
000
001
100
101
Arg CG (C/U) Cys UG (C/U) His CA (C/U) Tyr UA (C/U)010
Arg CG (A/G) Stop/Trp UG (A/G) Gln CA (A/G) Stop UA (A/G)011
Gly GG (C/U) Asp GA (C/U) Asn AA (C/U)110
Gly GG (A/G) Glu GA (A/G)111
Leu CU (C/U)
Leu UU (A/G)Ser UC (A/G)
Ser AG (C/U)
Arg AG (A/G)
Pro CC (C/U)
Lys AA (A/G)
Kyte&Doolittle, 1982, http://biology-pages.info
Amino Acid Patterns: Hydrophobicity
Codon Strong 6 H-bonds
Mixed 5 H-bonds
Mixed 5 H-bonds
Weak 4 H-bonds
Pro CC (A/G)
Ala GC (C/U)
Ala GC (A/G)
Leu CU (A/G)
Thr AC (C/U)
Thr AC (A/G)
Val GU (C/U)
Val GU (A/G)
Ile AU (C/U)
Ile/Met AU (A/G)
001
100
101
Arg CG (C/U) Cys UG (C/U) His CA (C/U) Tyr UA (C/U)010
Arg CG (A/G) Stop/Trp UG (A/G) Gln CA (A/G) Stop UA (A/G)011
Gly GG (C/U) Asp GA (C/U) Asn AA (C/U)110
Gly GG (A/G) Glu GA (A/G)111
Leu UU (A/G)Ser UC (A/G)
Ser AG (C/U)
Arg AG (A/G) Lys AA (A/G)
Codon-Anticodon Symmetry
Ser UC (C/U) Phe UU (C/U)000 Leu CU (C/U)Pro CC (C/U)
Codon Strong 6 H-bonds
Mixed 5 H- bonds
Mixed 5 H-bonds
Weak 4 H-bonds
Pro CC (A/G)
Ala GC (C/U)
Ala GC (A/G)
Leu CU (A/G)
Thr AC (C/U)
Thr AC (A/G)
Ser UC (C/U)
Val GU (C/U)
Val GU (A/G)
Phe UU (C/U)
Ile AU (C/U)
Ile/Met AU (A/G)
000
001
100
101
Arg CG (C/U) Cys UG (C/U) His CA (C/U) Tyr UA (C/U)010
Arg CG (A/G) Stop/Trp UG (A/G) Gln CA (A/G) Stop UA (A/G)011
Gly GG (C/U) Asp GA (C/U) Asn AA (C/U)110
Gly GG (A/G) Glu GA (A/G)111
Leu CU (C/U)
Leu UU (A/G)Ser UC (A/G)
Ser AG (C/U)
Arg AG (A/G)
Pro CC (C/U)
Lys AA (A/G)
Point Symmetry
D. Halitsky Extending the (Hexa-)Rhombic Dodecahedral Model of the Genetic Code: the Code's Four 6-fold Degeneracies and the Ten Orthogonal Projections of the 5-cube as 3-cube. Computer Systems Technology 2004
Codon Strong 6 H-bonds
Mixed 5 H- bonds
Mixed 5 H-bonds
Weak 4 H-bonds
Pro CC (A/G)
Ala GC (C/U)
Ala GC (A/G)
Leu CU (A/G)
Thr AC (C/U)
Thr AC (A/G)
Ser UC (C/U)
Val GU (C/U)
Val GU (A/G)
Phe UU (C/U)
Ile AU (C/U)
Ile/Met AU (A/G)
000
001
100
101
Arg CG (C/U) Cys UG (C/U) His CA (C/U) Tyr UA (C/U)010
Arg CG (A/G) Stop/Trp UG (A/G) Gln CA (A/G) Stop UA (A/G)011
Gly GG (C/U) Asp GA (C/U) Asn AA (C/U)110
Gly GG (A/G) Glu GA (A/G)111
Leu CU (C/U)
Leu UU (A/G)Ser UC (A/G)
Ser AG (C/U)
Arg AG (A/G)
Pro CC (C/U)
Lys AA (A/G)
Codon-Reverse Codon (XYZ↔ZYX) Symmetry
Evolution of the Genetic Code
binary doublet: 41=4 fields
00 00 00 00
01 01 01 01
10 10 10 10
11 11 11 11
00* 00* 00* 00*
01* 01* 01* 01*
10* 10* 10* 10*
11* 11* 11* 11*
quaternary doublet code: 42=16 fields
our contemporary code is the quaternary triplet code: 43=64 fields
00
01
10
11
CGU, UAC,…
CGU, UAC,…
Codon Strong 6 H bonds
Mixed 5 H bonds
Mixed5 H bonds
Weak4 H bonds
Pro CC (A/G)Proline
Ala GC (C/U)Alanine
Ala GC (A/G)Alanine
Leu CU (A/G)Leucine
Thr AC (C/U)Threonine
Thr AC (A/G)Threonine
Ser UC (C/U)Serine
Val GU (C/U)Valine
Val GU (A/G)Valine
Phe UU (C/U)Phenylalanine
Ile AU (C/U)Isoleucine
Ile/Met AU (A/G)Isoleucine/Methionine
000
001
100
101
Arg CG (C/U)Arginine
Cys UG (C/U)Cystein
His CA (C/U)Histidine
Tyr UA (C/U)Tyrosine
010
Arg CG (A/G)Arginine
Stop/Trp UG (A/G)Tryptophan
Gln CA (A/G)Glutamine
Stop UA (A/G)011
Gly GG (C/U)Glycine
Asp GA (C/U)Asparatic acid
Asn AA (C/U)Asparagine
110
Gly GG (A/G)Glycine
Glu GA (A/G)Glutamatic acid
111
Leu CU (C/U)Leucine
Leu UU (A/G)Leucine
Ser UC (A/G)Serine
Ser AG (C/U)Serine
Arg AG (A/G)Arginine
Pro CC (C/U)Proline
Lys AA (A/G)Lysine
Evolution: Scenario 100 00 00 00
01 01 01 01
10 10 10 10
11 11 11 11
Codon Strong6 H bonds
Mixed5 H bonds
Mixed5 H bonds
Weak4 H bonds
Pro CC (A/G)Proline
Ala GC (C/U)Alanine
Ala GC (A/G)Alanine
Leu CU (A/G)Leucine
Thr AC (C/U)Threonine
Thr AC (A/G)Threonine
Ser UC (C/U)Serine
Val GU (C/U)Valine
Val GU (A/G)Valine
Phe UU (C/U)Phenylalanine
Ile AU (C/U)Isoleucine
Ile/Met AU (A/G)Isoleucine/Methionine
000
001
100
101
Arg CG (C/U)Arginine
Cys UG (C/U)Cystein
His CA (C/U)Histidine
Tyr UA (C/U)Tyrosine
010
Arg CG (A/G)Arginine
Stop/Trp UG (A/G)Tryptophan
Gln CA (A/G)Glutamine
Stop UA (A/G)011
Gly GG (C/U)Glycine
Asp GA (C/U)Asparatic acid
Asn AA (C/U)Asparagine
110
Gly GG (A/G)Glycine
Glu GA (A/G)Glutamatic acid
111
Leu CU (C/U)Leucine
Leu UU (A/G)Leucine
Ser UC (A/G)Serine
Ser AG (C/U)Serine
Arg AG (A/G)Arginine
Pro CC (C/U)Proline
Lys AA (A/G)Lysine
Evolution: Scenario 200 00 00 00
01 01 01 01
10 10 10 10
11 11 11 11
Mitochondrial genomes have several surprising features
genetic code of mitochondria
only 22 tRNAs are required for mammalian mitochondrial protein synthesis
?
Codon Strong6 H bonds
Mixed5 H bonds
Mixed5 H bonds
Weak4 H bonds
Pro CC (A/G)
Ala GC (C/U)
Ala GC (A/G)
Leu CU (A/G)
Thr AC (C/U)
Thr AC (A/G)
Val GU (C/U)
Val GU (A/G)
Ile AU (C/U)
Met/Met AU (A/G)
001
100
101
Arg CG (C/U) Cys UG (C/U) His CA (C/U) Tyr UA (C/U)010
Arg CG (A/G) Trp /Trp UG (A/G) Gln CA (A/G) Stop UA (A/G)011
Gly GG (C/U) Asp GA (C/U) Asn AA (C/U)110
Gly GG (A/G) Glu GA (A/G)111
Leu UU (A/G)Ser UC (A/G)
Ser AG (C/U)
STOP AG (A/G) Lys AA (A/G)
The Mammalian Mitochondrial Genetic Code
Ser UC (C/U) Phe UU (C/U)000 Leu CU (C/U)Pro CC (C/U)
http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
Codon Strong6 H bonds
Mixed5 H bonds
Mixed5 H bonds
Weak4 H bonds
tRNAIle AU (C/U)
tRNAMet AU (A/G)
001
100
101
tRNACys UG (C/U) tRNAHis CA (C/U) tRNATyr UA (C/U)010
tRNATrp UG (A/G) tRNAGln CA (A/G) STOP UA (A/G)011
tRNAAsp GA (C/U) tRNAAsn AA (C/U)110
tRNAGlu GA (A/G)111
tRNALeu2 UU (A/G)
tRNASer2 AG (C/U)
STOP AG (A/G) tRNALys AA (A/G)
The Mammalian Mitochondrial Code 8 tRNAs for family codons + 14 tRNAs for non-family codons = 22
tRNASer1 UC
tRNAPhe UU (C/U)000
tRNALeu1 CUtRNAPro CC
tRNAAla GC
tRNAArg CG
tRNAGly GG
tRNAThr AC tRNAVal GU
http://mamit-trna.u-strasbg.fr/2DStructures.html
G A A T T C
Restriction Enzyme (Endonuclease)
Restriction enzymes recognize short specific DNA sequences enable bacteria to destroy foreign DNA are useful tools in biotechnology
G A A T T C
The most well studied class of REs is type II, which cleave DNA within their recognition sequences
Many recognition sequences are palindromic
Are REase similar in the binding sites?
Restriction Enzyme
SourceRecognition
SequencePur (1)–pyr (0)
pattern
AluI Arthrobacter luteus AG↓CT
HaeIII Haemophilus aegyptius GG↓CC
BamHI Bacillus amyloliquefaciens G↓GA TCC
HindIII Haemophilus influenzae A↓AG CTT
EcoRI Escherichia coli G↓AA TTC
11↓00
1↓11 000
11↓00
1↓11 000
1↓11 000
Examples from Kimball‘s Biology Pages
How significant is the Pattern RR/YY (11/00)?
Frequencies of dinucleotides trinucleotides tetranucleotides coded in three possible coding scheme:
R vs Y (G, A vs C, T) K vs M (G, T vs C, A) S vs W (G, C vs A, T)
Type II 3726
In the symmetrical set the most significant dinucleotides are RR (or 11) (p-value <10-63) and YY (or 00) (p-value <10-29)
In the asymmetric set RRR, YYY and YYYY are even more significant, but RR and YY also stand out.
Symmetrical (98%)recognition sequences
Asymmetrical (2%)recognition sequences
Why is the Motif RR..YY preferred?
specific geometrical properties minimal slide values
strong tilt in the negative direction
positive roll
low stacking energy
Figure 1 Example of an interaction between an H-bond donor cluster (resulting from two adjacent purines AA) and an H-bond acceptor.
Dinucleotides RR..YY are characterized by:
stronger H-bond donor and acceptor clusters
Outlook
Looking for binary patterns in the genomes
Additional information
Thank you for your attention !
http://www.imb-jena.de/tsb
Codon Strong 6 hydrogen bonds
Mixed 5 hydrogen bonds
Mixed 5 hydrogen bonds
Weak 4 hydrogen bonds
Pro CC (A/G)Proline
Ala GC (C/U)Alanine
Ala GC (A/G)Alanine
Leu CU (A/G)Leucine
Thr AC (C/U)Threonine
Thr AC (A/G)Threonine
Ser UC (C/U)Serine
Val GU (C/U)Valine
Val GU (A/G)Valine
Phe UU (C/U)Phenylalanine
Ile AU (C/U)Isoleucine
Ile/Met AU (A/G)Isoleucine/Methionine
000
001
100
101
Arg CG (C/U)Arginine
Cys UG (C/U)Cystein
His CA (C/U)Histidine
Tyr UA (C/U)Tyrosine
010
Arg CG (A/G)Arginine
Stop/Trp UG (A/G)Tryptophan
Gln CA (A/G)Glutamine
Stop UA (A/G)011
Gly GG (C/U)Glycine
Asp GA (C/U)Asparatic acid
Asn AA (C/U)Asparagine
110
Gly GG (A/G)Glycine
Glu GA (A/G)Glutamatic acid
111
Leu CU (C/U)Leucine
Leu UU (A/G)Leucine
Ser UC (A/G)Serine
Ser AG (C/U)Serine
Arg AG (A/G)Arginine
Pro CC (C/U)Proline
Lys AA (A/G)Lysine
Purine-Pyrimidine Scheme of the Genetic Code