RNA Bioinformatics - Dartmouth Computer Sciencerockmore/CSSS2006/stadler...RNA Bioinformatics Peter...

RNA Bioinformatics

Peter F. Stadler

Bioinformatics Group, Dept. of Computer Science & Interdisciplinary Center forBioinformatics, University of Leipzig

Institute for Theoretical Chemistry, Univ. of Vienna (external faculty)The Santa Fe Institute (external faculty)

CSSS, June 2006

Overview

PART 1: RNA Structures and How to Compute Them

PART 2: RNA Landscapes

PART 3: The Modern RNA World

PART1Why RNA?

until relatively recently:Central Dogma of Molecular BiologyDNA → RNA → ProteinDNA = “genetic memory”, RNA = working copy, proteins dothe work

around 1980: discovery of catalytic RNAs (Nobelprize for TomCech and Sidney Altman)nevertheless long considered “exotic” remnants from theancient RNA world

around 2000: structure of the ribosome showns that theribosome is an “RNA enzyme”

around 2000: microRNAs are discovered as a large class ofregulatory RNAs that inhibit translation of proteins

2006: the ENCODE project shows that human geneexpression is quite different from textbook knowledge

RNA Bioinformatics

RNA Secondary Structures are an appropriate level of description

explain the thermodynamics of RNA Structures

often highly conserved in evolution

can be computed efficiently

Many Functional RNAs are Structured

(a) Group I intron P4-P6 domain(b) Hammerhead ribozyme(c) HDV ribozyme

(d) Yeast tRNAphe

(e) L1 domain of 23S rRNA

Hermann & Patel, JMB 294, 1999

The RNA Model

GCGGGAAU

AGCUC

AGUUGG U A

G A G CA

CGA

CC

UU

GC C

AAGGUCGGGGU

CG C G A G

U U CGA

GUCUCGU

UUCCCGC

UC

CA

GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA

Formal Definition

A secondary structure on a sequence s is a collection of pairs (i , j)with i < j such that

Base pairing rules are respected, i.e., (i , j) ∈ Ω implies (si , sj)form an allowed pair (GC, CG, AU, UA, GU, UG)

Each base is involved in at most one pair, i.e., Ω is amatching, (i , j), (i , k) ∈ Ω implies j = k and (i , k), (j , k) ∈ Ωin implies i = j .

(i , j)Ω implies |j − i | > 3 (sterical constraint)

No-crossing rule: (i , j), (k, l) ∈ Ω and i < k implies eitheri < k < l < j or i < j < k < l .This excludes so-called pseudoknots

Let’s count the structures . . .

Counting secondary structures. Given a sequence of length n.Πkl = 1 if sequence positions k, l can form a pair GC, CG, AU,

UA, GU, UG and Πkl = 0 otherwise.Nkl = number of structures of the subsequence from k to l .Basic recursion:

• xxxxxxx +∑

(xxxx)xxxx

Nkl = Nk+1,l +l∑

j=k+m

ΠkjNk+1,j−1Nj+1,l

RNA Folding in a nutshell

i jj i i+1 j i i+1 k−1 k k+1|=

Nij = Ni+1,j +∑

k(i,k)pair

Ni+1,k−1Nk+1,j

Eij = minEi+1,j + min

k(i,k)pair

(Ei+1,k−1 + Ek+1,j + εik)

Zij = Zi+1,j +∑

k(i,k)pair

Zi+1,k−1Zk+1,j exp(−εik/RT )

Partition function: Z =∑

Ω exp(−E (Ω)/RT )

A word on the Partition Function

The partition function is the link between the combinatorics of thestructures (in general: states in an ensemble) and thethermodynamic properties of the physical ensemble, e.g.:

Free energy G = −RT lnZ

Expected Energy 〈E 〉 = RT 2 ∂ lnZ∂T

Heat Capacity Cp = −T ∂2G∂T 2

0 20 37 100

0

1

2

3

4

5

CUGUAUUGUUGUAUAGCCCGUGUGGUAAUAUGG

T [C]

C(T

) [k

cal•(

mol

•K)-1

]

Realistic Energy Model

GC

U

A

C

G

closing base pair

A

5

3

3

5

A

5

3

closing base pair

interior base pairsclosing base pair

3

C

U

A

A

U G

C

closing base pair

multi-loop

interior base pair

A U

interior base pair

interior base pair

hairpin loop

bulge

C

G

C U

A

closing base pair

stacking pair

interior loop

G

G

G

3

5A

G

C

CA

5

Parameters from large number of melting experiments by Douglas

Turner, David Matthews, John Santa Lucia, and others

Recursions for Linear RNAs

M1 M1

M1

i u u+1|

MC

= |

= | |

FC

i j i+1 j i

hairpin Cinterior

i j i i k l j

k k+1 j

=

C

F F

i j

M

=i ij j−1

j

j| C

i j

M

ui+1 u+1i j−1 j

j i

M

j−1 ji u u+1|C


Fij free energy of the optimal substructure on the subsequencex [i , j].

Cij free energy of the optimal substructure on the subsequencex [i , j] subject to the constraint that i and j form a base pair.

Mij free energy of the optimal substructure on the subsequencex [i , j] subject to the constraint that that x [i , j] is part of amultiloop and has at least one component, i.e., asub-sequence that is enclosed by a base pair.

M1ij free energy of the optimal substructure on the subsequence

x [i , j] subject to the constraint that that x [i , j] is part of amultiloop and has exactly one component, which has theclosing pair i , h for some h satisfying i ≤ h < j .


Fij = minFi+1,j , min

i<k≤jCik + Fk+1,j

Cij = minH(i , j), min

i<k<l<jCkl + I(i , j ; k, l),

mini<u<j

Mi+1,u + M1u+1,j−1 + a

Mij = min

mini<u<j

(u − i − 1)c + Cu+1,j + b,

mini<u<j

Mi ,u + Cu+1,j + b, Mi ,j−1 + c

M1ij = min

M1

i ,j−1 + c , Cij + b

Backward Recursion: Base Pairing Probabilities

pij =Z1,i−1Zi ,jZj+1,n

Z1,n+

∑

k<i

∑

l>j

pklΞij ,kl .

Ξij ,kl is a ratio of the two partition functions:

Zij ,kl ... both i , j and k, l pair

Zkl ... k, l pair.Simplest case:Zij ,kl = Zk+1,i−1ZijZj+1,l−1ζkl where ζkl = exp(−βkl/RT ) is theBoltzman factor of the pairing energy

Backward recursion: full modelBackward recursion:

Pkl = Pkl +

∑

p<k;q>l

Ppq

ZBk,l

ZBp,q

e−I(p,q,k,l)

+

∑

p<u<k

ZMp+1,uZ

M1u+1,k−1)

e−(a+(q−l−1)c)

+

∑

l<u<q

ZMl+1,uZ

M1v+1,q−1)

e−(a+(k−p−1)c)

+ ZMp+1,k−1Z

Ml+1,q−1

k lp q

Single-Stranded Circular RNAs

Viroid RNA

Hepatitis Delta Virus Genome

Cryptic by-products of splicing formed intronic sequence

Circularized C/D box snoRNAs were recently reported inPyrococcus furiosus

Synthetic constructs for in vitro selection

Circular, Linear, and Interacting RNAs

In the maximum matching case=⇒ same algorithm for all three cases

CIRCULAR FOLDING LINEAR FOLDING BINARY COFOLDING

i

j j j

i i

1n

1

n

1

n’

n

1’

Linear versus Circular Folding

Linear folding: energy contributions inside a pair (i , j) only.Co-folding: additional contribution for loop spanning [n, 1].

i

j

no energy contribution for external loop

i

j

1

n

k

l

p

q

1n

no external loop

extra contribution

external loop

Strategy 1 (e.g. Michael Zucker’s mfold)For each pair (i , j): compute energy both inside and outsidethe pair⇒ doubles memory requirements

Strategy 2 (Vienna RNA Package)First compute linear folding energies. Then compute energiesfor the loop spanning [n, 1].

1

n

1

n

1

n

hairpin loop interior loop or bulge multi−branch loop

Implementing Circular Folding

Relative to linear folding, only the loop containing the cut has tobe re-evaluated.Three cases: cut in Hairpin, Interior-, or Multi-loop

F = minF H ,F

I ,F

M

M1 M1

= | |n1

n1

n1

n1C

C

C

kl

p

interior

hairp

in kk+1

ql

k

F o

M2

M

k n

=

k nu u+1

M 2

Exterior Hairpin.

F H = min

p<qCpq + H(q, p)

Exterior Interior Loop.

F I = min

k<l<p<qCpq + Ckl + I(q, p, l , k)

Exterior Multi-Loop.

Modified decomposition: one or more components M1,k +exactly two components M2

k+1,n

M2kn = min

k<u<n

(M1

ku + M1u+1,n

)

F M = min

1<k<n

M1,k + M2

k+1,n + a

Folding energy: F = minF H ,F

I ,F

M

Application

sof

Circu

larFold

ing

Itdoes

make

adiff

erence

120

4060

80100

120140

160180

200220

240260

284starting point of linearized sequence

0 10 20 30 40 50

Structure distance

-6 -4 -2 0 2 4Energy difference

CU

GGGGAAUUUCUCUGCGGGACCAA

AUAA

A AACAGCUUGUGGAGGGAACAUACCUGAAGAG

GG

AUCCCCGGGG

AA

A UCU

CUUCAG

AC

UCGUCGAGGGGAGGGCG

CCGCGGAUC

ACUGG

CGUCCAGC

ACC

GGAA

CAGGAG

CU C G

UCUCCUU

CCU

UCC

AU

CGCUGGCU

CCA

CAUCCGAUCG

UCGCUUCUUCCUUCGCGA

CCUGAG

AU

AGA A

ACU

ACCCGGUG

GAUA

CA

ACUCUUGGGUUGUUCCUCCCAGGCUUGUU A

AUAA

AAAU

GGCCCGCGUUUG

AGACCCC

U

Citru

sviro

idIV

RNAfold-circ

inth

eViennaRNAPackage

Local structures

Idea: Restrict Recursion to base pairs (i , j) with j − i < L.Special interest in robust structures:Z

u,Lij . . . partition function of sub-sequence [i , j] when sequence

window [u, u + L] is folded

pu,Lij . . . probability that i and j form a base pair when window

[u, u + L] is folded.

Zu,Lij =

Zij if [i , j] ⊆ [u, u + L]0 otherwise

pu,Lij =

Zu,L1,i−1Z

u,Li ,j Z

u,Lj+1,n

Zu,Lu,u+L

+∑

k<i

∑

l>j

pu,Lkl Ξu,L

ij ,kl

=Zu,i−1Zi ,jZj+1,u+L

Zu,u+L

+∑

k<i

∑

l>j

pu,Lkl Ξij ,kl .

Robust local structuresAverage probability of an (i , j) pair over all folding windowscontaining the sequence interval [i , j]

πLij =

1

L − (j − i) + 1

i∑

u=j−L

pu,Lij .

Direct Recursion:

πLij =

1

L − (j − i) + 1

iX

u=j−L

Zu,L1,i−1

bZu,Li,j Z

u,Lj+1,n

Zu,L1,n

| z π∗L

ij

+1

L − (j − i) + 1

iX

u=j−L

X

k<i

X

l>j

pu,Lkl

Ξij,kl

= π∗Lij +

i−1X

k=j−L

i+LX

l=j+1

kX

u=l−L

pu,Lkl

Ξij,kl

L − (j − i) + 1= π∗L

ij +

i−1X

k=j−L

i+LX

l=j+1

L − (k − l) + 1

L − (j − i) + 1πL

klΞij,kl .

(1)

C C U U G G C C A U G U A A A A G U G C U U A C A G U G C A G G U A G C U U U U U G A G A U C U A C U G C A A U G U A A G C A C U U C U U A C A U U A C C A U G G U G A U U U A G U C A A U G G C U A C U G A G A A C U G U A G U U U G U G C A U A A U U A A G U A G U U G A U G C U U U U G A G C U G C U U C U U A U A A U G U G U C U C U U G U G U U A A G G U G C A U C U A G U G C A G U U A G U G A A G C A G C U U A G A A U C U A C U G C C C U A A A U G C C C C U U C U G G C A C A G G C U G C C U A A U A U A C A G C A U U U U A A A A G U A U G C C U U G A G U A G U A A U U U G A A U A G G A C A C A U U U C A G U G G U U U G U U U U U U G C C U U U U U A U U G U U U G U U G G G A A C A G A U G G U G G G G A C U G U G C A G U G U A C A G U U G U G U A C A G A G G A U A A G A U U G G G U C C U A G U A G U A C C A A A G U G C U C A U A G U G C A G G U A G U U U U G G C A U G A C U C U A C U G U A G U A U G G G C A C U U C C A G U A C U C U U G G A U A A C A A A U C U C U U G U U G A U G G A G A G A A U A U U C A A A G A C A U U G C U A C U U A C A A U U A G U U U U G C A G G U U U G C A U U U C A G C G U A U A U A U G U A U A U G U G G C U G U G C A A A U C C A U G C A A A A C U G A U U G U G A U A A U G U G U G C U U C C U A C G U C U G U G U G A A C A C A C C U U C A U G C G U A U C U C C A G C A C U C A U G C C C A U U C A U C C C U G G G U G G G G A U U U G U U G C A U U A C U U G U G U U C U A U A U A A A G U A U U G C A C U U G U C C C G G C C U G U G G A A G A

133029828 133029088Human chromosome X (minus strand)

mir-92-2mir-19b-2mir-20bmir-18bmir-106a

Local structures (L=100) in a 740 nt region of human X chromosome

Cofold: How to deal with Concentration?

Algorithmically that same as linear foldingspecial energy contribution for “loop with the cut”

Additional energy contribution for forming duplex

At least 5 molecular species need to be taken into account(Dmitrov & Zuker, 2005): A, B , A2, B2, AB .

Their folding energies and partition functions are easilycomputed

Cofold

A U G A A G A U G A C U G U C U G U C U U G A G A C A

A U G A A G A U G A C U G U C U G U C U U G A G A C AAU

GA

AG

AU

GA

CU

GU

CU

GU

CU

UG

AG

AC

A

AU

GA

AG

AU

GA

CU

GU

CU

GU

CU

UG

AG

AC

A

AUGAAG

AUG A

CUG

UC

UGUC

UUG

AGACA

Dot plot (left) and mfe structure representation (right) of thecofolding structure of the two RNA molecules AUGAAGAUGA (red)and CUGUCUGUCUUGAGACA.

Cofold: Concentration dependencies

Q = V n a!b! × (Z ′A)nA(Z ′AA)nAA(Z ′AB)nAB (Z ′BB)nBB (Z ′B)nB

nA!nB ! 2 nAA! 2 nBB !nAB !

where a = nA + 2nAA + nAB . The system minimizes the freeenergy −kT lnQ.solving this optimization problem yields the equilibria:[AA] = KAA [A]2 , [BB ] = KBB [B ]2 . [AB ] = KAB [A] [B ].with [A] = 6.023 × 1023nA, etc., and

KAA =Z ′AA

(ZA)2=

(ZAA − (ZA)2)e−ΘI /RT/2

(ZA)2=

1

2e−ΘI /RT

(ZAA

(ZA)2− 1

)

KBB =1

2e−ΘI /RT

(ZBB

(ZB)2− 1

)

KAB = e−ΘI /RT

(ZAB

ZAZB− 1

)

1 10total siRNA concentration b [nmol]

0

10

20

conc

entr

atio

n [n

mol

]A.siA.AAsiA’.si’A’.A’A’si’

Example for the concentration dependency for two mRNA-siRNAbinding experiments.

RNAup: Small RNAs Binding to Large Ones

RNA folding excludes pseudoknots, i.e., non outerplanargraphs

cofold thus does not allow small RNA binding into loopregions of large ones

... but this happens in reality

Remedy: Compute energy/partition function

Pu[i , j] =Z [1, i − 1] × 1 × Z [j + 1,N]

Z︸︷︷︸exterior

+∑

p,qp<i≤j<q

Ppq ×Zpq[i , j]

Z b[p, q]︸︷︷︸enclosed

that subsequence [i , j] is unpaired and the energy of binding ashort molecule in this location

RNAup

q

i j

1n

p

lk

i

j i

j

k l

qp p q

1 n 1 n

p q

i j

1 n

i j

1 n

qp p q

i j

1 n

(a) (b’) (b”) (c) (d) (e)

Zpq[i , j ] = exp(−βH(p, q))︸︷︷︸(a)

+∑

p < i ≤ j < k or

l < i ≤ j < q

Z b[k , l ]e−βI (p,q;k,l)

︸︷︷︸(b)

+∑

p<i≤j<q

Zm2[p + 1, i − 1]e−βc(q−i)

︸︷︷︸(c)

+∑

p<i≤j<q

Zm[p + 1, i − 1]Zm[j + 1, q − 1]e−βc(j−i+1)

︸︷︷︸(d)

+∑

p<i≤j<q

Zm2[j + 1, q − 1]e−βc(j−p)

︸︷︷︸(e)

RNAup: Interaction part

Z I [i , j , i∗, j∗] =∑

i<k<ji∗>k∗>j∗

Z I [i , k, i∗, k∗]e−βI (k,k∗ ;j ,j∗)

Z ∗[i , j] = Pu[i , j]∑

i∗>j∗

Z I [i , j , i∗, j∗];

P∗[i , j] = Z ∗[i , j]/∑

k<l

Z ∗[k, l ]n (3’)

m (3’)

1 (5’)

i

j

k k*

1 (5’)j*

i*

RNAup: Application

0

0.2

0.4

0.6

0.8

1

Prob

abili

ties

VR1 straight VR1 HP5_16 VR1 HP5_11

0

0.2

0.4

Exp

ress

ion

Sequence position

160 180 160 180 160 180 160 180

VR1 HP5_6

1060 1080

-25

-20

-15

-10

-5

0

∆G

i [kc

al/m

ol]

Binding of siRNAs to VR mRNA.Pu[i , i ] (dashed line), P∗

i (thick black line), ∆Gi (thick red line).Below: activity of siRNA

Alternative Approach

Consider RNA Folding as a Machine Learning ProblemContext Free Grammar + probabilities for production rules⇒ Stochastic Context Free Grammarssee work by Sean Eddy, Jotun Hein, and collaborators

Folding Kinetics

RNA molecules may have kinetic traps which prevent them from reachingequilibrium within the lifetime of the molecule. Long molecules are oftentrapped in such meta-stable states during transcription.Possible solutions are

Stochastic folding simulations can predict folding pathways and finalstructures. Computationally expensive, few programs available.

Predicting structures for growing fragments of the sequence canshow whether large scale re-folding will occur during transcription.Cheap but inaccurate.

Analysis of the energy landscape based on complete suboptimalfolding can identify possible traps (local minima).

Kinetic Folding Algorithm

Simulate folding kinetics by a Monte-Carlo type algorithm:Generate all neighbors using the move-setAssign rates to each move, e.g.

Pi = min

1, exp

(−

∆E

kT

)

Select a move with probability proportional toits rateAdvance clock 1/

∑i Pi .

P4

P3

P5P6

P7

P8

P1 P2

Characterization of Landscapes

A landscape consists of a configuration space V , a move set within thatconfiguration space and an energy function f : V → R.Simplest move set for secondary structure: opening and closing of basepairs.Speed of optimization depends on the roughness of the Landscape.Measures of roughness suggested in the literature:

Number of local optima

Correlation lengths (e.g. along a random walk)

Lengths of adaptive walks

Folding temperature vs. glass temperature Tf /Tg

Energy barriers between the local optima. Especially, themaximum barrier height (“depth” in SA literature)

Energy barriers

E [s,w ] = min

max

[f (z)

∣∣z ∈ p] ∣∣∣∣ p : path from s to w

,

B(s) = minE [s,w ] − f (s)

∣∣w : f (w) < f (s)

Depth and Difficulty(borrowed from simulated annealing theory)

D = maxB(s)

∣∣s is not a global minimum

ψ = max

B(s)

f (s) − f (min)

∣∣∣∣s is not a global minimum

Energy Barriers and Barrier Trees

Some topological definitions:A structure is a

local minimum if its energy is lowerthan the energy of all neighbors

local maximum if its energy ishigher than the energy of all

neighbors

saddle point if there are at leasttwo local minima that can bereached by a downhill walk startingat this point

Calculating barrier trees

M1

M2

M3

S12

S23

M3

M3

M1

M1

M2

M2

S12

S12

S23

S23

M3

M1

M2

M3

M1

M2

S23

M3

M1

M2

S12

S23

The flooding algorithm:Read conformations in energy sortedorder.For each confirmation x we havethree cases:

(a) x is a local minimum if it hasno neighbors we’ve alreadyseen

(b) x belongs to basin B(s), if allknown neighbors belong toB(s)

(c) if x has neighbors in severalbasins B(s1) . . .B(sk) then it’sa saddle point that merges

these basins.Basins B(s1), . . . ,B(sk) arethen united and are assigned tothe deepest of local minimum.

Information from the Barrier Trees

Local minima Saddle points Barrier heights Gradient basins Partition functions and free energies of (gradient) basins Depth and Difficulty of the landscape

N.B.: A gradient basin is the set of all initial points from which a

gradient walk (steepest descent) ends in the same local minimum.

Energy Landscape of a Toy Sequence

G C U A U U A

GC

GC

G

UG

A

CG

UG

CG

U

UU

A

G C U A U UCG

UG

CG

U

UU

A

C

C

G

G

G

UG

A

A G C U A U UCG

UG

CG

U

UU

A

C C G G A

G

UG

A

G C U A U UCG

UG

CG

U

UU

A

C

C

A

G

UGGGA

G C U A G UCG

UG

CG

U

UU

A

G G G

AC

CU

U

A

G C G U G GCG

UG

CG

U

UU

A

G A

AU

C

CU

U

A

G U G G G ACG

UG

CG

U

UU

A

GC

AU

C

CU

U

A

G U

UG

CG

U

UU

A GC

AU

C

CU

U

A

C G G GG

G U

CG

U

UU

A

GC

AU

C

CU

U

A

C G

G GUGG

G U

GC

AU

C

CU

U

A

C G

GU

C GUUUAGGG

8 9 10

1 2 3

4567

A A

G U

GC

AU

C

CU

U

A

C G

GU

C G

UUUAGGG

11

U AA

Steps [arbitrary]

−8

−6

−4

−2

0

2

Ene

rgy

[kca

l/mol

]

1

2

3

4

5

6

7

8

9

10

11

2.45[5]

[9]3.80 1 [11]

1.60 3 [7]

13

141.30 9

1.40 192.10 11

3.80 2 [1][3] 12

1.20 8 [4]

173.60 4

2.20 73.61 5

1.40 161.90 10

1.50 152.00 205.02 6

3.90 18E

Folding Kinetics

Transition rates from x to y :

ryx = r0e−

E6=yx−E(x)

RT for x 6= y

rxx = −∑

y 6=x

ryx

Kinetics as a Markov process:

dpx

dt=

∑

y∈X

rxypy (t) .

Transition states:E 6=

yx = maxE (x),E (y)

or more complex models (Tacker et al 1994, Schmitz et al 1996)

Reduced Description of the Folding Dynamics

Macrostates = Classes of a partition of the state space.Partition function for a macro state:

Zα =∑

x∈α

exp(−E (x)/RT )

Free energy of a macro state:

G(α) = −RT ln Zα

rβα =∑

y∈β

∑

x∈α

ryxProb[x |α] for α 6= β

=1

Zα

∑

y∈β

∑

x∈α

ryxe−E(x)/RT

rβα “on flight” while executing the barriers program.Transition state free energy:

G6=βα = −RT ln

∑

y∈β

∑

x∈α

e−E6=xy

RT

12

3

4

5

6

7

8

910

11

12

13

14

0.0

2.0

4.0

6.0

1.2

1.5

2.1

2.8

1.1

0.9

2.4

2.8

0.5 1.3

4.7

0.9

1.2

0.8

2.0

lillyA simple model sequence

10-1

100

101

102

103

time

0

0.2

0.4

0.6

0.8

1

popu

latio

n pr

obab

ility

mfe23456

10-1

100

101

102

103

time

0

0.2

0.4

0.6

0.8

1

popu

latio

n pr

obab

ility

mfe23456

10-1

100

101

102

103

time

0

0.2

0.4

0.6

0.8

1

popu

latio

n pr

obab

ility

mfe23456

103

104

105

106

107

108

109

time

0

0.2

0.4

0.6

0.8

1

popu

latio

n pr

obab

ility

101

102

103

104

105

106

107

108

time

0

0.2

0.4

0.6

0.8

1

popu

latio

n pr

obab

ility

mfe255680

Refolding of a tRNA molecule.

Summary I:

RNA structures can be computed efficiently by means ofdynamic programming

Computations are based on a set of carefully measures energyparameters and an additive energy model

Algorithms exist for ground state energy and structure, fullpartition functions, density of states, interacting structures,. . .

The folding kinetics of a given RNA Sequence can also beinvestigated as the level of secondary structures

VIENNA RNA PACKAGE

PART II: How Do RNAs Evolve

Basic Assumption

Selection Acts on Secondary Structures, Mutations acts on theunderlying sequences⇒ We need to understand the sequence-to-structure map of RNAs(hang on, we’ll discuss the empirical evidence for that a bit later)

Sewall Wright’s Fitness Landscapes

Fitn

ess

Phenotype

How do realistic fitness landscapes look like?

Biological Landscapes

The RNA case is a special case of a very general paradigm:

genotype 7→ phenotype 7→ fitness

What is the relationship between Genotyp and Phenotype?

Central topic in any theory of evolutionbecause:* Selection acts on the Phenotype* Mutation/Recombination acts on the GenotypeBiopolymers as the simplest model:The molecule is both genotype (sequence) and phenotype (structure).

The map from genotype to genotype is determined by physical chemistry:

⇐⇒ folding problem

Computational Analysis of the RNA Map

There are many more sequences than structures.(.)-string: 3-letters (with constraints)

=⇒ less than 3n structures

but 4n sequences.

=⇒ Redundancy

How are sequences folding into the same structure distributed insequence space?Neutral Set S(ψ) = x ∈ Qn

α|f (x) = ψ

Sensitivity and Neutrality

GCGGGAAU

AGCUC

AGUUGG U A

G A G CA

CGA

CC

UU

GC C

AAGGUCGGGGU

CG C G A G

U U CGA

GUCUCGU

UUCCCGC

UC

CA

GCGGGUAUA

GCUCAGU

UGG U A

G A G CA C G

A C CUU G CC A A

G GU

C G G G GU CG C G A G

U U CGA

GUCUCGU

UUCCCGCUCC

A

Effect of a single

point mutation0 100 200 300

Structure Distance

10-4

10-3

10-2

10-1

100

Fre

quen

cy

Distribution of structure distances

The Random Graph Model

Approach:Model S(ψ) as a random induced subgraph Γ with a given value

λ =〈#neutral neighbors〉

(α− 1)n

Threshold value:

λ∗ = 1 −

(1

α

) 1α−1

Theorem. [Reidys, Stadler, Schuster]If λ > λ∗ then Γ is a.s. dense and connected,if λ < λ∗ then Γ is a.s. neither dense nor connected

A complication: Base Pairing Rules

Unpaired bases:Alphabet A = A,U,G,C

Paired bases: 5’ and 3’ side correlated:Alphabet: B = AU,UA,GC,CG,GU,UG, .

Thus consider only the set of compatible sequences C (ψ):S(ψ) ⊆ C (ψ) ≡ Qnu

4 ×Qnp

6 .=⇒ Two neutrality parameters λu and λp

Connected Components of Neutral Networks

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0λu

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

λp

gray many small components red 1 connected componentgreen 2 equal sized components yellow 3 components size 2:1blue 4 equal sized components

Explanation: for this deviation from the random graph model in terms of the energy model. Some structures can

be made only with a significant bias in the G/C ratio.

x??P (d)

Length of Neutral Path: d !0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 100.0

0.250.200.150.100.050.00

0 20 40 60 80 100 120

Chain lenght n

0

5

10

15

20

25

30

Cov

erin

g ra

dius

from enumeration from inverse folding lower bound

Distance to Target structure Covering radiuslength neutral paths

Closest Approach

Intersection Theorem. For any two secondary structures φ, andψ holds

C (φ) ∩ C (ψ) 6= ∅

What is the distance of neutral networks

δ(φ,ψ) = mind(x , y)|f (x) = φ and f (y) = ψ

Random graph Theory: If λ > λ∗ then δ(φ,ψ) ≈ 2.Computer simulations: upper bounds on δ(φ,ψ):

n GC AU AUGC

50 5.6 2.6 2.170 9.3 4.6 3.4100 13.0 7.8 5.6

AccessibilityFontana & Schuster 1998

Idea: The “interface” between two structures is large is they are“similar”.More precisely: Structure ψ is accessible for φ if x ∈ S(φ) is like tohave neighbor (mutant) x ′ ∈ S(ψ).Structural characterization of “easy” (continuous) transitions:

Shorteningof stacks

Elongationof stacks

Opening ofconstrained stacks

Closing ofconstrained stacks

SUMMARY: Sequence-Structure Map of RNA

1. Redundancy: Many more sequences than structures

2. Sensitivity: Small changes in the sequences may lead to largechanges in the structure

3. Neutrality: A substantial fraction of mutations does not alter thestructure.

4. Isotropy: S(ψ) is “randomly” embedded in C (ψ).

Implications:

1. Neutral Networks: S(ψ) forms a connected “percolating” network insequence space for all “common” structures.

2. Shape Space Covering: Almost all structures can be found in arelatively small neighborhood of almost every sequence.

3. Mutual Accessibility: The neutral networks of any two structuresalmost touch each other somewhere in sequence space.

Simulated Trajectories

0.5 1.0 1.5time (arbitrary units)

5

15

25

35

45

stru

ctur

e di

stan

ce

0 510

40

proj

ectio

n co

ordi

nate

Punctuated equilibra = diffusion of neutral networks +constant rate of innovation +exponential selection of rare mutants

Proc.Natl.Acad.Sci. 93: 397-401 (1996)

Diffusion Constant

. . . can be deduced from Moran model:

D = λ6Anp

3 + 4Np(1 + 1/N) ∼

(3/2)A(n/N) p ≫ 0 orN ≫ 1

2Anp p ≪ 1

A . . . replication raten . . . sequence lengthN . . . population sizep . . . mutation rateλ . . . neutrality of network

Dynamics of Interacting Replicators

Ik + Ij −→ Il + Ik + Ij

With mutation:

xk = xk

∑

j

Akjxj −∑

i ,j

Aijxixj

+∑

l ,j

(QklAljxjxl − QlkAkjxkxj)

where

Qkl = (1 − p)n−d(k,l)

(p

α− 1

)d(k,l)

How does this behave in sequence space?

Simplest case: Simplest case: Akl = A0(1 − d(Ik , Il )/n):

0 1000 2000 3000 4000 5000time

0.0

0.2

0.4

0.6

0.8di

vers

ity

0 2×105

4×105

6×105

8×105

time lag τ

0

10

20

30

40

50

60

g(τ)

g(τ) =1

T2 − T1 + 1

T2∑

t=T1

‖p(t + τ) − p(t)‖2

B.M.R. Stadler, Adv. Complex Syst. (2003)

10-6

10-4

10-2

100

p

10-6

10-4

10-2

100

D

100

101

N/n

10-3

10-2

10-1

100

D

Left: Diffusion coefficient D as a function of the mutation rate for N = 10, 20, 30, 40, 80 and

n = 10, 20, 30, 40, 80 such that N/n = 1 after equilibration for 105 timesteps. Right: Dependence of the ratio

D/p on N/n.

An RNA-Based Model in the Plane

Target hypercycle with 8 mem-bers.

Model:Hypercyclically coupledspecies, each sequence hasa function that depends onits structure.

Spatial Extension: CA Model

Possible Catalysts Actual CatalystsPossible Replicators

Rules of replication. For each of the neighbors (•) of the empty cell (marked by a bold outline) the replication rate

ρz is computed taking into account their neighbors in the direction of the replication () as potential catalysts.

The neighbor with the largest values of ρz invades the empty position. In this example, for the chosen replicator,

only three of its neighbours are catalysts according to the hypercycle topology.

Spirals formed after 3000 generations in an evolution experimentstarted with 300 random sequences in the absence of parasites.see also Borlijst & Hogeweg (1993)

Diffusion in Sequence Space

2000 4000 6000 8000 10000time lag (tau)

2

4

6

8

10

12

g(ta

u)

0.0005 0.0010 0.0015 0.0020Mutation rate

0.002

0.003

0.004

Diff

usio

n co

nsta

nt

Summary

Neutrality of the Sequence-Structure Map impliesdiffusion/drift-like motion in sequence independent of detailsof the selection/mutation mechanisms and whether spatialextension is taken into account or not.

=⇒ The basic assumption of molecular phylogenetics, namelya dominating influence of drift in sequence evolution, holdstrue even when phenotypic evolution is dominated byinteractions(co-evolution).

TODO Development of a rigorous mathematical theorydescribing the motion in sequence space of a population withstrong interactions.

Evolutionary histories of some structured RNAs

Ribosomal RNAs (rRNAs) are the most frequently used sequencedata for reconstructing phylogenies from molecular dataHow does that work:In a nutshell:(1) compute evolutionary distances from the sequence data(2) “fit” an additive tree to the distances(In reality, there are other methods such as maximum parsimonyand maximum likelihood approaches, but the basic idea is thesame)Observation: all tRNAs have more or less the same clover-leafstructure.

MicroRNAs

processed from precursorhairpins

short (∼ 22nt) RNAmolecules

highly conserved

Function

bind to 3’UTRs of mRNAtargets

supress expression of thismRNA

mark mRNA molecule fordegradation

in plants involved in DNAmethylation

AGU

GCC

ACACU

CC

GUGUAUUUGACAAGCU

GAGU

U GGACACUC

CAU

G U GGU

AGA

GUGUCAGUUUGUCAAAUACC

CCA

AGUG

AGG

CACA

CGAU

GC

GCAU

MicroRNAs — processing and function

MicroRNAs ...

transcribed inlonger transcripts(primary-miRNA)

in some cases:polycistronic

“clusters”

Drosha processing→ precursor

miRNA

export to cytoplasmExportin-5 pathway

Dicer processing →mature miRNA

Evolution of microRNA Families: mir-17 clusters

Many miRNAs are transcribed from polycistronic transcriptsMost spectacular example: Human mir-17 clusters

100nt

Chr−13

Chr−X

Chr−7

19b−1 92−1

18X 20X 19b−2 92−2

106b 93 25

18 19a 2017

106a

91 17 18 19a 20 19b 92

106a 18X 20X 19b 92

106b 93 25

I−1

I−X

II−3

J. Mol. Biol. 339: 327-335 (2004)

Case Study: mir-17 clusters

AU

UG

CG

GC

GA

A U

A U

AC

AA

UA

GC

CAU

A

GCA

CG

C

GG

AU

AA

AU

UG

CU

UA

UA

GAUA

UAAG

AG

AU

C_

A_

G_

_U

AC

CA

A_

G_

UG

UA

AU

CG

AU

UG

_G

GC

UC

AU

GU

GC

UA

UA

A U

G C

A U

G C

G C

G C

U A

C G

C G

G U

C G

C G

C G

U A

G U

U _

C G

A U

C G

G C

U A

A U

U A

G C

A U

A U

X-106a

X-18X

X-20X

X-19b-2

X-92-2

. . . . . . . . ( ( ( ( ( ( ( ( ( ( ( ( ( ( . . . . ( ( ( ( ( . ( ( ( ( ( ( . . . ) ) ) . . ) ) ) ) ) ) ) ) . . . . ) ) ) ) ) ) ) ) . ) ) ) ) ) ) ( ( ( . . . . ) ) ) . . . ( ( ( . ( ( ( ( ( . . . . . . . . . . . . . . . . . . . . . . . . . ( ( ( ( . ( ( ( . . . ( ( ( . . . . . . . . ( ( ( ( ( . . ( ( ( ( . . . . . . . . ( ( ( . . . ( ( ( ( ( ( ( ( ( ( . ( ( ( ( . ( ( ( . ( ( ( ( ( . . . . . ( ( ( . . . ) ) ) . . . . . . . ) ) ) ) ) . ) ) ) . ) ) ) ) . ) ) ) ) . . ) ) ) ) ) ) ) ) ) ( ( ( . . . . . . . . . . . . ( ( ( ( ( ( ( . . . . . . . . . . . . . . . . . . . . ) ) ) ) ) ) ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ( ( ( ( ( ( ( ( ( . . . . . . . . . . . ) ) ) ) ) ) ) ) ) . . . . . . . . . . . . . . . . . . . . ) ) ) . . . . . ) ) ) ) . . ) ) ) ) ) . . . ( ( ( . . . . ( ( ( ( ( . . ( ( ( ( ( ( ( ( ( ( ( . ( ( ( ( ( . ( ( ( . . . . . . . . . . . . . ) ) ) ) ) ) ) ) . ) ) ) ) ) ) ) ) ) ) ) . . . ) ) ) ) ) . . . . ) ) ) . . . . ) ) ) . . . ) ) ) . ) ) ) ) . . ( ( ( . . . . . . . ) ) ) . . . . . ) ) ) ) ) ) ) ) . . . . . . . ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( . . . ( ( ( ( . . . . . . . . . . . . . . . . . ) ) ) ) ) ) ) ) ) ) ) ) . . ) ) ) ) ) ) ) ) ) ) ) . . . . . . . . . . . ( ( ( ( . . . . . . . ) ) ) ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ( ( ( . . . . ) ) ) . . . ( ( ( . ( ( ( . ( ( ( ( ( . ( ( ( ( ( . ( ( ( ( . . . . ( ( ( ( ( ( ( ( ( . . . ) ) ) ) . ) ) ) ) ) . . . ) ) ) ) . ) ) ) ) ) . ) ) ) ) ) . ) ) ) . ) ) ) . . . . . . . .

0 100

200

300

400

500

600

700

X-106a

X-18X

X-20X

X-19b-2

X-92-2

(a) (b)

Structure of the pri-pre-mir-17 at the human X chromosome.

Construction of Gene Treesfrom concatenated sequences in the cluster

HsXPtX

MmX

RnX

Pt1Hs1 Mm1

Rn1

Xt

TnB

TrB

TnA

TrA

DrA

Dr14

974

898

CcX

1000

1000

998

DrB

1000

233617

427

10

00

954

462

464

TnCTrC

0.1

Pt−IIHs−II

Mm−IIRn−II

Xt−II

Dr−II−D

Tr−II−D

0.1

Distant Homologies with unreliable Alignments

How to quantify sequence similarity when we cannot get a goodalignment?

measure pairwise sequence similarity s(x , y)

compare to the distribution of similarity values of alignmentsof shuffled sequences

define a z-score

z(x , y) =s(m, y) − 〈s(π(x), π′(y))〉π,π′

√varπ,π′(s(π(x), π′(y)))

use z(x , y) as similarity measure in WPGMA clustering

Gen

eTre

eof

mir-1

7cl

ust

erm

ember

s

1710

620

X−

20

106b

9318

X−

18

9225

19a

19b1−

19b

X−

19b

Hs1−17

Hs1−18

Hs1−19a

Hs1−19b−1

Hs1−20

Hs1−92−1

Mm1−17

Mm1−18

Mm1−19a

Mm1−19b−1

Mm1−20

Mm1−92−1

Rn1−20

HsX−19b−2

HsX−92−2

HsX−106a

MmX−19b−2

MmX−92−2

MmX−106a

Hs3−25

Hs3−93

Hs3−106b

Mm3−25

Mm3−93

Mm3−106b

Dm−92aDm−92b

Ce−235

Pt1−17

Pt1−18

Pt1−19a

Pt1−19b−1

Pt1−20

Pt1−92−1

Rn1−17

Rn1−18

Rn1−19a

Rn1−19b−1

Rn1−92−1

Xt−17

Xt−18

Xt−19a

Xt−20

Xt−19b

Xt−92

HsX−18

HsX−20

PtX−106a

PtX−18

PtX−19b−2

PtX−20

MmX−18

MmX−20

RnX−106a

RnX−18

RnX−19b−2

RnX−20

RnX−92−2

CcX−19b−2

CcX−92

Pt3−93

Pt3−25

Pt3−106bRn3−106b

Rn3−25

Rn3−93

Cc3−25DrD−25

DrD−19b

DrD−93

TrD−17/106b

TrD−20/93

TrD−25

TrD−19b

XtB−93

XtB−25

DrA−17

DrA−18

DrA−92−1

DrB−17

DrB−18

DrB−19a

DrB−19b−1

DrB−92−1

Dr14−20

Dr14−18

Dr14−19−b1

TrA−17

TrA−18

TrA−19a

TrA−19b−1

TrA−20

TrA−92−1

TrB−17

TrB−18

TrB−19a

TrB−19b−1

TrB−20

TrB−92−1

TrC−18

TrC−19b

TnB−17

TnB−18

TnB−19a

TnB−20

TnB−19b

TnB−92

TnC−18

TnC−19b

TnA−17

TnA−18

TnA−19a

TnA−20

TnA−19b

TnA−92

22.0

20.0

18.0

16.0

14.0

12.0

10.08.0

6.0

4.0

2.0

0.0

z−sc

ore

Collapsed tree of microRNA subgroups

z−score

I−17

I−18

I−19a

I−20

I−19b

I−92

II−106b

II−93

II−19b

II−25

Ce−92

Dm

−92

18.0

16.0

14.0

12.0

10.0

8.0

6.0

4.0

2.0

0.0

mir−17 groupmir−92 group mir−19 group

obtained by collapsingvertebrate, insect,and nematode speciestrees to single vertices

next step:combine gene treesand syntenyinformation to aduplication history

Scenario for the evolution of the mir17 familyancestral mir17 cluster probably contained

mir-17, mir-19 and mir-92

181720

93106b 19b

19−3

1992

25

deletion

Scenario for the evolution of the mir17 familyfirst detectable duplication event:

branch mir-17 and mir-18

181720

93106b 19b

19−3

1992

25

deletion

19b is copy of 19

18 is copy of 17

Scenario for the evolution of the mir17 familyseries of duplications:

branch mir-19 and mir19b, mir-17 and mir-93

181720

93106b 19b

19−3

1992

25

deletion

19b is copy of 19

18 is copy of 17

93 is copy of 17

Scenario for the evolution of the mir17 familygenome wide duplication:

duplication of whole cluster and loss of individual miRNAs

181720

93106b 19b

19−3

1992

25

deletion

cluster duplication

Deletions

19b is copy of 19

18 is copy of 17

93 is copy of 17

Scenario for the evolution of the mir17 familyindependent miRNA duplications

in type I cluster

III

181720

93106b 19b

19−3

1992

25

deletion

20 is copy of 17

cluster duplication

Deletions

19b is copy of 19

18 is copy of 17

93 is copy of 17

Scenario for the evolution of the mir17 familysplit of teleosts and mammalia

teleost specific genome duplication

Mammalia

III

I−1I−XII

181720

93106b 19b

19−3

1992

25

deletion

20 is copy of 17

cluster duplication

Deletions

19b is copy of 19

18 is copy of 17

93 is copy of 17

Scenario for the evolution of the mir17 familysplit of teleosts and mammalia

teleost specific genome duplication

TeleosteiMammalia

I−BI−AI−CII−D

III

I−1I−XII

181720

93106b 19b

19−3

1992

25

deletion

20 is copy of 17

cluster duplication

Deletions

19b is copy of 19

18 is copy of 17

93 is copy of 17

History of the mir-17 cluster: updated data

TeleosteiTetrapoda

deletion

cluster duplication

I

II

I−X

IIII−D

I−B

I−C

I−A

20

93

18

106b

19

19b92

2519b−II

I−1

17/106a

20 is copy of 17

18 is copy of 17

93 is copy of 17

Further Examples: let-7 family

a1 f1 df2 m

ir98

a3/c

2

k g i c1 e a2jb

Ancestral Eubilaterian

Tetrapoda only

Ancestral vertebrate

lost in mammals loss of mir100 or mir125 in one paralog in most species

Dro

sop

hila

Ne

ma

tod

es

duplicationteleost genome

Mammalian transcription from

intron of coding sequenceintron of non−coding sequenceexon

Further Examples: mir-1 and mir-30

1−1 1−2Teleosts Teleosts

TetrapodaTetrapoda

Nematoda

Urochordata

Arthropoda

Sea Urchin

Xtr

0.10.0

Gga−1b

d

b

bd

c2

c2

teleosts

tetrapoda

teleosts

tetrapoda

tetr

ap

od

a

tetrapoda

teleosts

teleosts

tetrapoda

teleosts

tetrapoda

c1

e

a

c1e

a

0.10.0

Dr

mir-1 mir-30

Further Examples: mir-9, mir-23, mir130/301

tt

t t

t

t

t

Apis

Sea Urchin

Nematoda

Schistosoma (?)

Tetrapods

Teleosts Diptera

9−3

9−2

9−3

9−4

mir−79

mir−9a

mir−9b

mir−9c

9c 79 9b306 9a

0.1

mir-9

(2)

Dr22(

6.1)

/ Tn1

Dr22(

8.8M

) / T

n23

Dr11

/ Tn3

Dr8 /

Tn12

Dr22(

10.1

)

lost in Tn/Tr

23a

27a

24.2

23b

27b

24.1

Hs19 Hs9

(1)

mir-23 cluster

Gg

Md

Xt

Rn, Mm

Cf, Pt, Hs

Dr

Fr

Tv

mir−301

mir−130b

mir−130a

mir−301

mir−130a

? ?

mir-130 cluster

Expansion of the Metazoan MicroRNA Repertoire

0 10 20 30 40 50 60 70 80

21

140

Rn Bt Cf PtMdGgXtDr Tn TrCsCiOdSpAgBmTcAmCb Ce D.sp. Mm Hs

40

miRNA innovationsnon−local duplicationslocal duplications

2

1

11

18

171

4 44

114

22

23

131

10

11

1124

56

2

13

11

5

46

26 1

1

Sm

Similar Situation: snoRNA

snoRNAs direct chemical modification of other RNAs (mostlyrRNA, snRNA, and (some?) messenger RNAs

two classes: box-H/ACA and box-C/D

known in eukaryotes and archea, not in eubacteria

H/A

CA

box

snoR

NAs

inVer

tebr

ates

Mm_4_E1_1Mm_4_E1_2

Rn_8_E1Mm_9_E1

Hs_1_E1_1Pt_1_E1_1

Hs_1_E1_2

Pt_1_E1_2Oc_E1_1r

Ss_E1_1Cf_2_E1_1

Cf_2_E1_2Bt_E1

Gg_E1Xl_E1_6r

Xl_E1_1rXl_E1_5r

Xl_E1_4rXl_E1_3r

Xl_E1_2rTr_E1_3

Tr_E1_4Tr_E1_5Tr_E1_6

Ol_E1_1Tn_E1_1

Tn_E1_2Tn_E1_4

Tn_E1_3Om_E1_r

St_E1_rDr_E1_3r

Dr_E1_2rDr_E1_5r

Dr_E1_4r

98

.79

9.8

90

.9

90

.49

6.3

82

.0

89

.6

80

.2X

enop

us

Tel

eost

s

Chi

ck

Mam

mal

s

0.00

0.02

0.04

0.06

0.08

0.10

DrE2_1

DrE2_2

Tr_E2

Cf_23_E2_1

Hs_3_E2_1

Mm_9_E2_1

Rn_8_E2_1

Gg_E2_2

Xt_E2_2

Xt_E2_1

Mm_9_E2

Rn_8_E2

Hs_3_E2

Cf_23_E2

Gg_E2_1 0.00

0.05

0.10

66

.6

92

.3

85

.4

74

.7

Tet

rapo

da−

1

Tet

rapo

da−

2

93

.0

Dr_25_E1_1Dr_25_E3_3

Dr_25_E3_5Dr_25_E3_2

Dr_25_E3_4Dr_25_E3_6

Tr_E3_1Tn_E3_2

Tr_E3_2Tn_E3_1

Xl_E3_2Gg_9_E3_1

Xl_E3_1Cf_34_E3_2

Hs_3_E3_2Pt_2_E3_2

Rn_11_E3_1Mm_16_E3_2

Cf_34_E3_1Hs_3_E3_1

Pt_2_E3_1Mm_16_E3_1

Rn_11_E3_2

0.00

0.05

0.10

99

.6

99

.8

10

0.0

81

.2

54

.7

Mam

mal

s−1

Mam

mal

s−2

Tel

eost

s

E1

E2

E3

Ver

tebr

ate

YRN

As

Rn

RnMm

MmRn

FrTnTn

Dr

MmRn

GgMmRn

Out

grou

p

HsY5

XlYa XlY3

XlY3

MmY3

OmY1HsY4

CfY4

XlY4

HsY3

HsY1

ApY3

OcY1

Tr

Y4

Y1

Y3

Y5

Summary

The genotype-phenotype map of RNA is charcterized by aninterplay of “ruggedness” and neutrality

Selection plus drift results in diffusion on neutral networks

Many non-coding RNAs have highly constrained (i.e.,evolutionarily very well conserved) structures but fairly rapidlyevolving sequences

Drift of sequences is independent of the details of theselection mechanism

Ongoing research: elucidate the evolutionary histories ofstructured ncRNAs

PART III: The Modern RNA World

mRNA

pre−mRNA

tRNA rRNA

miRNA

pre−miRNA7SL + proteinsY + proteins

NUCLEOLUS

CAJAL BODY

7SL Y RNA vRNApre−tRNA RNAse_P

tRNA

MRPmiRNA

pre−miRNA

pri−miRNA

pre−rRNAsnoRNA

rRNASplicosome

snRNA scaRNA

Drosha

Introns mRNA

translational inhibition

RISCRNAi

Ro RNP SRP

Ribosome

vRNA + proteins

vRNP

Dicer

Proteins

NUCLEUS

Multiple Origins of ncRNAs

CraniataCephalochordata

UrochordataEchinodermataHemichordata

ChoanoflagellataFungiMicrosporidia

AlveolatesStramenophilesRhodophyta

Chordata

Metazoa

Green Plantsother protists

Eukarya

Protostomia

Bacteria

ArcheaLUCA

rRNA

tmRNA

snoRNA C/DsnoRNA H/ACA

RNAse MRPtelomerase RNAmost snRNAsvault RNAs ?

7SK ?

U7 Y RNA

tRNARNAseP7SL/SRP small bacterial RNAs

in Kinetoplastids onlygRNA

microRNA in multicellular animals and plants only ?

Surveys for noncoding RNAs

> 5% of the human genome is under stabilizing selection(from man/mouse comparison), less than 1/3 of this codes forprotein

Virtually the entire genome is transcribed as primary nucleartranscripts in at least one direction(ENCODE Genes&Transcripts group, unpublished data)

∼ 80% of the ENCODE regions are transcribed in as parts ofprotein coding transcripts including introns and UTRs

Only a tiny part of the primary transcripts is protein coding

Large fraction of apparently non-protein-coding cDNAs

The functions of most of these transcripts are unclear.

“There is need for reliable experimental and computational methods

for comprehensive identification of non-coding RNAs.”

–International Human Genome Sequencing Consortium, Nature 431, p.943, October 2004

The ENCODE Project

ENCyclopedia Of DNA Elements

Public research consortium launched by NHGRI in 2003

Purpose: “testing and comparing existing methods torigorously analyze a defined portion of the human genomesequence”.

Focus: specified 30 megabases ( 1% of genome) in more than20 species

Informally organized in subgroups: Sequencing Technology,Comparative Genomics, Genes and Transcripts, GeneticVariation, ...

Results from 1st phase currently under review

Phase 2: scale-up to complete genome

Highlights from

ENCODE Genes and Transcripts Analysis Group

(Data presented by Tom Gingeras in Bethesda, Jan 12 2006)

Only a fraction of processed RNA transcripts correspond toGeneCode annotated transcripts:70% correlated with annotated (m)RNAs52% correlate with annotated protein coding sequences

Substantial fraction of transcription is specific of cellularconditionsonly 2.6% of transfrags are common to all 11 cell-lines.

The same genomic sequence may be processed into multipleRNA sequences with different fates

Virtually the entire genome is transcribed as primary nucleartranscript in at least one direction.

Transcriptional output is MUCH more extensive AND much more

complex than previously thought.

Recall: Sequence-Structure Map of RNA

1. Redundancy: Many more sequences than structures

2. Sensitivity: Small changes in the sequences may lead to largechanges in the structure

3. Neutrality: A substantial fraction of mutations does not alter thestructure.

4. Isotropy: S(ψ) is “randomly” embedded in C (ψ).

Implications:1. Neutral Networks: S(ψ) forms a connected “percolating” network in

sequence space for all “common” structures.

2. Shape Space Covering: Almost all structures can be found in arelatively small neighborhood of almost every sequence.

3. Mutual Accessibility: The neutral networks of any two structuresalmost touch each other somewhere in sequence space.

Proc.Roy.Soc.B 255 279-284 (1994), Proc. Natl. Acad. Sci. USA 93, 397-401 (1996),

Bull. Math. Biol. 59, 339-397 (1997), RNA 7: 254-265 (2000).

RNA Sequencs

Multiple Sequence Alignment

CLUSTAL W

Minimum Energy Folding

Mountain Representations

Vienna RNA Package

Secondary Structures

Aligned Structures

Detect conserved sub-structures Confirmed conserved sub-structures

CHECK:

compensatory

mutations

. . .. . .. . .. . .

. . .

. . .. . .. . .

.

..

.. .. ..

... . .

. . .. . .

. . .

. . ... .. ... . .. . ..

..

...

...

. .

.... .

....

.

..

.. . .

......

.. . . ..

...

......

.

. . ... .. ... . ... .. ..

RNA Sequencs

Dot Plots

Multiple Sequence Alignment

Combined Pair

Table

Conserved sub-structures

McCaskill’s

Algorithm

CLUSTAL W

UGUGGUCGAUAU 0.99

0.01

0.45

0.00

0.77

0.34

sequence and pairing probability

CHECK

compensatory

mutations

Credibility RankingReduce Pair List

Minimum Energy Base Pairing Probabilities

Nucl. Acids Res. 26: 3825-3836 (1998), Comp. & Chem. 23: 401-414 (1999)

Examples: HIV-1 TAR-hairpin

. . . . . (((((((((((. (((((. . . ((((. . . . . . )))))))))))))))))))).

0 10 20 30 40 50

. . . . . ( ( ( ( ( ( ( ( ( ( ( . ( ( ( ( ( . . . ( ( ( ( . . . . . . ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) . ..

..

.(

((

((

((

((

((

.(

((

((

..

.(

((

(.

..

..

.)

))

))

))

))

))

))

))

))

))

).

GG

UC

UC

UC

UG

GU

UA

GAC

CA

GA

UCU

GA

GC

CUG

GG

A GC

UC

UC

UG

G

CU

AA

CU

AG

GG

A

A

Flaviviridae: Nucl. Acids Res. 29: 5079-5089 (2001), Picornaviridae: J. Gen. Virol. 85: 1113-1124 (2004), Broad

survey: Bioinformatics 20: 1495-1499 (2004)

Examples: Picornaviridae: Cis-acting-Replication Element

(CRE)

The function of the CRE probably involves the initiation of the synthesisof the negative-sense strand template RNA during virus replication.

CGACGGUUA

CAA C C A

GC

AGACCGUCG C

AUAC

AGUUCAAG

UCCA

A A U GCCG

UAU

UGAAC

CU

GUAUG

A UU A

G C

ACGGCCAC

AAACA

CC C A A U

CAACU

GUUG

GCCGU A

UCAUAUACCG

AACA

A A C ACUA

UAG

GUGAUGAU G

AAGUCAUCGUUG

AGAA

A A C GAAA

CAG

ACGGUGGCC

UC

U G

ACGGCUA

CAA A C A

AC

AAGCUGU

C G

UUUUGCAUUUUG

CA

A AUU

CAAGAUGUAGAG

C G

U AG U

Aphthovirus Enterovirus Cardiovirus HRV-A HRV-B Teschov. Hepatov.region:2C 2C 1B 2A 1B 2C 2C

Aphto ~~~~CGAC-GGUU------ACA-CCAAGCA------GACCGUCG~~~~~Entero CAUACAGU-UCAAG--------UCCAAAU-GCCGUAUUGAACCUGUAUGCardio ~~~~~ACG-GCCA---CAAACACCCAAUCAACUGU-UGGCCGU~~~~~~HRV-A ~~~AUCAUAUACCGAACAAACA---------CUAUAGGUGAUGAU~~~~HRV-B GAAGUCAU-CGUUGAGAAAACG---AAACA------GACGGUGGCCUC~Tescho ~~~~~~AC-GGCU--ACAAACA-----ACA------AGCUGU~~~~~~~Hepato UUUUGCAU-UUUG---CAAA--------------UUCAAGAUGUAGAG~ ~~~(((((-((((.......................)))))))))~~~~ 1.......10........20........30........40.........

predicted in Nucl. Acids Res. 29 5079-5089 (2001),

experimentally detected by Gerber, Wimmer Paul J.Virol. 75 10979-10990 (2001).

A Method for Large Genomes: RNAz

∗ Two ingredients: Thermodynamic Stability & StructureConservation

Measuring thermodynamic stability of ncRNAs

Naturally occurring structured RNAs have a lower foldingenergy compared to random sequences of the same size andbase composition?

1. Calculate native MFE m.2. Calculate mean µ and standard deviation σ of MFEs of a large

number of shuffled random sequences.3. Express significance in standard deviations from the mean as

z-score

z =m − µ

σ

Negative z-scores indicate that the native RNA is more stablethan the random RNAs.

Efficient calculation of stability z-scores

The mean µ and standard deviation σ ofrandom samples of a given sequence arefunctions of the length and the basecomposition:

µ, σ(length,GC

AT,G

C,A

T)

Calculating z-scores is thus a 5 dimensionalregression problem.

The regression problem is solved using aSupport Vector Machine regression algorithm.

The SVM was trained on 10,000 syntheticsequences spaced evenly in the variable space.

The regression calculation is of the sameaccuracy as the sampling procedure.

-8

-7

-6

-5

-4

-3

-2

-1

0

1

2

3

Sam

pled

z-s

core

s-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3

Sampled z-scores

-8

-7

-6

-5

-4

-3

-2

-1

0

1

2

3

Cal

cula

ted

z-sc

ores

z-scores of known ncRNAs

ncRNA Type No. of Seqs. Mean z-score

tRNA 579 −1.845S rRNA 606 −1.62Hammerhead ribozyme III 251 −3.08Group II catalytic intron 116 −3.88SRP RNA 73 −3.37U5 spliceosomal RNA 199 −2.73

Functional RNAs are clearly more stable than randomsequences.

However: The scores are too small to discriminate reliably in agenome-wide screens since the z-score distributions haveheavy tails.

Consensus folding using RNAalifold

RNAalifold uses the same algorithms and energy parametersas RNAfold

Energy contributions of the single sequences are averaged

Covariance information (e.g. compensatory mutations) isincorporated in the energy model.

It calculates a consensus MFE consisting of an energy termand a covariance term:

J.Mol.Biol. 319:1059-1066 (2002)

The structure conservation index

The SCI is an efficient and convenient measure for secondarystructure conservation.

Separation of native ncRNAs from random controls in two

dimensions

0

0.2

0.4

0.6

0.8

1

1.2

0

0.2

0.4

0.6

0.8

1

1.2

5S rRNA tRNASignal recognitionparticle RNA

RNAseP U2 spliceosomal RNA U5 spliceosomalRNA

z-score

Str

uctu

re c

onse

rvat

ion

inde

x

Classification based on both scores

Classification based on both scores

Implementation and availability

The approach is implemented in ANSI C in the program RNAz.

The z-score regression is limited to 400 nucleotides.

The classification model is currently limited to alignments ofsix sequences.

At least an order of magnitude faster than other programs.

RNAz is freely available:Download from www.tbi.univie.ac.at/∼wash/RNAz

Proc. Natl. Acad. Sci. USA 102: 2454-2459 (2005)

www.tbi.univie.ac.at/~wash/RNAz

Screening the human genome

Large scale comparative screen including: human, mouse, rat, dog chicken fugu, zebrafish

Reduction of the ≈ 3.095 MB human genome: Take ≈ 5% of the best conserved regions Remove all annotated coding exons Only take alignments strictly conserved in all 4 mammals.

→ 438,788 alignments alignments covering 82.64 MB

92.0M 94.0M 96.0M 98.0MMost conserved noncoding regions (present in at least human/mouse/rat/dog)

RNAz structural RNAs (P>0.5)


RefSeq Genes

90801000 90801500RNAz structural RNAs (P>0.9)

miRNAsmir-17 mir-19a mir-19b-1

mir-18 mir-20mir-92-1

(((((..((((((..((((((((.((.(((((...(((........)))...))))).)).))))))))...))))))....)))))

GTCAGAATAATGTCAAAGTGCTTACAGTGCAGGTAGTGATATGT-GCATCTACTGCAGTGAAGGCACTTGTAGCATTA-TG-GTGAC

GTCAGAATAATGTCAAAGTGCTTACAGTGCAGGTAGTGATGTGT-GCATCTACTGCAGTGAGGGCACTTGTAGCATTA-TG-CTGAC

GTCAGGATAATGTCAAAGTGCTTACAGTGCAGGTAGTGGTGTGT-GCATCTACTGCAGTGAAGGCACTTGTGGCATTG-TG-CTGAC

GTCAGAGTAATGTCAAAGTGCTTACAGTGCAGGTAGTGATATATAGAACCTACTGCAGTGAAGGCACTTGTAGCATTA-TG-TTGAC

GTCAATGTATTGTCAAAGTGCTTACAGTGCAGGTAGTATTATGGAATATCTACTGCAGTGGAGGCACTTCTAGCAATA-CACTTGAC

GTCTGTGTATTGCCAAAGTGCTTACAGTGCAGGTAGTTCTATGTGACACCTACTGCAATGGAGGCACTTACAGCAGTACTC-TTGAC

HumanMouseRatChickenZebrafishFugu

G U C A GA A

U A A U G UC A

A A G U G C U UA C A

G U G C A GG

U AG U G

AU A

UG

U_G

CAUC

UACUGCA

GUGAAGGCACUU

GUAGCAUUA

_UG_UUGAC

93104k 93106k 93108kRNAz structural RNAs (P>0.5)


H/ACA snoRNAs

C/D-box snoRNAs

ACA25ACA32

ACA1ACA8

ACA18ACA40

mgh28S-2412mgh28S-2410

Chr. 13

Chr. 13

Chr. 11

a

b

d

c

Results of Human Genome Screen

Genome Coverage Alignments RNAz hits p > 0.9Size Fraction Number Size Fraction of Number

(MB) (%) (MB) input (%)Human genome 3,095.02 100.00 –PhastCons most conserved 137.85 4.81 1,601,903without coding regions 110.04 3.84 1,291,385without alignments < 50nt 103.83 3.33 564,455Set 1: 4 Mammals 82.64 2.88 438,788 5.46 6.62 35,985Set 2: + Chicken 24.00 0.85 104,266 1.34 5.50 8,802Set 3: + Fugu or zebrafish 6.86 0.24 30,896 0.14 2.03 996

Nature Biotechn. 23: 1383-1390 (2005)

Pictures instead of Numbers

91,676

35,958

20,391

8,8022,916 9965,6616,898

2,281

26,508

2087950

30,000

60,000

90,000

Str

uctu

rale

lem

ents Observed

Expected

P > 0.5 P > 0.9

6.6%15.1%

Structural RNA

Estimated false positives

Other conservednoncoding element s

4 Mammals

4 Mammals+ chicken

All vertebrates

P > 0.5 P > 0.5 P > 0.5P > 0.9 P > 0.9 P > 0.9

Distribution related to known protein gene annotation

Known gene< 10 kb from nearest gene> 10 kb from nearest gene

Intron of coding region3’−UTR (exon or intron)

1538016860

3745

2866

283011205

5’−UTR (exon or intron)

Sensitivity on known classes of ncRNAs

Detected ( > 0.9)P

Detected (0.5 < < 0.9)P

Not detected

Not in input setmicroRNA

(207)C/D snoRNA

(256)H/ACA

(86)

45

150

7 522

14

9

41

12977

2624

Not all ncRNAs have conserved secondary structures!

chr7: RNAz_set1_50

EvoFoldsno/miRNA

Conservation

RepeatMasker

26.90m 26.95m 27.00m 27.05m

HOXA1

chr7.279

HOXA2

HOXA3

chr7.283

HOXA4

HOXA5

HOXA6

chr7.287

HOXA7

HOXA10HOXA9

chr7.290

HOXA11

hoxa11-as

HOXA13 chr7.295

EVX1

HOXA1HOXA1

AC004079.7AC004079.7AC004079.7

AC004079.7HOXA2

HOXA3

HOXA3HOXA3

AC010990.1HOXA3HOXA3

AC010990.1AC010990.1

AC010990.1

HOXA4

AC004080.14HOXA5HOXA5

HOXA6HOXA6

AC004080.14HOXA6

AC004080.14AC004080.14

HOXA7HOXA7

HOXA9HOXA9

HOXA9

HOXA9

HOXA9

HOXA10HOXA10

HOXA10

HOXA10HOXA10

HOXA11HOXA11

HOXA13 EVX1

AC004080.12AC004080.12AC004080.12

AC004080.13

AC004080.15AC004080.15

AC004080.1AC004080.1

AC004080.1AC004080.17

AC004080.18AC004080.19

Affy Transcription

GENCODE

GENCODEputative

mRNA

alternativesplicing

Other RNAz Screens

Urochordates: Ciona intestinalis & Ciona savignyi

only a few conserved RNA with Oikopleura dioica

Bioinformatics 21(S2): i77-i78

Nematodes: Caernorhabditis elegans & Caenorhabditis

briggsae

JEZ:MDE 2006 epub

Teleost fishes: Danio rerio, Takifugu rubripes, Tetraodon

nigroviridis, Oryzias latipes (partial)(in progress)

Trypanosomatids: Trypananosoma and Leishmania species

Yeasts. (joint work with Kay Nieselt and Stephan Steigele)

Summary

Predicted structured RNAs (RNAz predictions, p > 0.9)

Teleosts Mammals36000

Yeasts Nematods Insects Urochordates4000 20002500

? ?

Trypanosomatids

rRNA, tRNA, snoRNA, snRNA ...

>10000

1000 unknownconserved

500<200

Novel Human ncRNA Candidates

GU

GGAGGCCU

UUGUCCGCUG

GAG

G C A G CGUU A U

GG G A

AG C

A G G C CA C CUUCCAAAGCCUGCACAAGGG

CCU C

CAG

GCAG

UGG A G G

U AGA

CCCCUC

GG

UG

CU

CC

AG

C AC

AUG

CU

GG

AG

UG

A

CGC

GGC

GCG

CGC

CGG

C

GC

GC

GC

GC

UA

GC

0 10 20 30 40 50 60 70 80 90 100

110

120

Novel ncRNA Candidates in Caenorhabditis

__

__

_ACCUU

AC

UCGAA

AU _ A C C

CG

UCGAUGAA

GA

CCACUAA

AU

GA C

GA A

UC

CU

AA

UA

AC

CCA A

UGG

GU

UUCA

UU

GCG

GA

UAU

GA

GGCA

UU

UG

UC

U

GAG

CG

GG

GUCU C

GGUC

C

GG

CGUC

AGUGGGUUAU

CG

UAUUUCUCUC

CC

UU

CGGG G

_AAU

UU

CCCAU

CGGC

ACCAA

CUU

GACCG U

UG

CGU C

AAU

U CGGUC

CGG

AGUCAAUGGGUU

AUCU

UU C A A

AACC

C_C

CCAUUGACAA

CAA

CUU

GACCG

GCG

U

CeN23 (UM1) CeN74 (UM3) CeN77 (UM3)unknown sb-RNA sb-RNA

__GAUCA

UGC

__U

CAUGCU

__

__

CUCAACCAG

UUA C C C U A C C

UGUCC

UGG CUGUGG

ACA

C C C A C A GU

AC

GC

AUUCG

GUACAG

UA

AC

CA

UCA

A CG

UG

GC A C

AAU

UA

CA

CCG A C AC

C C A CA

A C CG

GA

CAUGACACU

GG

UC

G UC

GG

AUC

A AG

ACA

A U AAC

ACGU

CUC

UUGU

CC

AGU

GGC

CA

ACU

GU

CC

GAUGG

C C G G GU

AUACGGU

AGGUGGCG

AC

GCGGU

GU

ACA

UG

GA

CG

GA

UU

CAAGAG

UG

G

UCUGA

CUAU

C A GAAA

UAAUCGAU

UCCG

GUUUGAAUUGUUUCAAUUGU

GA C

UGCAAGGAAACAAU C C

GC

UUCAAA

GCU

CG

AUCAAUCU

UCUC

GCCA

CA

AC

AU

AGCAUA

GA

UC

UC

UG

CUCAGAUAUCAAUU__UCUACAACA

UG

GA

GGU_

_ GA

GA

CGG

AC G

AG

CC

UC

UU

CA

UGAUUAGCAU

GA

UU C

UCA

UC

A

C

C

GC

AG

AA A C C A A A A

UA

ACAGA

AA

AA

ACAAACCAC

UU

A__

CA

513253 515948 513590(UM2) (UM3) (UM1)

Efforts to Annotate the RNAz Results

ongoing effort

Large number of microRNA candidates approximately 30-40 good H/ACA-box snoRNAs only 6% of hits (comparable to estimated false positive rate)

overlaps with predicted coding regions few clusters of signals with high sequence-similarity

work in progress: structure-based clustering (joint work withRolf Backofen’s lab in Freiburg)

BOTTOM LINE: most signals still unclassified.We need MUCH better methods to recognize members of knownRNA classes

RNAmicro: A classificator for microRNA Precursors

Input: Multiple Sequence alignment

Preprocessing: non-restrictive check for almost-hairpinstructureSome known microRNA precursors, notably some let-7

family members have small branches!

SVM Classification with few descriptors:Property # DescriptorsStructure 2 ls , lhSequence composition 1 G+CSequence conservation 4 S5′ , S3′ , S0 , SminThermodynamic stability 4 E , ǫ, η, z

Structure conservation 1 Econs

ISMB 2006, in press

Results: Caenorhabditis

351 158

RNAmicroP > 0.5 P > 0.9

3666

19

00

6

86

9

2 7

2

45

626 31 5

1251452675

RNAz

miRNA registry 7.1

Grad et al 2003

other RNAs

206

Results: Mammals

5440 1491

RNAmicroP > 0.5 P > 0.9

RNAz

208481

177

00

2

2541

72

10 21

33

miRNA registry 7.1

Berezikov et al. 2005

203014 3826 1260

38

846

Clustering

A

B

D E

AB DE C

A

B

DE

CE

BD

AE

AD

AC

AB

BC

BE

CD

C

DE

C

dot.ps

U A C G A C G G A C U U A C G G A C U U A C G

U A C G A C G G A C U U A C G G A C U U A C GUA

CG

AC

GG

AC

UU

AC

GG

AC

UU

AC

G

UA

CG

AC

GG

AC

UU

AC

GG

AC

UU

AC

G

dot.ps

A U C A C U C G U A C U G U A C

A U C A C U C G U A C U G U A CAU

CA

CU

CG

UA

CU

GU

AC

AU

CA

CU

CG

UA

CU

GU

AC

dot.ps



CA

CU

CG

UA

CU

GU

AC

AU

CA

CU

CG

UA

CU

GU

AC

dot.ps



CA

CU

CG

UA

CU

GU

AC

AU

CA

CU

CG

UA

CU

GU

AC

dot.ps

U A C G A C G G A C U U A C G G A C U U A C G

U A C G A C G G A C U U A C G G A C U U A C GUA

CG

AC

GG

AC

UU

AC

GG

AC

UU

AC

G

UA

CG

AC

GG

AC

UU

AC

GG

AC

UU

AC

G

Pro

ofof

Con

cept:

tRN

As

inCio

na

inte

stin

alis

Arg/Asn2 2 Arg 2 Thr 2 ~ Ile2 Phe Cys 2 2 Lys 2 Gln 3 Gly 2 2 Val Glu 2 Pro~ 2 Met Arg Met Gln Ala ~~~4 2 Ile 4 Ser Leu Tyr

0.0

0.050.1

0.150.2

0.250.3

0.350.4

Summary

Some classes of ncRNAs, namely the structures ones, can befound efficiently by means of comparative genomics

There are Tens of Thousands of structured RNAs of unknownfunction in the human genome

Some of them probably act, like microRNA and snoRNAs bybinding to other RNAs. These could be investigated usingRNA cofolding approaches (ongoing research).

So far, we know only of the proverbial tip of the iceberg of the

complexity of cellular regulation

& RNA bioinformatics is a really cool research topic ...

Acknowledgments: It’s not my fault . . .

Leipzig: Kristin Missal, Dominic Rose, Jana Hertel, ManjaLindemeyer, Matthias Kruspe, Sonja J. Prohaska, ClaudiaFried, Roman Stocsits, Axel Mosig Bettina Muller (FH

Weihenstephan), Katrin Sameith (U Jena)

Vienna: Stefan Washietl, Ivo L. Hofacker, Christoph Flamm,Andrea Tanzer, Stefan BernhartSusanne Rauscher, Caroline Thurner, Christina Witwer, Ingrid

Abfalter, and many others

Havard: Walter Fontana

Beijing: Xiaopeng Zhu, Wei Deng, Geir Skogerbø, RunshengChen

Tubingen: Stephan Steigele, Kay Nieselt,

Copenhagen: Jan Gorodkin, Stefan Seemann

Freiburg: Rolf Backofen, Sebastian Will

Date post:	14-Oct-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

RNA Bioinformatics - Dartmouth Computer Sciencerockmore/CSSS2006/stadler...RNA Bioinformatics Peter...

Documents