RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution...

Post on 29-Dec-2015

227 views 3 download

Tags:

transcript

RNA Secondary Structure

What is RNA?

Definition of RNA secondary Structure

RNA molecule evolution

Algorithms for base pair maximisation

Chomsky’s Linguistic Hierarchy

Stochastic Context Free Grammars & Evolution

Miscelaneous topics

Base PairingFrom Przytycka

CC

C

N

N C

O

C

CC

C

O

N C

O

N

cytosine

Uracyl

N

CC

C

NC

NC

N

N

O

N

CC

C

NC

NC

N

N

Adenine

Guanine

PYRIMIDINES PURINES

H donor acceptor

An Example: t-RNA

From Paul Higgs

Known RNAst-RNA (transfer-)

m-RNA (messenger-)

mi-RNA (micro-)

Sn-RNA (small nuclear)

RNA-I (interfering)

Srp-RNA (Signal Recognition Particle)

5S RNA

16S RNA

23S RNA

RNA viruses: Retroviruses (HIV), Coronavirus (SARS),.

….

Functions of RNAs

Information Transfer: mRNA

Codon -> Amino Acid adapter: tRNA

Enzymatic Reactions:

Other base pairing functions: ???

Structural:

Metabolic: ???

Regulatory: RNAi

Known RNA Structureshttp://www.rnabase.org/metaanalysis/ httpp://www.sanger.ac.uk/Software/rfam http://www.scor.lbl,gov

Figure 1: The cumulative number of publicly available RNA containing structures determined by x-ray crystallography (red), nmr spectroscopy (purple) or all techniques combined (blue) has been steadily increasing since the first RNA containing structure was released in 1978. There has been a substantial acceleration in RNA structure determinations since the mid-1990s.

Figure 2: In a positive new trend, the average number of conformational map outliers per residue solved has shown a consistent downtrend recently. Interestingly, most of the improvement can be attributed to structures determined by x-ray crystallography. There has been no consistent trend for structures determined by NMR spectroscopy.

Rfam – database of RNA alignments and secondary structure models

Scor - database of RNA experimentally solved structures

RNA SS: recursive definitionNussinov (1978) remade from Durbin et al.,1997

i,j pairbifurcation

j unpairedi unpaired

i jj-1i+1

iji+1

jj-1i

i k

jk+1

Secondary Structure : Set of paired positions on inteval [i,j].

A-U + C-G can base pair. Some other pairings can occur + triple interactions exists.

Pseudoknot – non nested pairing: i < j < k < l and i-k & j-l.

RNA Secondary Structure

2

0

1)2( ),1)(()()1(n

k

TkNkTnTnT

n

nnT

2

53

8

5715~)( 2/3

N1 NL

The number of secondary structures:

( )

N1 NL( ) N1 NL( )

NLN1

))

NkN1) Nk+1 )NL()

Waterman,1978

RNA: Matching Maximisation.remade from Durbin et al.,1997

Example: GGGAAAUCC (A-U & G-C)

0 0 02 03 04 05 16 27 3

0 0 0 0 0 0 1 2 32

0 0 0 0 0 1 2 23

0 0 0 0 1 1 14

0 0 0 1 1 15

0 0 1 1 16

0 0 0 07

0 0 0

0 0

G G G A A A U C C

j

i

G G

G A

A A

U C

0i)(i, & 01)-i(i, tionInitialisa

j)]1,(kk)(i,[max

j)(i,1)-j1,(i

1)-j(i,

j)1,(i

max

j)(i,

jki

U

A A

CA

C

G

GG

RNA Secondary Structure EvolutionFrom Durbin et al.(1998) Biological Sequence Comparison

Inference about hidden structure

Observable

Observable Unobservable

Unobservable

U

C G

A

C

AU

A

C

)()(

)()(

),(

SequencePSequenceStructureP

StructurePStructureSequenceP

StructureSequenceP

Goldman, Thorne & Jones, 96

Knudsen & Hein, 99

Pedersen & Hein, 03

Goldman, Thorne & Jones: ”Structure” + ”Evolution”

1 A S D F G H J K L P2 A S D F G H J K L P3 D S D F G K J K L C4 D S D F G K J K L C

HMM x x x x x x x L x x x

42

1 3

Three Questions

What is the probability of the data?

What is the most probable ”hidden” configuration?

What is the probability of specific ”hidden” state?

Training: Given a set of instances, find parameters making

them probable if they were independent.

O1 O2 O3 O4 O5 O6 O7 O8 O9 O10

H1

H2

H3

PO5

H5 2 P(O5 H5 2) PO4

H4 j

H 4 j

p j ,i

The Basic Calculations

What is the most probable ”hidden” configuration?O1 O2 O3 O4 O5 O6 O7 O8 O9 O10

H1

H2

H3

What is the probability of specific ”hidden” state?

O1 O2 O3 O4 O5 O6 O7 O8 O9 O10

H1

H2

H3

The time required for these calculations is proportional to K2*L, where K is the number of hidden states and L the length of the sequence.

Empirical Doublet Models

Partial Doublet Model

AU UA GC CG UG GU

AU -1.16 .18 .5 .12 .02 .27

UA .18 -1.16 .12 .5 .27 .02

CG .33 .08 -.82 .13 .02 .23

CG .08 .33 .13 -.82 .23 .02

UG .08 1.00 .1 1.26 -2.56 .04

GU 1.00 .08 1.26 .1 .04 -2.56

Singlet/Marginalized Doublet Model

A C G U

A -.75/-1.15 .16/.13 .32/.79 .26/.23

C .4/.09 -1.57/-.84 .24/.16 .93/.59

G .55/.45 .17/.13 -.96/-.7 .24/.11

U .35/.18 .51/.70 .19/.16 -1.05/-1.03

Alignment of slowly N related molecules – L long

AUUGCAUUCCAAUUGCAUUCCA rN1,N2 = #(N1->N2,N2->N1)/[NP/U(NP/U-1)/2] N1 not N2

AUUGCAUUCCAAUUGCAUUCCA where NP/U is number of paired/unpaired in alignment

AUUGCAUUCCAAUUGCAUUCCA r’N1,N2 = #N1*rN1,N2/#N2

AUUGCAUUCCAAUUGCAUUCCA

Doublet EvolutionFrom Bjarne Knudsen

Structure Dependent Evolution: RNA

U A C A C C G U

U

C G

A C

AU

CU A C A C C G U

U A C A C C G U

U A C A C C G U 1 2 3 4 5 6 7

23

68

457

1 2 3 4 5 6 7

23

68

457

)(

)(

,

,

UnpairedHistoryP

PairedHistoryP

ji

ji

)(

)(

,

,

UnpairedHistoryP

PairedHistoryP

ji

ji

Structure Dependent Evolution: RNA

Grammars: Finite Set of Rules for Generating Stringsi. A starting symbol:

ii. A set of substitution rules applied to variables - - in the present string:

Reg

ula

r

Co

nte

xt F

ree

Co

nte

xt S

ensi

tive

Gen

eral

(a

lso

era

sin

g)

finished – no variables

Chomsky Linguistic HierarchySource: Biological Sequence Comparison

W nonterminal sign, a any sign, are strings, but , not null string. Empty String

Regular Grammars W --> aW’ W --> a

Context-Free Grammars W -->

Context-Sensitive Grammars 1W2 --> 12

Unrestricted Grammars 1W2 -->

The above listing is in increasing power of string generation. For instance "Context-Free Grammars" can generate all sequences "Regular Grammar" can in addition to some more.

Simple String Generators

Terminals (capital) --- Non-Terminals (small)

i. Start with S S --> aT bS T --> aS bT

One sentence – odd # of a’s:S-> aT -> aaS –> aabS -> aabaT -> aaba

ii. S--> aSa bSb aa bb

One sentence (even length palindromes):S--> aSa --> abSba --> abaaba

Stochastic GrammarsThe grammars above classify all string as belonging to the language or not.

All variables has a finite set of substitution rules. Assigning probabilities to the use of each rule will assign probabilities to the strings in the language.

S -> aSa -> abSba -> abaaba

i. Start with S. S --> (0.3)aT (0.7)bS T --> (0.2)aS (0.4)bT (0.2)

If there is a 1-1 derivation (creation) of a string, the probability of a string can be obtained as the product probability of the applied rules.

S -> aT -> aaS –> aabS -> aabaT -> aaba

ii. S--> (0.3)aSa (0.5)bSb (0.1)aa (0.1)bb

*0.3

*0.3 *0.2 *0.7 *0.3 *0.2

*0.5 *0.1

S --> LS L .869 .131F --> dFd LS .788 .212L --> s dFd .895 .105

Secondary Structure Generators

SCFG Analogue to HMM calculations (Durbin et al,1998)

W

i j1 L

WL WR

i’ j’

The time required for these calculations is proportional to K2*L3, where K is the number of hidden states and L the length of the sequence.

What is the probability of the data?

What is the most probable ”hidden” configuration?

What is the probability of specific ”hidden” state?

S

RNA Secondary StructureKnudsen & Hein, 03

From Knudsen & Hein (1999)

1. Accuracy as certainty threshold is increased.

2. Accuracy as function of sequence number:

RNA Secondary StructureKnudsen & Hein, 03

Observing Evolution has 2 parts

P(x):

P(Further history of x):

U

C G

A

C

AU

A

C

xx

RNA Structure Prediction and Alignment

Sankoff, 1985 Combined RNA secondary structure & alignment

Gorodkin 1997 Foldalign – only hairpins

2002 Dynalign

Perriquet 2002 Carnac

Can only align molecules of same type.

RNA Structure Representations

From Fontana, 2003

Moulton et al.,2002

E MountainsCircle with chords

Ordered Tree

Balanced Nested Parenthesis

Full Description

RNA Structure Evolution

Insertion-deletion process of

Doublets

Singlets

There are methods of tree alignments that could probably be extended to statistica tree alignment.

Metrics on RNA StructuresMoulton,2000

Base Pair Metrics

Tree Metrics

Mountain Metrics

Population Genetics of Coupled MutationsW.Stephan,96 & P.Higgs,98

Possible separation of long term and short term evolution

Creation of Linkage Disequilibrium of paired sites.

SingletDoublet ModelsKirby et al, 95, Tillier et al.,98, Savill et al.,01

Jukes-Cantor with bias toward base pairing:

1/4, 1 difference, pairing gained

1/4, 1 difference, pairing unchanged

Ri,j=

1/4, 1 difference pairing lost

0, 2 differences

Contagious Dependencies: Overlapping Reading Frames & CG frequenciesPedersen & Jensen,01

n n n n n n n n n n n

DoubletTetraplet ModelsNerman & Durbin at B.Knudsen’s exam 02

N2 N4

N2N1

In principle a 44 times 44 matrix (65.536 entries!!) is need, but proper parametrisation and symmetries is could reduce this substantially.

Stacking:

RNA + Protein Structure Dependent Molecular Evolution

Singlet

Straight forward, no interference from RNA level.

Doublets

What seems to be needed is a parametrisation of how base pairing creates departure from a independent singlet,singlet model.

RNA Folding

Molecular Dynamics of RNA Structures

RNA Structure – Sequence Landscapes

RNA Homology Modelling & Threading

RNA Gene Finding

Close to Optimal Structures

Constraint Satisfaction Modelling

Miscellaneous Topics

Literature & www-sites

Eddy, S. Non-coding RNA genes and the modern RNA world.Nat Rev Genet. 2001 Dec;2(12):919-29. Review.

Eddy, S. “Computational genomics of noncoding RNA genes” Cell. 2002 Apr 19;109(2):137-40. Review. Fontana (2002) Modelling “evo-devo” with RNA BioEssays 24.12.1164-77

Knudsen, B. and J.J.Hein (2003) "Practical RNA Folding” (In Press, RNA)

Knudsen, B. and J.J.Hein (1999) "Using stochastic context free grammars and molecular evolution to predict RNA secondary structure (Bioinformatics vol 15.5 15.6.446-454)

Moore (1999) Structural Motifs in RNA Ann.Rev.Biochem. 68.287-300.

Moulton et al. (2000) Metrics on RNA Secondary Structures J.Compu.Biol. 7.1/2.277-

Perriquet et al.(2003) Finding the common homologous structure shared by two homologous RNAs. Bioinformatics 19.1.108-116.

http://www.imb-jena.de/RNA.html

http://scor.lbl.gov/index.html

http://www.rnabase.org/metaanalysis/