+ All Categories
Home > Documents > RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK...

RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK...

Date post: 16-Jul-2020
Category:
Upload: others
View: 20 times
Download: 0 times
Share this document with a friend
66
1 RNA Secondary Structure Prediction
Transcript
Page 1: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

1

RNA Secondary Structure Prediction

Page 2: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

2

RNA structure prediction methods

Base-Pair Maximization

Context-Free Grammar Parsing.

Free Energy Methods

Covariance Models

Page 3: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

A C A G U U G C A

1 2 3 4 5 6 7 8 9

q = 9

The Nussinov-Jacobson Algorithm

1 2 3 4 5 6 7 8 9

A C A G U U G C A

1 A 0 0 0 1 2 2 2 3

2 C 0 0 0 1 1 1 2 2 3

3 A 0 0 0 1 1 1 2 3

4 G 0 0 0 0 0 1 2

5 U 0 0 0 0 1 2

6 U 0 0 0 1 2

7 G 0 0 1 1

8 C 0 0 0

9 A 0 0

Page 4: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

4

SCFG Version

• Nussinov algorithm can be converted to

a stochastic context-free grammar:

• S W

• W aW | cW | gW | uW

• W Wa | Wc | Wg | Wu

• W aWu | cWg | uWa | gWc

• W WW

Page 5: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

5

SCFGs

• Stochastic Context Free Grammars (SCFGs) have also been used to model RNA secondary structure

• Examples – tRNAScan-SE

– program created to find snoRNAs

• Grammars are created by using a training set of data, and then the grammars are applied to potential sequences to see if they fit into the language

Page 6: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

6

SCFGs

• SCFGs allow the detection of

sequences belonging to a family

– tRNAs

– group I introns

– snoRNAs

– snRNAs

Page 7: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

7

SCFGs

• Any RNA structure can be reduced to a

SCFG (see Durbin, et al., p 278-279)

Page 8: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

8

Transformational Grammars

• First described by linguist Noam

Chomsky in the 1950’s.

– (Yes, the same Noam Chomsky who has

expressed various dissident political views

throughout the years!)

Page 9: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

13 June 2006 9

Page 10: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

13 June 2006 10

Page 11: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

11

Transformational Grammars

• Very important in computer science,

most notably in compiler design

• Covered in detail in compiler and

automaton classes

Page 12: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

12

Transformational Grammars

• Idea: take a set of outputs (sentence, RNA structure) and determine if it can be produced using a set of rules

• Consist of a set of symbols and production rules

• The symbols can be terminal (emitting) symbols or non-terminal symbols

Page 13: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

13 June 2006 13

Page 14: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

13 June 2006 14

Page 15: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

13 June 2006 15

Page 16: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

13 June 2006 16

Page 17: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

17

Grammar for Palindromes

• Consider palindromic DNA sequences

• Five possible terminal symbols: {a, c, g,

t, ) ( represents the blank terminal

symbol)

Page 18: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

18

Grammar for Palindromes

• Production Rules, where S and W are

non-terminal symbols:

• SW

• W aWa | cWc | gWg | tWt

• W a | c| g | t |

Page 19: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

19

Derivation of Sequences

• Using these production rules, a

derivation of the palindromic sequence

acttgttca follows:

• S W aWa acWcaactWtca

acttWttca acttgttca

Page 20: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

13 June 2006 20

Page 21: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

21

SCFGs for RNA

• base-paired columns modeled by pairwise emitting non terminals

– aWu; uWa; gWc; cWg; ...

• single-stranded columns modeled by leftwise emitting nonterminals (when possible)

– aW; cW; gW; uW; ..., when possible

Page 22: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

23

Parse Trees

• A context-free grammar can be aligned to a sequence using a parse tree

• Root of the tree is the non-terminal start symbol, S

• Leaves are terminal symbols

• Internal nodes are the nonterminals

• Leaves can be parsed from left to right to view the results of production

Page 23: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

13 June 2006 24

Page 24: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

25

Parse Tree

S

W

W

W

W

W

atta c cg t t

Page 25: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

13 June 2006 27

Page 26: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

13 June 2006 28

Page 27: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

13 June 2006 29

Page 28: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

13 June 2006 30

Page 29: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

13 June 2006 31

Page 30: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

CYK )Cocke-Younger-Kasami)

Parsing Algorithm

سید محمد حسین معطر

پردازش زبان طبیعی

ردانشگاه صنعتی امیر کبی

دانشکده مهندسی کامپیوتر

Page 31: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

Parsing Algorithms

• CFGs are basis for describing (syntactic) structure of NL sentences

• Thus - Parsing Algorithms are core of NL analysis systems

• Recognition vs. Parsing:– Recognition - deciding the membership in the language:

– Parsing – Recognition +producing a parse tree for it

• Parsing is more “difficult” than recognition? (time complexity)

• Ambiguity - an input may have exponentially manyparses

Page 32: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

CYK )Cocke-Younger-Kasami)

• One of the earliest recognition and parsing algorithms

• The standard version of CYK can only recognize languages defined by context-free grammars in Chomsky Normal Form (CNF).

• It is also possible to extend the CYK algorithm to handle some grammars which are not in CNF– Harder to understand

• Based on a “dynamic programming” approach:– Build solutions compositionally from sub-solutions

– Store sub-solutions and re-use them whenever necessary

• Recognition version: decide whether S == > w ?

Page 33: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

CYK Algorithm

• The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence of n letters a1 ... an.

– Let the grammar contain r terminal and nonterminal symbols R1 ... Rr, and let R1 be the start symbol.

– Let P[n,n,r] be an array of booleans. Initialize all elements of P to false.

– For each i = 1 to n • For each unit production Rj -> ai, set P[i,1,j] = true.

– For each i = 2 to n -- Length of span • For each j = 1 to n-i+1 -- Start of span

– For each k = 1 to i-1 -- Partition of span

» For each production RA -> RB RC

» If P[j,k,B] and P[j+k,i-k,C] then set P[j,i,A] = true

– If P[1,n,1] is true • Then string is member of language

• Else string is not member of language

Page 34: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

CYK Pseudocode

On input x = x1x2 … xn :

for (i = 1 to n) //create middle diagonal

for (each var. A)

if(Axi)

add A to table[i-1][i]

for (d = 2 to n) // d’th diagonal

for (i = 0 to n-d)

for (k = i+1 to i+d-1)

for (each var. A)

for(each var. B in table[i][k])

for(each var. C in table[k][k+d])

if(ABC)

add A to table[i][k+d]

return Stable[0][n] ? ACCEPT : REJECT

Page 35: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

CYK Algorithm

• this algorithm considers every possible consecutive subsequence of the sequence of letters and sets P[i,j,k] to be true if the sequence of letters starting from i of length j can be generated from Rk.

• Once it has considered sequences of length 1, it goes on to sequences of length 2, and so on.

• For subsequences of length 2 and greater, it considers every possible partition of the subsequence into two halves, and checks to see if there is some production P -> Q R such that Q matches the first half and R matches the second half. If so, it records P as matching the whole subsequence.

• Once this process is completed, the sentence is recognized by the grammar if the subsequence containing the entire string is matched by the start symbol

Page 36: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

CYK Algorithm for Deciding Context

Free Languages

Q: Consider the grammar G given by

S | AB | XB

T AB | XB

X AT

A a

B b

1. Is x = aaabbb in L(G ) ?

Page 37: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

CYK Algorithm for Deciding Context

Free LanguagesNow look at aaabbb :

S | AB | XB

T AB | XB

X AT

A a

B b

a a a b b b

Page 38: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

CYK Algorithm for Deciding Context

Free Languages1) Write variables for all length 1 substrings.

S | AB | XB

T AB | XB

X AT

A a

B b

a a a b b

A A A B B

b

B

Page 39: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

CYK Algorithm for Deciding Context

Free Languages2) Write variables for all length 2 substrings.

S | AB | XB

T AB | XB

X AT

A a

B b

a a a b b

A A A B B

S,T

b

B

Page 40: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

CYK Algorithm for Deciding Context

Free Languages3) Write variables for all length 3 substrings.

S | AB | XB

T AB | XB

X ATA a

B b

a a a b b

A A A B B

T

X

b

B

S,T

Page 41: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

CYK Algorithm for Deciding Context

Free Languages4) Write variables for all length 4 substrings.

S | AB | XB

T AB | XB

X AT

A a

B b

a a a b b

A A A B B

T

X

S,T

b

B

S,T

Page 42: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

CYK Algorithm for Deciding Context

Free Languages5) Write variables for all length 5 substrings.

S | AB | XB

T AB | XB

X ATA a

B b

a a a b b

A A A B B

T

X

S,T

b

B

X

S,T

Page 43: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

CYK Algorithm for Deciding Context

Free Languages6) Write variables for all length 6 substrings.

S | AB | XB

T AB | XBX AT

A a

B b

S is included so

aaabbb accepted!

a a a b b

A A A B B

T

XS,T

b

B

X

S,T

S,T

Page 44: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

CYK Algorithm for Deciding Context

Free LanguagesCan also use a table for same purpose.

end at

start at

1: aaabbb

2: aaabbb

3: aaabbb

4: aaabbb

5: aaabbb

6: aaabbb

0:aaabbb

1:aaabbb

2:aaabbb

3:aaabbb

4:aaabbb

5:aaabbb

Page 45: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

CYK Algorithm for Deciding Context

Free Languages1. Variables for length 1 substrings.

end at

start at

1: aaabbb

2: aaabbb

3: aaabbb

4: aaabbb

5: aaabbb

6: aaabbb

0:aaabbb A

1:aaabbb A

2:aaabbb A

3:aaabbb B

4:aaabbb B

5:aaabbb B

Page 46: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

CYK Algorithm for Deciding Context

Free Languages2. Variables for length 2 substrings.

end at

start at

1: aaabbb

2: aaabbb

3: aaabbb

4: aaabbb

5: aaabbb

6: aaabbb

0:aaabbb A -

1:aaabbb A -

2:aaabbb A S,T

3:aaabbb B -

4:aaabbb B -

5:aaabbb B

Page 47: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

CYK Algorithm for Deciding Context

Free Languages3. Variables for length 3 substrings.

end at

start at

1: aaabbb

2: aaabbb

3: aaabbb

4: aaabbb

5: aaabbb

6: aaabbb

0:aaabbb A - -

1:aaabbb A - X

2:aaabbb A S,T -

3:aaabbb B - -

4:aaabbb B -

5:aaabbb B

Page 48: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

CYK Algorithm for Deciding Context

Free Languages4. Variables for length 4 substrings.

end at

start at

1: aaabbb

2: aaabbb

3: aaabbb

4: aaabbb

5: aaabbb

6: aaabbb

0:aaabbb A - - -

1:aaabbb A - X S,T

2:aaabbb A S,T - -

3:aaabbb B - -

4:aaabbb B -

5:aaabbb B

Page 49: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

CYK Algorithm for Deciding Context

Free Languages5. Variables for length 5 substrings.

end at

start at

1: aaabbb

2: aaabbb

3: aaabbb

4: aaabbb

5: aaabbb

6: aaabbb

0:aaabbb A - - - X

1:aaabbb A - X S,T -

2:aaabbb A S,T - -

3:aaabbb B - -

4:aaabbb B -

5:aaabbb B

Page 50: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

CYK Algorithm for Deciding Context

Free Languages6. Variables for aaabbb. ACCEPTED!

end at

start at

1: aaabbb

2: aaabbb

3: aaabbb

4: aaabbb

5: aaabbb

6: aaabbb

0:aaabbb A - - - X S,T

1:aaabbb A - X S,T -

2:aaabbb A S,T - -

3:aaabbb B - -

4:aaabbb B -

5:aaabbb B

Page 51: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

Parsing results

• We keep the results for every wij in a

table.

• Note that we only need to fill in entries

up to the diagonal – the longest

substring starting at i is of length n-i+1

Page 52: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

Constructing parse tree

• we need to construct parse trees for

string w:

• Idea:

– Keep back-pointers to the table entries that

we combine

– At the end - reconstruct a parse from the

back-pointers

• This allows us to find all parse trees

Page 53: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

References

• Hopcroft and Ullman,“Intro. to Automata

Theory, Lang. and Comp.”Section 6.3, pp.

139-141

• “CYK algorithm ” , Wikipedia, the free

encyclopedia

• A representation by Zeph Grunschlag

Page 54: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

A C A G U U G C A

1 2 3 4 5 6 7 8 9

q = 9

The Nussinov-Jacobson Algorithm

1 2 3 4 5 6 7 8 9

A C A G U U G C A

1 A 0 0 0 1 2 2 2 3

2 C 0 0 0 1 1 1 2 2 3

3 A 0 0 0 1 1 1 2 3

4 G 0 0 0 0 0 1 2

5 U 0 0 0 0 1 2

6 U 0 0 0 1 2

7 G 0 0 1 1

8 C 0 0 0

9 A 0 0

Page 55: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

65

-

1 2 3 4 5 6 7 8 9

A C A G U U G C A

1 A 0 0 0 1 2 2 2 3

2 C 0 0 0 1 1 1 2 2 3

3 A 0 0 0 1 1 1 2 3

4 G 0 0 0 0 0 1 2

5 U 0 0 0 0 1 2

6 U 0 0 0 1 2

7 G 0 0 1 1

8 C 0 0 0

9 A 0 0

A C A G U U G C A

1 2 3 4 5 6 7 8 9

The Nussinov-Jacobson Algorithm

Page 56: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

66

A C A G U U G C A

1 2 3 4 5 6 7 8 9

q-1 q

1 2 3 4 5 6 7 8 9

A C A G U U G C A

1 A 0 0 0 1 2 2 2 3

2 C 0 0 0 1 1 1 2 2 3

3 A 0 0 0 1 1 1 2 3

4 G 0 0 0 0 0 1 2

5 U 0 0 0 0 1 2

6 U 0 0 0 1 2

7 G 0 0 1 1

8 C 0 0 0

9 A 0 0

The Nussinov-Jacobson Algorithm

i < q ≤ j

Page 57: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

67

A U C A U G G C A U

• Co-terminus foldings:

• Partitionable foldings:

A C A G U U G C A

1 2 3 4 5 6 7 8 9

Page 58: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

).,1(),(max

);,()1,1(

);1,(

);,1(

max),(

jkki

jioreBasePairScji

ji

ji

ji

jki

68

Another way to write the

Nussinov-Jacobson recursion

• Initialization:

• Recursion:

0),(

to2for 0)1,(

ii

Liii

Two special cases of

Partitionable Folding

Partitionable

Folding

Co-Terminus

Folding

Page 59: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

69

SCFG version of the

Nussinov-Jacobson algorithm

• Stochastic Context-Free Grammars

• Makes use of production rules:

– W aW | cW | gW | uW (i unpaired)

• Every production rule has a associated

probability parameter.

• The maximum probability parse is

equivalent to the maximum probability

secondary structure.

Page 60: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

70

SCFG Version of Nussinov-

Jacobson Algorithm

• The algorithm can be converted to a stochastic context-free grammar:

• S W

• W aW | cW | gW | uW

• W Wa | Wc | Wg | Wu

• W aWu | cWg | uWa | gWc

• W WW

Page 61: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

71

Needed terminology• The inside-outside (recursive dynamic

programming) algorithm for SCFGs in

Chomsky normal form is the natural

counterpart of the forward-backward

algorithm for HMM.

• Best path variant of the inside-outside

algorithm is the Cocke-Younger-Kasami

(CYK) algorithm. It finds the maximum

probabilistic alignment of the SCFG to the

sequence.

Page 62: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

).(log),1(),(max

);(log)1,1(

);(log)1,(

);(log),1(

max),(

WWpjkki

Wxxpji

Wxpji

Wxpji

ji

jki

ji

j

i

72

CYK for Nussinov-style

RNA SCFG

• Initialization:

• Recursion:

LiSxp

Sxpii

Liii

i

i to1for

)(log

)(logmax),(

to2for )1,(

Addition to the fill stage

of the Nussinov

algorithm.

The principal difference

is that the SCFG

description is a

probabilistic model.

Two special cases of

Partitionable Folding

Partitionable

Folding

Co-Terminus

Folding

Page 63: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

73

CYK for Nussinov-style

RNA SCFG (2)

• The is the log likelihood

of the optimal structure given the

SCFG model

• The traceback to find the secondary

structure corresponding to the best

score is performed analogously to the

traceback in the Nussinov algorithm

)|ˆ,(log xP

Page 64: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

74

Example of RNA Structure

SCFG• RNA structure for the sequence produced by

MFOLD, can be constructed (5’ to 3’):

• GCUUACGACCAUAUCACGUUGAAUGCAC

GCCAUCCCGUCCGAUCUGGCAAGUUAAG

CAACGUUGAGUCCAGUUAGUACUUGGAU

CGGAGACGGCCUGGGAAUCCUGGAUGU

UGUAAGCU

Page 65: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

75

Example Construction

• S

• W

• Wu

• gWcu

• gcWgcu

• gcuWagcu

• gcuuWaagcu

• gcuuaWuaagcu

• gcuuacWguaagcu

• gcuuacgWuguaagcu

• gcuuacgaWuuguaagcu

• gcuuacgacWguuguaagcu

• gcuuacgaccWguuguaagcu

• gcuuacgaccaWguuguaagcu....

Page 66: RNA Secondary Structure Predictiontabio162/wiki.files/RNACKY1.pdf · CYK Algorithm • The CYK algorithm for the membership problem is as follows: – Let the input string be a sequence

76

CYK for Nussinov-style

RNA SCFG

• Good starting example, but it is too

simple to be an accurate RNA folder

• The algorithm does not consider

important structural features like

preferences for certain:

– Loop lengths

– Nearest neighbours in the structure caused

by stacking interactions between

neighbouring base pairs in a stem.


Recommended