Date post: | 29-Dec-2015 |
Category: |
Documents |
Upload: | julie-hopkins |
View: | 227 times |
Download: | 3 times |
RNA Secondary Structure
What is RNA?
Definition of RNA secondary Structure
RNA molecule evolution
Algorithms for base pair maximisation
Chomsky’s Linguistic Hierarchy
Stochastic Context Free Grammars & Evolution
Miscelaneous topics
Base PairingFrom Przytycka
CC
C
N
N C
O
C
CC
C
O
N C
O
N
cytosine
Uracyl
N
CC
C
NC
NC
N
N
O
N
CC
C
NC
NC
N
N
Adenine
Guanine
PYRIMIDINES PURINES
H donor acceptor
An Example: t-RNA
From Paul Higgs
Known RNAst-RNA (transfer-)
m-RNA (messenger-)
mi-RNA (micro-)
Sn-RNA (small nuclear)
RNA-I (interfering)
Srp-RNA (Signal Recognition Particle)
5S RNA
16S RNA
23S RNA
RNA viruses: Retroviruses (HIV), Coronavirus (SARS),.
….
Functions of RNAs
Information Transfer: mRNA
Codon -> Amino Acid adapter: tRNA
Enzymatic Reactions:
Other base pairing functions: ???
Structural:
Metabolic: ???
Regulatory: RNAi
Known RNA Structureshttp://www.rnabase.org/metaanalysis/ httpp://www.sanger.ac.uk/Software/rfam http://www.scor.lbl,gov
Figure 1: The cumulative number of publicly available RNA containing structures determined by x-ray crystallography (red), nmr spectroscopy (purple) or all techniques combined (blue) has been steadily increasing since the first RNA containing structure was released in 1978. There has been a substantial acceleration in RNA structure determinations since the mid-1990s.
Figure 2: In a positive new trend, the average number of conformational map outliers per residue solved has shown a consistent downtrend recently. Interestingly, most of the improvement can be attributed to structures determined by x-ray crystallography. There has been no consistent trend for structures determined by NMR spectroscopy.
Rfam – database of RNA alignments and secondary structure models
Scor - database of RNA experimentally solved structures
RNA SS: recursive definitionNussinov (1978) remade from Durbin et al.,1997
i,j pairbifurcation
j unpairedi unpaired
i jj-1i+1
iji+1
jj-1i
i k
jk+1
Secondary Structure : Set of paired positions on inteval [i,j].
A-U + C-G can base pair. Some other pairings can occur + triple interactions exists.
Pseudoknot – non nested pairing: i < j < k < l and i-k & j-l.
RNA Secondary Structure
2
0
1)2( ),1)(()()1(n
k
TkNkTnTnT
n
nnT
2
53
8
5715~)( 2/3
N1 NL
The number of secondary structures:
( )
N1 NL( ) N1 NL( )
NLN1
))
NkN1) Nk+1 )NL()
Waterman,1978
RNA: Matching Maximisation.remade from Durbin et al.,1997
Example: GGGAAAUCC (A-U & G-C)
0 0 02 03 04 05 16 27 3
0 0 0 0 0 0 1 2 32
0 0 0 0 0 1 2 23
0 0 0 0 1 1 14
0 0 0 1 1 15
0 0 1 1 16
0 0 0 07
0 0 0
0 0
G G G A A A U C C
j
i
G G
G A
A A
U C
0i)(i, & 01)-i(i, tionInitialisa
j)]1,(kk)(i,[max
j)(i,1)-j1,(i
1)-j(i,
j)1,(i
max
j)(i,
jki
U
A A
CA
C
G
GG
RNA Secondary Structure EvolutionFrom Durbin et al.(1998) Biological Sequence Comparison
Inference about hidden structure
Observable
Observable Unobservable
Unobservable
U
C G
A
C
AU
A
C
)()(
)()(
),(
SequencePSequenceStructureP
StructurePStructureSequenceP
StructureSequenceP
Goldman, Thorne & Jones, 96
Knudsen & Hein, 99
Pedersen & Hein, 03
Goldman, Thorne & Jones: ”Structure” + ”Evolution”
1 A S D F G H J K L P2 A S D F G H J K L P3 D S D F G K J K L C4 D S D F G K J K L C
HMM x x x x x x x L x x x
42
1 3
Three Questions
What is the probability of the data?
What is the most probable ”hidden” configuration?
What is the probability of specific ”hidden” state?
Training: Given a set of instances, find parameters making
them probable if they were independent.
O1 O2 O3 O4 O5 O6 O7 O8 O9 O10
H1
H2
H3
PO5
H5 2 P(O5 H5 2) PO4
H4 j
H 4 j
p j ,i
The Basic Calculations
What is the most probable ”hidden” configuration?O1 O2 O3 O4 O5 O6 O7 O8 O9 O10
H1
H2
H3
What is the probability of specific ”hidden” state?
O1 O2 O3 O4 O5 O6 O7 O8 O9 O10
H1
H2
H3
The time required for these calculations is proportional to K2*L, where K is the number of hidden states and L the length of the sequence.
Empirical Doublet Models
Partial Doublet Model
AU UA GC CG UG GU
AU -1.16 .18 .5 .12 .02 .27
UA .18 -1.16 .12 .5 .27 .02
CG .33 .08 -.82 .13 .02 .23
CG .08 .33 .13 -.82 .23 .02
UG .08 1.00 .1 1.26 -2.56 .04
GU 1.00 .08 1.26 .1 .04 -2.56
Singlet/Marginalized Doublet Model
A C G U
A -.75/-1.15 .16/.13 .32/.79 .26/.23
C .4/.09 -1.57/-.84 .24/.16 .93/.59
G .55/.45 .17/.13 -.96/-.7 .24/.11
U .35/.18 .51/.70 .19/.16 -1.05/-1.03
Alignment of slowly N related molecules – L long
AUUGCAUUCCAAUUGCAUUCCA rN1,N2 = #(N1->N2,N2->N1)/[NP/U(NP/U-1)/2] N1 not N2
AUUGCAUUCCAAUUGCAUUCCA where NP/U is number of paired/unpaired in alignment
AUUGCAUUCCAAUUGCAUUCCA r’N1,N2 = #N1*rN1,N2/#N2
AUUGCAUUCCAAUUGCAUUCCA
Doublet EvolutionFrom Bjarne Knudsen
Structure Dependent Evolution: RNA
U A C A C C G U
U
C G
A C
AU
CU A C A C C G U
U A C A C C G U
U A C A C C G U 1 2 3 4 5 6 7
23
68
457
1 2 3 4 5 6 7
23
68
457
)(
)(
,
,
UnpairedHistoryP
PairedHistoryP
ji
ji
)(
)(
,
,
UnpairedHistoryP
PairedHistoryP
ji
ji
Structure Dependent Evolution: RNA
Grammars: Finite Set of Rules for Generating Stringsi. A starting symbol:
ii. A set of substitution rules applied to variables - - in the present string:
Reg
ula
r
Co
nte
xt F
ree
Co
nte
xt S
ensi
tive
Gen
eral
(a
lso
era
sin
g)
finished – no variables
Chomsky Linguistic HierarchySource: Biological Sequence Comparison
W nonterminal sign, a any sign, are strings, but , not null string. Empty String
Regular Grammars W --> aW’ W --> a
Context-Free Grammars W -->
Context-Sensitive Grammars 1W2 --> 12
Unrestricted Grammars 1W2 -->
The above listing is in increasing power of string generation. For instance "Context-Free Grammars" can generate all sequences "Regular Grammar" can in addition to some more.
Simple String Generators
Terminals (capital) --- Non-Terminals (small)
i. Start with S S --> aT bS T --> aS bT
One sentence – odd # of a’s:S-> aT -> aaS –> aabS -> aabaT -> aaba
ii. S--> aSa bSb aa bb
One sentence (even length palindromes):S--> aSa --> abSba --> abaaba
Stochastic GrammarsThe grammars above classify all string as belonging to the language or not.
All variables has a finite set of substitution rules. Assigning probabilities to the use of each rule will assign probabilities to the strings in the language.
S -> aSa -> abSba -> abaaba
i. Start with S. S --> (0.3)aT (0.7)bS T --> (0.2)aS (0.4)bT (0.2)
If there is a 1-1 derivation (creation) of a string, the probability of a string can be obtained as the product probability of the applied rules.
S -> aT -> aaS –> aabS -> aabaT -> aaba
ii. S--> (0.3)aSa (0.5)bSb (0.1)aa (0.1)bb
*0.3
*0.3 *0.2 *0.7 *0.3 *0.2
*0.5 *0.1
S --> LS L .869 .131F --> dFd LS .788 .212L --> s dFd .895 .105
Secondary Structure Generators
SCFG Analogue to HMM calculations (Durbin et al,1998)
W
i j1 L
WL WR
i’ j’
The time required for these calculations is proportional to K2*L3, where K is the number of hidden states and L the length of the sequence.
What is the probability of the data?
What is the most probable ”hidden” configuration?
What is the probability of specific ”hidden” state?
S
RNA Secondary StructureKnudsen & Hein, 03
From Knudsen & Hein (1999)
1. Accuracy as certainty threshold is increased.
2. Accuracy as function of sequence number:
RNA Secondary StructureKnudsen & Hein, 03
Observing Evolution has 2 parts
P(x):
P(Further history of x):
U
C G
A
C
AU
A
C
xx
RNA Structure Prediction and Alignment
Sankoff, 1985 Combined RNA secondary structure & alignment
Gorodkin 1997 Foldalign – only hairpins
2002 Dynalign
Perriquet 2002 Carnac
Can only align molecules of same type.
RNA Structure Representations
From Fontana, 2003
Moulton et al.,2002
E MountainsCircle with chords
Ordered Tree
Balanced Nested Parenthesis
Full Description
RNA Structure Evolution
Insertion-deletion process of
Doublets
Singlets
There are methods of tree alignments that could probably be extended to statistica tree alignment.
Metrics on RNA StructuresMoulton,2000
Base Pair Metrics
Tree Metrics
Mountain Metrics
Population Genetics of Coupled MutationsW.Stephan,96 & P.Higgs,98
Possible separation of long term and short term evolution
Creation of Linkage Disequilibrium of paired sites.
SingletDoublet ModelsKirby et al, 95, Tillier et al.,98, Savill et al.,01
Jukes-Cantor with bias toward base pairing:
1/4, 1 difference, pairing gained
1/4, 1 difference, pairing unchanged
Ri,j=
1/4, 1 difference pairing lost
0, 2 differences
Contagious Dependencies: Overlapping Reading Frames & CG frequenciesPedersen & Jensen,01
n n n n n n n n n n n
DoubletTetraplet ModelsNerman & Durbin at B.Knudsen’s exam 02
N2 N4
N2N1
In principle a 44 times 44 matrix (65.536 entries!!) is need, but proper parametrisation and symmetries is could reduce this substantially.
Stacking:
RNA + Protein Structure Dependent Molecular Evolution
Singlet
Straight forward, no interference from RNA level.
Doublets
What seems to be needed is a parametrisation of how base pairing creates departure from a independent singlet,singlet model.
RNA Folding
Molecular Dynamics of RNA Structures
RNA Structure – Sequence Landscapes
RNA Homology Modelling & Threading
RNA Gene Finding
Close to Optimal Structures
Constraint Satisfaction Modelling
Miscellaneous Topics
Literature & www-sites
Eddy, S. Non-coding RNA genes and the modern RNA world.Nat Rev Genet. 2001 Dec;2(12):919-29. Review.
Eddy, S. “Computational genomics of noncoding RNA genes” Cell. 2002 Apr 19;109(2):137-40. Review. Fontana (2002) Modelling “evo-devo” with RNA BioEssays 24.12.1164-77
Knudsen, B. and J.J.Hein (2003) "Practical RNA Folding” (In Press, RNA)
Knudsen, B. and J.J.Hein (1999) "Using stochastic context free grammars and molecular evolution to predict RNA secondary structure (Bioinformatics vol 15.5 15.6.446-454)
Moore (1999) Structural Motifs in RNA Ann.Rev.Biochem. 68.287-300.
Moulton et al. (2000) Metrics on RNA Secondary Structures J.Compu.Biol. 7.1/2.277-
Perriquet et al.(2003) Finding the common homologous structure shared by two homologous RNAs. Bioinformatics 19.1.108-116.
http://www.imb-jena.de/RNA.html
http://scor.lbl.gov/index.html
http://www.rnabase.org/metaanalysis/