Fast and effective prediction of miRNA targets file1 7. 0 2. 2 0 0 5 Fast and effective prediction...

Post on 17-Aug-2019

213 views 0 download

transcript

17.02.2005

Fast and effective prediction of miRNA targets

Marc RehmsmeierCeBiTec, Bielefeld University, GermanyJunior Research Group Bioinformatics of Regulation

Small interfering RNAs versus small temporal RNAs

Hannon. Nature. 418:244-251, 2002.

miRNA/target duplexes

Grosshans and Slack. The Journal of Cell Biology, 156(1):17-21, 2002.

A direct approach

Given a miRNA and a potential target: What are the energetically most favourable binding sites?

Calculation of multiple mfe secondary structure duplexes

The language of RNA duplexes

hybrid = nil ><< tt (region,region) ||| unpaired_left_top |||closed ... h

unpaired_left_top = ult <<< tt (base,empty) ~~~ unpaired_left_top ||| unpaired_left_bot

... h

unpaired_left_bot = ulb <<< tt (empty,base) ~~~ unpaired_left_bot ||| edangle ... h

edangle = eds <<< tt (base, base) ~~~ closed |||edt <<< tt (base,emptybase) ~~~ closed ||| edb <<< tt (emptybase,base) ~~~ closed ... h

closed = stacking_region ||| bulge_top ||| bulge_bottom |||internal_loop ||| end_loop ... h

stacking_region = sr <<< basepair ~~~ closed

bulge_top = (bt <<< basepair ~~~ tt (uregion, empty)) `topbound` closed

bulge_bottom = (bb <<< basepair ~~~ tt (empty, uregion)) `botbound` closed

internal_loop = (il <<< basepair ~~~ tt (uregion,uregion)) `symbound` closed

end_loop = el <<< basepair ~~~ tt (region,region)

The language of RNA duplexes

Dynamic Programming recurrences

Time/memory complexity: linear in target length

let-7/lin-41 binding sites

position: 688, mfe: -28.0 kcal/mol

position: 737, mfe: -29.0 kcal/mol

Requirements

For prediction of miRNA targets in large databases we need:

• A fast program

• Good statistics

Length normalisation of minimum free energies

)mnlog(een

p-values of individual binding sites

Poisson statistics of multiple binding sites

Probability of k binding sites:

with

For small p-values:

The probability of at least k binding sites:

exp

!k]kN[P

k

]N[E

p,p]N[E

1

01

k

i]iN[P]kN[P

Comparative analysis of orthologous targets

Multi-species p-values

2p

1p

3p

Poisson p-values:

3313322 })p,...,p(max{]pP,pP,pP[P 11

multi-species p-value:

General case: k species

A dependence problem

We should see a p-value as often as it says (blue curve), but we don‘t (red curve).

let-7b/NME4 (human/mouse) binding sites

-GGCTCAAGCTGCCCTTACCACCCCATCCCCCACGCAGGACCAACTACCTCCGTCAGCAAGAACCCAAGCCCACATCCAAACCTGCCTGTCCCAAACCAC

GGGCTTGCACTGCCTTCTGCACTTCAGGTCT-ACCCATGACCTACTACCTCTGTCAACAAGAAGTCAAGCCCCCATGC---TTCCCATGTCCCCAAAC--

**** ***** * *** ** * ** ** **** ******** **** ****** ******* *** * * ****** ** *

TTACTTCCCTGTTCACCTCTGCCCCACCCCAGCCCAGAGGAGTTTGAGCCACCAACTTCAGTGCCTTTCTGTACCCCAAGCCAGCACAAGATTGGACCAA

-CACTCCCTACTCCCGCTCTACCCAACTCCAGCCCAGGGGAGTCTAAGCCTCAACTCTATGTGCCTTTTTGTATCCTAAGTCAATACAATATTGGACCAT

*** ** * * **** *** ** ********* ***** * **** * * * ******** **** ** *** ** **** *********

TCCTTTTTGCACCAAAGTGCCGGACAACCTTTGTGGTGGGGGGGGGTCTTCACATTATCATAACCTCTCCTCTAAAGGGGAGGCATTAAAATTCACTGTG

GTCCTTGTGTACAAAAGTGCCAGACAACCTTTG--------GGGCATTGTCA-AAGGTGACTTCACCTGCCTCAAAGGAGAGACATTAAAATTT--TATG

* ** ** ** ******** *********** *** * *** * * * * ** * ***** *** ********** * **

CCCAGCACATGGGTGGTACACTAATTATGACTTCCCCCAGCTCTGAGGTAGAAATGACGCCTTTATGCAAGTTGTAAGGAGTTGAACAGTAAAGAGGAAG

CTTAAAAT--------------------------------------------------------------------------------------------

* * *

5.0e-05Multi-species p-value with k = 1.1:

1.5e-08Multi-species p-value with k = 2:

k = 1.1 is the effective k

Effective number of orthologous targets

21 )xy(x

minargk

'kF)y,x('k

eff

kkeff 1 })p,p(max{]pP,pP[P effk11 2122

Requirements

For prediction of miRNA targets in large databases we need:

• A fast program

• Good statistics

True and false positives and negatives

Classify a

s Positiv

es

Classify a

s Negativ

es

TP

FP

TN

FN Positives

Negatives

FNTPTP Sens

TP

FP

TN

FN

FPTPFP Sel

1

Sensitivity and specificity

p-values control specificity

Spec

FNTPTP Sens

TP

FP

TN

FN

FPTPFP Sel

1Spec

RNAhybrid

Target prediction workflowtarget

db miRNA registry

individual p-values

multi-species p-values

Poisson p-values

bantam

#sites

target gene E-value Dm Dp Ag

CG13906 0.000141369 2 1 1

CG3629 0.029351532 2 2 0

CG17136 0.047489474 2 0 1

CG5123 0.048580874 2 2 0

CG13761 0.120263377 0 2 2

CG11624 0.605310610 0 3 0

CG1142 0.677123716 0 0 1

CG13333 0.714171923 2 0 0

Prediction of Drosophila miRNA targets

• 78 miRNAs

• 28,645 3‘UTRs (1/3 from D. mel, 1/3 from D. pseu, 1/3 from A. gamb)

Bantam hits

#sites Ag

# sites Dp

#sites Dm

E-valuetarget

0220.049Wrinkled (Hid)

0220.029Distal-less

1120.00014nervous fingers 1

miR-7 hits

3320.000095CG8394

0220.00014Twin of m4

0110.0083E(spl) region transcript m3

0210.094E(spl) region transcript m

0110.21CG7342

1110.27CG10444

0210.30Him

0110.86CG11132

#sites Ag

# sites Dp

#sites Dm

E-valuetarget

0110.87Arginine methyltransferase 1

miR-2 hits

2 2 00.054sickle

1 1 00.00951 1 00.111 1 00.00061reaper

1 1 00.0451 2 00.0711 1 00.014grim

#sitesE-value#sitesE-value#sitesE-valuetarget

miR-2cmiR-2bmiR-2a

plus a number of others

RNAhybrid functionality

length normalisation

Poisson statistics

web serverseed/loop constraints

miRNA specific statistics

effective k

comparative analysis

multiple binding sites

RNAhybrid

miRNA target selection

surprise

miRNA target selection

rank based

p-values E-values

user guidance

p-values indicate not only biochemical possibility, but also biological function.

Acknowledgements

• Peter Steffen, Robert Giegerich, Jan Krüger

• Matthias Höchsmann

• Alexander Stark, Julius Brennecke, Stephen M. Cohen

• Sven Rahmann

• Gregor Obernosterer

• Robert Heinen

• Leonie Ringrose

References

Rehmsmeier M, Steffen P, Höchsmann M and Giegerich R. Fast and effective prediction of microRNA/target duplexes. RNA, 10:1507-1517, 2004.

bibiserv.techfak.uni-bielefeld.de/rnahybrid