+ All Categories
Home > Documents > RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

Date post: 24-Dec-2015
Category:
Upload: norah-anis-henderson
View: 219 times
Download: 0 times
Share this document with a friend
Popular Tags:
30
RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcript
Page 1: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

RNA-Seq and RNA Structure Prediction

Xiaole Shirley Liu

STAT115, STAT215, BIO298, BIST520

Page 2: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

Outline

• RNA-seq– Experiments– Analysis: read mapping, expression index,

isoform inference, differential expression

• RNA structure prediction– Covariance model

– Base-pair maximization

– Free energy method

2

Page 3: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

RNA-seq

Mortazavi et al, Nat Meth 20083

Page 4: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

RNA-Frag has Less 3’ Biase

Wang et al. 20094

Page 5: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

RNA-Seq: Alternative to Microarrays

• General expression profiling

• Novel genes

• Alternative splicing

• Detect gene fusion

• Can use on any sequenced genome

• Better dynamic range

• Cleaner and more informative data

• Data analysis challenges5

Page 6: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

Mapping

• Bowtie or Maq mapping identify transcribed known or novel exons

• Longer (e,g. 100bp)

paired-end libraries are

better

6

Page 7: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

Transcript Abundances

• More reads mapped to longer genes

• More reads mapped if sequencing is deep

• RPKM: reads per kb transcript per million reads: 1 RPKM ~ 0.3 -1 transcript / cell

• Low technical noise

(Poisson distribution)

but high biological noise

(over dispersion, neg

binomial)7

Page 8: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

Different Alternative Splicing

8

Page 9: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

Isoform Inference

• If given known set of isoforms

• Estimate x to maximize the likelihood of observing n

9

Page 10: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

Known Isoform Abundance Inference

10

Page 11: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

De novo isoform inference

11

Page 12: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

Isoform Inference

• With known isoform set, sometimes the gene-level expression level inference is great, although isoform abundances have big uncertainty

(e.g. known set is not complete)

• De novo isoform inference is a non-identifiable problem with current RNA-seq protocol and (short) read length

(e.g. exon and isoform numbers are big)12

Page 13: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

Gene Fusion

• Down regulation of tumor suppressor or up regulation of oncogenes

Maher et al, Nat 200913

Page 14: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

A Few Algorithms

• Expression index and isoform inference– Cufflinks from Steve Salzburg– Rseq from Wing Wong– Scripture from Aviv Regev

• Differential expression– Cufflinks– DESeq from Wolfgang Huber– EdgeR from Gordon Smyth– Replicates are still preferred!

• Still need systematic evaluation 14

Page 15: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

15

Why do we Care?

• RNA (tRNA, rRNA) structure determines function

• Many non-coding RNA genes have special structure, which leads to special functions– ncRNA genes later

Mostly RNA 2nd structure: G-C and A-U;G-U

Page 16: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

16

Simple RNA Structures

Page 17: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

17

More Complex Interactions

• Kissing hairpins

• Pseudoknots

• Hairpin-bulge contact

Page 18: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

18

RNA Structure Representations

Page 19: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

19

Covariance Models

• Get related RNA sequences, obtain multiple sequence alignment– E.g. orthologous RNA from many species or family of

RNA believed to have similar structure and function

– Require sequences be similar enough so that they can be initially aligned

• Look at every pair of columns and check for covarying substitutions– Sequences should be dissimilar enough for covarying

substitutions to be detected

Page 20: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

20

Base-Pair Maximization

• Find structure with the max # of base pairs

• Efficient dynamic programming solution introduced by Nussinov (1970s)

• Compare a sequence against itself in a dynamic programming matrix

• Since structure folds upon itself, only necessary to calculate half the matrix

• Four rules for scoring the structure at a particular point

Page 21: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

21

Nussinov Algorithm

• Initialization: score for complementary matches along main diagonal and diagonal just below it are set to zero

Page 22: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

22

Nussinov Algorithm

• Fill matrix: M[i][j] = max of the following– M[i+1][j-1] + S(xi, xj)

– M[i+1][j]– M[i][j-1]

– M[i][j] = MAXi<k<j (M[i][k] + M[k+1][j])

Page 23: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

23

Nussinov Algorithm

• Fill diagonal by diagonal (assume no bulge penalty, similar to SW gap penalty)

i

j

Page 24: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

24

Nussinov Algorithm

• Trace back from upper right corner to get the structure

Page 25: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

25

Free Energy Method

• Mfold: Mathews, JMB 1999• Predict the correct secondary structure by

minimizing the free energy (G)• Energy: Base pairing and base stacking

Page 26: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

26

Energy Factors

• Consecutive basepairing,

good

• Internal bulge, bad

• Terminal basepairing,

not stable

• Hairpin loop, interior

and bulge loop destabilize energies

Page 27: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

27

Energy Minimization

• Assume: the most likely structure is the most stable structure energetically

• Energy associated with any position is only influenced by local sequence and structure

• Does not consider pseudoknot formation• Dynamic program

Page 28: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

28

Energy Minimization

Page 29: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

29

Vienna RNA Package

• Vienna RNA web

Page 30: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

30

Summary• RNA-seq

– Cutflinks: read mapping, expression index, isoform inference, differential expression

– Different technique, analysis, and output for different tasks

– Awaiting RMA of RNA-seq

– 3rd generation sequencing might read whole transcript

• RNA structure prediction methods– Covariance model: mutual information

– Base-pair maximization: Nussinov

– Free energy method: Mfold, Vienna RNA

– Caution: best is the enemy of the good


Recommended