MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu

Post on 23-Feb-2016

24 views 1 download

Tags:

description

CS173. Lecture 6 : NON protein coding genes. MW  11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu. Announcements. HW1 due in one week. - PowerPoint PPT Presentation

transcript

http://cs173.stanford.edu [BejeranoWinter12/13] 1

MW  11:00-12:15 in Beckman B302Prof: Gill BejeranoTAs: Jim Notwell & Harendra Guturu

CS173

Lecture 6: NON protein coding genes

http://cs173.stanford.edu [BejeranoWinter12/13] 2

Announcements• HW1 due in one week.

http://cs173.stanford.edu [BejeranoWinter12/13] 3

TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAATTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATAAAG

http://cs173.stanford.edu [BejeranoWinter12/13] 4

“non coding” RNAs (ncRNA)

5

Central Dogma of Biology:

6

Active forms of “non coding” RNA

reverse transcription

long non-coding RNA

microRNA

rRNA, snRNA, snoRNA

7

What is ncRNA?• Non-coding RNA (ncRNA) is an RNA that functions without being

translated to a protein.• Known roles for ncRNAs:

– RNA catalyzes excision/ligation in introns.– RNA catalyzes the maturation of tRNA.– RNA catalyzes peptide bond formation.– RNA is a required subunit in telomerase.– RNA plays roles in immunity and development (RNAi).– RNA plays a role in dosage compensation.– RNA plays a role in carbon storage.– RNA is a major subunit in the SRP, which is important in protein trafficking.– RNA guides RNA modification.

– RNA can do so many different functions, it is thought in the beginning there was an RNA World, where RNA was both the information carrier and active molecule.

http://cs173.stanford.edu [BejeranoWinter12/13] 8

“non coding” RNAs (ncRNA)

Small structural RNAs (ssRNA)

9

AAUUGCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAGUCUCAGGGGAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAAGCUGACGGACAUGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAGAUCUUCUGUUGAUAUGGAUGCAGUUCA

ssRNA Folds into Secondary and 3D Structures

P 6b

P 6a

P 6

P 4

P 5P 5a

P 5b

P 5c

120

140

160

180

200

220

240

260

AAU

UGCGGG

AAA

GGGGUCA

ACAGCCG UUCAGU

ACCA

AGUCUCAGGGGAAA

CUUUGAGAU

GGCCUUGCA A A G G G U A U

GGUAA

UA AGC

UGACGGACA

UGGUCC

UAA

CCA CGCA

GC

CAA

GUCC

UAA G

UCAACAG

AU C U

UCUGUUGAU

AU

GGAU

GC

AGU

UC A

Cate, et al. (Cech & Doudna).(1996) Science 273:1678.

Waring & Davies. (1984) Gene 28: 277.

We would like to predict them from sequence.

For example, tRNA

tRNA Activity

ssRNA structure rules• Canonical basepairs:

– Watson-Crick basepairs:• G – C• A – U

– Wobble basepair:• G - U

• Stacks: continuous nested basepairs. (energetically favorable)

• Non-basepaired loops:– Hairpin loop– Bulge– Internal loop– Multiloop

• Pseudo-knots

Ab initio RNA structure prediction: lots of Dynamic Programming

• Objective: Maximizing the number of base pairs (Nussinov et al, 1978)

simple model:(i, j) = 1 if allowedfancier model:GC > AU > GU

Pseudoknots drastically increase computational complexity

http://cs273a.stanford.edu [Bejerano Fall11/12] 15

Objective: Minimize Secondary StructureFree Energy at 37 OC:

C G U U U G G GUU

CACAAACG

-2 .0

-2 .1

-0 .9

-0 .9

-1 .8

-1 .6

+ 5 .0

G helix = GC GG C

+ G

G UC A

+ 2G

U UA A

+ G

U GA C

=

-2.0 kcal/mol - 2.1 kcal/mol + 2x(-0.9) kcal/mol - 1.8 kcal/mol = -7.7 kcal/mol

Ghairpin loop = G

initiation (6 nucleotides) + Gmismatch

G GC A

=

5.0 kcal/mol - 1.6 kcal/mol = 3.4 kcal/mol

Gtotal = G

hairpin + Ghelix = 3.4 kcal/mol - 7.7 kcal/mol = -4.3 kcal/mol

Mathews, Disney, Childs, Schroeder, Zuker, & Turner. 2004. PNAS 101: 7287.

Instead of (i, j), measure and sum energies:

Zuker’s algorithm MFOLD: computing loop dependent energies

Bafna 1

RNA structure

• Base-pairing defines a secondary structure. The base-pairing is usually non-crossing.

S aSu

S cSg

S gSc

S uSa

S a

S c

S g

S u

S SS

1. A CFG

S aSu

acSgu

accSggu

accuSaggu

accuSSaggu

accugScSaggu

accuggSccSaggu

accuggaccSaggu

accuggacccSgaggu

accuggacccuSagaggu

accuggacccuuagaggu

2. A derivation of “accuggacccuuagaggu”3. Corresponding structure

Stochastic context-free grammar

Cool algorithmics. Unfortunately…

– Random DNA (with high GC content) often folds into low-energy structures.

– We will mention powerful newer methods later on.

ssRNA transcription• ssRNAs like tRNAs are usually encoded by short

“non coding” genes, that transcribe independently.• Found in both the UCSC “known genes” track, and

as a subtrack of the RepeatMasker track

http://cs173.stanford.edu [BejeranoWinter12/13] 20

http://cs173.stanford.edu [BejeranoWinter12/13] 21

“non coding” RNAs (ncRNA)

microRNAs (miRNA/miR)

http://cs173.stanford.edu [BejeranoWinter12/13] 22

MicroRNA (miR)

mRNA

~70nt ~22nt miR match to target mRNAis quite loose.

a single miR can regulate the expression of hundreds of genes.

http://cs173.stanford.edu [BejeranoWinter12/13] 23

MicroRNA Transcription

mRNA

http://cs173.stanford.edu [BejeranoWinter12/13] 24

MicroRNA Transcription

mRNA

http://cs173.stanford.edu [BejeranoWinter12/13] 25

MicroRNA (miR)

mRNA

~70nt ~22nt miR match to target mRNAis quite loose.

a single miR can regulate the expression of hundreds of genes.

Computational challenges:Predict miRs.Predict miR targets.

http://cs173.stanford.edu [BejeranoWinter12/13] 26

MicroRNA Therapeutics

mRNA

~70nt ~22nt miR match to target mRNAis quite loose.

a single miR can regulate the expression of hundreds of genes.

Idea: bolster/inhibit miR production tobroadly modulate protein productionHope: “right” the good guys and/or “wrong” the bad guysChallenge: and not vice versa.

http://cs173.stanford.edu [BejeranoWinter12/13] 27

Other Non Coding Transcripts

lncRNAs (long non coding RNAs)

http://cs173.stanford.edu [BejeranoWinter12/13] 28

Don’t seem to fold into clear structures (or only a sub-region does).Diverse roles only now starting to be understood.

Hard to detect or predict function computationally (currently)

http://cs173.stanford.edu [BejeranoWinter12/13] 29

lncRNAs come in many flavors

X chromosome inactivation in mammals

X X X Y

X

Dosage compensation

Xist – X inactive-specific transcript

Avner and Heard, Nat. Rev. Genetics 2001 2(1):59-67

http://cs173.stanford.edu [BejeranoWinter12/13] 32

Transcripts, transcripts everywhere

Human Genome

Transcribed* (Tx)

Tx from both strands*

* True size of set unknown

Or are they?

http://cs173.stanford.edu [BejeranoWinter12/13] 33

http://cs173.stanford.edu [BejeranoWinter12/13] 34

The million dollar question

Human Genome

Transcribed* (Tx)

Tx from both strands*

* True size of set unknown

Leaky tx?

Functional?

Coding and non-coding gene production

http://cs173.stanford.edu [BejeranoWinter12/13] 35

The cell is constantly making new proteins and ncRNAs.

These perform their function for a while,

And are then degraded.

Newly made coding and non coding gene products take their place.

The picture within a cell is constantly “refreshing”.

To change its behavior a cell can change the repertoire of genes and ncRNAs it makes.

Cell differentiation

http://cs173.stanford.edu [BejeranoWinter12/13] 36

To change its behavior a cell can change the repertoire of genes and ncRNAs it makes.

That is exactly what happens when cells differentiate during development from stem cells to their different final fates.

Human manipulation of cell fate

http://cs173.stanford.edu [BejeranoWinter12/13] 37

To change its behavior a cell can change the repertoire of genes and ncRNAs it makes.

We have learned (in a dish) to:1 control differentiation2 reverse differentiation3 hop between different states

Cell replacement therapies

http://cs173.stanford.edu [BejeranoWinter12/13] 38

We want to use this knowledge to provide a patient with healthy self cells of a needed type.

We have learned (in a dish) to:1 control differentiation2 reverse differentiation3 hop between different states

(iPS = induced pluripotent stem cells)

How does this happen?

http://cs173.stanford.edu [BejeranoWinter12/13] 39

Different cells in our body hold copies of (essentially) the same genome.

Yet they express very different repertoires of proteins and non-coding RNAs.

How do cells do it?

A: like they do everything else: using their proteins & ncRNAs…

Gene Regulation

http://cs173.stanford.edu [BejeranoWinter12/13] 40

Some proteins and non coding RNAs go “back” to bind DNA near genes, turning these genes on and off.

GeneDNA

Proteins

To be continued…

ReviewLecture 6• Central dogma recap

–Genes, proteins and non coding RNAs• RNA world hypothesis• Small structural RNAs

–Sequence, structure, function–Structure prediction–Transcription mode

• MicroRNAs–Functions–Modes of transcription

• lncRNAs–Xist

• Genome wide (and context wide) transcription–How much?–To what goals?

• Gene transcription and cell identity–Cell differentiation–Human manipulation of cell fates

• Gene regulation control

http://cs173.stanford.edu [BejeranoWinter12/13] 41

http://cs173.stanford.edu [BejeranoWinter12/13] 42

(On Mondays) ask students to stack the chairs without wheels at the back of the room at the end of class.