+ All Categories
Home > Documents > Lecture 13

Lecture 13

Date post: 19-Mar-2016
Category:
Upload: kaiya
View: 28 times
Download: 0 times
Share this document with a friend
Description:
Lecture 13. Cis-Regulation cont’d GREAT. Gene Regulation. gene (how to) control region (when & where). RNA gene. Protein coding. DNA. DNA binding proteins. Pol II Transcription. Key components: Proteins DNA sequence DNA epigenetics Protein components: General Transcription factors - PowerPoint PPT Presentation
Popular Tags:
50
http://cs273a.stanford.edu [Bejerano Fall10/11] 1
Transcript
Page 1: Lecture 13

http://cs273a.stanford.edu [Bejerano Fall10/11] 1

Page 2: Lecture 13

http://cs273a.stanford.edu [Bejerano Fall10/11] 2

Lecture 13

Cis-Regulation cont’dGREAT

Page 3: Lecture 13

http://cs273a.stanford.edu [Bejerano Fall10/11] 3

Gene Regulation

•gene (how to)•control region(when & where)

DNA

DNA bindingproteins

RNA geneProtein coding

Page 4: Lecture 13

http://cs273a.stanford.edu [Bejerano Fall10/11] 4

Pol II Transcription

Key components:• Proteins• DNA sequence• DNA epigenetics

Protein components:• General Transcription factors• Activators• Co-activators

Page 5: Lecture 13

http://cs273a.stanford.edu [Bejerano Fall10/11] 5

Enhancers

Page 6: Lecture 13

http://cs273a.stanford.edu [Bejerano Fall10/11] 6

Vertebrate Gene Regulation

gene (how to)control region(when & where)

DNA

proximal: in 103 letters

distal: in 106 letters

DNA bindingproteins

Page 7: Lecture 13

http://cs273a.stanford.edu [Bejerano Fall10/11] 7

Gene Expression Domains: Independent

Page 8: Lecture 13

http://cs273a.stanford.edu [Bejerano Fall10/11] 8

Distal Transcription Regulatory Elements

Page 9: Lecture 13

http://cs273a.stanford.edu [Bejerano Fall10/11] 9

Repressors / Silencers

Page 10: Lecture 13

http://cs273a.stanford.edu [Bejerano Fall10/11] 10

What are Enhancers?What do enhancers encode?Surely a cluster of TF binding sites.[but TFBS prediction is hard, fraught with false positives]What else? DNA Structure related properties?

So how do we recognize enhancers?Sequence conservation across multiple species[weak but generic]

Verifying repressors is trickier [loss vs. gain of function].

How do you predict an enhancer from a repressor? Duh...

repressors

repressorsRepressors

Page 11: Lecture 13

http://cs273a.stanford.edu [Bejerano Fall10/11] 11

Insulators

Page 12: Lecture 13

http://cs273a.stanford.edu [Bejerano Fall10/11] 12

Cis-Regulatory Components

Low level (“atoms”):• Promoter motifs (TATA box, etc)• Transcription factor binding sites (TFBS)Mid Level:• Promoter• Enhancers• Repressors/silencers• Insulators/boundary elements• Cis-regulatory modules (CRM)• Locus control regions (LCR)High Level:• Epigenetic domains / signatures• Gene expression domains• Gene regulatory networks (GRN)

Page 13: Lecture 13

http://cs273a.stanford.edu [Bejerano Fall10/11] 13

Disease Implications: Genes

genome

gene

protein

Limb Malformation

Over 300 genes alreadyimplicated in limb malformations.

Page 14: Lecture 13

http://cs273a.stanford.edu [Bejerano Fall10/11] 14

Disease Implications: Cis-Reg

genome

gene

NO proteinmade

Limb Malformation

Growing number of cases (limb, deafness, etc).

Page 15: Lecture 13

http://cs273a.stanford.edu [Bejerano Fall10/11] 15

Transcription Regulation & Human Disease

[Wang et al, 2000]

Page 16: Lecture 13

http://cs273a.stanford.edu [Bejerano Fall10/11] 16

Critical regulatory sequences

Lettice et al. HMG 2003 12: 1725-35

Single base changes

Knock out

Page 17: Lecture 13

http://cs273a.stanford.edu [Bejerano Fall10/11] 17

Other Positional Effects

[de Kok et al, 1996]

Page 18: Lecture 13

http://cs273a.stanford.edu [Bejerano Fall10/11] 18

Genomewide Association Studies point to non-coding DNA

Page 19: Lecture 13

http://cs273a.stanford.edu [Bejerano Fall10/11] 19

WGA Disease

Page 20: Lecture 13

9p21 Cis effects

http://cs273a.stanford.edu [Bejerano Fall10/11] 20

Follow up study:

Page 21: Lecture 13

http://cs273a.stanford.edu [Bejerano Fall10/11] 21

Cis-Regulatory Evolution: E.g., obile Elements

[Yass is a small town in New South Wales, Australia.]

Gene

Gene

What settings make these“co-option” events happen?

Gene

Gene

Page 22: Lecture 13

http://cs273a.stanford.edu [Bejerano Fall10/11] 22

Britten & Davidson Hypothesis: Repeat to Rewire!

[Britten & Davidson, 1971]

[Davidson & Erwin, 2006]

Page 23: Lecture 13

http://cs273a.stanford.edu [Bejerano Fall10/11] 23

Modular: Most Likely to Evolve?

Chimp Human

Page 24: Lecture 13

24

Human Accelerated Regions• Human-specific substitutions in conserved

sequences

24[Pollard, K. et al., Nature, 2006] [Prabhakar, S. et al., Science, 2008][Beniaminov, A. et al., RNA, 2008]

Human Chimp

Page 25: Lecture 13

http://GREAT.stanford.edu:Generating Functional Hypotheses from Genome-Wide Measurements

of Mammalian Cis-Regulation

25

Gill BejeranoDept. of Developmental Biology &

Dept. of Computer ScienceStanford University

http://bejerano.stanford.edu

Page 26: Lecture 13

http://bejerano.stanford.edu 26

Human Gene Regulation

All these cells have the same Genome.

Gene

Gene

Gene

Gene

20,000 Genes encode how to make proteins.

1,000,000 Genomic “switches” determinewhich and how much proteins to make.

1013 different cells in an adult human.

Hundreds of different cell types.

Page 27: Lecture 13

http://bejerano.stanford.edu 27

Most Non-Coding Elements likely work in cis…

9Mb

“IRX1 is a member of the Iroquois homeobox gene family. Members of this family appear to play multiple roles during pattern formation of vertebrate embryos.”

gene deserts

regulatory jungles

Every orange tick mark is roughly 100-1,000bp long, each evolves under purifying selection, and does not code for protein.

Page 28: Lecture 13

http://bejerano.stanford.edu 28

Many non-coding elements tested are cis-regulatory

Page 29: Lecture 13

http://bejerano.stanford.edu 29

Combinatorial Regulatory Code

Gene

2,000 different proteins can bind specific DNA sequences.

A regulatory region encodes 3-10 such protein binding sites.When all are bound by proteins the regulatory region turns “on”,

and the nearby gene is activated to produce protein.

Proteins

DNA

DNA

Protein binding site

Page 30: Lecture 13

ChIP-Seq: first glimpses of the regulatory genome in action

Cis-regulatory peak

3030http://bejerano.stanford.edu

Peak Calling

Page 31: Lecture 13

Gene transcription start site

What is the transcription factor I just assayed doing?

Cis-regulatory peak

3131http://bejerano.stanford.edu

• Collect known literature of the form• Function A: Gene1, Gene2, Gene3, ...• Function B: Gene1, Gene2, Gene3, ...• Function C: ...

• Ask whether the binding sites you discovered are preferentially binding (regulating) any one or more of the functions listed above.• Form hypothesis and perform further experiments.

Page 32: Lecture 13

Example: inferring functions of Serum Response Factor (SRF) from its ChIP-seq binding profile

32

Gene transcription start site

SRF binding ChIP-seq peak

• ChIP-seq identified 2,429 SRF binding peaks in human Jurkat cells1

• SRF is known as a “master regulator of the actin cytoskeleton”

• In the ChIP-Seq peaks, we expect to find binding sites regulating (genes involved in) actin cytoskeleton formation.

[1] Valouev A. et al., Nat. Methods, 2008http://bejerano.stanford.edu

Page 33: Lecture 13

Example: inferring functions of Serum Response Factor (SRF) from its ChIP-seq binding profile

33

Existing, gene-based method to analyze enrichment:

• Ignore distal binding events.

• Count affected genes.

• Rank by enrichment hypergeometric p-value.

Gene transcription start site

SRF binding ChIP-seq peakOntology term (e.g. ‘actin cytoskeleton’)

N = 8 genes in genomeK = 3 genes annotated withn = 2 genes selected by proximal peaksk = 1 selected gene annotated with

P = Pr(k ≥1 | n=2, K =3, N=8)

http://bejerano.stanford.edu

Page 34: Lecture 13

We have (reduced ChIP-Seq into) a gene list!What is the gene list enriched for?

34

Microarray tool

Microarray data

Microarray data

Deep sequencing

data

http://bejerano.stanford.edu

Pro: A lot of tools out there for the analysis of gene lists.Cons: These tools are built for microarray analysis.Does it matter ??

Page 35: Lecture 13

SRF Gene-based enrichment results

35

• Original authors can only state: “basic cellular processes, particularly those related to gene expression” are enriched1

[1] Valouev A. et al., Nat. Methods, 2008

SRF

SRF

SRF acts on genes both in nucleus and cytoplasm, that are involved in transcription and various types of binding

35http://bejerano.stanford.edu

Where’s the signal?Top “actin” term is ranked #28 in the list.

Page 36: Lecture 13

Associating only proximal peaks loses a lot of information

36

Relationship of binding peaks to nearest genes for eight human (H) and mouse (M) ChIP-seq datasets

Restricting to proximal peaks often leads to complete loss of key enrichments

http://bejerano.stanford.edu

Page 37: Lecture 13

Bad Solution: Associating distal peaks brings in many false enrichments

37

Why bad? 14% of human genes tagged ‘multicellular organismal development’. But 33% of base pairs have such a gene nearest upstream/downstream.

http://bejerano.stanford.edu

Term Bonferroni corrected p-valuenervous system development 5x10-9

system development 8x10-9

anatomical structure development 7x10-8

multicellular organismal development 1x10-7

developmental process 2x10-6

SRF ChIP-seq set has 2,000+ binding events.Throw a random set of 2,000 regions at the genome.

What do you get from a gene list analysis?Regulatory jungles are oftennext to key developmental genes

Page 38: Lecture 13

Real Solution: Do not convert to gene list.Analyze the set of genomic regions

38

Gene transcription start siteOntology term ( ‘actin cytoskeleton’)

P = Prbinom(k ≥5 | n=6, p =0.33)

p = 0.33 of genome annotated withn = 6 genomic regionsk = 5 genomic regions hit annotation

http://bejerano.stanford.edu

Gene regulatory domainGenomic region (ChIP-seq peak)

Since 33% of base pairs are near a ‘multicellular organismal development’ gene, we now expect 33% of genomic regions to hit this term by chance. => Toss 2,000 random regions at genome, get NO (false) enrichments.

GREAT = Genomic RegionsEnrichment of Annotations Tool

Page 39: Lecture 13

How does GREAT know how to assign distal binding peaks to genes?

39

Future: High-throughput assays based on chromosome conformation capture (3C) methods will elucidate complex regulation mechanisms

Currently: Flexible computational definitions allow assignment of peaks to nearest gene, nearest two genes, etc.

• Default: each gene has a “basal regulatory domain” of 5 kb up- and 1kb downstream of transcription start site, extends to basal domain of nearest genes within 1 Mb

• Though some associations may be missed or incorrect, in general signal richness and robustness is greatly improved by associating distal peaks

http://bejerano.stanford.edu

Page 40: Lecture 13

GREAT infers many specific functions of SRF from its binding profile

40

Ontology Term # Genes Binomial Experimental P-value support*

Gene Ontology actin cytoskeletonactin binding

7x10-9

5x10-5

Miano et al. 2007

Miano et al. 2007

* Known from literature – as in function is known, SOME of the genes are known, and the binding sites highlighted are NOT.

3031

Pathway Commons

TRAIL signalingClass I PI3K signaling

5x10-7

2x10-6

Bertolotto et al. 2000

Poser et al. 20003226

TreeFam 1x10-85 Chai & Tarnawski 2002

TF Targets Targets of SRFTargets of GABPTargets of YY1Targets of EGR1

5x10-76

4x10-9

1x10-6

2x10-4

Positive control

ChIp-Seq support

Natesan & Gilman 1995

84284423

Top gene-basedenrichments of SRF

Top GREAT enrichments of SRF

(top actin-related term 28th in list)

FOS gene family

http://bejerano.stanford.edu

Similar results for GABP, NRSF, Stat3, p300 ChIP-Seq

[McLean et al., Nat Biotechnol., 2010]

Page 41: Lecture 13

GREAT data integrated

41Michael Hiller

• Twenty ontologies spanning broad categories of biology• 44,832 total ontology terms tested in each GREAT run

(2,800 terms)(5,215)(834)

(5,781)(427)(456)

(150)(1,253)(288)(706)

(6,700)(3,079)(911)

(615)(19)(222)(9)

(6,857)(8,272)(238)

http://bejerano.stanford.edu

Page 42: Lecture 13

GREAT implementation

• Can handle datasets of hundreds of thousands of genomic regions• Testing a single ontology term takes ~1 ms• Enables real-time calculation of enrichment results for all ontologies

42http://bejerano.stanford.edu Cory McLean

Page 43: Lecture 13

43

GREAT web app: input page

Dave Bristor

Pick a genome assembly

Input BED regions of interest

http://great.stanford.edu

http://bejerano.stanford.edu

Page 44: Lecture 13

44

Additional ontologies,term statistics,multiple hypothesis corrections, etc.

GREAT web app: output summary

Ontology-specific enrichments

http://bejerano.stanford.edu

Page 45: Lecture 13

45

GREAT web app: term details page

Frame holding http://www.geneontology.org definition of “actin binding”

Genes annotated as “actin binding” with associated genomic regions

Genomic regions annotated with “actin binding”

Drill down to explore how a particular peak regulates Plectin and its role in actin binding

http://bejerano.stanford.edu

Page 46: Lecture 13

You can also submit any trackstraight from the UCSC Table Browser

46http://bejerano.stanford.edu

A simple, well documentedprogrammatic interface allowsany tool to submit directly to GREAT.See our Help. Inquiries welcome!

Page 47: Lecture 13

GREAT web app: export data

47

HTML output displays all user selected rows and columns

Tab-separated values also available for additional postprocessing

http://bejerano.stanford.edu

Page 48: Lecture 13

External Web Stats: Catching On

48http://bejerano.stanford.edu

last 500 entries only

Page 49: Lecture 13

• Current technologies identify cis-regulatory sequences• GREAT accurately assesses functional enrichments of cis-

regulatory sequences using a genomic region-based approach [McLean et al., Nat Biotechnol., 2010]

• Online tool available (version 1.5 coming soon, in QA) http://great.stanford.edu• GREAT is immediately applicable to all sets with a significant

cis-regulatory content:• Regulatory Chromatin Markers (e.g., H3K4me1)• Genome Wide Association Studies (GWAS)• Comparative Genomics sets

(e.g., ultraconserved elements)

49

Summary

http://bejerano.stanford.edu

Page 50: Lecture 13

Acknowledgments

GREAT developersCory McLeanDave BristorMichael HillerShoa ClarkeCraig LoweAaron WengerGill Bejerano

50

Other help Fah Sathira Marina Sirota Bruce Schaar Terry Capellini Christopher Meyer Jennifer Hardee

http://great.stanford.edu http://bejerano.stanford.edu


Recommended