Gene regulaon: ENCODE project and the Human Epigenome Atlas€¦ · The ENCODE Project hLps:// The...

Post on 11-Aug-2020

4 views 0 download

transcript

Generegula*on:ENCODEprojectandtheHumanEpigenomeAtlas

JoshuaW.K.Ho,PhD

Head,Bioinforma.csandSystemsMedicineLaboratoryVictorChangCardiacResearchIns.tute

NHMRC/Na.onalHeartFounda.onCareerDevelopmentFellow

SeniorLecturer(Conjoint),StVincent’sClinicalSchool,UNSW

hLp://bioinforma.cs.victorchang.edu.au

j.ho@victorchang.edu.au

UNSW,20thApr2017

Human genome has 23 pairs of chromosomes

2

Human somatic cells (ie, not gametes) have a diploid genome – one haploid genome from each parent Each chromosome is a double helix A haploid genome contains ~ 20,000 protein-coding genes

22 pairs of autosomes 1 pair of sex chromosomes Complete* human genome sequenced by 2003

Beyond DNA sequence: chromatin organisation is important for epigenetic regulation

3

Nucleosomecore=146bpofDNA+histoneoctamerLinkerDNA=~10-80bpinlengthDensitynucleosomeisanimportantdeterminantofchroma.naccessibility

One nucleosome

Transcription factor (TF), RNA polymerase, histone modification, chromatin accessibility, and DNA methylation

Wangetal.(2016)Computa3onalBiology&Bioinforma3cs:GeneRegula3on4

Histone modification is associated with genomic regulatory elements

5Ho(2012)BiophysicalReviews

H3K4me3=Trimethyla.onofhistoneH3atlysine4Promoters:H3K4me3&someH3K4me1&H3K27acEnhancers:H3K4me1&H3K27acandsomeH3K4me3Repressed:H3K27me3orH3K9me3

The ENCODE Project

hLps://www.encodeproject.org

TheEncyclopediaofDNAElements(ENCODE)Consor.umisaninterna.onalcollabora.onofresearchgroupsfundedbytheNa.onalHumangenomeResearchIns.tute(NHGRI).ThegoalofENCODEistobuildacomprehensivepartslistoffunc.onalelementsinthehumangenome,includingelementsthatactattheproteinandRNAlevels,andregulatoryelementsthatcontrolcellsandcircumstancesinwhichgeneisac.ve.

6

ENCODE – past and present •  Launched in September 2003 to identify all functional

elements in the human genome sequence – as a follow-up to the Human Genome Project.

•  2003-2007: The pilot phase and technology development phase. Test and compare existing method to rigorously analyse a defined portion (1%) of the human genome sequence.

•  2007-now: genome-wide data production phase

•  Major finding: showing that ~80% of the genome participate in at least one biochemical RNA- and/or chromatin-associated event in at least one cell type (ENCODE 2012, Nature)

7

Who are the ENCODE team?

hLps://www.encodeproject.org/about/contributors/ListofcurrentENCODEresearchgroups:hLp://www.genome.gov/26525220

UK

Spain

Israel

USA

8

ENCODE summary slides and videos hLps://www.genome.gov/27561910

hLps://www.encodeproject.org/tutorials/

9

Roadmap Epigenomics

hLp://www.roadmapepigenomics.org/

Definingthehumanepigenome

10

GTEx – Genotype-tissue expression

hLp://www.gtexportal.org/

Discoveringassocia.onbetweengenotype(e.g.,SNP)andgeneexpressioninhumans

11

FANTOM – another international effort

hLp://fantom.gsc.riken.jp/

Discoveryoffunc.onalelementsinH.sapiens(human)andM.musculus(mouse)

12

Related project: mouse ENCODE

hLp://www.mouseencode.org/

ENCODEforM.musculus(mouse)

13

Related project - modENCODE

hLp://www.modencode.org/

ENCODEforD.melanogaster(fruitfly)andC.elegans(roundworm)

14

What can we learn from ENCODE?

•  Genome-wide data –  RNA-seq, ChIP-seq, DNase-seq,…

•  Genome annotation –  Gene annotation, enhancers, chromatin state,…

•  Experimental protocols and best practices •  Bioinformatics analysis methods and software

hLps://www.encodeproject.org/

Landtetal.(2012)GenomeResearch15

ENCODE human cell lines •  ENCODE data were mostly generated from human

cell lines. Tier 1 cell lines are more widely assayed than tier 2 cell lines, and in turn are more widely assayed than tier 3 cell lines.

•  Tier 1: GM12878 (lymphoblastoid cell line, from a female HapMap individual); H1-hESC (embryonic stem cells); K562 (leukemia cell line)

•  Tier 2: 15 cell lines, including IMR90 (fetal lung fibroblasts), HUVEC (umbilical endothelial cells), HeLa-S3 (cervical carcinoma), MCF-7 (mammary gland), etc.

•  Tier 3: 338 cell lines hLps://genome.ucsc.edu/ENCODE/cellTypes.html 16

Experimental assays •  DNA methylation

–  Methyl Array, RRBS

•  Chromatin accessibility –  DNase-seq, FAIRE-seq

•  RNA binding proteins –  RIP tiling array, RIP-seq

•  Chromatin modification

–  ChIP-seq

•  TF binding –  ChIP-seq

•  Higher order chromatin structure –  5C, ChIA-PET

•  Replication –  Repli-chip, Repli-seq

hLps://genome.ucsc.edu/ENCODE/dataMatrix/encodeDataMatrixHuman.html

17

RNA-seq

18

RNA-seq reads in a browser

19

RNA-seqreads

Geneannota.onApplica.ons:1)  Discoveryofnoveltranscriptoralterna.vepromoter2)  Lookatalterna.vesplicing3)  Profilingofgeneexpression

ChIP-seq maps genome-wide DNA-protein interactions

20

•  ChIP = Chromatin-immunoprecipitation

•  Enrich for sequence fragments that are bound by a specific protein –  Transcription factors –  Chromatin-

associated proteins –  Histone proteins

Park(2009)NatureRev.Gene.cs

Transcrip*onfactor

Histonemodifica*on

Bulknucleosome

ChIP-seqprofiles

nucleosomeHistonewithamodifiedtail

DNATF

21

ChIP-seq

Hoetal.(2011)Tag-basedApproachesforNextGenera.onSequencing

Identifying chromatin landscape in human cell lines

22Ernstetal.(2011)NatureChroma.nstate=acombina.onofhistonemodifica.ons

Hoetal.(2014)Nature

Chromatin state can differ between organisms

23

Accessing ENCODE data from UCSC Genome Browser

Searchforgene,orgenomicintervalRef.genomeassemblySpecies

24

Many genomic ‘tracks’ in UCSC GB

Selecttrack

Click“Refresh”

25

ENCODE data and annotation on UCSC GB

Geneannota.on

Chroma.nstatetrack

ENCODEtrack’snamingformat:celltypeandexperiment:e.g.,K562H3K4me3

BLATresult

26

Mycustomtrack

Customtrackdata(BEDformatinthisexample)

You can upload custom tracks into UCSC Genome Browser

27

Using encodeproject.org

28

29

30

31

Downloaddata

32

33

34

Visualisedata

35

36

UseRegulomeDBtoiden.fyallSNPsaroundNKX2-5,andpredictoftheyarepoten.alregulatoryelements(promoter/enhancers)Trychr5:172,644,666-172,676,755athLp://regulomedb.org/

37

38

39

40

FactorBook:TheENCODEknowledgebaseoftranscrip.onfactors

hLp://www.factorbook.org/

41

42

43