Post on 09-Oct-2020
transcript
11/11/2014 1Jack Binysh MRC Internship
Analysis of Promoter Shifting
Using CAGE data
An insight into transcription regulation
Outline
• Background
Introduction to Promoters and Transcription Start Sites (TSS’s)
Classification of Promoters
Motivation for Project
• Project
CAGE data
Previous work
CAGEr
Results
Future Work
11/11/2014 Jack Binysh MRC Internship 2
Promoters and TSS’s
How is regulation achieved?
• Promoter region
Contains regulatory
elements (binding motifs,
CG enrichment…)
Controls gene expression
• Context specific
• Dynamic
• Associated epigentics
Histone placement
Histone ‘marks’
11/11/2014 Jack Binysh MRC Internship 3
Classification
• Correlation betweenseveral features
Broad vs Sharp
CG islands vs TATA
Ordered vs DisordedHistones
General vs Specificfunction
11/11/2014 Jack Binysh MRC Internship 4
CAGE Data
• Cap Analysis of Gene Expression
• mRNA captured, first ~20 bp
sequenced from 5’ end Tags
Full length estimated
Tags mapped to Genome
• TSS determination at bp resolution.
• Genome wide mapping of mRNA
transcription
• FANTOM5 – CAGE datasets
for many cell types
11/11/2014 Jack Binysh MRC Internship 5
A ‘TSS Profile’
Motivation for Project
• Already known that:
One Gene may have multiple types of promoter → regulated in
several ways
Variants of Transcription factors may exist in different cells
11/11/2014 Jack Binysh MRC Internship 6
Do the rules governing transcription change
between cell types? Both temporally (embryonic development) and spatially (in adult tissues) ?
•Focus on housekeeper genes – always expressed
•Look for changes in TSS profile between cell types…
CAGEr
11/11/2014 Jack Binysh MRC Internship 7
Input Output
Methods
CAGEset
CAGE bam
files
Availa
ble
resourc
es
Custo
m input
CTSS files
TSS
Tag
clusters
(TC)
Normalized
expression
Clustering in CAGEr
• Two levels of clustering
TSS profiles Tag clusters
Tag cluster Consensus
11/11/2014 Jack Binysh MRC Internship 8
S1
S2
S3
consensus cluster
TCs
CTSSs
TCs
•Tag clusters sample specific
•Consensus clusters the
same for all samples
Extending CAGEr
11/11/2014 Jack Binysh MRC Internship 9
•Large datasets
• 1 sample ~ 46 million tag sites
•FANTOM5 has hundreds of samples
•Pairwise comparisons of datasets O(n2)
• if 1 comparison takes ~ 1 hour, 60
samples takes ~ 10 weeks!
•Need to speed things up, avoid doing every
comparison, etc.
Dendrogram
11/11/2014 Jack Binysh MRC Internship 10
•67 cell types compared
•Most show very little
shifting –’bulk’
•~ 8 ‘outliers’
Cardiac Myocytes
Sertoli Cells
Hepatocytes
Hair follicle papilla
CD 19
Renal Glomerular
Neurons
Aortic Endothelial
Heatmap
11/11/2014 Jack Binysh MRC Internship 11
•Each outlier is separated from
every other cell type
•The difference between two
outliers is greater than the
difference between one outlier
and the ‘bulk’
•Suggests a different set of
shifting promoters in every
outlier
Scatter plots
11/11/2014 Jack Binysh MRC Internship 12
Dinucleotide Density plots
• Each cluster has two
dominant TSS’s – modal
and sample specific
• Look for dinucleotide
enrichment in sequences
• Initiator sequence at
modal CTSS visible
• No obvious motif at the
TSS of the outlier
11/11/2014 Jack Binysh MRC Internship 13
Cardiac Myocytes
Centered on outlier TSS
Cardiac Myocytes
Centered on modal TSS
Motif discovery
• Motif discovery finds no specific motifs 500 bp either side of the outlier
TSS in any of the samples.
• All of the samples show general GC enrichment
• ~80% clusters overlap 1 annotated CpG island, ~20% overlap none
11/11/2014 Jack Binysh MRC Internship 14
Cardiac Myocytes
Gene Ontology
• Gene Ontology analysis
Each cluster associated with nearest annotated TSS & entrezgene ID
Keywords tagged to each entrezgene ID
Statistics on over/under representation of Keywords
11/11/2014 Jack Binysh MRC Internship 15
Cardiac Myocytes
Gene Ontology
11/11/2014 Jack Binysh MRC Internship 16
Cardiac Myocytes
•Significantly over-represented Biological functions tend to be housekeeping –
not cell specific
•Perhaps the shifting promoters are not involved with cell specific gene function
at all?
Future Work
• Repeat analysis using different consensus clusters
Problems with thresholds within analysis
More promising recent dinucleotide maps
11/11/2014 Jack Binysh MRC Internship 17
Future Work
• Analysis of non-shifting promoters
Looking at more general changes in shape
Eg . Dot product, linear scaling
11/11/2014 Jack Binysh MRC Internship 18
Extra slides…
11/11/2014 Jack Binysh MRC Internship 19
Previous Results
11/11/2014 Jack Binysh MRC Internship 20
•Zebrafish embryonic development
•Initial RNA transcriptome inherited from
mother, zygotic gene activation at Mid Blastula
Transition
•Corresponds to change in TSS profile
Sharp Broad
Position of TSS’s shift
Shifting Promoters
•“Differential promoter interpretation by the
maternal and zygotic transcription
machinery”
Shifting Promoters
11/11/2014 Jack Binysh MRC Internship 21
Search for genetic structure
correlated with this shifting
•TATA like enrichment always found
~30 bp upstream in Maternal
•In Zygote, boundary 50 bp
downstream of TSS
•Majority of TATA- like motifs not
canonical TATA boxes (W box)
Two Independent Mechanisms
for Transcription Initiation
Nucleosome Location
11/11/2014 Jack Binysh MRC Internship 22
•H3K4me3 Nucleosome locations
estimated at 4 developmental stages
•Alignment with Zygotic, but not
Maternal, TSS, 50bp downstream
Same location as boundary
• Suggests Zygotic mechanism for
positioning nucleosomes after MBT
Internucleosomal Phasing Patterns
11/11/2014 Jack Binysh MRC Internship 23
•10 bp AA/TT dinucleotide enrichment
periodicity downstream of zygotic
TSS, but not maternal
•Weaker GC/AT enrichment pattern
matching nucleosome free and
wrapped DNA
•Zygotic,not maternal, TSS associated
with nucleosome positioning