1
cis-regulatory element study in transcriptome
Jin ChenCSE891-001
Fall 2012
2
What is Cis-element
Courey and Jia (2001)
A cis-regulatory element or cis-element is a region of DNA or RNA that regulates the expression of genes located on that same molecule of DNA
Latin word “cis” means "on the same side as"
3
Cis-element properties• Typically found in 5’ untranscribed region of the
gene (promoter region)
• Can be specific sites for binding of activators or repressors
• Position and orientation of cis-element relative to transcriptional start site is usually fixed
4
Cis-element properties
• Short sequences• Recurring patterns • Sequence-specific binding sites
5
Cis-element Representations
A G T A T AA G A T T AC G A C T CA G T G T AA G T G T G
Consensus sequence:
Prob(A) 0.8 0 0.4 0.2 0 0.6
Prob(C) 0.2 0 0 0.2 0 0.2
Prob(G) 0 1 0 0.4 0 0.2
Prob(T) 0 0 0.6 0.2 1 0
ProbabilityMatrix & sequence logo:
A G W N T A
Sequence 1:Sequence 2:Sequence 3:Sequence 4:Sequence 5:
6
Cis-element Representation 1
• Consensus based method– Refer to a sequence that
matches all examples of the binding site closely but not exactly
– Trade-off between ambiguity and sensitivity
code descriptionA AdenineC CytosineG GuanineT ThymineU UracilR Purine (A or G)Y Pyrimidine (C, T, or U)M C or AK T, U, or GW T, U, or AS C or GB C, T, U, or G (not A)D A, T, U, or G (not C)H A, T, U, or C (not G)
V A, C, or G (not T, not U)
N Any base (A, C, G, T, or U)
IUPAC codes
7
Cis-element Representation 2
• Sequence logos
– A visual representation of the probability matrix
– The total height of each column is proportional to its information content
http://www-lmmb.ncifcrf.gov/~toms/sequencelogo.html
8
Cis-element matching/discovery
• Pattern Matching– Discovery patterns in sequences from co-regulated
genes using JASPAR and TRANSFAC matrices– Pscan
• Pattern Discovery– Discovery patterns in sequences from co-regulated
genes without using known patterns – MEME, hmmbuild
9
Pattern Matching
http://www.slideshare.net/Stewbacca/dna-motif-finding-2010
10
Pattern Matching
11
Pattern Matching
12
http://159.149.109.9/pscan/
13
14
15
Cis-element evolution
• Composition• Location• Modules
chiken aA
mouse aA
mouse d1
Gene control regions for eye lens chrystallins
Molecular Biology of the Cell, Alberts et al., 4th ed.
16
Large Scale Analysis
• Identify 264 co-regulated gene groups in S. serevisiae• Putative cis-regulatory elements– 80 known consensus binding sites– 597 elements by motif discovery with MEME
• Score enrichment of genes containing each putative element- 42 cis-elements in 35 unique groups
• Orthologous modules in other species• Enrichment of orthologous modules
A. P. Gasch et al., PLoS Biol., 2004
17
Conservation of S. cerevisiae motifsG1 phase cell cycle ACGCG MCBAmino acid biosynthesis TGACTM Gcn4pNitrogen source GATAA GATA factors
Proteasome GGTGGCAAA Rpn4p
18
Positions of binding sites
• Non random distribution• Similar across species• No correlations in locations
across species
19
Spacing between binding sitesin Methionine Biosynthesis genes
• Small distance between Cbf1p and Met31/32p• Conserved across species• Independent of exact positions
20
Control of iron metabolism in Mycobacterium tuberculosis. Rodriguez, Marcela. Trends in Microbiology, 2006.
21
Poisson Method for module discovery
Look for matches to consensus sequences
Mcm1 : DCCYWWWNNRG
Ste12 : TGAAACA
Random DNA sequence:
“Pearson type III distribution”:
2
( 2)!k axa
pdf x ax ek
Exponential distribution:
axpdf x ae
Wagner A (1999) Bioinformatics 15(10): 776-784
22
Cister & Comet
DNA sequence segment
Prob segment | cluster modelscore segment ln
Prob segment | random model
Cluster model:
Poisson-distributed cis-elements, embedded in random DNA
Frith MC, Hansen U, Weng Z (2001) Bioinformatics 17(10): 878-889. Frith MC, Spouge JL, Hansen U, Weng Z (2002) Nucleic Acids Research