cis -regulatory element study in transcriptome

Post on 24-Feb-2016

43 views 0 download

Tags:

description

cis -regulatory element study in transcriptome. Jin Chen CSE891- 001 Fall 2012. What is Cis-element. Latin word “cis” means "on the same side as". Courey and Jia (2001). - PowerPoint PPT Presentation

transcript

1

cis-regulatory element study in transcriptome

Jin ChenCSE891-001

Fall 2012

2

What is Cis-element

Courey and Jia (2001)

A cis-regulatory element or cis-element is a region of DNA or RNA that regulates the expression of genes located on that same molecule of DNA

Latin word “cis” means "on the same side as"

3

Cis-element properties• Typically found in 5’ untranscribed region of the

gene (promoter region)

• Can be specific sites for binding of activators or repressors

• Position and orientation of cis-element relative to transcriptional start site is usually fixed

4

Cis-element properties

• Short sequences• Recurring patterns • Sequence-specific binding sites

5

Cis-element Representations

A G T A T AA G A T T AC G A C T CA G T G T AA G T G T G

Consensus sequence:

Prob(A) 0.8 0 0.4 0.2 0 0.6

Prob(C) 0.2 0 0 0.2 0 0.2Prob(G) 0 1 0 0.4 0 0.2Prob(T) 0 0 0.6 0.2 1 0

ProbabilityMatrix & sequence logo:

A G W N T A

Sequence 1:Sequence 2:Sequence 3:Sequence 4:Sequence 5:

6

Cis-element Representation 1• Consensus based method

– Refer to a sequence that matches all examples of the binding site closely but not exactly

– Trade-off between ambiguity and sensitivity

code descriptionA AdenineC CytosineG GuanineT ThymineU UracilR Purine (A or G)Y Pyrimidine (C, T, or U)M C or AK T, U, or GW T, U, or AS C or GB C, T, U, or G (not A)D A, T, U, or G (not C)H A, T, U, or C (not G)

V A, C, or G (not T, not U)

N Any base (A, C, G, T, or U)

IUPAC codes

7

Cis-element Representation 2

• Sequence logos

– A visual representation of the probability matrix

– The total height of each column is proportional to its information content

http://www-lmmb.ncifcrf.gov/~toms/sequencelogo.html

8

Cis-element matching/discovery

• Pattern Matching– Discovery patterns in sequences from co-regulated

genes using JASPAR and TRANSFAC matrices– Pscan

• Pattern Discovery– Discovery patterns in sequences from co-regulated

genes without using known patterns – MEME, hmmbuild

9

Pattern Matching

http://www.slideshare.net/Stewbacca/dna-motif-finding-2010

10

Pattern Matching

11

Pattern Matching

12http://159.149.109.9/pscan/

13

14

15

Cis-element evolution

• Composition• Location• Modules

chiken aA

mouse aA

mouse d1

Gene control regions for eye lens chrystallins

Molecular Biology of the Cell, Alberts et al., 4th ed.

16

Large Scale Analysis• Identify 264 co-regulated gene groups in S. serevisiae• Putative cis-regulatory elements– 80 known consensus binding sites– 597 elements by motif discovery with MEME

• Score enrichment of genes containing each putative element- 42 cis-elements in 35 unique groups

• Orthologous modules in other species• Enrichment of orthologous modules

A. P. Gasch et al., PLoS Biol., 2004

17

Conservation of S. cerevisiae motifsG1 phase cell cycle ACGCG MCBAmino acid biosynthesis TGACTM Gcn4pNitrogen source GATAA GATA factors

Proteasome GGTGGCAAA Rpn4p

18

Positions of binding sites

• Non random distribution• Similar across species• No correlations in locations

across species

19

Spacing between binding sitesin Methionine Biosynthesis genes

• Small distance between Cbf1p and Met31/32p• Conserved across species• Independent of exact positions

20Control of iron metabolism in Mycobacterium tuberculosis. Rodriguez, Marcela. Trends in Microbiology, 2006.

21

Poisson Method for module discovery

Look for matches to consensus sequences

Mcm1 : DCCYWWWNNRGSte12 : TGAAACA

Random DNA sequence:

“Pearson type III distribution”:

2

( 2)!k axapdf x ax e

k

Exponential distribution:

axpdf x ae

Wagner A (1999) Bioinformatics 15(10): 776-784

22

Cister & Comet

DNA sequence segment

Prob segment | cluster modelscore segment ln

Prob segment | random model

Cluster model:

Poisson-distributed cis-elements, embedded in random DNA

Frith MC, Hansen U, Weng Z (2001) Bioinformatics 17(10): 878-889. Frith MC, Spouge JL, Hansen U, Weng Z (2002) Nucleic Acids Research