+ All Categories
Home > Documents > How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y...

How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y...

Date post: 17-Jan-2016
Category:
Upload: pamela-griffith
View: 217 times
Download: 0 times
Share this document with a friend
19
How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G D D I BAXB_HUMAN L S E C L K R I G D E L BimS I A Q E L R R I G D E F HRK_HUMAN T A A R L K A L G D E L Egl-1 I G S K L A A M C D D F Statistica l representa tion G: 5 -> 71% S: 1 -> 14 % C: 1 -> Basic concept of motif identification 2.
Transcript
Page 1: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.

How do we represent the position specific preference ?

BID_MOUSE I A R H L A Q I G D E MBAD_MOUSE Y G R E L R R M S D E FBAK_MOUSE V G R Q L A L I G D D IBAXB_HUMAN L S E C L K R I G D E L BimS I A Q E L R R I G D E FHRK_HUMAN T A A R L K A L G D E LEgl-1 I G S K L A A M C D D F

Statistical representation

G: 5 -> 71%

S: 1 -> 14 %

C: 1 -> 14 %

Basic concept of motif identification 2.

Page 2: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.

Practice: identify potential transcription factor binding sites on a promoter

sequence.

Using TESS : Transcription Element Search System

http://www.cbil.upenn.edu/cgi-bin/tess/tess33?RQ=WELCOME

Page 3: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.

TESS result

Page 4: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.

Why there are many false positives for TF binding site scan?

Contextual dependency is not considered.

Stringency of the matrices.

Page 5: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.

Stringency of the matrices

A C G T Consens

us 40 13 23 23 N

20 3 70 5 G

55 3 40 0 R

0 93 0 5 C

53 8 8 30 W

15 0 3 82 T

0 0 100 0 G

0 50 0 50 Y

0 68 0 30 C

12 35 3 48 Y

A C G T

Consensus

4 0 13 0 G 5 0 12 0 G

15 0 2 0 A 0 17 0 0 C

17 0 0 0 A 0 0 0 17 T 0 0 17 0 G 0 13 0 4 C 0 17 0 0 C 0 17 0 0 C 0 0 17 0 G 0 0 17 0 G 2 0 15 0 G 0 17 0 0 C

17 0 0 0 A 0 0 0 17 T 0 0 17 0 G 0 2 0 15 T 0 13 0 4 C 0 7 2 7 Y P53_01

P53_02

Consensus –10 bp

Consensus –20 bp

Page 6: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.

DNA Pattern – Transcription factor binding site

• Pattern strings / Matrixes are extracted from known binding sequence.

• Core vs whole.

• Some short and/or ambiguous patterns will have many hits.

Page 7: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.

Sequence logo

Info N A C G T Consensus

1 0.679 27 0 5 17 5 G

2 0.883 27 6 2 19 0 G

3 1.771 27 1 0 26 0 G

4 1.619 27 25 2 0 0 A

5 2 27 0 0 0 27 T

6 1.771 27 0 0 1 26 T

7 1.771 27 26 0 0 1 A

8 0.192 27 8 2 11 6 R

1.0

2.0 Information

content

Page 8: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.

Comparing genomes

For understanding genome organization.

For identifying functionally conserved region / sequences. 3’, 5’ UTR (eg. microRNA binding sites) Transcription factor binding sites /

regulatory modules.

Page 9: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.

Vista Genome Browser

Practice & Observe: cross genome comparison using vista browser

Page 10: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.

Identifying conserved regulatory modules

• Regulatory module: a set of TF binding sites that controls a particular aspects of transcriptional regulation.

• Functional requirement conservation at the binding site (sequence) level.

Page 11: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.

Ways to Identify conserved regulatory modules

• Based on sequence similarity: MEME, rVista, Whole genome rVista for model

organisms…

• Based on binding site identity: BLISS

Page 12: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.

Practice: Identifying conserved TF binding sites using rVista

1.) Search for your gene in Whole genome rVista.

Or

2.) Compile corresponding genomic region from different species (can be >2). Load to rVista. This can be used for identifying shared regulatory modules in related genes in the same organism as well.

Page 13: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.

rVista

Practice & Observe: Load genomic sequences from Human, Rat, and Opossum to rVista. Choose TF matrices (e.g. E2F, P53, ATF, etc)

Page 14: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.

Representation of Deep Seq data

chr2L 10000192 10000217 U0 0 + chr2L 10000227 10000252 U1 0 -chr2R 10000310 10000335 U2 0 +chr3L 10000496 10000521 U1 0 -chr21 10000556 10000581 U2 0 +

Chrom. Start End name Scor Strand

Page 15: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.

Representation of Deep Seq data

The importance of reference genome

• All coordinates are only meaningful for a given genome assembly.

• One assembly may have multiple releases (annotations).

Page 16: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.

Manipulating Deep Seq data with Galaxy

Practice & Observe:

1.Load the PolII.H99.Bed file to Galaxy with the Get Data tool.

2.Sort data based on chromosome location c2.

3.Filter out lines with U0 with the expression c4!=‘U2’

Page 17: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.

Visualizing Deep Seq data with UCSC genome browser

Practice & Observe I:

1.Load the PolII.H99.Bed file as custom track to the browser by copy/past the URL link.

2.View ‘dense’ and then ‘full’ presentation of the track.

Page 18: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.

Visualizing Deep Seq data with UCSC genome browser

Practice & Observe II:

1.Save the landmark.bed file to your local computer. View the contents with Notepad.

2.Load the local file to UCSC browser.

3.Edit the color value, save, resubmit, and observe the differences.

Page 19: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.

Apollo Genome annotation tools

Observe: Using Apollo to organize information for studying complex genomic regions.


Recommended