ChIP-chip Data
DNA-binding proteins
• Constitutive proteins (mostly histones)– Organize DNA– Regulate access to DNA– Have many modifications
• Acetylation, methylation, …
• Sporadic proteins (Transcription Factors)– Mediate docking of transcription apparatus– Modify histones– Methylate DNA
Histones
Histones are an ancient family of proteins which serve as the scaffold for DNA
Four types of histones assemble in pairs to form a nucleosome
DNA is wrapped twice around each nucleosome
Histones and Modifications
DNA contacts histones on their tails Histone tails can be modified
Histones can stay loose or assemble tightly – this compacts the DNA
Transcription Factors
• General – help to set up transcription of many genes
• Specific – draw in general factors or RNA Pol II to specific genes
TATABindingProtein
DNA Methylation
Adding a Methyl to Cytosine
Cytosine methylation is passed on to daughter cells
Chromatin Immuno-precipitation
Tiling Array
• One probe every n base pairs over some length of chromosome
– Interrupted by repeat regions
• Promoter array: each (known) promoter tiled
An Affymetrix tiling design
What the data look like
__ _
_
__ _ __
_
_____
1206600 1206800 1207000 1207200 1207400
-2-1
01
23
4
loc[nn]
lr(e
co
g1
.h3
k9
)[n
n, ]
_
__
_
_ _ _ ___ _____
__ _
_
_ __
___ ___
__
__
__
__
____
_____
__
_ _
_ __
___
____
_
_
_
_
_
_ _ _ ___
_
____
__
_
_
_ __
____
___
_
__
_ _
_ _
__
__ _
__
__
___
_
__
_ ___ _
_
___
__ _
_
_
__ ___
_
____
___
_
__ _
___
_
___
_
__
_ _
_ _
_
_
_
_
_
_
___
__ _
_
_ _
_ ____
_
___
__ __
_ _ _
___ _
____
histone acetylation on 15 samples over one promoter (raw)
Multiple Promoters
----
--
--
----
-------------
--------
-
-
-
- ---
--
--
-
-
-
------------- -
-
------------------
---------
---
10120000 10125000 10130000 10135000
-4-3
-2-1
01
2
loc[mm]
log
.R[m
m, ] -
log
.G[m
m, ]
---
-
------
-
------
---------
-----------
-----
--------
---------- --
---
-------
-----------------
- -
-
-
----
-
--
--
---
-
-
---------------------- --------
------
---
------
-- -
--
-------
--
--
----
--------
--
--
-
-------
--
-
-
-
--
---
-
-
------------
-
-
-
----
--
-
--
-----
-
-
----
-
---
-
-
----
----
---
---
-
-
-
--
-
-
-
--
-----
--
Normalized by Medians
----
--
--
----
---------
----
---
---
--
-
-
-
- ---
--
--
-
-
-
------
---
---- -
-
------------------
-----
----
---
10120000 10125000 10130000 10135000
-2-1
01
23
loc[mm]
xx
---
-
------
-
-----
-
---------
----------
---
---
------------------ --
---
-------
----
-------------- -
--
---
-
-
--
--
---
-
-
---------------------- --------
------
---
------
-
- -
--
-------
--
--
----
--------
--
--
-
-
------
--
-
-
-
--
---
-
-
------------
-
-
-
----
--
-
--
-----
-
-
----
-
---
-
-
----
----
---
---
-
-
-
--
-
-
-
--
-----
--
Methods and Issues
• Normalization– Different enrichment ratios– Different probe thermodynamics– Dye and probe bias
• Estimation– Categorical or continuous?– Individual values are noisy:
• For TF binding: where is the peak?----
--
--
----
---------
----
---
---
--
-
-
-
- ---
--
--
-
-
-
------
---
---- -
-
------------------
-----
----
---
10120000 10125000 10130000 10135000
-2-1
01
23
loc[mm]
xx
---
-
------
-
-----
-
---------
----------
---
---
------------------ --
---
-------
----
-------------- -
--
---
-
-
--
--
---
-
-
---------------------- --------
------
---
------
-
- -
--
-------
--
--
----
--------
--
--
-
-
------
--
-
-
-
--
---
-
-
------------
-
-
-
----
--
-
--
-----
-
-
----
-
---
-
-
----
----
---
---
-
-
-
--
-
-
-
--
-----
--
Normalization
• Basic idea: compensate technical variables
• Technique differences should affect different probes differently
• Try to estimate what part of signal can be attributed to technical factors
• Easiest variable to access: sequence
MAT
• One color Affy array– Needs separate array for comparison
• Normalizes probe thermodynamics & enrichment ratio
• Estimation by (robust) moving average
Normalized Data – Rare Event
Normalized Data – Common Event
Estimation
• Try to build an intelligent moving average
• Not all neighbors will be similar
• Typical TF binds to 8bp– Pol II may spread wider
• Typical fragment is 100-200 bp
• Cannot resolve < 200 bp----
--
--
----
---------
----
---
---
--
-
-
-
- ---
--
--
-
-
-
------
---
---- -
-
------------------
-----
----
---
10120000 10125000 10130000 10135000
-2-1
01
23
loc[mm]
xx
---
-
------
-
-----
-
---------
----------
---
---
------------------ --
---
-------
----
-------------- -
--
---
-
-
--
--
---
-
-
---------------------- --------
------
---
------
-
- -
--
-------
--
--
----
--------
--
--
-
-
------
--
-
-
-
--
---
-
-
------------
-
-
-
----
--
-
--
-----
-
-
----
-
---
-
-
----
----
---
---
-
-
-
--
-
-
-
--
-----
--
Pol II binding on a 100 bp grid
TileMap
• Ignores normalization
• ‘Shrinkage’ estimator of variance– Improves individual scores
• Smooths noise by moving average