+ All Categories
Home > Documents > Topologically associating domains of chromatin: methods...

Topologically associating domains of chromatin: methods...

Date post: 10-Jul-2020
Category:
Upload: others
View: 11 times
Download: 1 times
Share this document with a friend
118
Topologically associating domains of chromatin: methods and tools for calling Part 2 Svyatoslav Sidorov 1 1 The Dobzhansky Center for Genome Bioinformatics St. Petersburg State University Group meeting at BI Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for calling Group meeting at BI 1 / 75
Transcript

Topologically associating domains of chromatin:methods and tools for calling

Part 2

Svyatoslav Sidorov1

1The Dobzhansky Center for Genome BioinformaticsSt. Petersburg State University

Group meeting at BI

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 1 / 75

Outline

1 Background from Part 1

2 Advanced TAD calling methods

3 Conclusion

4 Selected literature

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 2 / 75

Outline

1 Background from Part 1

2 Advanced TAD calling methods

3 Conclusion

4 Selected literature

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 2 / 75

Outline

1 Background from Part 1

2 Advanced TAD calling methods

3 Conclusion

4 Selected literature

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 2 / 75

Outline

1 Background from Part 1

2 Advanced TAD calling methods

3 Conclusion

4 Selected literature

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 2 / 75

Outline

1 Background from Part 1

2 Advanced TAD calling methods

3 Conclusion

4 Selected literature

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 3 / 75

Background from Part 1

Alberts B. et al. 2004. Essential Cell Biology, 2 ed.; Koch T. A. et al.Question: How is chromatin folded within euchromatin and

heterochromatin compartments?

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 4 / 75

Background from Part 1

Answer: Topologically Associating Domains (TADs)TAD is such a region that frequency of intra-TAD interactions is higher than

inter-TAD interactions.

Nguyen H. G. and Bosco G., 2015

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 5 / 75

Background from Part 1

TADs as functional domains in mammals(Dekker J. and Heard E., 2015):

TADs are units of coordinated gene expression.

Series of adjacent TADs correspond to replication domains.

Mammalian TAD borders are to a significant extent conservedbetween different cell types, and even between mouse and human.

Internal interaction patterns of TADs are highly cell type-specific.

TADs have hierarchical folding and consist of sub-TADs(Cubenas-Potts C. and Corces V. G., 2015; Rao et al., 2014).

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 6 / 75

Background from Part 1

TADs as functional domains in mammals(Dekker J. and Heard E., 2015):

TADs are units of coordinated gene expression.

Series of adjacent TADs correspond to replication domains.

Mammalian TAD borders are to a significant extent conservedbetween different cell types, and even between mouse and human.

Internal interaction patterns of TADs are highly cell type-specific.

TADs have hierarchical folding and consist of sub-TADs(Cubenas-Potts C. and Corces V. G., 2015; Rao et al., 2014).

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 6 / 75

Background from Part 1

TADs as functional domains in mammals(Dekker J. and Heard E., 2015):

TADs are units of coordinated gene expression.

Series of adjacent TADs correspond to replication domains.

Mammalian TAD borders are to a significant extent conservedbetween different cell types, and even between mouse and human.

Internal interaction patterns of TADs are highly cell type-specific.

TADs have hierarchical folding and consist of sub-TADs(Cubenas-Potts C. and Corces V. G., 2015; Rao et al., 2014).

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 6 / 75

Background from Part 1

TADs as functional domains in mammals(Dekker J. and Heard E., 2015):

TADs are units of coordinated gene expression.

Series of adjacent TADs correspond to replication domains.

Mammalian TAD borders are to a significant extent conservedbetween different cell types, and even between mouse and human.

Internal interaction patterns of TADs are highly cell type-specific.

TADs have hierarchical folding and consist of sub-TADs(Cubenas-Potts C. and Corces V. G., 2015; Rao et al., 2014).

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 6 / 75

Background from Part 1

TADs as functional domains in mammals(Dekker J. and Heard E., 2015):

TADs are units of coordinated gene expression.

Series of adjacent TADs correspond to replication domains.

Mammalian TAD borders are to a significant extent conservedbetween different cell types, and even between mouse and human.

Internal interaction patterns of TADs are highly cell type-specific.

TADs have hierarchical folding and consist of sub-TADs(Cubenas-Potts C. and Corces V. G., 2015; Rao et al., 2014).

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 6 / 75

Background from Part 1

Selection criteria for published TAD-calling methods. A methodshould be:

applied in biological papers, not just cited in reviews;

used more than once;

used not only by its authors.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 7 / 75

Background from Part 1

Selection criteria for published TAD-calling methods. A methodshould be:

applied in biological papers, not just cited in reviews;

used more than once;

used not only by its authors.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 7 / 75

Background from Part 1

Selection criteria for published TAD-calling methods. A methodshould be:

applied in biological papers, not just cited in reviews;

used more than once;

used not only by its authors.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 7 / 75

Background from Part 1

Selection criteria for published TAD-calling methods. A methodshould be:

applied in biological papers, not just cited in reviews;

used more than once;

used not only by its authors.

In Part 1 the following basic methods were considered:

Directionality index;

Insulation score;

Contrast index.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 8 / 75

Background from Part 1

Selection criteria for published TAD-calling methods. A methodshould be:

applied in biological papers, not just cited in reviews;

used more than once;

used not only by its authors.

In Part 1 the following basic methods were considered:

Directionality index;

Insulation score;

Contrast index.

In Part 2 I’ll consider two advanced TAD calling methods:

Armatus (Filippova et al., 2014);

TADbit (Serra et al., 2016), though I’m not sure that TADbit meets allthe conditions above (as of 23.03.2016).

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 9 / 75

Background from Part 1

Selection criteria for published TAD-calling methods. A methodshould be:

applied in biological papers, not just cited in reviews;

used more than once;

used not only by its authors.

In Part 1 the following basic methods were considered:

Directionality index;

Insulation score;

Contrast index.

In Part 2 I’ll consider two advanced TAD calling methods:

Armatus (Filippova et al., 2014);

TADbit (Serra et al., 2016), though I’m not sure that TADbit meets allthe conditions above (as of 23.03.2016).

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 9 / 75

Outline

1 Background from Part 1

2 Advanced TAD calling methods

3 Conclusion

4 Selected literature

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 10 / 75

Armatus

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 11 / 75

Armatus

Let’s consider resolution parameter γ ∈ R that is inversely related to theaverage domain size: lower γ results is sets of large domains and viceversa.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 12 / 75

Armatus

Let’s consider resolution parameter γ ∈ R that is inversely related to theaverage domain size: lower γ results in sets of large domains and viceversa.Algorithm scheme is as follows:

1 For each γ ∈ Γ = {γ1, γ2, . . . , γt } compute an optimal set of domainsand several suboptimal sets.

2 Using all computed sets of domains obtain a consensus set ofdomains.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 13 / 75

Armatus

Let’s consider resolution parameter γ ∈ R that is inversely related to theaverage domain size: lower γ results in sets of large domains and viceversa.Algorithm scheme is as follows:

1 For each γ ∈ Γ = {γ1, γ2, . . . , γt } compute an optimal set of domainsand several suboptimal sets.

2 Using all computed sets of domains obtain a consensus set ofdomains.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 13 / 75

Armatus: compute sets of domains for one γ

Let A be a n × n contact matrix.

Let di = [ai , bi], 1 ≤ ai < bi ≤ n be a TAD, and ∀i, j, i , j, di ∩ dj = ∅.

Let q(ai , bi , γ) be a quality of the domain di for the specific value ofγ. This domain quality is directly related to the contact frequencywithin a domain.

We seek to identify a set of TADs Dγ that has the maximum totalquality for the specific value of γ, i. e., to solve the following problem:∑

[ai , bi ]∈D

q(ai , bi , γ)→ maxD∈D

,

where D is a set of all possible sets of domains, and Dγ is such a setD ∈ D that gives the maximum.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 14 / 75

Armatus: compute sets of domains for one γ

Let A be a n × n contact matrix.

Let di = [ai , bi], 1 ≤ ai < bi ≤ n be a TAD, and ∀i, j, i , j, di ∩ dj = ∅.

Let q(ai , bi , γ) be a quality of the domain di for the specific value ofγ. This domain quality is directly related to the contact frequencywithin a domain.

We seek to identify a set of TADs Dγ that has the maximum totalquality for the specific value of γ, i. e., to solve the following problem:∑

[ai , bi ]∈D

q(ai , bi , γ)→ maxD∈D

,

where D is a set of all possible sets of domains, and Dγ is such a setD ∈ D that gives the maximum.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 14 / 75

Armatus: compute sets of domains for one γ

Let A be a n × n contact matrix.

Let di = [ai , bi], 1 ≤ ai < bi ≤ n be a TAD, and ∀i, j, i , j, di ∩ dj = ∅.

Let q(ai , bi , γ) be a quality of the domain di for the specific value ofγ. This domain quality is directly related to the contact frequencywithin a domain.

We seek to identify a set of TADs Dγ that has the maximum totalquality for the specific value of γ, i. e., to solve the following problem:∑

[ai , bi ]∈D

q(ai , bi , γ)→ maxD∈D

,

where D is a set of all possible sets of domains, and Dγ is such a setD ∈ D that gives the maximum.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 14 / 75

Armatus: compute sets of domains for one γ

Let A be a n × n contact matrix.

Let di = [ai , bi], 1 ≤ ai < bi ≤ n be a TAD, and ∀i, j, i , j, di ∩ dj = ∅.

Let q(ai , bi , γ) be a quality of the domain di for the specific value ofγ. This domain quality is directly related to the contact frequencywithin a domain.

We seek to identify a set of TADs Dγ that has the maximum totalquality for the specific value of γ, i. e., to solve the following problem:∑

[ai , bi ]∈D

q(ai , bi , γ)→ maxD∈D

,

where D is a set of all possible sets of domains, and Dγ is such a setD ∈ D that gives the maximum.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 14 / 75

Armatus: compute sets of domains for one γ

How do we define domain quality?

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 15 / 75

Armatus: compute sets of domains for one γ

How do we define domain quality? The following way:Let’s define the total frequency of interactions in a domain [k , l]scaled by the domain length l − k + 1. Resolution parameter γ isactually a scaling parameter here:

s(k , l, γ) =T(k , l)

(l − k + 1)γ, T(k , l) =

l∑g=k

l∑h=g+1

A(g, h).

Adapted from Crane et al., 2015.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 16 / 75

Armatus: compute sets of domains for one γ

How do we define domain quality? The following way:

Let’s define the total frequency of interactions in a domain [k , l]scaled by the domain length l − k + 1. Resolution parameter γ isactually a scaling parameter here:

s(k , l, γ) =T(k , l)

(l − k + 1)γ, T(k , l) =

l∑g=k

l∑h=g+1

A(g, h).

Let µs(l − k + 1, γ) be the mean value of s(k , l, γ) over all regions oflength k − l + 1 for the specific value of γ.

Then we can define domain quality as a zero-centered value of itsscaled total interaction frequency:

q(k , l, γ) = s(k , l, γ) − µs(l − k + 1, γ).

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 17 / 75

Armatus: compute sets of domains for one γ

How do we define domain quality? The following way:

Let’s define the total frequency of interactions in a domain [k , l]scaled by the domain length l − k + 1. Resolution parameter γ isactually a scaling parameter here:

s(k , l, γ) =T(k , l)

(l − k + 1)γ, T(k , l) =

l∑g=k

l∑h=g+1

A(g, h).

Let µs(l − k + 1, γ) be the mean value of s(k , l, γ) over all regions oflength k − l + 1 for the specific value of γ.

Then we can define domain quality as a zero-centered value of itsscaled total interaction frequency:

q(k , l, γ) = s(k , l, γ) − µs(l − k + 1, γ).

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 17 / 75

Armatus: compute sets of domains for one γ

So, we need to find Dγ = argmaxD∈D∑

[ai , bi ]∈D q(ai , bi , γ).

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 18 / 75

Armatus: compute sets of domains for one γ

So, we need to find Dγ = argmaxD∈D∑

[ai , bi ]∈D q(ai , bi , γ).∑[ai , bi ]∈Dγ

q(ai , bi , γ) is the score of the solution Dγ.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 19 / 75

Armatus: compute sets of domains for one γ

So, we need to find Dγ = argmaxD∈D∑

[ai , bi ]∈D q(ai , bi , γ).∑[ai , bi ]∈Dγ

q(ai , bi , γ) is the score of the solution Dγ.

Let OPT1(l) be the score of the solution for the contact sub-matrixdefined by the first l windows on the chromosome.

Adapted from Crane et al., 2015.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 20 / 75

Armatus: compute sets of domains for one γ

So, we need to find Dγ = argmaxD∈D∑

[ai , bi ]∈D q(ai , bi , γ).∑[ai , bi ]∈Dγ

q(ai , bi , γ) is the score of the solution Dγ.Let OPT1(l) be the score of the solution for the contact sub-matrixdefined by the first l windows on the chromosome.Let OPTD(l) be the score of the solution for the same l × l sub-matrix,such that l-th window is the last one in the last domain.

Adapted from Crane et al., 2015.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 21 / 75

Armatus: compute sets of domains for one γ

So, we need to find Dγ = argmaxD∈D∑

[ai , bi ]∈D q(ai , bi , γ).

Then the score of the solution for the whole contact matrix can befound with the following formulae:

OPT1(l) = max{maxk≤l

{OPTD(k − 1)

}, OPTD(l)

},

OPTD(l) = maxk≤l{OPT1(k − 1) + q′(k , l, γ)},

q′(k , l, γ) =

{q(k , l, γ), if q(k , l, γ) > 0,−∞, otherwise,

where OPT1(l) = OPTD(l) = 0 for l ∈ {0, 1}.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 22 / 75

Armatus: compute sets of domains for one γ

How does it work?

OPT1(l) = max{maxk≤l

{OPTD(k − 1)

}, OPTD(l)

},

OPTD(l) = maxk≤l{OPT1(k − 1) + q′(k , l, γ)},

q′(k , l, γ) =

{q(k , l, γ), if q(k , l, γ) > 0,−∞, otherwise.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 23 / 75

Armatus: compute sets of domains for one γ

How does it work?

OPT1(l) = max{maxk≤l

{OPTD(k − 1)

}, OPTD(l)

},

OPTD(l) = maxk≤l{OPT1(k − 1) + q′(k , l, γ)},

q′(k , l, γ) =

{q(k , l, γ), if q(k , l, γ) > 0,−∞, otherwise.

Adapted from Crane et al., 2015.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 24 / 75

Armatus: compute sets of domains for one γ

How does it work?

OPT1(l) = max{maxk≤l

{OPTD(k − 1)

}, OPTD(l)

},

OPTD(l) = maxk≤l{OPT1(k − 1) + q′(k , l, γ)},

q′(k , l, γ) =

{q(k , l, γ), if q(k , l, γ) > 0,−∞, otherwise.

Adapted from Crane et al., 2015.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 25 / 75

Armatus: compute sets of domains for one γ

How does it work?

OPT1(l) = max{maxk≤l

{OPTD(k − 1)

}, OPTD(l)

},

OPTD(l) = maxk≤l{OPT1(k − 1) + q′(k , l, γ)},

q′(k , l, γ) =

{q(k , l, γ), if q(k , l, γ) > 0,−∞, otherwise.

Adapted from Crane et al., 2015.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 26 / 75

Armatus: compute sets of domains for one γ

How does it work?

OPT1(l) = max{maxk≤l

{OPTD(k − 1)

}, OPTD(l)

},

OPTD(k − 1) = maxk ′≤k−1

{OPT1(k ′ − 1) + q′(k ′, k − 1, γ)},

q′(k , l, γ) =

{q(k , l, γ), if q(k , l, γ) > 0,−∞, otherwise.

Adapted from Crane et al., 2015.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 27 / 75

Armatus: compute sets of domains for one γ

How does it work?

OPT1(l) = max{maxk≤l

{OPTD(k − 1)

}, OPTD(l)

},

OPTD(k − 1) = maxk ′≤k−1

{OPT1(k ′ − 1) + q′(k ′, k − 1, γ)},

q′(k , l, γ) =

{q(k , l, γ), if q(k , l, γ) > 0,−∞, otherwise.

Adapted from Crane et al., 2015.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 28 / 75

Armatus: compute sets of domains for one γ

How does it work?

OPT1(l) = max{maxk≤l

{OPTD(k − 1)

}, OPTD(l)

},

OPTD(k − 1) = maxk ′≤k−1

{OPT1(k ′ − 1) + q′(k ′, k − 1, γ)},

q′(k , l, γ) =

{q(k , l, γ), if q(k , l, γ) > 0,−∞, otherwise.

Adapted from Crane et al., 2015.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 29 / 75

Armatus: compute sets of domains for one γ

OPT1(l) = max{maxk≤l

{OPTD(k − 1)

}, OPTD(l)

},

OPTD(l) = maxk≤l{OPT1(k − 1) + q′(k , l, γ)},

q′(k , l, γ) =

{q(k , l, γ), if q(k , l, γ) > 0,−∞, otherwise.

So, these formulae can be used to compute the score of the optimalsolution, and then to find the solution itself by tracing back OPT1 andOPTD arrays.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 30 / 75

Armatus: compute sets of domains for one γ

OPT1(l) = max{maxk≤l

{OPTD(k − 1)

}, OPTD(l)

},

OPTD(l) = maxk≤l{OPT1(k − 1) + q′(k , l, γ)},

q′(k , l, γ) =

{q(k , l, γ), if q(k , l, γ) > 0,−∞, otherwise.

But to find several optimal and suboptimal paths, these formulaeare converted into a directed acyclic graph G, such that:

OPT1(·) and OPTD(·) are vertices;

each OPT1(l) is connected to OPTD(k − 1) if k ≤ l and to OPTD(l),and each OPTD(l) is connected to OPT1(k − 1) if k ≤ l;an edge from OPTD(l) to OPT1(k − 1) has the weight q′(k , l, γ);an edge from OPT1(l) to OPTD(k − 1) or OPTD(l) has the weight 0(inter-domain regions are not scored).

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 31 / 75

Armatus: compute sets of domains for one γ

OPT1(l) = max{maxk≤l

{OPTD(k − 1)

}, OPTD(l)

},

OPTD(l) = maxk≤l{OPT1(k − 1) + q′(k , l, γ)},

q′(k , l, γ) =

{q(k , l, γ), if q(k , l, γ) > 0,−∞, otherwise.

But to find several optimal and suboptimal paths, these formulaeare converted into a directed acyclic graph G, such that:

OPT1(·) and OPTD(·) are vertices;each OPT1(l) is connected to OPTD(k − 1) if k ≤ l and to OPTD(l),and each OPTD(l) is connected to OPT1(k − 1) if k ≤ l;

an edge from OPTD(l) to OPT1(k − 1) has the weight q′(k , l, γ);an edge from OPT1(l) to OPTD(k − 1) or OPTD(l) has the weight 0(inter-domain regions are not scored).

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 31 / 75

Armatus: compute sets of domains for one γ

OPT1(l) = max{maxk≤l

{OPTD(k − 1)

}, OPTD(l)

},

OPTD(l) = maxk≤l{OPT1(k − 1) + q′(k , l, γ)},

q′(k , l, γ) =

{q(k , l, γ), if q(k , l, γ) > 0,−∞, otherwise.

But to find several optimal and suboptimal paths, these formulaeare converted into a directed acyclic graph G, such that:

OPT1(·) and OPTD(·) are vertices;each OPT1(l) is connected to OPTD(k − 1) if k ≤ l and to OPTD(l),and each OPTD(l) is connected to OPT1(k − 1) if k ≤ l;an edge from OPTD(l) to OPT1(k − 1) has the weight q′(k , l, γ);

an edge from OPT1(l) to OPTD(k − 1) or OPTD(l) has the weight 0(inter-domain regions are not scored).

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 31 / 75

Armatus: compute sets of domains for one γ

OPT1(l) = max{maxk≤l

{OPTD(k − 1)

}, OPTD(l)

},

OPTD(l) = maxk≤l{OPT1(k − 1) + q′(k , l, γ)},

q′(k , l, γ) =

{q(k , l, γ), if q(k , l, γ) > 0,−∞, otherwise.

But to find several optimal and suboptimal paths, these formulaeare converted into a directed acyclic graph G, such that:

OPT1(·) and OPTD(·) are vertices;each OPT1(l) is connected to OPTD(k − 1) if k ≤ l and to OPTD(l),and each OPTD(l) is connected to OPT1(k − 1) if k ≤ l;an edge from OPTD(l) to OPT1(k − 1) has the weight q′(k , l, γ);an edge from OPT1(l) to OPTD(k − 1) or OPTD(l) has the weight 0(inter-domain regions are not scored).

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 31 / 75

Armatus: compute sets of domains for one γ

OPT1(l) = max{maxk≤l

{OPTD(k − 1)

}, OPTD(l)

},

OPTD(l) = maxk≤l{OPT1(k − 1) + q′(k , l, γ)},

q′(k , l, γ) =

{q(k , l, γ), if q(k , l, γ) > 0,−∞, otherwise.

An example of graph G:

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 32 / 75

Armatus: compute sets of domains for one γ

OPT1(l) = max{maxk≤l

{OPTD(k − 1)

}, OPTD(l)

},

OPTD(l) = maxk≤l{OPT1(k − 1) + q′(k , l, γ)},

q′(k , l, γ) =

{q(k , l, γ), if q(k , l, γ) > 0,−∞, otherwise.

An example of graph G:

OPT1(3)

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 33 / 75

Armatus: compute sets of domains for one γ

OPT1(l) = max{maxk≤l

{OPTD(k − 1)

}, OPTD(l)

},

OPTD(l) = maxk≤l{OPT1(k − 1) + q′(k , l, γ)},

q′(k , l, γ) =

{q(k , l, γ), if q(k , l, γ) > 0,−∞, otherwise.

An example of graph G:

OPT1(3) OPTD(3)

OPTD(2)

OPTD(1)

OPTD(0)

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 34 / 75

Armatus: compute sets of domains for one γ

OPT1(l) = max{maxk≤l

{OPTD(k − 1)

}, OPTD(l)

},

OPTD(l) = maxk≤l{OPT1(k − 1) + q′(k , l, γ)},

q′(k , l, γ) =

{q(k , l, γ), if q(k , l, γ) > 0,−∞, otherwise.

An example of graph G:

OPT1(3) OPTD(3)

OPTD(2)

OPTD(1)

OPTD(0)

OPT1(2)

OPT1(1)

OPT1(0)

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 35 / 75

Armatus: compute sets of domains for one γ

OPT1(l) = max{maxk≤l

{OPTD(k − 1)

}, OPTD(l)

},

OPTD(l) = maxk≤l{OPT1(k − 1) + q′(k , l, γ)},

q′(k , l, γ) =

{q(k , l, γ), if q(k , l, γ) > 0,−∞, otherwise.

An example of graph G:

OPT1(3) OPTD(3)

OPTD(2)

OPTD(1)

OPTD(0)

OPT1(2)

OPT1(1)

OPT1(0)

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 36 / 75

Armatus: compute sets of domains for one γ

OPT1(l) = max{maxk≤l

{OPTD(k − 1)

}, OPTD(l)

},

OPTD(l) = maxk≤l{OPT1(k − 1) + q′(k , l, γ)},

q′(k , l, γ) =

{q(k , l, γ), if q(k , l, γ) > 0,−∞, otherwise.

An example of graph G:

OPT1(3) OPTD(3)

OPTD(2)

OPTD(1)

OPTD(0)

OPT1(2)

OPT1(1)

OPT1(0)

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 37 / 75

Armatus: compute sets of domains for one γ

OPT1(l) = max{maxk≤l

{OPTD(k − 1)

}, OPTD(l)

},

OPTD(l) = maxk≤l{OPT1(k − 1) + q′(k , l, γ)},

q′(k , l, γ) =

{q(k , l, γ), if q(k , l, γ) > 0,−∞, otherwise.

An example of graph G:

OPT1(3) OPTD(3)

OPTD(2)

OPTD(1)

OPTD(0)

OPT1(2)

OPT1(1)

OPT1(0)

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 38 / 75

Armatus: compute sets of domains for one γ

OPT1(l) = max{maxk≤l

{OPTD(k − 1)

}, OPTD(l)

},

OPTD(l) = maxk≤l{OPT1(k − 1) + q′(k , l, γ)},

q′(k , l, γ) =

{q(k , l, γ), if q(k , l, γ) > 0,−∞, otherwise.

An example of graph G:

OPT1(3) OPTD(3)

OPTD(2)

OPTD(1)

OPTD(0)

OPT1(2)

OPT1(1)

OPT1(0)

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 39 / 75

Armatus: compute sets of domains for one γ

OPT1(l) = max{maxk≤l

{OPTD(k − 1)

}, OPTD(l)

},

OPTD(l) = maxk≤l{OPT1(k − 1) + q′(k , l, γ)},

q′(k , l, γ) =

{q(k , l, γ), if q(k , l, γ) > 0,−∞, otherwise.

An example of graph G:

OPT1(3) OPTD(3)

OPTD(2)

OPTD(1)

OPTD(0)

OPT1(2)

OPT1(1)

OPT1(0)

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 40 / 75

Armatus: compute sets of domains for one γ

OPT1(l) = max{maxk≤l

{OPTD(k − 1)

}, OPTD(l)

},

OPTD(l) = maxk≤l{OPT1(k − 1) + q′(k , l, γ)},

q′(k , l, γ) =

{q(k , l, γ), if q(k , l, γ) > 0,−∞, otherwise.

An example of graph G (it has one source and one sink):

OPT1(3) OPTD(3)

OPTD(2)

OPTD(1)

OPTD(0)

OPT1(2)

OPT1(1)

OPT1(0)

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 41 / 75

Armatus: compute sets of domains for one γ

OPT1(3) OPTD(3)

OPTD(2)

OPTD(1)

OPTD(0)

OPT1(2)

OPT1(1)

OPT1(0)

So, to find one optimal solution we need to find the path with thebiggest weight from the source to the sink.

To find several optimal and suboptimal solution (i. e. several heaviestpaths in the graph), the authors use an algorithm from Huang L. andChiang D., 2005.

Thus, we can find several solutions for one specific value of γ.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 42 / 75

Armatus: compute sets of domains for one γ

OPT1(3) OPTD(3)

OPTD(2)

OPTD(1)

OPTD(0)

OPT1(2)

OPT1(1)

OPT1(0)

So, to find one optimal solution we need to find the path with thebiggest weight from the source to the sink.

To find several optimal and suboptimal solution (i. e. several heaviestpaths in the graph), the authors use an algorithm from Huang L. andChiang D., 2005.

Thus, we can find several solutions for one specific value of γ.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 42 / 75

Armatus: compute sets of domains for one γ

OPT1(3) OPTD(3)

OPTD(2)

OPTD(1)

OPTD(0)

OPT1(2)

OPT1(1)

OPT1(0)

So, to find one optimal solution we need to find the path with thebiggest weight from the source to the sink.

To find several optimal and suboptimal solution (i. e. several heaviestpaths in the graph), the authors use an algorithm from Huang L. andChiang D., 2005.

Thus, we can find several solutions for one specific value of γ.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 42 / 75

Armatus: find the consensus domain set

Algorithm scheme is as follows:1 For each γ ∈ Γ = {γ1, γ2, . . . , γt } compute an optimal set of domains

and several suboptimal sets.2 Using all computed sets of domains obtain a consensus set of

domains.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 43 / 75

Armatus: find the consensus domain set

Algorithm scheme is as follows:1 For each γ ∈ Γ = {γ1, γ2, . . . , γt } compute an optimal set of domains

and several suboptimal sets.2 Using all computed sets of domains obtain a consensus set of

domains.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 44 / 75

Armatus: find the consensus domain set

Algorithm scheme is as follows:1 For each γ ∈ Γ = {γ1, γ2, . . . , γt } compute an optimal set of domains

and several suboptimal sets.2 Using all computed sets of domains obtain a consensus set of

domains.

put domains found for all γ ∈ Γ in one multiset: D = ∪γ∈ΓDγ;

for each domain [ai , bi] ∈ D compute its persistence:

p(ai , bi , Γ) =∑γ∈Γ

δi , where δi =

{1, if [ai , bi] ∈ Dγ,

0, otherwise.

Select such set of non-overlapping domains DC that maximizes thetotal persistence:∑

[ai , bi ]∈DC

p(ai , bi , Γ) = maxD⊂D

∑[ai , bi ]∈D

p(ai , bi , Γ)

and call it consensus set of domains.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 45 / 75

Armatus: find the consensus domain set

Algorithm scheme is as follows:1 For each γ ∈ Γ = {γ1, γ2, . . . , γt } compute an optimal set of domains

and several suboptimal sets.2 Using all computed sets of domains obtain a consensus set of

domains.

put domains found for all γ ∈ Γ in one multiset: D = ∪γ∈ΓDγ;for each domain [ai , bi] ∈ D compute its persistence:

p(ai , bi , Γ) =∑γ∈Γ

δi , where δi =

{1, if [ai , bi] ∈ Dγ,

0, otherwise.

Select such set of non-overlapping domains DC that maximizes thetotal persistence:∑

[ai , bi ]∈DC

p(ai , bi , Γ) = maxD⊂D

∑[ai , bi ]∈D

p(ai , bi , Γ)

and call it consensus set of domains.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 45 / 75

Armatus: find the consensus domain set

Algorithm scheme is as follows:1 For each γ ∈ Γ = {γ1, γ2, . . . , γt } compute an optimal set of domains

and several suboptimal sets.2 Using all computed sets of domains obtain a consensus set of

domains.

put domains found for all γ ∈ Γ in one multiset: D = ∪γ∈ΓDγ;for each domain [ai , bi] ∈ D compute its persistence:

p(ai , bi , Γ) =∑γ∈Γ

δi , where δi =

{1, if [ai , bi] ∈ Dγ,

0, otherwise.

Select such set of non-overlapping domains DC that maximizes thetotal persistence:∑

[ai , bi ]∈DC

p(ai , bi , Γ) = maxD⊂D

∑[ai , bi ]∈D

p(ai , bi , Γ)

and call it consensus set of domains.Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 45 / 75

Armatus: find the consensus domain set

So, we need to solve the following problem:∑[ai , bi ]∈D

p(ai , bi , Γ)→ maxD⊂D

.

This problem is equivalent to the weighted interval schedulingproblem.

Let’s sort all domains in D by the end coordinate and walk throughthe obtained list in the the end coordinate ascending order.Then the problem can be solved with the following dynamicprogramming formula:

OPT2(j) = max{OPT2(j − 1), OPT2(c(j)) + p(aj , bj , Γ)},

where OPT2(j) is the total priority of the optimal non-overlapping setof domains for the jth domain in the list, OPT2(0) = 0, c(j) is theclosest domain before j that doesn’t overlap with j.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 46 / 75

Armatus: find the consensus domain set

So, we need to solve the following problem:∑[ai , bi ]∈D

p(ai , bi , Γ)→ maxD⊂D

.

This problem is equivalent to the weighted interval schedulingproblem.Let’s sort all domains in D by the end coordinate and walk throughthe obtained list in the the end coordinate ascending order.

Then the problem can be solved with the following dynamicprogramming formula:

OPT2(j) = max{OPT2(j − 1), OPT2(c(j)) + p(aj , bj , Γ)},

where OPT2(j) is the total priority of the optimal non-overlapping setof domains for the jth domain in the list, OPT2(0) = 0, c(j) is theclosest domain before j that doesn’t overlap with j.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 46 / 75

Armatus: find the consensus domain set

So, we need to solve the following problem:∑[ai , bi ]∈D

p(ai , bi , Γ)→ maxD⊂D

.

This problem is equivalent to the weighted interval schedulingproblem.Let’s sort all domains in D by the end coordinate and walk throughthe obtained list in the the end coordinate ascending order.Then the problem can be solved with the following dynamicprogramming formula:

OPT2(j) = max{OPT2(j − 1), OPT2(c(j)) + p(aj , bj , Γ)},

where OPT2(j) is the total priority of the optimal non-overlapping setof domains for the jth domain in the list, OPT2(0) = 0, c(j) is theclosest domain before j that doesn’t overlap with j.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 46 / 75

Armatus: computational complexity

The consensus set of domains can be found in

O(m log m + (n2 + m)|Γ|

),

time, where m ≡ |D| and

n2|Γ| is for computing the multiset D of all domains;

m|Γ| is for computing persistence for all domains from D;

m log m is for D sorting and c(j) computation.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 47 / 75

Armatus: tool

Armatus is implemented as a tool in C++ and can be found onGitHub: https://github.com/kingsfordgroup/armatus. Current version isArmatus 2.2.

Armatus doesn’t compute any score for TAD borders, but this scorecan easily be calculated with some simple method from Part 1 (e. g.,as insulation score or contrast index).

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 48 / 75

Armatus: tool

Armatus is implemented as a tool in C++ and can be found onGitHub: https://github.com/kingsfordgroup/armatus. Current version isArmatus 2.2.

Armatus doesn’t compute any score for TAD borders, but this scorecan easily be calculated with some simple method from Part 1 (e. g.,as insulation score or contrast index).

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 48 / 75

Armatus: biological results

The authors used IMR90 and mESC Hi-C data to test their tool:

Filippova et al., 2014. Consensus domains are red, and domains fromDixon et al., 2012 are green.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 49 / 75

Armatus: biological results

The authors used IMR90 and mESC Hi-C data to test their tool.They got smaller multiscale topological domains (µ = 0.2 Mb,σ = 1.2 Mb), than Dixon et al., 2012 (µ = 1.2 Mb, σ = 0.9 Mb).

Also they got higher frequency of intra-domain interactions, thanDixon et al., 2012:

Filippova et al., 2014. The histogram is generated for IMR90.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 50 / 75

Armatus: biological results

The authors used IMR90 and mESC Hi-C data to test their tool.They got smaller multiscale topological domains (µ = 0.2 Mb,σ = 1.2 Mb), than Dixon et al., 2012 (µ = 1.2 Mb, σ = 0.9 Mb).Also they got higher frequency of intra-domain interactions, thanDixon et al., 2012:

Filippova et al., 2014. The histogram is generated for IMR90.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 50 / 75

Armatus: biological results

Finally, they demonstrated that TAD borders found with Armatus areenriched with CTCF just as those by Dixon et al., 2012:

Filippova et al., 2014. CTCF enrichment in TAD borders found withArmatus (left) and by Dixon et al., 2012 (right) in IMR90.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 51 / 75

Armatus: biological results

Finally, they demonstrated that TAD borders found with Armatus areenriched with CTCF just as those by Dixon et al., 2012:

Filippova et al., 2014. CTCF enrichment in TAD borders found withArmatus (left) and by Dixon et al., 2012 (right) in mESC.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 52 / 75

Armatus: biological results

Armatus was used in Ulianov et al., 2016. They found that TADborders in Drosophila are only weakly enriched with the insulatorprotein dCTCF, while another insulator protein Su(Hw) is enrichedwithin TADs. But Drosophila TAD borders are enriched with activechromatin histone marks (fig. adapted from Ulianov et al., 2016):

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 53 / 75

Armatus: biological results

Armatus was used in Ulianov et al., 2016. They found that TADborders in Drosophila are only weakly enriched with the insulatorprotein dCTCF, while another insulator protein Su(Hw) is enrichedwithin TADs. But Drosophila TAD borders are enriched with activechromatin histone marks (fig. adapted from Ulianov et al., 2016).

Also Armatus was used in Criscione et al., 2016, where the authorsidentified TADs that switch chromatin compartments (A to B or B to A)during cell senescence in Drosophila.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 54 / 75

TADbit

TADbit is a tool for Hi-C data processing that is being developed in theCentre for Genomic Regulation, Barcelona, Spain. A paper about it waspublished on bioRxiv (Serra et al., 2016):

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 55 / 75

TADbit: features

TADbit is implemented in Python and can be found on GitHub:https://github.com/3DGenomes/tadbit. Basically, it’s a library of useful Hi-Canalysis functions and several scripts.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 56 / 75

TADbit

TADbit is described as a full-analysis pipeline that can process Hi-C rawreads and generate TAD sets as well as 3D chromatin models. Its mainfeatures are as follows:

Raw reads quality check similar to that of FastQC but withadaptations for Hi-C datasets.

Contact matrix generation using iterative read mapping procedure(Imakaev et al., 2012) implemented in GEM-mapper (Marco-Sola etal., 2012).

Contact matrix normalization, visualization, and comparison (usingSpearman rank correlation score or Pearson correlation score).

TAD detection and alignment.

Scoring of TAD borders.

Generation and analysis of 3D-models of chromatin regions withIntegrative Modeling Platform (Russel et al., 2012). There is also a3D-models browser called TADkit.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 57 / 75

TADbit

TADbit is described as a full-analysis pipeline that can process Hi-C rawreads and generate TAD sets as well as 3D chromatin models. Its mainfeatures are as follows:

Raw reads quality check similar to that of FastQC but withadaptations for Hi-C datasets.

Contact matrix generation using iterative read mapping procedure(Imakaev et al., 2012) implemented in GEM-mapper (Marco-Sola etal., 2012).

Contact matrix normalization, visualization, and comparison (usingSpearman rank correlation score or Pearson correlation score).

TAD detection and alignment.

Scoring of TAD borders.

Generation and analysis of 3D-models of chromatin regions withIntegrative Modeling Platform (Russel et al., 2012). There is also a3D-models browser called TADkit.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 57 / 75

TADbit

TADbit is described as a full-analysis pipeline that can process Hi-C rawreads and generate TAD sets as well as 3D chromatin models. Its mainfeatures are as follows:

Raw reads quality check similar to that of FastQC but withadaptations for Hi-C datasets.

Contact matrix generation using iterative read mapping procedure(Imakaev et al., 2012) implemented in GEM-mapper (Marco-Sola etal., 2012).

Contact matrix normalization, visualization, and comparison (usingSpearman rank correlation score or Pearson correlation score).

TAD detection and alignment.

Scoring of TAD borders.

Generation and analysis of 3D-models of chromatin regions withIntegrative Modeling Platform (Russel et al., 2012). There is also a3D-models browser called TADkit.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 57 / 75

TADbit

TADbit is described as a full-analysis pipeline that can process Hi-C rawreads and generate TAD sets as well as 3D chromatin models. Its mainfeatures are as follows:

Raw reads quality check similar to that of FastQC but withadaptations for Hi-C datasets.

Contact matrix generation using iterative read mapping procedure(Imakaev et al., 2012) implemented in GEM-mapper (Marco-Sola etal., 2012).

Contact matrix normalization, visualization, and comparison (usingSpearman rank correlation score or Pearson correlation score).

TAD detection and alignment.

Scoring of TAD borders.

Generation and analysis of 3D-models of chromatin regions withIntegrative Modeling Platform (Russel et al., 2012). There is also a3D-models browser called TADkit.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 57 / 75

TADbit

TADbit is described as a full-analysis pipeline that can process Hi-C rawreads and generate TAD sets as well as 3D chromatin models. Its mainfeatures are as follows:

Raw reads quality check similar to that of FastQC but withadaptations for Hi-C datasets.

Contact matrix generation using iterative read mapping procedure(Imakaev et al., 2012) implemented in GEM-mapper (Marco-Sola etal., 2012).

Contact matrix normalization, visualization, and comparison (usingSpearman rank correlation score or Pearson correlation score).

TAD detection and alignment.

Scoring of TAD borders.

Generation and analysis of 3D-models of chromatin regions withIntegrative Modeling Platform (Russel et al., 2012). There is also a3D-models browser called TADkit.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 57 / 75

TADbit

TADbit is described as a full-analysis pipeline that can process Hi-C rawreads and generate TAD sets as well as 3D chromatin models. Its mainfeatures are as follows:

Raw reads quality check similar to that of FastQC but withadaptations for Hi-C datasets.

Contact matrix generation using iterative read mapping procedure(Imakaev et al., 2012) implemented in GEM-mapper (Marco-Sola etal., 2012).

Contact matrix normalization, visualization, and comparison (usingSpearman rank correlation score or Pearson correlation score).

TAD detection and alignment.

Scoring of TAD borders.

Generation and analysis of 3D-models of chromatin regions withIntegrative Modeling Platform (Russel et al., 2012). There is also a3D-models browser called TADkit.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 57 / 75

TADbit: TAD calling algorithm

Some notes about TAD calling algorithm: (Serra et al., 2016):

The number of interactions between loci i and j separated by ∆nucleotides is assumed to have a Poisson distribution with theparameter wij exp{α+ β∆}, where the values of α and β dependent onthe region [i, j] and wij is the normalization factor for the cell withcoordinates (i, j) in the Hi-C contact matrix.

Breakpoint detection method is applied, where an observation is acolumn in the Hi-C matrix.Each region of a chromosome defines a slice in the Hi-C contactmatrix, and each cell in this slice belongs to one of three categories:

1 the contacts between the region and upstream loci;2 the contacts within the region;3 the contacts between the region and downstream loci.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 58 / 75

TADbit: TAD calling algorithm

Some notes about TAD calling algorithm: (Serra et al., 2016):

The number of interactions between loci i and j separated by ∆nucleotides is assumed to have a Poisson distribution with theparameter wij exp{α+ β∆}, where the values of α and β dependent onthe region [i, j] and wij is the normalization factor for the cell withcoordinates (i, j) in the Hi-C contact matrix.

Breakpoint detection method is applied, where an observation is acolumn in the Hi-C matrix.

Each region of a chromosome defines a slice in the Hi-C contactmatrix, and each cell in this slice belongs to one of three categories:

1 the contacts between the region and upstream loci;2 the contacts within the region;3 the contacts between the region and downstream loci.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 58 / 75

TADbit: TAD calling algorithm

Some notes about TAD calling algorithm: (Serra et al., 2016):

The number of interactions between loci i and j separated by ∆nucleotides is assumed to have a Poisson distribution with theparameter wij exp{α+ β∆}, where the values of α and β dependent onthe region [i, j] and wij is the normalization factor for the cell withcoordinates (i, j) in the Hi-C contact matrix.

Breakpoint detection method is applied, where an observation is acolumn in the Hi-C matrix.Each region of a chromosome defines a slice in the Hi-C contactmatrix, and each cell in this slice belongs to one of three categories:

1 the contacts between the region and upstream loci;2 the contacts within the region;3 the contacts between the region and downstream loci.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 58 / 75

TADbit: TAD calling algorithm

The scheme of TAD calling algorithm is as follows (Serra et al., 2016):

For each slice log-likelihood is computed.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 59 / 75

TADbit: TAD calling algorithm

The scheme of TAD calling algorithm is as follows (Serra et al., 2016):

For each slice log-likelihood is computed.

If a slice covers exactly one TAD, its log-likelihood will be high, andotherwise it’s relatively lower.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 60 / 75

TADbit: TAD calling algorithm

The scheme of TAD calling algorithm is as follows (Serra et al., 2016):For each slice log-likelihood is computed.If a slice covers exactly one TAD, its log-likelihood will be high, andotherwise it’s relatively lower (fig. is adapted from Hi-C Data Browser,GM06690 cell line, chr14):

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 61 / 75

TADbit: TAD calling algorithm

The scheme of TAD calling algorithm is as follows (Serra et al., 2016):For each slice log-likelihood is computed.If a slice covers exactly one TAD, its log-likelihood will be high, andotherwise it’s relatively lower (fig. is adapted from Hi-C Data Browser,GM06690 cell line, chr14):

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 62 / 75

TADbit: TAD calling algorithm

The scheme of TAD calling algorithm is as follows (Serra et al., 2016):For each slice log-likelihood is computed.If a slice covers exactly one TAD, its log-likelihood will be high, andotherwise it’s relatively lower (fig. is adapted from Hi-C Data Browser,GM06690 cell line, chr14):

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 63 / 75

TADbit: TAD calling algorithm

The scheme of TAD calling algorithm is as follows (Serra et al., 2016):

For each slice log-likelihood is computed.

If a slice covers exactly one TAD, its log-likelihood will be high, andotherwise it’s relatively lower.

The log-likelihood of the optimal segmentation of a chromosome intok TADs can be found using dynamic programming:

Lk (1, e) = max1≤h≤e

{Lk−1(1, h) + L1(h + 1, e)

},

where Lq(i, j) is the log-likelihood of the optimal segmentation of theslice (i, j) into q sub-slices, and e is the number of the last window onthe chromosome.

The number of TADs k grows with the step 1, and for each value of kthe optimal TAD segmentation is found.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 64 / 75

TADbit: TAD calling algorithm

The scheme of TAD calling algorithm is as follows (Serra et al., 2016):

For each slice log-likelihood is computed.

If a slice covers exactly one TAD, its log-likelihood will be high, andotherwise it’s relatively lower.

The log-likelihood of the optimal segmentation of a chromosome intok TADs can be found using dynamic programming:

Lk (1, e) = max1≤h≤e

{Lk−1(1, h) + L1(h + 1, e)

},

where Lq(i, j) is the log-likelihood of the optimal segmentation of theslice (i, j) into q sub-slices, and e is the number of the last window onthe chromosome.

The number of TADs k grows with the step 1, and for each value of kthe optimal TAD segmentation is found.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 64 / 75

TADbit: TAD calling algorithm

The scheme of TAD calling algorithm is as follows (Serra et al., 2016):

BIC is calculated for each segmentation, and k grows while BIClowers. If for the current value of k BIC became higher than it was onthe previous step, then the previous optimal segmentation is returnedas the final one.

A score is assigned to each TAD border in the final segmentation: allTAD likelihoods are penalized, and segmentation is regenerated. Thisprocess is repeated 10 times, and a TAD border is assigned with ascore of i if it emerged i times from 10.

TAD borders with a score greater than 5 are considered ’robust’ (asthey are steadily reproducible among different runs); conversely, TADborders with a score less than 5 a considered to be ’weak’.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 65 / 75

TADbit: TAD calling algorithm

The scheme of TAD calling algorithm is as follows (Serra et al., 2016):

BIC is calculated for each segmentation, and k grows while BIClowers. If for the current value of k BIC became higher than it was onthe previous step, then the previous optimal segmentation is returnedas the final one.

A score is assigned to each TAD border in the final segmentation: allTAD likelihoods are penalized, and segmentation is regenerated. Thisprocess is repeated 10 times, and a TAD border is assigned with ascore of i if it emerged i times from 10.

TAD borders with a score greater than 5 are considered ’robust’ (asthey are steadily reproducible among different runs); conversely, TADborders with a score less than 5 a considered to be ’weak’.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 65 / 75

TADbit: TAD calling algorithm

The scheme of TAD calling algorithm is as follows (Serra et al., 2016):

BIC is calculated for each segmentation, and k grows while BIClowers. If for the current value of k BIC became higher than it was onthe previous step, then the previous optimal segmentation is returnedas the final one.

A score is assigned to each TAD border in the final segmentation: allTAD likelihoods are penalized, and segmentation is regenerated. Thisprocess is repeated 10 times, and a TAD border is assigned with ascore of i if it emerged i times from 10.

TAD borders with a score greater than 5 are considered ’robust’ (asthey are steadily reproducible among different runs); conversely, TADborders with a score less than 5 a considered to be ’weak’.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 65 / 75

TADbit: biological results

In Serra et al., 2016 several types of active and repressed Drosophilachromatin were studied. The main accent is made on the 3Dchromatin modelling.

In Dily et al., 2014 the authors studied chromatin modifications inT47D breast cancer cells after progestin treatment. TAD borders arelargely maintained after the treatment. Up to 20 % of the found TADscan be considered as discrete regulatory units where the majority ofthe genes are either transcriptionally activated or repressed in acoordinated fashion.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 66 / 75

TADbit: biological results

In Serra et al., 2016 several types of active and repressed Drosophilachromatin were studied. The main accent is made on the 3Dchromatin modelling.

In Dily et al., 2014 the authors studied chromatin modifications inT47D breast cancer cells after progestin treatment. TAD borders arelargely maintained after the treatment. Up to 20 % of the found TADscan be considered as discrete regulatory units where the majority ofthe genes are either transcriptionally activated or repressed in acoordinated fashion.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 66 / 75

TADbit: biological results

In Dily et al., 2014 the authors studied chromatin modifications inT47D breast cancer cells after progestin treatment. TAD borders arelargely maintained after the treatment. Up to 20 % of the found TADscan be considered as discrete regulatory units where the majority ofthe genes are either transcriptionally activated or repressed in acoordinated fashion.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 67 / 75

Outline

1 Background from Part 1

2 Advanced TAD calling methods

3 Conclusion

4 Selected literature

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 68 / 75

Conclusion

Two ’advanced’ TAD-calling methods were considered: Armatus andTADbit, both implemented as tools and described (more or less) in apaper or a preprint.

Both methods make use of the basic properties of TADs and contactmatrices.

Computationally they are much more expensive than simplealgorithms, e. g., based on insulation score, contrast index ordirectionality index (without HMM), but they may give more reliableoutput since they don’t use arbitrary thresholds for TAD identificationand consider sets of TAD segmentations, rather than only one.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 69 / 75

Conclusion

Two ’advanced’ TAD-calling methods were considered: Armatus andTADbit, both implemented as tools and described (more or less) in apaper or a preprint.

Both methods make use of the basic properties of TADs and contactmatrices.

Computationally they are much more expensive than simplealgorithms, e. g., based on insulation score, contrast index ordirectionality index (without HMM), but they may give more reliableoutput since they don’t use arbitrary thresholds for TAD identificationand consider sets of TAD segmentations, rather than only one.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 69 / 75

Conclusion

Two ’advanced’ TAD-calling methods were considered: Armatus andTADbit, both implemented as tools and described (more or less) in apaper or a preprint.

Both methods make use of the basic properties of TADs and contactmatrices.

Computationally they are much more expensive than simplealgorithms, e. g., based on insulation score, contrast index ordirectionality index (without HMM), but they may give more reliableoutput since they don’t use arbitrary thresholds for TAD identificationand consider sets of TAD segmentations, rather than only one.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 69 / 75

Conclusion

There are several other tools and ’advanced’ methods for TAD calling:

HiCseg (Levy-Leduc et al., 2014);TADtree (Weinreb et al., 2015);TopDom (Shin et al., 2015);Sexton et al., 2012;Hou et al., 2012;Rao et al., 2014.

In some projects researches are interested not in the whole-genomeTAD calling, but in studying genes and epigenetics in several specificTADs or sub-TADs (Flavahan et al., 2015, Andrey et al., 2013,Montefiori et al., 2016, Wijchers et al., 2016).’Advanced’ methods for TAD calling use quite different algorithms andgive quite different segmentations.They are used not so frequently as simple ones.Thus, we can expect emergence of new algorithms and refinement ofthe TAD definition itself.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 70 / 75

Conclusion

There are several other tools and ’advanced’ methods for TAD calling:

HiCseg (Levy-Leduc et al., 2014);TADtree (Weinreb et al., 2015);TopDom (Shin et al., 2015);Sexton et al., 2012;Hou et al., 2012;Rao et al., 2014.

In some projects researches are interested not in the whole-genomeTAD calling, but in studying genes and epigenetics in several specificTADs or sub-TADs (Flavahan et al., 2015, Andrey et al., 2013,Montefiori et al., 2016, Wijchers et al., 2016).

’Advanced’ methods for TAD calling use quite different algorithms andgive quite different segmentations.They are used not so frequently as simple ones.Thus, we can expect emergence of new algorithms and refinement ofthe TAD definition itself.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 70 / 75

Conclusion

There are several other tools and ’advanced’ methods for TAD calling:

HiCseg (Levy-Leduc et al., 2014);TADtree (Weinreb et al., 2015);TopDom (Shin et al., 2015);Sexton et al., 2012;Hou et al., 2012;Rao et al., 2014.

In some projects researches are interested not in the whole-genomeTAD calling, but in studying genes and epigenetics in several specificTADs or sub-TADs (Flavahan et al., 2015, Andrey et al., 2013,Montefiori et al., 2016, Wijchers et al., 2016).’Advanced’ methods for TAD calling use quite different algorithms andgive quite different segmentations.

They are used not so frequently as simple ones.Thus, we can expect emergence of new algorithms and refinement ofthe TAD definition itself.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 70 / 75

Conclusion

There are several other tools and ’advanced’ methods for TAD calling:

HiCseg (Levy-Leduc et al., 2014);TADtree (Weinreb et al., 2015);TopDom (Shin et al., 2015);Sexton et al., 2012;Hou et al., 2012;Rao et al., 2014.

In some projects researches are interested not in the whole-genomeTAD calling, but in studying genes and epigenetics in several specificTADs or sub-TADs (Flavahan et al., 2015, Andrey et al., 2013,Montefiori et al., 2016, Wijchers et al., 2016).’Advanced’ methods for TAD calling use quite different algorithms andgive quite different segmentations.They are used not so frequently as simple ones.

Thus, we can expect emergence of new algorithms and refinement ofthe TAD definition itself.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 70 / 75

Conclusion

There are several other tools and ’advanced’ methods for TAD calling:

HiCseg (Levy-Leduc et al., 2014);TADtree (Weinreb et al., 2015);TopDom (Shin et al., 2015);Sexton et al., 2012;Hou et al., 2012;Rao et al., 2014.

In some projects researches are interested not in the whole-genomeTAD calling, but in studying genes and epigenetics in several specificTADs or sub-TADs (Flavahan et al., 2015, Andrey et al., 2013,Montefiori et al., 2016, Wijchers et al., 2016).’Advanced’ methods for TAD calling use quite different algorithms andgive quite different segmentations.They are used not so frequently as simple ones.Thus, we can expect emergence of new algorithms and refinement ofthe TAD definition itself.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 70 / 75

Outline

1 Background from Part 1

2 Advanced TAD calling methods

3 Conclusion

4 Selected literature

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 71 / 75

TAD calling tools

Filippova D. et al. 2014. Identification of alternative topologicaldomains in chromatin. Algorithms for Molecular Biology 9: 14.

Serra F. et al. 2016. Structural features of the fly chromatin colorsrevealed by automatic three-dimensional modeling. bioRxiv, DOI:10.1101/036764.

Levy-Leduc et al. 2014. Two-dimensional segmentation for analyzingHi-C data. Bioinformatics 30(17): i386–392.

Weinreb C. and Raphael B. 2015. Identification of hierarchicalchromatin domains. Bioinformatics, DOI:10.1093/bioinformatics/btv485.

Shin H. et al. 2015. TopDom: an efficient and deterministic methodfor identifying topological domains in genomes. NAR, DOI:10.1093/nar/gkv1505.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 72 / 75

Some other TAD calling methods

Sexton T. et al. 2012. Three-dimensional folding and functionalorganization principles of the Drosophila genome. Cell 148: 458–472.

Hou C. et al. 2012. Gene density, transcription and insulatorscontribute to the partition of the Drosophila genome into physicaldomains. Molecular Cell 48(3): 471–484.

Rao et al. 2014. A 3D map of the human genome at kilobaseresolution reveals principles of chromatin looping. Cell 159(7):1665–1680.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 73 / 75

Cutting-edge overviews of chromatin structure studies

Rowley M. and Corces V. 2016. The three-dimensional genome:principles and roles of long-distance interactions. Current Opinion inCell Biology 40: 8–14.

Dekker J. and Mirny L. 2016. The 3D genome as moderator ofchromosomal communication. Cell 164(6):1110–1121.

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 74 / 75

Thank you!

Svyatoslav Sidorov (SPbSU) Topologically associating domains of chromatin: methods and tools for callingGroup meeting at BI 75 / 75


Recommended