Download - 1 Transforming Men into Mice: Are there Fragile Regions in Human Genome?

1

Transforming Men into Mice:

Are there Fragile Regions in Human Genome?

2

From Biology to Computing

From Biology to Computing:

……

Problem Formulation

6

What are the similarity blocks and how to find them?

What is the architecture of the ancestral genome?

What is the evolutionary scenario for transforming one genome into the other?

Unknown ancestor~ 80 million yearsago

Mouse (X chrom.)

Human (X chrom.)

Genome rearrangements

7

Transforming mice into men (X chromosome)

8

Genome Rearrangements: Evolutionary “Earthquakes”

What is the evolutionary scenario for transforming one genome into the other?

What is the organization of the ancestral genome?

Are there any rearrangement hotspots in mammalian genomes?

10

Susumu Ohno: Two Hypothesis

Ohno, 1970, 1973 Whole Genome Duplication

Hypothesis: Big leaps in evolution would have been

impossible without whole genome duplications. Random Breakage Hypothesis: Genomic architectures are shaped by

rearrangements that occur randomly (there are no fragile regions).

11

Whole Genome Duplication Hypothesis Finally Confirmed After Years’ of Controversy

The Whole Genome Duplication hypothesis first met with skepticism

and was only recently confirmed.

Kellis, Birren & Lander, Nature, 2004

“Our analysis resolves the long-standing controversy on the ancestry of the yeast genome”

“There was a whole-genome duplication.”

Wolfe, Nature, 1997“There was no whole-genome duplication.” Dujon, FEBS, 2000

“Duplications occurred independently” Langkjaer, JMB, 2000

“Continuous duplications” Dujon, Yeast 2003“Multiple duplications” Friedman, Gen. Res, 2003

“Spontaneous duplications” Koszul, EMBO, 2004

12

Random Breakage Hypothesis Meets a Different Fate

The random breakage hypothesis was embraced by biologists and has become de facto theory of chromosome evolution.

Nadeau & Taylor, 1984, PNAS First estimate of the number of synteny blocks

between human and mouse First convincing arguments in favor of the

Random Breakage Model (RBM) RBM was re-iterated in hundreds of papers

13

Random Breakage Hypothesis Meets a Different Fate

The random breakage hypothesis was embraced by biologists and has become de facto theory of chromosome evolution

Nadeau & Taylor, PNAS 1984 First estimate of the number of synteny blocks

between human and mouse First convincing arguments in favor of the Random

Breakage Model (RBM) RBM was re-iterated in hundreds of papers

Pevzner & Tesler, PNAS 2003 Rejected RBM and proposed the Fragile Breakage

Model Postulated existence of rearrangement hotspots

and vast breakpoint reuse

14

Are the Rearrangement Hotspots

Real?

The Fragile Breakage Model did not live long.

In 2004 David Sankoff presented convincing arguments against the Fragile Breakage Model (Sankoff & Trinh, 2004) “… we have shown that breakpoint re-use

of the same magnitude as found in Pevzner and Tesler, 2003 may very well be artifacts in a context where NO re-use actually occurred.”

15

Random Breakage Theory re-re-re-visited

Ohno, 1970, Nadeau & Taylor, 1984 introduced RBM

Pevzner & Tesler, 2003 argued against RBM

16




Sankoff & Trinh, 2004 argued against Pevzner & Tesler, 2003 arguments against

RBM

17




Sankoff & Trinh, 2004 argued against Pevzner & Tesler, 2003 arguments against RBM

Today I will argue against Sankoff & Trinh, 2004 arguments against Pevzner & Tesler, 2003 arguments against RBM

22

History of Chromosome X

Rat Consortium, Nature, 2004

23

Human-Mouse-Rat Phylogeny

26

Tumor Genomes

Tumor cells often exhibit chromosomal aberrations:

27

Tumor Genomes

Thousands of individual rearrangements known for different tumors.

promoter c-ab1 oncogene

BCR genepromoterpromoter ABL gene

BCR genepromoter

Rearrangements may disrupt genes and alter gene regulation.

Example: translocation in leukemia yields “Philadelphia” chromosome:

Chr 9

Chr 22

28

Breast Cancer Tumor Genome

MCF7 is human breast cancer cell line. Cytogenetic analysis suggests complex architecture:

What is the detailed architecture of MCF7 tumor genome?

What sequence of rearrangements produced MCF7?

30

Reversals(also called inversions)

Classically, blocks represent conserved genes. In the course of evolution or in a clinical context, blocks 1,

…,10 could be misread as 1, 2, 3, -8, -7, -6, -5, -4, 9, 10. Clinical: occurs in many cancers. Evolution: occurred about once-twice every million years

on the evolutionary path between human and mouse.

1 32

4

10

56

8

9

7

1, 2, 3, 4, 5, 6, 7, 8, 9, 10

31


1 32

4

10

56

8

9

7

1, 2, 3, -8, -7, -6, -5, -4, 9, 10

Classically, blocks represent conserved genes. In the course of evolution or in a clinical context, blocks

1,…,10 could be misread as 1, 2, 3, -8, -7, -6, -5, -4, 9, 10.

Clinical: occurs in many cancers. Evolution: occurred one-two times every million years

on the evolutionary path between human and mouse.

32


1 32

4

10

56

8

9

7

1, 2, 3, -8, -7, -6, -5, -4, 9, 10

The inversion introduced two breakpoints(disruptions in gene order).

33

Sorting by reversals

Step 0: 2 -4 -3 5 -8 -7 -6 1Step 1: 2 3 4 5 -8 -7 -6 1Step 2: 2 3 4 5 6 7 8 1Step 3: 2 3 4 5 6 7 8 -1Step 4: -8 -7 -6 -5 -4 -3 -2 -1Step 5: 1 2 3 4 5 6 7 8

34

Sorting by reversalsMost parsimonious scenarios

Step 0: 2 -4 -3 5 -8 -7 -6 1Step 1: 2 3 4 5 -8 -7 -6 1Step 2: -5 -4 -3 -2 -8 -7 -6 1Step 3: -5 -4 -3 -2 -1 6 7 8Step 4: 1 2 3 4 5 6 7 8

The reversal distance is the minimum number of reversals required to transform one gene order into another.

Here, the distance is 4.

35

Sorting by ReversalsBreakpoint distance


36



Sorting by Reversal = breakpoint elimination

37




How many breakpoints can be eliminated by a single reversal?

38




reversal distance >= # breakpoints / 2 = 6/2 = 3

39




reversal distance >= # breakpoints / 2 = 6/2 = 3

This formula vastly underestimates the reversal distance by assuming that breakpoints are never re-used.

42

Breakpoint graph

Reversal Distance Theorem (slightly imprecise version):

reversal distance = number of blocks–number of cycles

0 2 -4 -3 5 -8 -7 -6 1 9

45

Rearrangements in Multi-Chromosomal Genomes

are not limited to reversals…

translocations:

46

Rearrangements in Multi-chromosomal Genomes

Besides reversals…

translocations:

fusions and fissions of chromosomes

Reversals on Circular Genomes

reversal

P=(+a-b-c+d) Q=(+a-b-d+c)

a c

d

b

a c

d

b


reversal


a c

d

b

a c

d

b


reversal


a c

d

b

a c

d

b


reversal


a c

d

b

a c

d

b


reversal


a c

d

b

a c

d

b


reversal


A reversal replaces two black edges with two other black edges

a c

d

b

a c

d

b

Reversals on Circular Chromosomes

reversal

A reversal replaces two black edges with two other black edges

a c

b

a c

d

b

d

a b c d a b c d

Not a Reversal

P=(+a-b-c+d) Q=??????

This operation also replaces two black edges with two other black edges. But it is not a reversal.

a c

d

b

a c

d

b

Fissions

∗

P=(+a-b-c+d) Q=(+a-b)(-c+d)

Fissions split a single chromosome into two – also replace two black edges with two other black edges.

a c

d

b

a c

d

b

∗

fission

2-Breaks

2-Break replaces any pair of black edges with another pair.


2-breaka c

d

b

a c

d

b

∗

∗

2-Break Distance Problem

Given two genomes, find the shortest sequence of 2-Breaks transforming one genome into another.

Two Genomes as Black-Red and Green-Red Cycles

P=(+a-b-c+d)

Q=(+a+c+b-d)

a c

d

b

a b

d

c

P

Q

Common Red Edges

a c

d

b

a b

d

c

P

Q

a

b

c

d

Superimposing...

a c

d

b

a b

d

c

P

Q

Qa

b

c

d

Superimposing...

a c

d

b

a b

d

c

P

Q

Qa

b

c

d

Superimposing...

a c

d

b

a b

d

c

P

Q

Qa

b

c

d

Superimposing...

a c

d

b

a b

d

c

P

Q

Qa

b

c

d

Breakpoint Graph

Breakpoint Graph

BG(P,Q)

a c

d

b

a b

d

c

P

Q

a

b

c

d

Breakpoint Graph: Red, Black, and Green Matchings

Breakpoint graph is formed by red, black and green edges.

a

b

c

d

Black-Red Cycles (red genome)


black and red edges form genome P

a

b

c

d

Green-Red Cycles(green genome)

Breakpoint graph is formed by red, black and green edges .

green and red edges form genome Q

a

b

c

d

Black-Green Cycles(breakpoint graph)


black and green edges form black-green cycles

cycle (P,Q) – number of cycles in the breakpoint graph of genomes P and Q

a

b

cc

d

Breakpoint Graph of Two Identical Genomes

Trivial Breakpoint Graph is a breakpoint graph of two identical genomes.

Q=(+a-b-c+d)

BG(Q,Q)a

b

c

d

Identity Breakpoint Graph Consists of Trivial Black-Green Cycles

Identity Breakpoint Graph is a breakpoint graph of two identical genomes.

Identity breakpoint graph consists of trivial cycles, each formed by one green and one black edge.

a

b

c

d

# trivial cycles = # genes

Genome Rearrangements Affect Black-Green Cycles

cycle(P,Q)=2 cycles cycle(Q,Q)= 4 trivial cycles

Transforming genome P into genome Q corresponds to transforming black-green cycles in G(P,Q) into trivial cycles in G(Q,Q).

a c

d

b

a

b

c

d

Rearrangements Change Breakpoint Graphs and

Cycle(P,Q)

cycle(P',Q) = 3cycle(Q,Q) = 4

=#genes

a

c

b

d

a

c

b

d

a

c

b

d

cycle(P,Q) = 2

BG(P,Q)

BG(P',Q)

BG(Q,Q)

trivial cycles

Sorting by 2-Breaks2-breaks

P=Q0 → Q1 → ... → Qd=Q

BG(P,Q) → BG(Q1,Q) → ... → BG(Q,Q)

cycle(P,Q) cycles→..............→cycle(Q,Q)=#genes

# of black-green cycles increased by #genes - cycle(P,Q)

How much each 2-break can contribute to this increase?

A 2-Break:

adds 2 new black edges and thus creates at most 2 new cycles (containing two new black edges)

removes 2 black edges and thus destroys at least 1 old cycle (containing two old old edges):

change in the number of cycles ≤ 2-1=1.

Each 2-Break Increases #Cycles by at Most 1

2-Break increases the number of cycles by at most one since any non-trivial cycle can be split into two cycles with a 2-break

∗∗

There Exist 2-Breaks Increasing #Cycles by 1

Any 2-Break increases the number of cycles by at most one

Any non-trivial cycle can be split into two cycles with a 2-break

Every sorting by 2-breaks must increase #cycles by #genes - cycle(P,Q)

2-Break distance between genomes P and

Q:

#genes - cycle(P,Q)

2-Breaks Distance

79

Human-mouse breakpoint graph

Human and mouse genomes can be viewed as strings in the alphabet of 280 synteny blocks (at least 0.5 million nucleotides in length)

The breakpoint graph on these blocks has 35 cycles

2-Break distance between HUMAN and MOUSE:

#genes - cycle(HUMAN,MOUSE)=280-35=245

2-Break Distance between HUMAN and MOUSE

109

GRIMM Web Server: Multichromosomal rearrangements

114

What are the similarity blocks and how to find them?

Unknown ancestor~ 80 million yearsago

Mouse (X chrom.)

Human (X chrom.)

Genome rearrangements

115

Finding Synteny Blocks

25,839 anchors Anchors enlarged for

visibility. Apparent density may be an illusion.

First, separate noise synteny blocks

116

GRIMM-Synteny on X chromosome (a)

Macro/Micro-rearrangements

25,839 anchors Anchors enlarged for

visibility. Apparent density may be an illusion.

First, separate noise synteny blocks

and then separate microrearrangements

(inside synteny blocks)

macrorearrangements(of whole blocks)

117

A single synteny block with 1114 anchors and 85 micro-rearrangements.

GRIMM-SyntenyBlowup of a synteny block

118

GRIMM-Synteny on X chromosome (a)

From anchors to synteny blocks

119

Synteny Block Generation

GRIMM-Synteny(Genome,w,) w: gap size : minimum synteny block size

Represent Genome in 2-D and form a graph whose vertex set is the set of genes (anchors) in 2-D

Connect two vertices by an edge if the 2-D distance between them is < w. The connected components in the resulting graph define synteny blocks

Delete small synteny blocks (length )

120

Yet Another Synteny Block Generation

ST-Synteny(Genome,w,) w: gap size : minimum synteny block size

Define each gene (anchor) in Genome as a separate block and iteratively amalgamate the resulting blocks

Amalgamate two adjacent blocks if they contain two genes that are separated by less than w genes in another genome.

Delete any short block containing < elements

121

Two Algorithms: Which One is “Better”?

GRIMM-Synteny(Genome,w,∆)

Represent Genome in 2-D and form a graph whose vertex set is the set of genes in 2-D


Delete small synteny blocks (length C)

ST-Synteny(Genome,w, ) Define each gene in Genome

as a separate block and iteratively amalgamate the resulting blocks



122

GRIMM-Synteny on X chromosome (b)


124

GRIMM-Synteny on X chromosome (d)


11 synteny blocks.

176 micro-rearrangements within these blocks.

125

GRIMM-Synteny on X chromosome (e)


130

GRIMM-SyntenyHuman-mouse breakpoint graph

131

Evidence for fragile regions (rearrangement hotspots) in mammalian evolution

132

GRIMM determines minimum number of rearrangements is 7 (naked eye gives 6).

There are numerous 7-step scenarios. The true scenario may have more than 7

steps.

GRIMM on X chromosome

133

GRIMM on X chromosome: breakpoint re-uses

134

GRIMM on ALL chromosomes

GRIMM determines minimum number of rearrangemnts is 245 (naked eye gives 130).

There are numerous 245-step scenarios. The true scenario may have more than 245

steps.

136

Are There any Rearrangement Hotspots in Human Genome?

137


Theorem. Yes

138


Theorem. YesProof:•Every rearrangement creates up to 2 breakpoints

139


Theorem. YesProof:•Every rearrangement creates up to 2 breakpoints• If there were no breakpoint re-use then after k rearrangements we would have 2k breakpoints

140


Theorem. YesProof:•Every rearrangement creates up to 2 breakpoints• If there were no breakpoint re-use then after k rearrangements we would have 2k breakpoints• Human-mouse comparison reveals 2k=260 breakpoints

141


Theorem. YesProof:•Every rearrangement creates up to 2 breakpoints• If there were no breakpoint re-use then after k rearrangements we would have 2k breakpoints• Human-mouse comparison reveals 2k≈260 breakpoints • If there were no breakpoint re-use, how many rearrangements happened on the human-mouse evolutionary path?

142


Theorem. YesProof:•Every rearrangement creates up to 2 breakpoints• If there were no breakpoint re-use then after k rearrangements we would have 2k breakpoints• Human-mouse comparison reveals 2k≈260 breakpoints • If there were no breakpoint re-use, how many rearrangements happened on the human-mouse evolutionary path? #rearrangements=#breakpoints/2=260/2=130

143


Proof continues:• If there were no breakpoint re-use: #rearrangements=#breakpoints/2=260/2=130

144


Proof continues:• If there were no breakpoint re-use: #rearrangements=#breakpoints/2=260/2=130 • Hannenhalli-Pevzner theorem implies that there were at least 245 rearrangements on the human-mouse evolutionary path

145


Proof continues:• If there were no breakpoint re-use: #rearrangements=#breakpoints/2=260/2=130 • Hannenhalli-Pevzner theorem implies that there were at least 245 rearrangements on the human-mouse evolutionary path• Is 245 larger than 130?

146


Proof continues:• If there were no breakpoint re-use: #rearrangements=#breakpoints/2=260/2=130 • The Hannenhalli-Pevzner theorem implies that there were at least 245 rearrangements on the human-mouse evolutionary path• Is 245 larger than 130? • Yes, 245 >> 130

147


Theorem. YesProof:•Every rearrangement creates up to 2 breakpoints• If there were no breakpoint re-use then after k rearrangements we would have 2k breakpoints• Human-mouse comparison reveals 2k≈260 breakpoints • If there were no breakpoint re-use, how many rearrangements happened on the human-mouse evolutionary path? #rearrangements=#breakpoints/2=260/2=130

148


Proof continues:• If there were no breakpoint re-use: #rearrangements=#breakpoints/2=260/2=130 • Hannenhalli-Pevzner theorem implies that there were at least 245 rearrangements on the human-mouse evolutionary path• Is 245 larger than 130? • Yes, 245 >> 130• There was a vast breakpoint re-use – an argument against the random breakage model (according to scan statistics).

149

Human/mouse comparison reveals the size of breakpoint regions (regions between

consecutive synteny blocks) is small, accounting for ~ 5% of genome

breakpoint re-use is very high, approx. 1.9 uses per breakpoint region on average

Mouse genome paper (Nature, 2002):The analysis suggests that chromosomal breaks may have a tendency to reoccur in certain regions.

High Breakpoint re-use provides evidence against the Random Breakage Model

155

Random Breakage Theory re-re-visited

Sankoff and Trinh, 2004 refute this conclusion and suggested that RBM is correct.

158

Random Breakage Theory re-re-visited

If you are not criticized, you may not be doing much

Donald Rumsfeld

But how can one criticize a

THEOREM???

160


Theorem. Yes!Proof:• ………………………………………………………• ………………………………………………………• Is 245 larger than 130? • Yes, 245 >> 130• ……………………………………………………… ……………………………………………………….

161


Theorem. Yes!Proof:• ………………………………………………………• ………………………………………………………• Is 245 larger than 130? • Yes, 245 >> 130• ………………………………………………………Sankoff did not question the validity of the proof - he questioned the validity of the numbers.The computed #breakpoint regions (260) and the rearrangement distance (245) are parameter-dependent and may be wrong

162

Sankoff-Trinh Argument

Designed a simulation where a series of random rearrangements created the appearance of rearrangement hotspots

How can it be???

163

Sankoff-Trinh Argument

Sankoff & Trinh designed a simulation where a series of random rearrangements created the appearance of rearrangement hotspots

How can it be??? S&T emphasized the importance of synteny

block generation and parameter choice S&T argued that the breakpoint re-use we observed

is caused by artifacts of parameter-dependent synteny block generation and micro-rearrangements

165

Walking in Sankoff-Trinh Shoes

Sankoff and Trinh used a simple synteny block generation algorithm (ST-Synteny) and claimed that it is similar to GRIMM-Synteny

ST-Synteny indeed appears to be similar to GRIMM-Synteny

We reproduced Sankoff-Trinh’s simulation and their ST-Synteny algorithm

168

Sankoff-Trinh Synteny Block Generation

ST-Synteny(Genome,w,) w: gap size : minimum synteny block size

Define each gene (anchor) in Genome as a separate block and iteratively amalgamate the resulting blocks



170

GRIMM-Synteny Block Generation

GRIMM-Synteny(Genome,w,) w: gap size : minimum synteny block size

Represent Genome in 2-D and form a graph whose vertex set is the set of genes in 2-D


Delete small synteny blocks (length )

171

Comparing Two Algorithms GRIMM-

Synteny(Genome,w,∆) Represent Genome in 2-D

and form a graph whose vertex set is the set of genes in 2-D







172

Comparing Two Algorithms GRIMM-

Synteny(Genome,w,∆) Represent Genome in 2-D

and form a graph whose vertex set is the set of genes in 2-D







The algorithms look very similar but do they produce similar results?

178

ST-Synteny Flaw I

Permutation-3 2 -1 -5 4

GRIMM-Synteny

ST-Synteny

Synteny blocks by GRIMM-Synteny & ST-Synteny

hypothetical genome 1

hypo

thet

ical

gen

ome

2

179

ST-Synteny Flaw I


GRIMM-Synteny

ST-Synteny



hypo

thet

ical

gen

ome

2

180

ST-Synteny Flaw I


GRIMM-Synteny

ST-Synteny



hypo

thet

ical

gen

ome

2

181

ST-Synteny Flaw I


GRIMM-Synteny

ST-Synteny



hypo

thet

ical

gen

ome

2

182

ST-Synteny Flaw I


GRIMM-Synteny

ST-Synteny



hypo

thet

ical

gen

ome

2

183

ST-Synteny Flaw I


GRIMM-Synteny

ST-Synteny



hypo

thet

ical

gen

ome

2

184

ST-Synteny Flaw I


GRIMM-Synteny

ST-Synteny



hypo

thet

ical

gen

ome

2

185

ST-Synteny Flaw I


GRIMM-Synteny

ST-Synteny



hypo

thet

ical

gen

ome

2

186

ST-Synteny Flaw I


GRIMM-Synteny

ST-Synteny



hypo

thet

ical

gen

ome

2

192

ST-Synteny Flaw II

GRIMM-Synteny vs.ST-Synteny

Number of blocks44, 10

Total block length (Mb)95, 140

Breakpoint regions (%)38, 9

Breakpoint re-use: 1.97, 1.64 Changing

parameters does not help

ST-Synteny

193

ST-Synteny Flaw III ST-Synteny is not even symmetric, i.e., the number of synteny blocks between human and mouse may differ from the number of synteny blocks

between mouse and human

194

ST-Synteny Results in Much Higher Breakpoint Reuse than GRIMM-Synteny

An artifact of ST-Synteny rather than any argument against the Fragile Breakage Model

201

Random Breakage Model re-re-re-visited

Sankoff & Trinh, 2004 emphasized the importance of accurate synteny block generation but felt victims of their own flawed ST-Synteny algorithm

ST-Synteny was never applied to real data

Peng et al., 2006 (PLOS Computational Biology): If Sankoff & Trinh fixed their ST-Synteny algorithm, they would

confirm rather than reject Pevzner-Tesler’s Fragile Breakage Model

Sankoff, 2006 (PLOS Computational Biology): Not only did we foist a hastily conceived and incorrectly

executed simulation on an overworked RECOMB conference program committee, but worse—nostra maxima culpa—we obliged a team of high-powered researchers to clean up after us!

”nostra maxima culpa” = It’s all our fault (Latin)

Kikuta et al., Genome Res. 2007: “... the Nadeau and Taylor hypothesis is not possible for the explanation of synteny in rat.”

All Recent Studies Support FBM

205

Where are the rearrangement hotspots located?

We demonstrated the existence of rearrangement hotspots but did not answer the question where they are.

We presented the preliminary answer to the question in Murphy et al., Science, 2005 (joint work with “Mammalian Genomic Architectures” consortium).

Many groups are currently trying to identify all fragile regions in mammalian genomes (Alekseyev and PP, Genome Biology, 2010)

Turnover Fragile Breakage Turnover Fragile Breakage ModelModel

Recent studies reveal evidence for the “birth and death” of the fragile regions, implying that they move to different locations in different lineages.

This discovery resulted in the Turnover Fragile Turnover Fragile Breakage Model (TFBM)Breakage Model (TFBM) that accounts for the “birth and death” of the fragile regions and sheds light on a possible relationship between rearrangements and Matching Segmental DuplicationsMatching Segmental Duplications.

TFBM points to locations of the currently fragile regions in the human genome.

Tests vs. Models

Why biologists believed in RBM for 20 years? Because RBM implies the exponential distribution of the sizes of the blocks observed in real genomes.

A flaw in this logic: RBM is not the only model that complies with the “exponential distribution” test.

Why RBM was refuted? Because RBM does not comply with the “breakpoint reuse” test: RBM implies low reuse but real genomes reveal high reuse.

FBM complies with both the “exponential distribution” and “breakpoint reuse” tests.

But is there a test that both RBM and FBM fail?

Exponential distribution

Breakpoint reuse

RBM YES NO

FBM YES YES

Model

Test

Tests vs. Models

RBM and FBM fail the Multispecies Breakpoint Reuse (MBR) test.


Breakpoint reuse

MBR

RBM YES NO NO

FBM YES YES NO

Model

Test

Tests vs. Models

TFBM passes all three tests.


Breakpoint reuse

MBR

RBM YES NO NO

FBM YES YES NO

TFBM YES YES YES

Model

Test

Implications of TFBM

Where are the (currently) Fragile Regions in the Human genome?

Prediction Power of TFBM Can we determine currently active regions in

the human genome HH from comparison with other mammalian genomes?

RBM provides no clue FBM suggests to consider the breakpoints

between HH and any other genome TFBM suggests to consider the closest

genome such as the macaque-human ancestor QHQH. Breakpoints in G(QH,H)G(QH,H) are likely to be reused in the future rearrangements of HH.

Validation of Predictions for

the Macaque-Human Ancestor Prediction of fragile regions on (QH,H)(QH,H) based on

the mouse, rat, and dog genomes:

Using mouse genome MM as a proxy: accuracy 34 / 552 ≈ 6%

Using mouse-rat-dog ancestor genome MRDMRD: accuracy 18 / 162 ≈ 11%

Using macaque genome QQ: accuracy 10 / 68 ≈ 16% (using synteny blocks larger than 500K)

Putative Active Fragile Regions in the Human Genome

Unsolved Mystery: What Causes Fragility? Zhao and Bourque, Genome Res. 2009,

suggested that fragility is promoted by Matching Segmental Duplications, a pair of long similar regions located within breakpoint regions flanking a rearrangement.

TFBM is consistent with this hypothesis since the similarity between MSDs deteriorates with time, implying that MSDs are also subject to a “birth and death” process.

222

Reconstructing Genomic Architecture of Tumor Genomes

1) Pieces of tumor genome: clones (100-250kb).

Human DNA

2) Sequence ends of clones (500bp).

3) Map end sequences to human genome.

Tumor DNA

Each clone corresponds to a pair of end sequences (ES pair) (x,y).

yx

223

Human genome(known)

Tumor genome(unknown)

Unknown sequence of rearrangements

Location of ES pairsin human genome.(known)

Map ES pairs tohuman genome.

-C -D EA B

B C EA D

x2 y2x3 x4 y1 x5 y5 y4 y3x1

Tumor Genome Reconstruction Puzzle

Reconstruct tumor genome

224

B C EA D

-C

-D

E

A

B

Tumor

Human

Tumor Genome Reconstruction

225

B C EA D

-C

-D

E

A

B

Tumor

Human


226

B C EA D

-C

-D

E

A

B

Tumor (x2,y2)

(x3,y3)

(x4,y4)

(x1,y1)

y4 y3x1 x2 x3 x4 y1 y2


227

B

C

E

A

D

Human

B C EA D

2D Representation of ESP Data

• Each point is ES pair.• Can we reconstruct the tumor genome from the positions of the ES pairs?

(x2,y2)

(x3,y3)

(x4,y4)

(x1,y1)

ESP Plot

Human

228

B

C

E

A

D

Human

Human

B C EA D



229

B

C

E

A

D

Human

Human

B C EA D



230

B

C

E

A

D

Human

Human

B C EA D



231

B

C

E

A

D

Human

Human

B C EA D



232

B

C

E

A

D

Human

Human

B C EA D



233

B

C

E

A

D

Human

Human

B C EA D



234

B

C

E

A

D

Human

Human

B C EA D



235

B

C

E

A

D

Human

Human

B C EA D



236

B

C

E

A

D

Human

Human

B

-D

E

A

DA C

E

-C

B

-C -D EA B

ReconstructedTumor Genome

237

Real data noisy and incomplete!

238

Breast Cancer MCF7 Cell Line

Human chromosomes MCF7 chromosomes5 inversions

15 translocations

Raphael et al. 2003.

241

Complex Tumor Genomes

247

Sequencing Tumor Clones Confirms Complex Mosaic Structure

Volik et al., Decoding the fine-scale structure of breast cancer genome and transcriptome: Implications for Tumor Genome Project, Genome Res., 2006

248

Sequencing Tumor Clones Confirms Complex Mosaic Structure

Volik et al., Decoding the fine-scale structure of breast cancer genome and transcriptome: Implications for Tumor Genome Project, Genome Res., 2006

Hampton et al., A sequence-level map of chromosome breakpoints yields insights into the evolution of cancer genome. Genome Res, 2008 (157 breakpoints found using next generation sequencing)