+ All Categories
Home > Documents > Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

Date post: 23-Mar-2016
Category:
Upload: len
View: 56 times
Download: 5 times
Share this document with a friend
Description:
Aligning Multiple Genome Sequences With the Threaded Blockset Aligner. Blanchette, W., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F.A., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., Haussler, D., and Miller, W. Genome Research 2004. Outline. Introduction TBA MULTIZ - PowerPoint PPT Presentation
Popular Tags:
29
1 Aligning Multiple Genome Sequences With the Threaded Blockset Aligner Blanchette, W., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F.A., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., Haussler, D., and Miller, W. Genome Research 2004
Transcript
Page 1: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

1

Aligning Multiple Genome Sequences With the Threaded

Blockset Aligner

Blanchette, W., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F.A., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., Haussler, D., and Miller, W.

Genome Research 2004

Page 2: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

2

Outline

• Introduction• TBA• MULTIZ• How TBA was built• Evaluation of alignment accuracy• Accuracy of the Multiple Alignments• Experiment results

Page 3: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

3

Introduction

• Reference Sequence Idea– A sequence is fixed as the reference to which

all other sequences are compared

Efficient methods for multiple sequence alignment with guaranteed error bounds, Gusfield, D., Bull. Math. Biol., 1993, Vol. 55, pp. 141-54.

S1: A T G C T CS2: A G A G CS3: T T C T GS4: A T T G C A T G C

S1: A T - G C - T - CS2: A - - G A - G - CS3: - T - T C - T - GS4: A T T G C A T G C

S1: D(S1,S2) + D(S1,S3) + D(S1,S4) = 9S2: D(S2,S1) + D(S2,S3) + D(S2,S4) = 12S3: D(S3,S1) + D(S3,S2) + D(S3,S4) = 12S4: D(S4,S1) + D(S4,S2) + D(S4,S3) = 11

S1: A T G C T CS2: A - G A G C

S1: A T G C T CS2: A - G A G CS3: - T T C T G

Page 4: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

4

• Benefit– Simplicity

• Drawbacks– Regions conserved in a subset of the species, but

absent from the reference sequence, are not identified.

– Alignments generated with different reference sequences may be inconsistent.

• Inconsistent:– Two positions that are aligned to each other using

one reference sequence might be aligned to different positions when another reference sequence is chosen.

S1: A T - G C - T - CS2: A - - G A - G - CS3: - T - T C - T - GS4: A T T G C A T G C

S1: A T G C T CS2: A G A G CS3: T T C T GS4: A T T G C A T G C

Page 5: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

5

TBA

• Threaded Blockset Aligner

• Block: – A local alignment of the sequences

• Blockset: – A set of Blocks

Page 6: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

6

96h: human (400bp)

m: mouse (400bp)

r: rat (350bp)

TBA

Block

1 400

1 400

1 350

101 200

Blockset

201 300

14651

Page 7: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

7

TBA

• Thread:– A sequence S threads a blockset if every position

in the sequence S appears exactly once in some block of the blockset.

• Threaded blockset:– A blockset is threaded by each of the original

sequences.

Page 8: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

8

96h: human (400bp)

m: mouse (400bp)

r: rat (350bp)

TBA

Block

1 400

1 400

1 350

101 200

Blockset

201 300

14651

Threaded blockset

Page 9: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

9

• Ref-blockset: – A Blockset where every block has a row from a

particular sequence which is designated as the reference for that ref-blockset.

• Projection:– Given a thread blockset,

generate an S-ref blockset for any sequence S.

Threaded Blockset

Ref-blockset

Page 10: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

10

TBA• Any two ref-blocksets generated by

projection from the same threaded blockset are consistent.

m-ref-blockset

h-ref-blockset

Threaded blockset

Page 11: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

11

TBA

• Threaded Blockset Aligner– TBA produces a set of blocks in which each

position in the given sequences to be aligned appears once and only once.

– Any detected match among some or all of the sequences is represented among the blocks, and mutually consistent reference-sequence alignments can be extracted at will.

Page 12: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

12

• Alignment between the chloroplast genomes of Arabidopsis thaliana(阿拉伯芥 ) and Oenothera elata(月見草 ) by PipMaker.

• Blocks of a threaded blockset for the chloroplast genomes of Arabidopsis thaliana(a) and Oenothera elata(p).

Page 13: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

13

Applying TBA to vertebrate HOX clusters

Tilapia

Mammals

Fish

alg
決定脊椎動物身體結構的重要Hox基因
Page 14: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

14

Applying TBA to vertebrate HOX clusters

Human

Mammals

Fish

alg
決定脊椎動物身體結構的重要Hox基因
Page 15: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

15

• Assumption– The matching regions occur in the same order

and orientation in all species.

• Partial order– For a sequence S, S’s segments in block A

precedes S’s segments in block B, and we say that block A precedes block B.

• Local alignment– Pairwise alignment: BLATZ– Three or more sequences alignments: MULITZ

Page 16: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

16

MULTIZ

• Deals with alignments between three or more sequences .

– MULTIZ• Merge two blocksets by assistance of another

guiding blockset.

– HUMOR• A specialized version of MULTIZ used in “The Rat

Genome Sequencing Consortium 2003.s”

Page 17: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

17

How does it work?

Page 18: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

18

How does it work? Cont.• Proceeds in order along S (The reference for G,

M and the output).

• Access the corresponding (to S’s position) portion of N according to G.

• Collect each aligned columns.

Page 19: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

19

HUMOR

• Stands for Human-Mouse-Rat • Starts with pairwise human-ref blocksets

for human-mouse and for human-rat.• Trims columns from the ends of the

blocks to make the human components identical.

• Aligns the mouse and rat intervals to each other.

• Aligns the human interval to the resulting mouse-rat block.

Page 20: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

20

How TBA was built

Page 21: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

21

Page 22: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

22

Evaluation of Alignment Accuracy• Simulate sequence evolution, starting

with some ancestral sequence and performing mutation along the branches of a predetermined phylogenetic tree.

• Use the agreement between the truth and the result as a scoring method.

Page 23: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

23

Accuracy of the Multiple Alignments (9 Mammals)

Page 24: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

24

Accuracy of the Multiple Alignments (H,M,R)

Page 25: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

25

Experimental results

• Accuracy of the closely related sequences is better than more diverged ones.

• TBA uniformly stands out for the more diverged pairs.

• For most programs, their accuracy increases when there’re smaller number of species (indicates improvement, more species should have more information).

Page 26: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

26

• MULTIZ suffers mouse-rat alignment.

• Human-rat is also slightly worse than the human-mouse alignment because rat is aligned to human only through mouse.

• Score of 1.0 may be impossible to achieve, because a certain information is lost during sequence evolution.

• Score of 1.0 is usually not necessary, some errors are inconsequential.

Experimental results

Page 27: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

27

• Running Time– Only the four

programs (MULTIZ, TBA, MAVID, MLAGAN) actually designed for aligning large regions run fast enough.

– MAVID super fast!

Experimental results

Page 28: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

28

~ Thank you ~

Page 29: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

29


Recommended