+ All Categories
Home > Documents > Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

Date post: 01-Jan-2016
Category:
Upload: adam-davidson
View: 217 times
Download: 2 times
Share this document with a friend
Popular Tags:
31
Aligning Reads Ramesh Hariharan Strand Life Sciences IISc
Transcript
Page 1: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

Aligning Reads

Ramesh Hariharan

Strand Life SciencesIISc

Page 2: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

What is Read Alignment?

Page 3: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

AGGCTACGCATTTCCCATAAAGACCCACGCTTAAGTTC

Subject’s Genome

AGGCTACGCATGTCCCATAATGACCCACACTTAAGTTC

Reference Genome

Where do these

match in the

Reference?

Close but not quite

the same as the

Subject’s Genome

Page 4: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

What does “Match” mean?

Page 5: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

AGGCTACGCATGTCCCATAATGACCCACACTTAAGTTC

Reference Genome

GCTACGCA

Exact Match

CATAAAGAC

With Mismatche

s

CACTT_AGT

With Gaps

Page 6: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

Why mismatches and gaps?

Page 7: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

The subject genome could be different from the reference

Page 8: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

Reads

Reference

Genome

SNP

Deletion

Mismatches and Gaps

Page 9: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

The reading process could be erroneous

Page 10: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

How many mismatches and gaps?

Page 11: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

Short reads ~50, few

mismatches and gaps

Long reads, ~1000, many

more mismatches

and gaps

Page 12: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

How do aligners fare?

Page 13: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

BWA: Very few

mismatches and gaps

CoBWeb

BWA-SW: Many

mismatches and gaps

BowTie: only

mismatches, no gaps

No paired read

handling

No handling of adaptor

trimming for small RNA

Separate handling for

RNASeq

BowTie2

Page 14: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

How does an Aligner work?

Page 15: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

For simplicity, assume Exact Match

Page 16: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

For each read, scan the entire reference genome sequence

SLOW!!!!

Page 17: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

C G A C G

The Reference

C

C

G

T

T

A C

A G

A C

T

Index the Reference

Page 18: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

How can we find Exact Matches of a read quickly with this index?

Page 19: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

C G A C G

The Reference

C

C

G

T

T

A C

A G

A C

T

C G C

Page 20: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

The problem: 24GB

Page 21: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

Can this structure be compressed?

Page 22: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

C G A C $

A C $ C GC G A C $C $ C G AG A C $ C$ C G A C

The Reference

This column is the BWT

All its circular shifts, sorted

lexicographically

The Index: now an array instead

of a tree

The Burrows-Wheeler

based Index

Sampled to reduce memory at the

expense of speed (Ferragina and

Manzini)

Page 23: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

How about Mismatches and Gaps?

Page 24: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

BWA, BWA-SW and BowTie force mismatches and gaps into the BW Index searching

procedure

Page 25: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

CoBWeb uses the BW Index to find a ‘seed’ exact match and does Smith-Waterman around

this seed

This 15-mer occurs at

locations x1, x2…

This 15-mer occurs at

locations x3, x4…

This whole 30-mer occurs at

location x5

Page 26: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

Dynamic Programming

• Given a location in the reference with an read anchor, how well does the read match here?

Reference

Read

Anchor 14 mer

• Smith-Waterman (optimized for large gaps)

Page 27: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

Comparison with BWA

Read Length 50

Read Length

150

20% faster than BWA with

comparable results

CoBWeb: 3 mismatches and 2 gaps

BWA: 2 mismatches + 1 gap of possibly multiple length

Page 28: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

Comparison with BWA-SW

Read Length

400

8 mismatches

plus 10 gaps

CoBWeb BWA-SW

Reads 1m 1m

Time taken 1130s 2242s

Incorrectly Mapped 12598 9819

5650 mapped

incorrecty by BWA-SW

The remainder

has poor BWA mapping quality

Page 29: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

Avadis NGS

Page 30: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

Avadis NGS Alignment, DNA Var Detection,

RNASeq, ChIPSeq, Small RNASeq

Page 31: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

Thank You


Recommended