+ All Categories
Home > Documents > Short read mapping (Alignment)

Short read mapping (Alignment)

Date post: 19-Mar-2016
Category:
Upload: lali
View: 23 times
Download: 0 times
Share this document with a friend
Description:
Short read mapping (Alignment). ABySS:. Kmernn:100n:N50minmedian meanN50maxsum 20114654945862100686 10841891178715364700 25620625212781001026 21384578368085390722 3048872134181100998 25777525622255500829 35426119451591001002 28319493664665507159 - PowerPoint PPT Presentation
Popular Tags:
29
Short read mapping (Alignment)
Transcript
Page 1: Short read mapping (Alignment)

Short read mapping(Alignment)

Page 2: Short read mapping (Alignment)

Kmer n n:100 n:N50 min median mean N50 max sum20 11465 4945 862 100 686 1084 1891 17871 536470025 6206 2521 278 100 1026 2138 4578 36808 539072230 4887 2134 181 100 998 2577 7525 62225 550082935 4261 1945 159 100 1002 2831 9493 66466 550715940 3693 1827 147 101 1049 3089 10663 69602 564510045 3727 1797 145 100 1006 3062 10791 71989 550274650 2705 1791 145 100 1053 3069 10378 82440 549699455 3288 2636 148 100 398 2097 9959 71517 552898160 3053 2531 150 100 459 2242 10487 77123 567467464 3001 2571 164 100 499 2157 9508 47608 5546956

n n:100 n:N50 min median mean N50 max sum298 298 37 100 6747 18245 45215 185235 5437024

Newbler:

ABySS:

Enterobacter cloacae subsp. cloacae ATCC 13047 chromosome, Length: 5,314,581 ntplasmid pECL_B, Length: 84,653 ntplasmid pECL_A, Length: 199,562 nt

Total Length: 5598796

Page 3: Short read mapping (Alignment)

Alignment topics in GEN875

• Assembly• Whole genome alignment• Short read “mapping”• BLAST• Pair-wise using dynamic programming• Progressive multiple alignment

Page 4: Short read mapping (Alignment)

Alignment

• Take a set of sequences. Find where they match.

• Arrange sequences in a matrix where columns contain homologous (corresponding?) characters from each sequence

Page 5: Short read mapping (Alignment)

Types of Alignments

• Global – include the entire length of all sequences in the alignment

• Local – identify and align subsets of longer sequences

• Glocal - hybrid

Page 6: Short read mapping (Alignment)

Short Read Mapping

• Find a match between sequence reads and a reference genome

• Find the best match between sequence reads and a reference genome

• Find all the plausible matches between sequence reads and a reference genome

Page 7: Short read mapping (Alignment)

Reads may not match the reference exactly

• Sequence errors in the read – may be distinguishable using quality scores

• Sequence errors in the reference genome

• Legitimate polymorphism

Page 8: Short read mapping (Alignment)

When (how much) does (sequence and ) alignment accuracy matter?

• RNA seq for expression• RNA seq for annotation – endpoints, splicing• chIP seq• Resequencing related genomes for SNP detection• Resequencing related genomes for indel detection• Resequencing to clean up existing sequences• Sequencing to determine copy number

Page 9: Short read mapping (Alignment)

Short Read Mapping Tools• Bowtie• ELAND (Illumina)• Maq• SOAP• RMAP• ZOOM• SHRiMP• BFAST• MOSAIK• BWA• SOAP2

• Speed• Accuracy• Exact match vs

mismatches• Gapped vs ungapped• Greedy or exhaustive

Page 10: Short read mapping (Alignment)

Short Read Mapping Tools• Bowtie• ELAND (Illumina)• Maq• SOAP• RMAP• ZOOM• SHRiMP• BFAST• MOSAIK• BWA• SOAP2

• Hash table of oligos in reference sequence

• Hash table of input reads

• Hash table – method unknown

• Burrows Wheeler Transform-based Index

Page 11: Short read mapping (Alignment)

Spaced Seeds

Example Seed: 1100

Query: GATCMatches:GATC

GAACGACCGATTGATAGATG

• Length and weight of seeds

• Number of Hash tables required to find mismatches

Page 12: Short read mapping (Alignment)
Page 13: Short read mapping (Alignment)

Some mapping software uses alignment refinement

• Once candidates are identified using the hash table search, conduct a more rigorous alignment of the read and reference genome

• Smith-Waterman local alignment (with or without gaps)

Page 14: Short read mapping (Alignment)

Bowtie

• Burrows Wheeler Index based on FM (full-text minute) index extended to accommodate mismatches

• Reduces memory footprint• Increases speed• Amenable to multiple processors

• 14 .3x Illumina coverage of human genome mapped in 14 hrs on a 4 core desktop PC

Page 15: Short read mapping (Alignment)
Page 16: Short read mapping (Alignment)
Page 17: Short read mapping (Alignment)
Page 18: Short read mapping (Alignment)

gc$aaac

$ a c g0 1 4 6

For each character, make a table of the number of lexicographically smaller characters in the text

For each position, make a matrix of the number of occurrences of each character in the prefix

LF mapping is simple addition

1 g 2 c 3 $ 4 a 5 a 6 a 7 c$ 0 0 1 1 1 1 1a 0 0 0 1 2 3 3

c 0 1 1 1 1 1 2

g 1 1 1 1 1 1 1

Page 19: Short read mapping (Alignment)

Query Sequence:

GGTA

No exact match, so try alternative sequences with a mismatch:

GGCAGGAAGGTG

Page 20: Short read mapping (Alignment)

Bowtie caveat

“If one or more exact matches exist for a read, then Bowtie is guaranteed to report one, but if the best match is an inexact one then Bowtie is not guaranteed in all cases to find the highest quality alignment.”

…unless you use the slower “best” option

Page 21: Short read mapping (Alignment)
Page 22: Short read mapping (Alignment)
Page 23: Short read mapping (Alignment)
Page 24: Short read mapping (Alignment)
Page 25: Short read mapping (Alignment)
Page 26: Short read mapping (Alignment)

Alignment Methods – Dynamic Programming

• Needleman-Wunsch (global) and Smith-Waterman (local) use dynamic programming

• Guaranteed to find an optimal alignment given a particular scoring function

• Computationally intensive

Page 27: Short read mapping (Alignment)

Dynamic Programming

One possible simple scoring scheme:

•Si,j = 1 if the residue at position i of sequence #1 is the same as the residue at position j of sequence #2 (match score); otherwise •Si,j = 0 (mismatch score) •w = 0 (gap penalty)

Page 28: Short read mapping (Alignment)

Dynamic Programming

Three steps: 1) Initialize

Mi,j = MAXIMUM [Mi-1, j-1 + Si,j (match/mismatch in the diagonal),

Mi,j-1 + w (gap in sequence #1), Mi-1,j + w (gap in sequence #2)]

2) Fill Matrix

Page 29: Short read mapping (Alignment)

Dynamic Programming3) Traceback

G A A T T C A G T T AG G A - T C - G - - A

Score = 1+0+1+0+1+1+0+1+0+0+1 = 6


Recommended