+ All Categories
Home > Documents > Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

Date post: 17-Dec-2015
Category:
Upload: miles-norton
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
22
Fast and accurate short read alignment with Burrows– Wheeler transform Heng Li and Richard DurbinMembers of this presentation: Yunji Wang Sree Devineni Zhen Gao
Transcript
Page 1: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

Fast and accurate short read alignment with Burrows–Wheeler

transformHeng Li and Richard Durbin∗

Members of this presentation:Yunji WangSree DevineniZhen Gao

Page 2: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

Motivation

The first generation of hash table-based methods (e.g. MAQ) are:SlowNot support gapped alignment

Page 3: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

Suffix array interval

position of each substring will occur in an interval in the suffix array. (On the right figure)

e.g. Suffix interval of pattern “go” is [1, 2].What about “og”?

Page 4: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

Prefix trie and Inexact string matchingPrefix trie of string “GOOGOL”

The dashed line shows how to find string ‘LOL’ (1 mismatch allowed)

What about “LOG”?

Page 5: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

ConclusionsScientists Implemented of Burrows-Wheeler Alignment tool (BWA) which is based on BWT. Thus:FastReducing memoryAllow gaps

Page 6: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

REFERENCESHeng Li and Richard Durbin (2009) Fast and

accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 25, no. 14 2009, pages 1754–1760

Page 7: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

CS 6293: Advanced Topics: Current Bioinformatics

A probabilistic framework for aligning paired-end RNA-seq data

Members of this presentation:Yunji WangSree DevineniZhen Gao

Page 8: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

A probabilistic framework for aligning paired-end RNA-seq data

• Current Biology Method

• Align RNA-seq reads to the reference genome rather than to a transcript database.

Page 9: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

Current Biology Problem

• A single read:

Constitute 35-100 consecutive nucleotides of a fragment of an mRNA transcript.

• However, the expected size of mRNA fragments are around 182bp.

• Paired-end read (PER)protocol sequences two ends of a size-selected fragment of an mRNA.

(Double the length of single read)

Page 10: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

Problem of PER fragment alignment

• Problem:

The expected distance between the two end reads within the transcript fragment, know as mate-pair distance.

The distance between the two ends when aligned to the genome is quit different with mate-pair distance.

Page 11: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

Problem of PER fragment alignment

Page 12: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

Current Tools

• TopHat• TopHat reports the closest end alignment for a

PER.

• SpliceMap• SpliceMap considers PERs with ends mapped

within 400 000bp on the genome.

Page 13: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

Method-Step 1

• Mapping the individual reads

Page 14: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

Method-Step 2

• Graphical model

Page 15: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

Probabilistic framework

• Splice graph, G={V,E}

• Nodes - individual nucleotides• Directed edge types✔connect adjacent nodes✔Skips around the sliced-out portion of

the genome

Page 16: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

Estimation of alignments,

(Maximize likelihood of PERsover all the putative alignments.)

Page 17: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

EM continued...

Page 18: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

Methods-Step 3

• Expectation-maximization algorithm

Page 19: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

Discussion• Proposed a probabilistic framework to

predict the alignment of each PER fragment to a reference genome.

• By maximizing the likelihood of all PER alignments through a splice graph model

• Advantageous-higher coverage and specificity than just the alignment of PERs.

• Capable of detecting trans-chromosome and trans-strand gene fusion events.

Page 20: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

Advantages

• First, the fragment alignments significantly increase coverage of the transcriptome.

Reason: The PER contains almost double information of single read.

• Second, it has higher specificity than the junctions in the individual end reads.

Reasons: EM algorithm used the information from the entire set of end read alignments.

Page 21: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

Advantages

• Third, the splice graph accurately captures alternative paths between two end read and the expected mate-pair distance can effectively disambiguate them.

Page 22: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.

Thank you


Recommended