Reconstruction of Haplotype Spectra from NGS Data Ion Mandoiu UTC Associate Professor in Engineering...

Post on 24-Dec-2015

215 views 0 download

Tags:

transcript

Reconstruction of Haplotype Spectra from NGS Data

Ion MandoiuUTC Associate Professor in Engineering InnovationDepartment of Computer Science & Engineering

University of Connecticut

Haplotype Spectra Reconstruction

• Given NGS reads, reconstruct:– Full length sequences– Sequence frequencies

• Example applications:– Single individual haplotyping– Allele specific transcriptome reconstruction– Viral quasispecies reconstruction

Single Individual Haplotyping

• Somatic cells are diploid, containing two nearly identical copies of each autosomal chromosome– Heterozygous loci found by mapping reads to reference

genome– Long haplotype fragments can be generated by

sequencing fosmid pools [Duitama et al. 2012]

RefHap Algorithm [Duitama et al. 12]

• Reduce the problem to Max-Cut• Solve Max-Cut• Build haplotypes according with the cut

Locus 1 2 3 4 5

f1 * 0 1 1 0

f2 1 1 0 * 1

f3 1 * * 0 *

f4 * 0 0 * 1

3f1

1

1 -1

-1f4

f2

f3

h1 00110h2 11001

Chr. 22, 32k SNPs, 14k fragments

Haplotype Spectra Reconstruction

• Given short sequence fragments, reconstruct:– Full length sequences– Sequence frequencies

• Example applications:– Single individual haplotyping– Allele specific transcriptome reconstruction– Viral quasispecies reconstruction

Transcriptome Reconstruction Challenge: Alternative Splicing

[Griffith and Marra 07]

1 742 3 65t1 :

1 743 65t2 :

1 742 3 5t3 :

t4 :1 743 5

1 742 3 65

• Map the RNA-Seq reads to genome

• Construct Splice Graph - G(V,E)– V : exons– E: splicing events

• Generate candidate transcripts– Depth-first-search (DFS)

• Filter candidate transcripts– Fragment length distribution

(FLD)– Integer programming

Genome

TRIPTransciptome Reconstruction using Integer Programming

How to filter?

• Select the smallest set of putative transcripts that yields a good statistical fit between– empirically determined during library preparation– implied by “mapping” read pairs

1 3

1 2 3

500

300

200 200 200

200 200

Series1

Mean : 500; Std. dev. 50

Series1

Mean : 500; Std. dev. 50

t3t2 t1

Allele Specific Expression

Haplotype Spectra Reconstruction

• Given short sequence fragments, reconstruct:– Full length sequences– Sequence frequencies

• Example applications:– Single individual haplotyping– Allele specific transcriptome reconstruction– Viral quasispecies reconstruction

RNA Virus Replication

High mutation rate (~10-4)

Lauring & Andino, PLoS Pathogens 2011

Shotgun reads starting

positions distributed

~uniformly

Amplicon reads

have predefined

start/end positions

covering fixed

overlapping windows

Shotgun vs. Amplicon Reads

Reconstruction from Shotgun Reads: ViSpA

Read Error Correction

Read Alignment

Preprocessing of Aligned Reads

Read Graph ConstructionContig Assembly

Frequency Estimation

Shotgun reads

Quasispecies sequences w/ frequencies

Reconstruction from Amplicon Reads: VirA

Reference in FASTAformat

Error-correctedSAM/BAMRead data

Estimate Amplicons

Max-Bandwidth Paths

Viral population variants with frequencies

Amplicon Read Graph

Frequency Estimation

• K amplicons represented by K-layer read graph

• Vertices distinct reads⇔

• Edges reads with consistent overlap⇔

• Vertices have count function c(v)

Amplicon Read Graph

Read Graph Transformation• Heuristic to reduce edges in dense graphs

• Replace bipartite cliques with star subgraphs

Challenges

• Scalability• Exploit inherent sparsity of biological instances

• E.g., exact scaffolding algorithm using non-serial

dynamic programming based on SPQR trees

• Flexibility• Long (noisy) reads + short

• Heterogeneous data, e.g., RNA-Seq + TSSeq + PolyA-Seq

• Quantifying reconstruction uncertainty• Compute intensive, e.g., bootstrapping

+

+

+

--

+

-

-

Acknowledgements

Jorge DuitamaSahar Al SeesiMazhar KahnRachel O’Neill

Alexander ArtyomenkoAdrian CaciulaNicholas MancusoSerghei MangulBassam TorkAlex ZelikovskyIrina AstrovskayaPavel Skums