+ All Categories
Home > Documents > Final Results Genome Assembly Team Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain,...

Final Results Genome Assembly Team Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain,...

Date post: 17-Dec-2015
Category:
Upload: joshua-beasley
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
30
Final Results Genome Assembly Team Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington, Juliette Zerick
Transcript

Final ResultsGenome Assembly Team

Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington, Juliette Zerick

454 raw reads

PRE-PROCESSING

Illumina raw reads

Pre-processing

454 reads

Illumina reads

Statistical analysis

Read stats

Published Genomes from public databases

V. vulnificus

YJ016

V. vulnificus CMCP6

V. vulnificus MO6-24/O

Align Illumina against the reference

FastqcPrinseqNGS QC

Compare mapping statistics

Reference genome

samstats

bwa

REFERENCE SELECTION

Hybrid DeNovo • Ray• MIRA

Illumina/ 454/ Hybrid DeNovo assembly

454 DeNovo• Newbler• CABOG• SUTTA

Illumina DeNovo• Allpaths LG• SOAP DeNovo• Velvet• Taipan• SUTTA

contigs * 3

Align illumina reads against 454 contigs

Unmapped reads

Mac vectorCLC wb

contigs

Unmapped reads

Evaluation

GAGEHawk-eye

Illumina/(454?) reference based

assembly

AMOScmp

contigs

Unmapped reads

DENOVO ASSEMBLY

REFERENCE BASED ASSEMBLY

Draft/ Finished genome

Reference evaluation

Reference evaluation

DNA DiffMUMmer

Parameter optimization

CONTIG MERGING

All possible combinations of the

best 3

MimimusMAIA

PAGITMauve

Finished genomeScaffolds

GAGE

GENOME FINISHING

Gap filling Nulceotide identity

MUMmer

GRASSBuilt-in

Process

454

Illumina

Info.

Chosen Ref.

Assemblers

Assemblers

Illumina454

LEGEND

hybrid

Original Pipeline

Read Visualization – spot the differences

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Comparison of 454 Reads for 08-2462 (low coverage) and 2541-90 (improved coverage)

Read Visualization - more is better!

Nav 08-2462 454 reads compared to Nav 08-2462 Illumina reads.

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Read Visualization – cousins or siblings?

Nav_2541-90 and Vul_06-2432 (454 and Illumina reads) coverage comparison.

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Data Quality

Effect of pre-processing data (using prinseq)

V. navarensis (454; non-preprocessed|pre-processed)Metric 2423-01 08-2462 2541-90 2756-81

Per Base Seq. Quality

Per Seq. Quality Sc

Per Base Seq. Content

Per Base GC Content

Per Seq. GC Content

Per Base N Content

Seq. Length Dist.

Seq. Dup. Levels

Overrepresented Seqs.

Kmer Content

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

V. Vulnificus (454; non-preprocessed|preprocessed)

Metric

Metric 2009V_1368

06-2432 08-2435 08-2439 07-2444

Per Base Seq. Quality

Per Seq. Quality Score

Per Base Seq. Content

Per Base GC Content

Per Seq. GC Content

Per Base N Content

Seq. Length Dist.

Seq. Dup. Levels

Overrepresented Seqs.

Kmer Content

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

V. navarensis (Illumina; non-preprocessed|preprocessed)

Metric 2423-01 08-2462 2541-90 2756-81

Per Base Seq. Quality

Per Seq. Quality Score

Per Base Seq. Content

Per Base GC Content

Per Seq. GC Content

Per Base N Content

Seq. Length Dist.

Seq. Dup. Levels

Overrepresented Seqs.

Kmer Content

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

V. vulnificus (Illumina; non-preprocessed|preprocessed)Metric 2009V_1368 06-2432 08-2435 08-2439 07-2444

Per Base Seq. Quality

Per Seq. Quality Score

Per Base Seq. Content

Per Base GC Content

Per Seq. GC Content

Per Base N Content

Seq. Length Dist.

Seq. Dup. Levels

Overrepresented Seqs.

Kmer Content

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Assembly

Reference-guided and de-Novo

Reference guided assembly

Comparison of reference guided assembly vs de-novo assembly

ARE – Assembly Score

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Reference-guided vs de-Novo assembly

AMOSC

mp

Newbl

er (r

ef)

CABOG

Newbl

er (d

n)

SOAP

dn

Velvet

Ray0

10

20

30

40

50

60

70

80

90

454 (Vul_06-2432)454 (Nav_2541-90)Illumina (Vul_06-2432)Illumina (Nav_2541-90)

ARE

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Summary of Reference-guided assembly Using V. vulnificus (CMCP6) reference strain

84% coverage De-Novo assemblers overall provided higher assembly score

than reference based assembly

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

40 50 1000

102030405060708090

100Newbler (denovo)

Nav_2541-90Vul_06-2432

K-MER SIZE

AR

EDe Novo Assembly

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

De Novo Assembly

15 22 2505

101520253035404550

CABOG

Nav_2541-90Vul_06-2432

K-MER Size

AR

E

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

De Novo Assembly

20 30 40 50 60 700

0.51

1.52

2.53

3.54

SOAPdenovo

Nav_2541-90Vul_06-2432

K-MER Size

AR

E

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

De Novo Assembly

19 25 310

1

2

3

4

5

6

7

Velvet

Nav_2541-90Vul_06-2432

K-MER Size

AR

E

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

De-Novo Assembler Comparison (Optimal Parameters)

CABOG Newbler (dn)

Ray Ray (hybrid)

SOAPdn Velvet0

10

20

30

40

50

60

70

80

90

100

454 (Vul_06-2432)Illumina (Vul_06-2432)454 (Nav_2541-90)Illumina (Nav_2541-90)Hybrid (Vul_06-2432)Hybrid (Nav_2541-90)

ARE

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Final Results – V. vulnificus

Assem

bly

S

co

re

Velvet

Graph comparing assemblers on 3 criteria: Assembly Score, Span Ratio, 1/(Break Points). Higher score for all criteria are preferable. Newbler (dn) has been removed to show variance in other tools.

Span Ratio

CABOG

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Final Results – V. vulnificus

Graph comparing assemblers on 3 criteria: Assembly Score, Span Ratio, 1/(Break Points). Higher score for all criteria are preferable.

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

1000/(Break Points)

Summary of de-Novo results OLC assemblers showed considerable differences in ARE than

de-Brujin based assemblers Cabog/Newbler vs Soap de-Novo/Velvet

Hybrid assembler, Ray, did not perform as well in terms of assembly score

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Merging-Vul_06-2432AMOScmp CABOG Newbler

(dn;454)Newbler (ref;454)

Newbler ref ill

Ray (454) Ray(Ill) Ray (hybrid)

SOAPdn Velvet

AMOScmp164.00 234.69 6.35 4.69 63.51 55.13 64.51 44.38 67.22

CABOG164.00 225.12 101.30 62.66 73.23 93.88 98.11 75.98 113.08

Newbler (dn;454) 234.69 221.89 5.48 ND 311.98 ND 419.76 104.46 127.01

Newbler (ref;454) 6.35 99.30 5.48 1.44 67.72 64.99 72.79 35.07 72.34

Newbler (ref;Illumina) 4.69 62.66 ND 1.44 35.28 ND ND ND ND

Ray (454)63.50 72.56 311.99 67.72 35.28 33.81 49.94 22.92 37.68

Ray (Illumina) 55.13 93.88 ND 64.99 ND 33.81 ND ND ND

Ray (hybrid)64.51 97.17 419.76 72.79 ND 49.94 ND ND ND

SOAPdn44.38 75.98 104.46 35.07 ND 22.92 ND ND ND

Velvet67.22 113.08 127.01 72.34 ND 37.68 ND ND ND

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Merging-Nav_2541-90AMOScmp Cabog Newblerdn Newbler

(ref;454)Newbler (ref;Illumina)

Ray (454) Ray (Illumina)

Ray (hybrid)

SOAPdn Velvet

AMOScmp133.95 ND 0.03 0.03 15.26 14.00 15.77 11.23 45.32

Cabog133.95 ND 107.60 114.60 82.62 92.44 92.53 80.73 123.02

NewblerdnND ND ND ND 54.21 59.81 60.47 33.17 94.89

Newbler (ref;454) 0.03 107.60 59.94 0.11 11.6 11.78 11.86 10.17 39.2

Newbler (ref;Illumina)

0.03 114.60 ND 0.28 12.66 12.15 12.41 9.6 39.60

Ray (454)15.26 82.62 54.21 11.60 12.66 59.19 76.36 13.65 63.75

Ray (Illumina) 14.01 92.44 59.81 11.78 12.15 33.79 24.21 11.54 39.84

Ray (hybrid)15.77 92.53 60.47 11.86 12.41 40.33 36.79 14.06 ND

SOAPdenovo 11.22 80.73 33.17 10.04 9.54 13.61 11.40 13.91 8.47

Velvet45.32 123.02 94.89 39.20 39.84 64.54 39.84 ND 8.31

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Assembler ReviewAssembler Status 454 Illumina Hybrid Algorithm

Allpaths LG Paired-end only DBG

AMOScmp BB

CABOG OLC

MIRA ZEBRA

Newbler OLC

Ray DBG

SOAPdenovo DBG

SUTTA Unresolved errors BB

Velvet DBG

BB = branch-and-bound; OLC = overlap consensus; DBG = de Bruijn Graph; ZEBRA

Mira worked as good as our merged contigs but it is impractical – 40hr run time

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

454 raw reads

PRE-PROCESSING

Illumina raw reads

Pre-processing

454 reads Illumina reads

Statistical analysis

Read stats

FastqcPrinseq

Hybrid DeNovo • Ray• Mira

Illumina/ 454/ Hybrid DeNovo assembly

454 DeNovo• Newbler• CABOG

Illumina DeNovo• Velvet

contigs

Align illumina reads against 454 contigs

contigs

DENOVO ASSEMBLY

CONTIG MERGING

Merge Ray –hyb/ Newbler Merge CABOG/Velvet

MIRA-hyb

Mimimus

Draft genome

Process

454

Illumina

Info.

Assemblers

Assemblers

Illumina

454

LEGEND

hybrid

Final Pipeline

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Splinter

Pipeline 1 Pipeline 2

NUM AVG N50Assembly Size

Assembly Score

Nav_2423-01 106 42657.2 156064 4.52 136.53Nav_08-2462 149 25736.8 51230 3.83 19.48Nav_2541-90 166 26172.5 130386 4.34 62.57Nav_2756-81 107 42939.4 131591 4.59 122.31Vul_2009v-1368 83 57787.2 401973 4.80 345.03Vul_06-2432 57 85122.7 322525 4.85 419.76Vul_08-2435 111 42872.9 230373 4.76 144.01Vul_08-2439 98 50885.7 250789 4.99 210.94Vul_07-2444 70 73255.1 492706 5.13 656.10

NUM AVG N50Assembly Size

Assembly Score

Nav_2423-01 125 35357.0 164305 4.42 111.36Nav_08-2462 451 311.9 2253 0.14 0.09Nav_2541-90 106 40547.5 169781 4.30 123.02Nav_2756-81 111 41840.8 132119 4.64 124.55Vul_2009v-1368 97 49705.8 228408 4.82 170.81Vul_06-2432 167 28489.7 78353 4.76 32.53Vul_08-2435 193 24903.7 204178 4.85 75.19Vul_08-2439 114 44047.9 180889 5.02 134.64Vul_07-2444 143 35905.1 130942 5.13 85.93

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Visualization

Merged

Newbler Ray Hybrid

Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Demo


Recommended