Accelerating Genome Assembly with Power8 · 2/14/2016 · Your logo here Experimental Test Beds...

Revolutionizing the Datacenter

Join the Conversation #OpenPOWERSummit

Accelerating Genome Assembly with Power8

Seung-Jong Park, Ph.D.

School of EECS, CCT, Louisiana State University

Join the Conversation #OpenPOWERSummit

Your logohere

Agenda

The Genome Assembly Problem

Accelerating Graph Construction with POWER8

Accelerating Graph Simplification with CAPI Flash

24/1/2016

Your logohere

The Genome Assembly Problem

34/1/2016

Your logohere

NGS Technologies Outpaced Moore’s Law

Software with Extreme Scalability

HPC Platform• More Compute Cycles

• Extreme I/O Performance

• Huge Storage Space

Challenges for Genome Assemblers

44/1/2016

Genome

NGS

Reads (TBs)

HPC

Re-constructed

Genome (MBs/GBs)Data and

ComputeIntensive

Your logohere

MapReduce-based Graph Construction

54/1/2016

TA

GT

CG

AG

G

CT

GG

CT

TTA

GA

T

CT

GA

GG

CT

TTA

G Map

TT

TA

GA

GA

CA

GG

AT

CC

GA

TG

A

GTA

GT

CG

AG

G

CT Map

TT

TA

:G

TA

GT

:C

TT

AG

:A

TA

GA

:GT

CC

G:

AT

GA

G:

N

TC

GA

:

G

AG

AG

:

AA

GA

C:A

AC

AG

:

NA

TC

C:

GC

CG

A:

TC

GA

T:

GA

TG

A:G

AG

TC

:

GC

GA

G:

GA

GG

C:

T

GA

TC

:

C

GA

GA

:

CG

AC

A:

G

GA

TG

:AG

TC

G:

AG

AG

G:

CG

GC

T:

N

GG

CT

:

T

GT

CG

:

AG

AG

G:

CG

GC

T:

N

GC

TT

:TG

AT

C:

NG

AG

G:

CG

GC

T:

TG

CT

T:T

AG

TC

:

GC

GA

G:

GA

GG

C:

TC

TT

T:A

AG

AT

:CA

GG

C:

TC

TT

T:A

TA

GT

:C

TG

AG

:

G

TC

GA

:

GT

TT

A:G

TT

AG

:A

TA

GA

:T

TT

TA

:G

TT

AG

:N

Reduce

Reduce

Reduce

TA

GA

:G,T

TA

GT

:C

TC

CG

:A

TC

GA

:G

TG

AG

:G

TT

AG

:A

TT

TA

:G

AC

AG

:N

AG

AC

:A

AG

AG

:A

AG

AT

:C

AG

GC

:T

AG

TC

:G

AT

CC

:G

AT

GA

:G

CC

GA

:T

CG

AG

:G

CG

AT

:G

CT

TT

:A

GA

CA

:G

GA

GA

:C

GA

GG

:C

GA

TC

:C

GA

TG

:A

GC

TT

:T

GG

CT

:T

GT

CG

:A

Your logohere

Accelerating Graph Construction with POWER8

64/1/2016

Your logohere

Experimental Test Beds

74/1/2016

System Type IBM PKY Cluster LSU SuperMikeII

Processor Two 10-core IBM Power8 Two 8-core Intel SandyBridge Xeon

Maximum #Nodes used in various

experiments

40 120

#Physical cores/node 20 (8 Simultaneous Multi-Thread) 16 (Hyper threading disabled)

#vcores/node 160 16

RAM/node (GB) 256 32

#Disks/node 5 3

#Disks/node used for shuffled data 3 1

Total Storage space/node used for shuffled

data

1.8 0.5

Network 56Gbps InfiniBand (non-blocking) 40Gbps InfiniBand (2:1 blockings)

Your logohere

Datasets

84/1/2016

Genome data set Input size Shuffle data

size

Output size

Rice genome 12GB 70GB 50GB

Bumble bee genome 90GB 600GB 95GB

Metagenome 3.2TB 20TB 8.6TB

Your logohere

Hadoop Configurations

94/1/2016

Hadoop Parameters IBM Power8 SuperMikeII

Yarn.nodemanager.cpu.resource.vcore 120 16

Yarn.nodemanager.memory.mb 231000 29000

Mapreduce.map/reduce.cpu.vcore 4 2

Mapreduce.map/reduce.memory.mb 7000 3500

Mapreduce.map/reduce.java.opts 6500m 3000m

Your logohere

Hadoop Scalability with POWER8 SMTs

Tested with small size rice genome data on 2 node

Almost linear scalability with increasing SMTs

104/1/2016

Your logohere

Rice Genome

Analyzing small size (12GB) data

Eliminate the impact of network and disk I/O

7.5X performance improvement per server

114/1/2016

Your logohere

Bumble Bee Genome

Analyzing Medium size (90GB) Bumble Bee genome

7.5x improvement in terms of Performance/server

124/1/2016

Your logohere

Metagenome

Analyzing huge (3.2TB) metagenome data

Only 6.5 hours on 40-node IBM Power8 cluster

More than 9x improvement in terms of performance per server

134/1/2016

Your logohere

Graph Simplification with Distributed NoSQL

144/1/2016

TAGA:G,T

TAGT:C

TCCG:A

TCGA:G

TGAG:G

TTAG:A

TTTA:G

ACAG:N

AGAC:A

AGAG:A

AGAT:C

AGGC:T

AGTC:G

ATCC:G

ATGA:G

GACA:G

GAGA:C

GAGG:C

GATC:C

GATG:A

GCTT:T

GGCT:T

GTCG:A

CCGA:T

CGAG:G

CGAT:G

CTTT:A

TAGTCGAG GAGGCTTTAGA

Your logohere

Accelerating Simplification with IBM CAPI Flash

154/1/2016

NoSQL I/OThroughput(keys/sec)

CAPI Flash I/O Throughput (bytes/sec)

Only 20 Power8 Cores + CAPI : 500GB Graph traversal in

7.5 Hrs

Your logohere

Computational Challenges – The Next Step

Graph building is the most expensive phase in terms of time and resources

The Obvious Solutions: Either use a single machine with LOTS of memory, or run on a cluster.

Idea: Use CAPI accelerated flash instead of main memory

164/1/2016

Your logohere

Graph Construction on IBM CAPI Flash

174/1/2016

TAGTCGAGGCT

GGCTTTAGATC

TGAGGCTTTAG

Map

TTTAGAGACAG

GATCCGATGAG

TAGTCGAGGCT

GATC:C

GAGA:C

GACA:G

GATG:A

GTCG:A

GAGG:C

GGCT:N

GGCT:T

GTCG:A

GAGG:C

GGCT:N

GCTT:T

GATC:N

GAGG:C

GGCT:T

GCTT:T

AGAG:A

AGAC:A

ACAG:N

ATCC:G

CCGA:T

CGAT:G

ATGA:G

AGTC:G

CGAG:G

AGGC:T

AGTC:G

CGAG:G

AGGC:T

CTTT:A

AGAT:C

AGGC:T

CTTT:A

TTTA:G

TAGT:C

TTAG:A

TAGA:G

TCCG:A

TGAG:N

TCGA:G

TAGT:C

TGAG:G

TCGA:G

TTTA:G

TTAG:A

TAGA:T

TTTA:G

TTAG:N

Sort

GATC:C

GAGA:C

GACA:G

GATG:A

GTCG:A

GAGG:C

GGCT:N

GGCT:T

GTCG:A

GAGG:C

GGCT:N

GCTT:T

GATC:N

GAGG:C

GGCT:T

GCTT:T

GACA:G

GAGA:C

GAGG:C

GATC:C

GATG:A

GCTT:T

GGCT:T

GTCG:A

AGAG:A

AGAC:A

ACAG:N

ATCC:G

CCGA:T

CGAT:G

ATGA:G

AGTC:G

CGAG:G

AGGC:T

AGTC:G

CGAG:G

AGGC:T

CTTT:A

AGAT:C

AGGC:T

CTTT:A

TTTA:G

TAGT:C

TTAG:A

TAGA:G

TCCG:A

TGAG:N

TCGA:G

TAGT:C

TGAG:G

TCGA:G

TTTA:G

TTAG:A

TAGA:T

TTTA:G

TTAG:N

Sort

Sort

ACAG:N

AGAC:A

AGAG:A

AGAT:C

AGGC:T

AGTC:G

ATCC:G

ATGA:G

CCGA:T

CGAG:G

CGAT:G

CTTT:A

TAGA:G,T

TAGT:C

TCCG:A

TCGA:G

TGAG:G

TTAG:A

TTTA:G

NoSQL data engine APIs

Your logohere

Initial Results of Graph Construction

Compared 85GB bumblebee dataset on 8-node Hadoop cluster vs. a single node with CAPI-accelerated flash.

Hadoop Cluster (20 physical cores per node)• Peak memory usage of 60GB per datanode

• 1 HDD per datanode

• 1 hr 56 mins

CAPI Accelerated Flash server (20 physical cores)• Peak memory usage of 7 GB

• 1 HDD and 1 CAPI card

• 3 hrs 44 mins

184/1/2016

• Peak memory usage reduced by 60 times.

• Execution time reduced by 3.5 times per node.

Date post:	12-Aug-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Accelerating Genome Assembly with Power8 · 2/14/2016 · Your logo here Experimental Test Beds...

Documents