+ All Categories
Home > Documents > GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla:...

GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla:...

Date post: 17-Sep-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
61
GenAx: A Genome Sequencing Accelerator Daichi Fujiki* Arun Subramaniyan* Tianjun Zhang* Yu Zeng Reetuparna Das David Blaauw Satish Narayanasamy * Equally contributed to the paper
Transcript
Page 1: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

GenAx:AGenomeSequencingAccelerator

Daichi Fujiki* Arun Subramaniyan* Tianjun Zhang* Yu ZengReetuparna Das David Blaauw Satish Narayanasamy

*Equallycontributedtothepaper

Page 2: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

Genomicsissettotransformmedicine

Population-basedtreatment Personalizedtreatment 2

Page 3: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

“Illuminasaysitcandelivera$100genome— soon”

Cost

1million$

1000$100$

2008 2018 2028

1000x

Genomesequencingcostshaveplummeted

3

Page 4: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

Portablesequencersarebecomingcommonplace

4

Page 5: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

Sequenceanalysishasseveralcomputationalsteps

HumanGenome3Gbases

ATCGTGCAGTGTGCATCTACCAGTACATCGATCGTGCTAC

Sequencedreads(~billions)

ReadAlignment

Referencegenome

Read

ATCGTGCAGTTTCGTGAAG

GAAGTTTATTCGTA

CGTAAGT

VariantCalling

Alignedreads

Referencegenome

Diagnosis

SecondaryAnalysis

(350genomes/week)permachine

(5.6genomes/week)perserver 5

Page 6: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

~300CPUhours

[1]Li,Heng.“Aligningsequencereads,clonesequencesandassemblycontigswithBWA-MEM.”PlatinumGenomedataset.IlluminaHiSeq 2000reads,Run:ERR194147(~50xcoverage)

Readalignmentisamajorbottleneckinsequenceanalysis

ReadAlignment

Referencegenome

Read

BWA-MEM1Seeding

Seed

Seedextension

Seed

6

*Qualitystandard

Page 7: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

Seeding– aFiltrationStep

Read

ReferenceGenome

Seed

AATA

AATA AATA AATA

0 52 103 512

7

Page 8: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

SeedExtension

Read

ReferenceGenome

Seed

AATA

AATA AATA AATA

0 52 103 512

8

CandidateReferenceStrings Score

Page 9: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

SeedExtension

Read

ReferenceGenome

Seed

AATA

AATA AATA AATA

0 52 103 512

9

GAATA-CTA-AATTTAT

G--AATA-C---TTTAT

AAATACCTAAAATTTAT

CandidateReferenceStrings Score

15

11

17

AAATACCTAAAATTTATRead

Page 10: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

Levenshtein (edit)distance:minimumnumberofedits(insertions,substitutions,deletions)requiredtoperfectlymatchtheread(orquerystringQ)andreferencestringR

Reference(R)

ReadorQuery(Q)

CATCGA– CGTAGAT

CA– CGAA CC TATAT

x x

del ins sub sub

Editdistance=4

Seedextensionasapproximatestringmatching

CA– CGAACC TAT AT

10

Page 11: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

GenomeSequencing– AlignmentMethodology

SmithWatermanMatrix

©Wikimediacommons

11

Page 12: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

GenomeSequencing– AlignmentMethodology

SmithWatermanMatrix 12

Page 13: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

GenomeSequencing– AlignmentMethodology

SmithWatermanMatrix

O(n2)

nn

13

Page 14: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

GenomeSequencing– AlignmentMethodology

Banded SmithWatermanMatrix

O(kn)

nn

14

Page 15: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

GenomeSequencing– AlignmentMethodology

Levenshtein Automata

AcceptableEditDistancek

A G C

A G C

ins

sub

del

n

Banded SmithWatermanMatrix

O(kn)

nn

15

Page 16: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

GenomeSequencing– AlignmentMethodology

Levenshtein Automata

AcceptableEditDistancek

A G C

A G C

ins

sub

del

A G C

ins

sub

del

O(kn)

n

Banded SmithWatermanMatrix

O(kn)

nn

16

Page 17: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

GenomeSequencing– AlignmentMethodology

Levenshtein Automata

Stringdependent

O(kn)

Banded SmithWatermanMatrix

O(kn)

nn

17

Page 18: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

GenomeSequencing– AlignmentMethodology

Stringindependent O(k2)

Banded SmithWatermanMatrix

O(kn)

nn

18

Silla:StringIndependentLocalLevenshtein Automata

1,00,0

del

insmatch

2,0

1,10,1

0,2

K=1

K=2Di,d = R[c-i] XNOR Q[c-d]

NewAutomatonAlgorithm

Page 19: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

AlgorithmContribution HardwareImplementation

Silla:StringIndependentLocalLevenshtein Automata

1,00,0

del

insmatch

2,0

1,10,1

0,2

K=1

K=2Di,d = R[c-i] XNOR Q[c-d]

SillaX:SillaAcceleratorforGenomeSequencing

Hardw

areOptimization

SMEM+Hashbasedseedingalgorithm

A G T A A T G C C A T G

A G T A A T G C C A T T

Binarysearch

SMEM+HashbasedSeedingAccelerator

IndexTable

PositionTable

512entryCAM

512entryCAM

512entryCAM

512entryCAM

512entryCAM

Segmenting

19

Page 20: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

AlgorithmContribution HardwareImplementation

Silla:StringIndependentLocalLevenshtein Automata

1,00,0

del

insmatch

2,0

1,10,1

0,2

K=1

K=2Di,d = R[c-i] XNOR Q[c-d]

SillaX:SillaAcceleratorforGenomeSequencing

Hardw

areOptimization

SMEM+Hashbasedseedingalgorithm

A G T A A T G C C A T G

A G T A A T G C C A T T

Binarysearch

SMEM+HashbasedSeedingAccelerator

IndexTable

PositionTable

512entryCAM

512entryCAM

512entryCAM

512entryCAM

512entryCAM

Segmenting

20

Page 21: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

Indel Silla

StringIndependent

A G T A A T G C C A T TReference

A G T A A T A C C A T TQuery

Cyclecd

i

1,00,0

del

insmatch

2,0

1,10,1

0,2

K=1

K=2

Di,d = (R[c-i] == Q[c-d])

K

Silla:StringIndependentLocalLevenshtein Automaton 21

Page 22: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

StringIndependent

T A A T G C C A T T

T A A T G C C A T T

Cyclec

0,0

D0,0

D0,0

D0,0 del

insmatch

Indel Silla

Silla:StringIndependentLocalLevenshtein Automaton 22

Reference

Query

Exactmatch

Page 23: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

StringIndependent

T A A T G C C A T T

T A A T G C C A T T

Cyclec

0,0

D0,0

D0,0

D0,0 del

insmatch

Indel Silla

Silla:StringIndependentLocalLevenshtein Automaton 23

0,0 1,0

D0,0

D0,0

D0,0 del

insmatch

Reference

Query

T A T G C C A T T A

T A A T G C C A T T

×

insertion

Exactmatch Insertion

Page 24: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

StringIndependent

T A A T G C C A T T

T A A T G C C A T T

Cyclec

0,0

D0,0

D0,0

D0,0 del

insmatch

Indel Silla

Silla:StringIndependentLocalLevenshtein Automaton 24

0,0 1,0

D0,0

D0,0

D0,0 del

insmatch 0,0

0,1

1,0

D0,0

D0,0

D0,0 del

insmatch

Reference

Query

T A T G C C A T T A

T A A T G C C A T T

insertion

T A A G C C A T T A

T A G C C A T T A G

×

deletion

Exactmatch Insertion Deletion

Page 25: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

Indel Silla

StringIndependent

A G T A A T G C C A T TReference

A G T A A T A C C A T TQuery

Cyclecd

i

1,00,0

del

insmatch

2,0

1,10,1

0,2

K=1

K=2

Silla:StringIndependentLocalLevenshtein Automaton

D1,0D2,0

i =4 3 2 1

Di,d = (R[c-i] == Q[c-d])

A G T A A T G C C A T T

A G T A A T A C C A T T

Cyclecd

i

1,00,0

del

insmatch

2,0

1,10,1

0,2

K=1

K=2

D0,1D0,2

d=4 3 2 1

Di,d = (R[c-i] == Q[c-d])

25

d=4 3 2 1

i =4 3 2 1

Editdistance

↓↓

Editdistance

↓↓

Page 26: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

StringIndependent

A G T A A G C C A T T AReference

A G T A T G C C A T T AQuery

Cyclecd

i

D0,0|0 = false

Keyobservation3DSilla≅ 2layerSilla

D0,0|0 à D0,0|1

3DSilla

Silla:StringIndependentLocalLevenshtein Automaton 26

O(k3)

Page 27: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

StringIndependent

A G T A A G C C A T T AReference

A G T A T G C C A T T AQuery

Cyclecd

i

D0,0|0

Canwemergethesenodes?

D1,1|0 = D0,0|2

Keyobservation(2)D1,1|0willseethesamecharactersasD0,0|2 inthefuture

Collapsing3DSilla

Silla:StringIndependentLocalLevenshtein Automaton 27

Keyobservation3DSilla≅ 2layerSilla

Page 28: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

StringIndependent

A G T A A G C C A T T AReference

A G T A T G C C A T T AQuery

Cyclecd

i

D0,0|0

Canwemergethesenodes?

D1,1|0 = D0,0|2 (t = c)

Keyobservation(2)D1,1|0willseethesamecharactersasD0,0|2 inthefuture

Collapsing3DSilla

Silla:StringIndependentLocalLevenshtein Automaton

c+1

(t=c+1)

28

Keyobservation3DSilla≅ 2layerSilla

Page 29: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

StringIndependent

Canwemergethesenodes?

Keyobservation(2)D1,1|0willseethesamecharactersasD0,0|2 inthefuture

Collapsing3DSilla

Silla:StringIndependentLocalLevenshtein Automaton

à InsertWaitnode i,dw

iA G T A A G C C A T T AReference

A G T A T G C C A T T AQuery

Cyclecd

i

D0,0|0D1,1|0

+1cycle

= D0,0|1 = D0,0|2

29

Keyobservation3DSilla≅ 2layerSilla

Page 30: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

StringIndependent

Canwemergethesenodes?

Keyobservation(2)D1,1|0willseethesamecharactersasD0,0|2 inthefuture

Collapsing3DSilla

Silla:StringIndependentLocalLevenshtein Automaton

à InsertWaitnode i,dw

iA G T A A G C C A T T AReference

A G T A T G C C A T T AQuery

Cyclecd

i

D0,0|0D1,1|0

+1cycle

= D0,0|1 = D0,0|2

30

Keyobservation3DSilla≅ 2layerSilla

O(k2)

Page 31: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

StringIndependent

Canwemergethesenodes?

Keyobservation(2)D1,1|0willseethesamecharactersasD0,0|2 inthefuture

Collapsing3DSilla

Silla:StringIndependentLocalLevenshtein Automaton

à InsertWaitnode i,dw

iA G T A A G C C A T T AReference

A G T A T G C C A T T AQuery

Cyclecd

i

D0,0|0D1,1|0

+1cycle

= D0,0|1 = D0,0|2

i,d0

i+1,d0

Di,d

del

insmatch

i,d+10

i+1,d+10

sub

i,d1

i+1,d1

Di,d

del

insmatch

i,d+11

i+1,d+11

sub i,dw

31

Keyobservation3DSilla≅ 2layerSilla

O(k2)

Page 32: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

Local

Silla:StringIndependentLocalLevenshtein Automaton 32

Page 33: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

Local

A G T A A G C C A T T AReference

A G T A T G C C A T T AQuery

Cyclecd

i

D0,0

i,d0

i+1,d0

Di,d

del

insmatch

i,d+10

i+1,d+10

sub

i,d1

i+1,d1

del

insmatch

i,d+11

i+1,d+11

sub i,dw

D1,1

sub

PipelinedDatapaths

Silla:StringIndependentLocalLevenshtein Automaton

ProblemstatementSomenodesrequirelongwiresformcomparators/othernodes

Comparators

33

D1,1

Page 34: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

Local

A G T A A G C C A T T AReference

A G T A T G C C A T T AQuery

Cyclecd

i

D0,0

i,d0

i+1,d0

Di,d

del

insmatch

i,d+10

i+1,d+10

sub

i,d1

i+1,d1

Di,d

del

insmatch

i,d+11

i+1,d+11

sub i,dwA G T A A G C C A T T AReference

A G T A T G C C A T T AQueryCyclec+1

D0,0(t=c) = D1,1(t=c+1) = D2,2(t=c+2)= ...

sub

Di+1,d+1

D1,1 D0,0

PipelinedDatapaths

Silla:StringIndependentLocalLevenshtein Automaton 34

Comparators

Page 35: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

Local

A G T A A G C C A T T AReference

A G T A T G C C A T T AQuery

Cyclecd

i

D0,0

i,d0

i+1,d0

Di,d

del

insmatch

i,d+10

i+1,d+10

sub

i,d1

i+1,d1

Di,d (t=c)

del

insmatch

i,d+11

i+1,d+11

sub i,dwA G T A A G C C A T T AReference

A G T A T G C C A T T AQueryCyclec+1

sub

D0,0(t=c) = D1,1(t=c+1) = D2,2(t=c+2)= ...

D1,1 D0,0

PipelinedDatapaths

Silla:StringIndependentLocalLevenshtein Automaton 35

×

Comparators

Page 36: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

Local

A G T A A G C C A T T AReference

A G T A T G C C A T T AQuery

Cyclecd

i

D0,0

i,d0

i+1,d0

Di,d

del

insmatch

i,d+10

i+1,d+10

sub

i,d1

i+1,d1

del

insmatch

i,d+11

Di+1,d+1 (t=c+1)

i+1,d+11

Di,d (t=c+1)

sub i,dwA G T A A G C C A T T AReference

A G T A T G C C A T T AQueryCyclec+1

sub

D0,0(t=c) = D1,1(t=c+1) = D2,2(t=c+2)= ...

D1,1 D0,0

PipelinedDatapaths

Silla:StringIndependentLocalLevenshtein Automaton 36

×

Comparators

Page 37: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

Local

A G T A A G C C A T T AReference

A G T A T G C C A T T AQuery

Cyclecd

i

D0,0

i,d0

i+1,d0

Di,d

del

insmatch

i,d+10

i+1,d+10

sub Di,d (t=c+2)

i,d1

i+1,d1

del

insmatch

Di+1,d+1 (t=c+2)

i,d+11

i+1,d+11

sub i,dwA G T A A G C C A T T AReference

A G T A T G C C A T T AQueryCyclec+1

sub

D0,0(t=c) = D1,1(t=c+1) = D2,2(t=c+2)= ...

D1,1 D0,0

PipelinedDatapaths

Silla:StringIndependentLocalLevenshtein Automaton 37

×

Comparators

×

D1,1

Page 38: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

HardwareImplementation

GenAx:AGenomeSequencingAccelerator

1,00,0

del

insmatch

2,0

1,10,1

0,2

Query

Reference

SMEM+HashbasedSeedingAccelerator

IndexTable

PositionTable

512entryCAM

512entryCAM

512entryCAM

512entryCAM

512entryCAM

Segmenting

In-placeTraceback

AffinegapScoring

Composability

SillaX:SeedextensionacceleratorSeedingmachine

Silla

38

Page 39: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

HardwareImplementation

In-placeTraceback

TracebackRecap

39

Page 40: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

HardwareImplementation

In-placeTraceback

NoExternalTracebackMemoryTraceback InformationisstoredinnodesGreedyapproach

StringMatchingPhaseBestscore ispropagatedTraceback PointeriskeptinnodesTraceback Pointerisupdatedwithbetterscore

Traceback PhasePointerTrailingfromthenodewithbestscore

Traceback Machine

40

Page 41: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

HardwareImplementation

In-placeTraceback

NoExternalTracebackMemoryTraceback InformationisstoredinnodesGreedyapproach

StringMatchingPhaseBestscore ispropagatedTraceback PointeriskeptinnodesTraceback Pointerisupdatedwithbetterscore

Traceback PhasePointerTrailingfromthenodewithbestscore

Traceback Machine

k=3

k=2

k=1

41

Page 42: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

HardwareImplementation

Traceback Machine

In-placeTraceback

NoExternalTracebackMemoryTraceback InformationisstoredinnodesGreedyapproach

StringMatchingPhaseBestscore ispropagatedTraceback PointeriskeptinnodesTraceback Pointerisupdatedwithbetterscore

Traceback PhasePointerTrailingfromthenodewithbestscore

42

Page 43: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

HardwareImplementation

BrokenPointerTrail

In-placeTraceback

BrokenPointerTrailPreviousbestscoreisupdatedandbreaksthepath

Rerun themachinetillthecycle7.6% ofreadsrequirererun

43

Page 44: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

HardwareImplementation

BrokenPointerTrail

In-placeTraceback

×

BrokenPointerTrailPreviousbestscoreisupdatedandbreaksthepath

Rerun themachinetillthecycle7.6% ofreadsrequirererun

44

Page 45: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

HardwareImplementation

BrokenPointerTrail

In-placeTraceback

Re-run

BrokenPointerTrailPreviousbestscoreisupdatedandbreaksthepath

Rerun themachinetillthecycle7.6% ofreadsrequirererun

45

Page 46: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

HardwareImplementation

BrokenPointerTrail

In-placeTraceback

BrokenPointerTrailPreviousbestscoreisupdatedandbreaksthepath

Rerun themachinetillthecycle7.6% ofreadsrequirererun

46

Page 47: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

HardwareImplementation

GenAx:AGenomeSequencingAccelerator

1,00,0

del

insmatch

2,0

1,10,1

0,2

Query

Reference

SMEM+HashbasedSeedingAccelerator

IndexTable

PositionTable

512entryCAM

512entryCAM

512entryCAM

512entryCAM

512entryCAM

Segmenting

In-placeTraceback

AffinegapScoring

Composability

SillaXSeedingmachine

Silla

47

Page 48: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

HardwareImplementation

ComposabilityAffinegapScoring

ScoreMatch +1

Mismatch -4GapOpening -7GapExtension -1

M M M I I M I I I M MTrace1

M M M M I I I I I M MTrace2

-7-1 -7-1-1

-7-1-1-1-1GOOD

BAD

Editdistance=5

Editdistance=5

Score=-11

Score=-5

48

Page 49: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

HardwareImplementation

AffinegapScoring

ScoreMatch +1

Mismatch -4GapOpening -7GapExtension -1

M M M I I M I I I M MTrace1

M M M M I I I I I M MTrace2

-7-1 -7-1-1

-7-1-1-1-1

Editdistance=5

Editdistance=5

Score=-11

Score=-5

CurrentScore

insScore

delScore

max(cur– go,ins- ge)

max(cur– go,del- ge)

cur+(matchormismatch)

cur+(matchormismatch)

MAX

Gapopening

Gapextension

GOOD

BAD

49

Node(i,d)

(i-1,d-1)(i-1,d)(i,d-1)

Page 50: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

HardwareImplementation

Composabletolargeeditdistances

Composability

R

Q

k

Q

Q

R

Q

R

R

k

k

k

Q

R2k

AffinegapScoring

ScoreMatch +1

Mismatch -4GapOpening -7GapExtension -1

M M M I I M I I I M MTrace1

M M M M I I I I I M MTrace2

-7-1 -7-1-1

-7-1-1-1-1GOOD

BAD

Editdistance=5

Editdistance=5

Score=-11

Score=-5

50

Page 51: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

HardwareImplementation

GenAx:AGenomeSequencingAccelerator

1,00,0

del

insmatch

2,0

1,10,1

0,2

Query

Reference

SMEM+HashbasedSeedingAccelerator

IndexTable

PositionTable

512entryCAM

512entryCAM

512entryCAM

512entryCAM

512entryCAM

Segmenting

In-placeTraceback

AffinegapScoring

Composability

SillaXSeedingmachine

51

Page 52: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

Identifyingsuper-maximalexactmatchesforaread

10 100 104 390 394

AGTAATGCCATG

Reference

0

Read

IndexTable

AGTA10100 390

PositionTableH1 = { 10, 100, 390 } 10

390100

k-mer-1 hits

CAM

52

Page 53: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

Identifyingsuper-maximalexactmatchesforaread

0 10 30 100 104 310 390 394

Reference

Read0 4

IndexTable

ATGC30104 394

PositionTable

AGTAATGCCATG

10

390100

390, 100, 26

k-mer-2 hits

k-mer-1 hits

CAMxH1 = { 10, 100, 390 }

H2 = { 26, 100, 390 }∩

53

Page 54: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

Identifyingsuper-maximalexactmatchesforaread

0 10 30 34 100 104 108 310 390 394

Reference

Read0 4 8

IndexTable

CATG34108

PositionTable

AGTAATGCCATG

H1 = { 10, 100, 390 }

H3 = { 26, 100 }

H2 = { 26, 100, 390 }∩∩

10

390100

k-mer-3 hits

k-mer-1 hits

CAMx

x100, 26

SMEM

SMEM = { 100 }54

Page 55: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

Seedingimplementation:Keyideas

1 Binary search-based intersection for frequent k-mers

Intersectingmhitsofk-mer-1withnhitsofk-mer-2

104 394

Position list for each k-mer is sorted

10

504 750 }

Position Table

950,504,394,104 n hits (n > 500)

m hits

O ( m log n ) steps

55

Page 56: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

Seedingimplementation:Keyideas

Read

AAAAAAAAAAAAAAGTAATGCCATGATGCCGTATGAATGCAAGT

1002511000000 # Hits

2 Probing: Intersect from k-mer with minimum number of hits

∩ ∩1 2

x

Read

AAAAAAAAAAAAAAGTAATGCCATGATGCCGTATGAATGCAAGT

1002511000000 # Hits∩ ∩12 56

Page 57: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

Methodology- Input

ReferenceGenome:GRCh38(HumanGenome)fromUCSCgenomebrowserInputReads:IlluminaPlatinumGenomes(50x,787Mreads,101bp)

- Baseline

CPU:BWA-MEMonIntelXeonE5-2697(2.6GHz,56threads)+128GBDDR4GPU:CUSHAW2-GPUonNVIDIATitanXp (1.58GHz,3840cores)+12GBGDDR5X

- SillaX configuration

Synthesis:SynopsisDesignCompiler,28nmprocess,2GHzà 5.64mm2,6.6WBandwidth(K):40

- Seedingmachineconfiguration

K-mer size:12Segmenting:512segments,48MBindextableand18MBpositiontable

57

Page 58: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

Performance(Throughput/Power)

1

10

100

1000

10000

100000

Illumina100

Throughput(Khits/s) SillaX CPU GPU

020406080100120140

BWA-MEM CUSHAW2GPU

GenAx

Avg.Power(W

)

128K 56K

4,058K

5001,0001,5002,0002,5003,0003,5004,0004,500

BWA-MEM CUSHAW2GPU

GenAxThroughput(Kreads/sec)

GenAx SillaX

BaselineCPU:SeqAn LibraryGPU:SW#

58

31.7x

12x

63xoverCPU5000xoverGPU

Page 59: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

ConclusionContributions

Silla – anovelautomatonalgorithmforapproximatestringmatchingo O(k2)complexityo Naturallymapstosystolicarray/automatonaccelerator

SillaX – aseedextensionacceleratoro Affinegap+Traceback +Composable

GenAx – areadalignmentacceleratoro Drop-inreplacementofBWA-MEM

Results31.7x speedupoverBWA-MEMon56-threadXeonprocessor12x powerreduction5.6x areareduction

59

Page 60: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

UniversityofMichiganPrecisionHealthInitiative

60

“Discoverthegenetic,lifestyleandenvironmentalfactorsthatinfluenceapopulation’shealthandprovidespersonalizedsolutionsthatallowindividualstoimprovetheirhealthandwellness.”

Page 61: GenAx: A Genome Sequencing AcceleratorKey observation 3D Silla ≅2 layer Silla O(k2) Local Silla: String Independent Local LevenshteinAutomaton 32. Local Reference A G T A AG C CA

GenAx:AGenomeSequencingAcceleratorDaichi Fujiki Arun Subramaniyan Tianjun Zhang Yu Zeng

Reetuparna Das David Blaauw Satish Narayanasamy

Thank you.Any questions?


Recommended