+ All Categories
Home > Documents > Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for...

Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for...

Date post: 31-May-2020
Category:
Upload: others
View: 11 times
Download: 0 times
Share this document with a friend
37
Efficient Longest Common Subsequence Computation using Bulk-Synchronous Parallelism Peter Krusche Department of Computer Science University of Warwick June 2006
Transcript
Page 1: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Efficient Longest Common SubsequenceComputation using Bulk-Synchronous

Parallelism

Peter Krusche

Department of Computer ScienceUniversity of Warwick

June 2006

Page 2: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Outline

1 IntroductionMotivationThe BSP Model

2 Problem Definition and AlgorithmsThe Standard AlgorithmStandard AlgorithmBit-Parallel AlgorithmThe Parallel Algorithm

3 ExperimentsExperiment SetupPredictionsSpeedup

Page 3: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Motivation

Computing the (Length of the) Longest CommonSubsequence is representative of a class of dynamicprogramming algorithms for string comparison. Hence,we want to

Start with a fast sequential algorithm.Examine the suitability of BSP as a programmingmodel for such problems.Compare different BSP libraries on differentsystems.Examine performance predictability.

Page 4: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

The BSP Computer

p identical processor/memory pairs (computingnodes), computation speed ƒ

Arbitrary interconnection network, latency ,bandwidth g

M M M M M

Network

P1 P2 Pp...

Page 5: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

BSP Programs

Programs are SPMDExecution takes place in supersteps

Communication may be delayed until the end of thesuperstepSeparates communication and computation

Cost/Running time Formula :

T = ƒ ·W + g ·H+ · S

Page 6: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

BSP Programs

Programs are SPMDExecution takes place in supersteps

Communication may be delayed until the end of thesuperstepSeparates communication and computation

Cost/Running time Formula :

T = ƒ ·W + g ·H+ · S

Page 7: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

BSP Programming

‘BSP-style’ programming using a conventionalcommunications library (MPI/Cray shmem/...)

Barrier synchronizations for creating superstep structure

Message passing or remote memory access forcommunication

Using a specialized library (The Oxford BSPToolset/PUB/CGMlib/...)

Optimized barrier synchronization functions andmessage routing

Higher level of abstraction / nicer looking code.

Page 8: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

BSP Programming

‘BSP-style’ programming using a conventionalcommunications library (MPI/Cray shmem/...)

Barrier synchronizations for creating superstep structure

Message passing or remote memory access forcommunication

Using a specialized library (The Oxford BSPToolset/PUB/CGMlib/...)

Optimized barrier synchronization functions andmessage routing

Higher level of abstraction / nicer looking code.

Page 9: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Previous Work

Daniel S. Hirschberg.A linear space algorithm for computing maximal commonsubsequences.Communications of the ACM, 18(6):341–343, 1975.

Maxime Crochemore, Costas S. Iliopoulos, Yoan J. Pinzon,and James F. Reid.A fast and practical bit-vector algorithm for the LongestCommon Subsequence problem.Information Processing Letters, 80(6):279–285, 2001.

C. E. R. Alves, E. N. Cáceres, and F. Dehne.Parallel dynamic programming for solving the stringediting problem on a CGM/BSP.In Proceedings of the 14th Annual ACM symposium onParallel Algorithms and Architectures, pp. 275–281,2002.

Page 10: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Previous Work

Daniel S. Hirschberg.A linear space algorithm for computing maximal commonsubsequences.Communications of the ACM, 18(6):341–343, 1975.

Maxime Crochemore, Costas S. Iliopoulos, Yoan J. Pinzon,and James F. Reid.A fast and practical bit-vector algorithm for the LongestCommon Subsequence problem.Information Processing Letters, 80(6):279–285, 2001.

C. E. R. Alves, E. N. Cáceres, and F. Dehne.Parallel dynamic programming for solving the stringediting problem on a CGM/BSP.In Proceedings of the 14th Annual ACM symposium onParallel Algorithms and Architectures, pp. 275–281,2002.

Page 11: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Previous Work

Daniel S. Hirschberg.A linear space algorithm for computing maximal commonsubsequences.Communications of the ACM, 18(6):341–343, 1975.

Maxime Crochemore, Costas S. Iliopoulos, Yoan J. Pinzon,and James F. Reid.A fast and practical bit-vector algorithm for the LongestCommon Subsequence problem.Information Processing Letters, 80(6):279–285, 2001.

C. E. R. Alves, E. N. Cáceres, and F. Dehne.Parallel dynamic programming for solving the stringediting problem on a CGM/BSP.In Proceedings of the 14th Annual ACM symposium onParallel Algorithms and Architectures, pp. 275–281,2002.

Page 12: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Problem Definition

Definition (Input data)Let X = 12 . . . m and Y = y1y2 . . . yn be two strings onan alphabet of constant size.

Definition (Subsequences)A Subsequence U of X: U can be obtained by deletingzero or more elements from X.

Definition (Longest Common Subsequences)A LCS (X, Y) is any string which is subsequence of bothX and Y and has maximum possible length. Length ofthese sequences: LLCS (X, Y).

Page 13: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Problem Definition

Definition (Input data)Let X = 12 . . . m and Y = y1y2 . . . yn be two strings onan alphabet of constant size.

Definition (Subsequences)A Subsequence U of X: U can be obtained by deletingzero or more elements from X.

Definition (Longest Common Subsequences)A LCS (X, Y) is any string which is subsequence of bothX and Y and has maximum possible length. Length ofthese sequences: LLCS (X, Y).

Page 14: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Problem Definition

Definition (Input data)Let X = 12 . . . m and Y = y1y2 . . . yn be two strings onan alphabet of constant size.

Definition (Subsequences)A Subsequence U of X: U can be obtained by deletingzero or more elements from X.

Definition (Longest Common Subsequences)A LCS (X, Y) is any string which is subsequence of bothX and Y and has maximum possible length. Length ofthese sequences: LLCS (X, Y).

Page 15: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

The Dynamic Programming Matrix

Definition (Matrix L0...m,0...n)

L,j =

0 if = 0 or j = 0,L−1,j−1 + 1 if = yj,m(L−1,j, L,j−1) if 6= yj .

Theorem (Hirschberg, ’75)L,j = LLCS(12 . . . , y1y2 . . . yj). The values in thismatrix can be computed in O(mn) time and space.

Page 16: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Bit-Parallel AlgorithmBit-parallel computation has same asymptoticcomplexity but processes ω entries of L in parallel (ω :machine word size).

Example (this leads to substantial speedup)

100 10 kSequence length [char]

1 M

100 M

10 G

char

ops

/s

Standard LLCSBit-parallel LLCS

Page 17: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

How does it work?

ΔL(, j) = L(, j)− L(− 1, j) ∈ {0,1}ΔL(, j) is computed columnwise usingmachine-word parallel operations :

s ΔL(, j) ← (s ΔL(, j) +(s ΔL(, j− 1) nd M(j)))or (s ΔL(, j− 1) nd(sM(j)))

M maps characters to bit strings of length m,

M(σ ∈ ) = 1⇔ = σ

Page 18: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

How does it work?

ΔL(, j) = L(, j)− L(− 1, j) ∈ {0,1}ΔL(, j) is computed columnwise usingmachine-word parallel operations :

s ΔL(, j) ← (s ΔL(, j) +(s ΔL(, j− 1) nd M(j)))or (s ΔL(, j− 1) nd(sM(j)))

M maps characters to bit strings of length m,

M(σ ∈ ) = 1⇔ = σ

Page 19: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

How does it work?

ΔL(, j) = L(, j)− L(− 1, j) ∈ {0,1}ΔL(, j) is computed columnwise usingmachine-word parallel operations :

s ΔL(, j) ← (s ΔL(, j) +(s ΔL(, j− 1) nd M(j)))or (s ΔL(, j− 1) nd(sM(j)))

M maps characters to bit strings of length m,

M(σ ∈ ) = 1⇔ = σ

Page 20: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

The Parallel Algorithm

Matrix L is partitioned into a grid ofrectangular blocks of size(m/G)× (n/G) (G : grid size)Blocks in a wavefront can beprocessed in parallelAssumptions:

Strings of equal length m = nRatio α = G

p is an integer

Page 21: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

The Parallel Algorithm

Matrix L is partitioned into a grid ofrectangular blocks of size(m/G)× (n/G) (G : grid size)Blocks in a wavefront can beprocessed in parallelAssumptions:

Strings of equal length m = nRatio α = G

p is an integer

Page 22: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

The Parallel Algorithm

Matrix L is partitioned into a grid ofrectangular blocks of size(m/G)× (n/G) (G : grid size)Blocks in a wavefront can beprocessed in parallelAssumptions:

Strings of equal length m = nRatio α = G

p is an integer

Page 23: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Parallel Cost Model

When G > p, there can be multiplestages for one block-wavefrontRunning time

T(α) = ƒ · (pα(α + 1)− α) ·�

n

αp

�2

+ g · α(αp− 1)�

n

αp

+ · (2αp− 1) · α

Page 24: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Experiments: Systems Used

aracari: IBM cluster, 2-way SMP Pentium3 1.4 GHznodes (Myrinet 2000)argus: Linux cluster, 2-way SMP Pentium4 Xeon 2.6GHz nodes (100Mbit Ethernet)skua: SGI Altix shared memory machine, Itanium-21.6 GHz processors

Page 25: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Experimental values of ƒ and ƒ ′

Simple Algorithm (ƒ )skua 0.008 ns/op 130 M op/sargus 0.016 ns/op 61 M op/saracari 0.012 ns/op 86 M op/s

Bit-Parallel Algorithm (ƒ ′)skua 0.00022 ns/op 4.5 G op/sargus 0.00034 ns/op 2.9 G op/saracari 0.00055 ns/op 1.8 G op/s

Page 26: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Prediction ResultsGood Results (LLCS) . . . (e.g. aracari MPI, 32

processors)

10000 20000 30000 40000 50000 60000String length

0.25

0.50

1.00

2.00

4.00

8.00

Run

ning

tim

e [s

] (lo

garit

hmic

)

α=1 err 4.44% α=2 err 3.95% α=3 err 4.52% α=4 err 4.49% α=5 err 5.35%prediction α=1prediction α=2prediction α=3prediction α=4prediction α=5

Page 27: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Good Predictions

. . . on all distributed memory systems,using both bit-parallel and standardalgorithm. . . on the shared memory system onlyfor larger problem sizes, and for thestandard algorithm

Page 28: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Prediction ResultsNot so good ones. . . (Oxtool, skua, 32 processors)

1e+05 2e+05 3e+05 4e+05 5e+05String length

0.01

0.04

0.20

1.00

5.00

Run

ning

tim

e [s

] (lo

garit

hmic

)

α=1 err 70.96% α=2 err 81.78% α=3 err 84.92% α=4 err 86.57% α=5 err 86.50%prediction α=1prediction α=2prediction α=3prediction α=4prediction α=5

Page 29: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

What happened?

Cache size effects prevent prediction of computationtime. . .

Sequentialcomputationperformance onskua

1 100 10 k 1 MSequence length [char]

10 M

100 M

1 G

10 G

char

ops

/s

Bit parallel LLCSLLCS4.4G char/s64M char/s130M char/s

Page 30: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Other Problems when PredictingPerformance

Setup costs only covered by parameter ⇒ difficult to measure⇒ Problems when communication size is small

PUB has performance break-in whencommunication size reaches a certain valueBusy communication network can create‘spikes’

Page 31: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Speedup for the Bit-ParallelVersion

Speedup lower than for the standard versionHowever, overall running times for sameproblem sizes are shorterCan only expect parallel speedup for largerproblem sizesLatency is problematic, as computation timesare low.

Page 32: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Result Summary

Oxtool PUB MPIShared memory (skua)

LLCS (standard) ••• • ••LLCS (bit-parallel) •• ••• •

Distributed memory, Ethernet (argus)

LLCS (standard) ••• •• •LLCS (bit-parallel) ••• •• •

Distributed memory, Myrinet (aracari)

LLCS (standard) ••• •• •LLCS (bit-parallel) •• ••◦ •

Page 33: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Result Summary

Oxtool PUB MPIShared memory (skua)

LLCS (standard) ••• • ••LLCS (bit-parallel) •• ••• •

Distributed memory, Ethernet (argus)

LLCS (standard) ••• •• •LLCS (bit-parallel) ••• •• •

Distributed memory, Myrinet (aracari)

LLCS (standard) ••• •• •LLCS (bit-parallel) •• ••◦ •

Page 34: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Result Summary

Oxtool PUB MPIShared memory (skua)

LLCS (standard) ••• • ••LLCS (bit-parallel) •• ••• •

Distributed memory, Ethernet (argus)

LLCS (standard) ••• •• •LLCS (bit-parallel) ••• •• •

Distributed memory, Myrinet (aracari)

LLCS (standard) ••• •• •LLCS (bit-parallel) •• ••◦ •

Page 35: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Summary

BSP algorithms are efficient for dynamicprogramming.Implementations benefit from a low latencyimplementation (Oxtool/PUB)Very good predictability

Page 36: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Outlook

Technical improvements

Different modeling of bandwidth allows betterpredictionsUsing assembly can double bit-parallel performanceLower latency possible by using subgroupsynchronization

Algorithmic improvements

Extraction of LCS possible, using post processingstep or other algorithmImplementation of all-substrings LLCS (which hasmany applications)Design and study of subquadratic algorithms

Page 37: Efficient Longest Common Subsequence Computation using ... · A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341–343, 1975.

Outlook

Technical improvements

Different modeling of bandwidth allows betterpredictionsUsing assembly can double bit-parallel performanceLower latency possible by using subgroupsynchronization

Algorithmic improvements

Extraction of LCS possible, using post processingstep or other algorithmImplementation of all-substrings LLCS (which hasmany applications)Design and study of subquadratic algorithms


Recommended