+ All Categories
Home > Documents > ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

Date post: 24-Dec-2015
Category:
Upload: ernest-gallagher
View: 219 times
Download: 3 times
Share this document with a friend
32
Tonglin Li, Xiaobing Zhou, Kevin Brandstatter, Dongfang Zhao, Ke Wang, Zhao Zhang, Ioan Raicu Illinois Institute of Technology, Chicago, U.S.A ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table
Transcript
Page 1: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

1

Tonglin Li, Xiaobing Zhou, Kevin Brandstatter, Dongfang Zhao, Ke Wang, Zhao Zhang, Ioan RaicuIllinois Institute of Technology, Chicago, U.S.A

ZHTA Fast, Reliable and Scalable Zero-hop Distributed Hash Table

Page 2: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

A supercomputer is a device for turning compute-bound problems into I/O-bound problems.

Ken Batcher

Page 3: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

3

Big problem: file systems scalability

Parallel file system (GPFS, PVFS, Lustre) Separated computing resource from

storage Centralized metadata management

Distributed file system(GFS, HDFS) Specific-purposed design (MapReduce

etc.) Centralized metadata management

Page 4: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

4

The bottleneck of file systems

MetadataConcurrent file creates

1

10

100

1,000

10,000

100,000

1 4 16 64 256 1024 4096 16384

Tim

e pe

r O

pera

tion

(ms)

Scale (# of Cores)

File Create (GPFS Many Dir)File Create (GPFS One Dir)

Page 5: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

5

Proposed work

A distributed hash table (DHT) for HEC

As building block for high performance distributed systems

Performance Latency Throughput

Scalability Reliability

Page 6: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

6

Related work: Distributed Hash Tables

Many DHTs: Chord, Kademlia, Pastry, Cassandra, C-MPI, Memcached, Dynamo ...

Why another?Name Impl. Routin

g TimePersiste

nce

Dynamic members

hip

Append Operati

onCassandra Java Log(N) Yes Yes NoC-MPI C Log(N) No No No

Dynamo Java 0 to Log(N) Yes Yes No

Memcached C 0 No No No

ZHT C++ 0 to 2 Yes Yes Yes

Page 7: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

7

Zero-hop hash mapping

Node1 Node

2...

NodenNode

n-1

Client 1 … n

hash

Key j

Value jReplica

1

hash

Key k

Value jReplica

2

Value jReplica

3

Value kReplica

1 Value kReplica

2

Value kReplica

3

Page 8: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

8

2-layer hashing

Page 9: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

9

Architecture and terms

Name space: 264

Physical node Manager ZHT Instance Partition: n

(fixed) n = max(k)

Instance

Manager

Update

Response to request

Partition

Instance

Partition

Responseto request

Broadcast

Physical node

Membership table

UUID(ZHT)KeyIPPortCapacityworkload

Page 10: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

10

How many partition per node can we do?

1 10 100 10000.6

0.620.640.660.68

0.70.720.740.760.78

Average latency

Number of partitions per instance

Late

ncy (

ms)

Page 11: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

11

Membership management

Static: Memcached, ZHT Dynamic

Logarithmic routing: most of DHTs Constant routing: ZHT

Page 12: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

12

Membership management

Update membership Incremental broadcasting

Remap k-v pairs Traditional DHTs: rehash all influenced

pairs ZHT: Moving whole partition

▪ HEC has fast local network!

Page 13: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

13

Consistency

Updating membership tables Planed nodes join and leave: strong

consistency Nodes fail: eventual consistency

Updating replicas Configurable Strong consistency: consistent, reliable Eventual consistency: fast, availability

Page 14: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

14

Persistence: NoVoHT

NoVoHT  persistent in-memory hash map Append operation Live-migration

1 million 10 million 100 million0

2

4

6

8

10

12

14

16

18

20 NoVoHT

NoVoHT (No persistence)

KyotoCabinet

BerkeleyDB

unordered_map

Scale( number of key/value pairs)

La

ten

cy

(m

icro

se

co

nd

s)

Page 15: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

15

Failure handling

Insert and append Send it to next replica Mark this record as primary copy

Lookup Get from next available replica

Remove Mark record on all replicas

Page 16: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

16

Evaluation: test beds

IBM Blue Gene/P supercomputer Up to 8192 nodes 32768 instance deployed

Commodity Cluster Up to 64 node

Amazon EC2 M1.medium and Cc2.8xlarge 96 VMs, 768 ZHT instances

deployed

Page 17: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

17

Latency on BG/P

0

0.5

1

1.5

2

2.5TCP without Connection Caching

TCP connection cachig

UDP

Memcached

Number of Nodes

La

ten

cy

(m

s)

Page 18: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

18

Latency distribution

SCALES 75% 90% 95% 99%64 713 853 961 1259

256 755 933 1097 18481024 820 1053 1289 3105

Page 19: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

19

Throughput on BG/P

1 2 4 8 16 32 64 128 256 512 1024 2048 4096 81921,000

10,000

100,000

1,000,000

10,000,000

TCP: no connection caching

ZHT: TCP connection caching

UDP non-blocking

Memcached

Scale (# of Nodes)

Th

rou

gh

pu

t (o

ps

/s)

Page 20: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

20

Aggregated throughput on BG/P

1 2 4 8 16 32 64 128 256 512 1024 2048 4096 81920

2,000,000

4,000,000

6,000,000

8,000,000

10,000,000

12,000,000

14,000,000

16,000,000

18,000,000

1 instances/node

2 instances/node

4 instances/node

8 instances/node

Number of Nodes

Th

rou

gh

pu

t (o

ps

/s)

Page 21: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

21

Latency on commodity cluster

1 2 4 8 16 32 640

0.5

1

1.5

2

2.5

3

ZHT

Cassandra

Memcached

Scale (# of nodes)

La

ten

cy

(m

s)

Page 22: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

22

ZHT on cloud: latency

1 2 4 8 16 32 64 960

2000

4000

6000

8000

10000

12000

14000

ZHT on m1.medium instance (1/node)

ZHT on cc2.8xlarge instance (8/node)

DynamoDB

Node number

Avera

ge late

ncy in

mic

ro

secon

ds

Page 23: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

ZHT on cloud: latency distribution

SCALES 75% 90% 95% 99% AVG THROUGHP

UT8 11942 13794 20491 35358 12169 83.39

32 10081 11324 12448 34173 9515 3363.11128 10735 12128 16091 37009 11104 11527512 9942 13664 30960 38077 28488 ERROR

SCALES 75% 90% 95% 99% AVG THROUGHP

UT8 186 199 214 260 172 46421

32 509 603 681 1114 426 75080

128 588 717 844 2071 542 236065

512 574 708 865 3568 608 841040

ZHT on cc2.8xlarge instance

8 s-c pair/instance

DynamoDB: 8 clients/instance

DynamoDB readDynamoDB write

ZHT 4 ~ 64 nodes

0.99

0.9

Page 24: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

24

ZHT on cloud: throughput

1 2 4 8 16 32 64 9610

100

1000

10000

100000

1000000

10000000

0

5

10

15

20

25

ZHT cost, m1ZHT cost, cc2DynamoDB cost (10k ops/s provision)

Node number

Ag

reg

gate

d t

hro

ug

hp

ut

op

s/se

con

d

Hou

rly c

ost

in

US

dollar

Page 25: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

25

Amortized cost

2 4 8 16 32 64 960.01

0.1

1

10

ZHT on m1.medium instance (1/node)

Hou

rly c

ost

for

1K

op

s/s

th

rou

gh

pu

t in

US

dollar

Page 26: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

26

Applications

FusionFS A distributed file system Metadata: ZHT

IStore A information dispersal storage system Metadata: ZHT

MATRIX A distributed many-Task computing execution

framework ZHT is used to submit tasks and monitor the

task execution status

Page 27: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

27

FusionFS result: Concurrent File Creates

1 2 4 8 16 32 64 128 256 5121

10

100

1000

FusionfsGPFS

Number of Nodes

Tim

e P

er

Op

era

tio

n (

ms)

Page 28: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

28

Istore results

0

100

200

300

400

500

600

8 16 32

Thro

ughp

ut (c

hunk

s/se

c)

Scale (# of Nodes)

1GB100MB10MB1MB100KB10KB

Page 29: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

29

MATRIX results

0

1000

2000

3000

4000

5000

6000

1 10 100 1000 10000

Th

rou

gh

pu

t (t

ask

s/s

ec)

Number of Processors

Falkon (Linux Cluster - C)Falkon (SiCortex)Falkon (BG/P)Falkon (Linux Cluster - Java)MATRIX (BG/P)

Page 30: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

30

Future work

Larger scale Active failure detection and

informing Spanning tree communication Network topology-aware routing Fully synchronized replicas and

membership: Paxos protocol More protocols support (UDT, MPI…) Many optimizations

Page 31: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

31

Conclusion

ZHT : A distributed Key-Value store light-weighted high performance Scalable Dynamic Fault tolerant Versatile: works from clusters, to clouds,

to supercomputers

Page 32: ZHT 1 A Fast, Reliable and Scalable Zero-hop Distributed Hash Table.

32

Questions?Tonglin Li

[email protected]://datasys.cs.iit.edu/projects/ZHT/


Recommended