+ All Categories
Home > Documents > In-memory Caching in HDFS Lower latency, same great taste

In-memory Caching in HDFS Lower latency, same great taste

Date post: 25-Feb-2016
Category:
Upload: kato
View: 62 times
Download: 2 times
Share this document with a friend
Description:
In-memory Caching in HDFS Lower latency, same great taste. Andrew Wang | [email protected] Colin McCabe | [email protected]. Query. Alice. Result set. Hadoop cluster. Alice. Fresh data. Fresh data. Alice. Rollup. Problems. Data hotspots - PowerPoint PPT Presentation
Popular Tags:
67
1 In-memory Caching in HDFS Lower latency, same great taste Andrew Wang | [email protected] Colin McCabe | [email protected]
Transcript
Page 1: In-memory  Caching in HDFS Lower  latency, same great taste

1

In-memory Caching in HDFSLower latency, same great tasteAndrew Wang | [email protected] McCabe | [email protected]

Page 2: In-memory  Caching in HDFS Lower  latency, same great taste

Alice

Hadoop cluster

Query

Result set

Page 3: In-memory  Caching in HDFS Lower  latency, same great taste

Alice

Fresh data

Page 4: In-memory  Caching in HDFS Lower  latency, same great taste

Fresh data

Page 5: In-memory  Caching in HDFS Lower  latency, same great taste

Alice

Rollup

Page 6: In-memory  Caching in HDFS Lower  latency, same great taste

6

Problems

• Data hotspots• Everyone wants to query some fresh data• Shared disks are unable to handle high load

• Mixed workloads• Data analyst making small point queries• Rollup job scanning all the data• Point query latency suffers because of I/O contention

• Same theme: disk I/O contention!

Page 7: In-memory  Caching in HDFS Lower  latency, same great taste

7

How do we solve I/O issues?

• Cache important datasets in memory!• Much higher throughput than disk• Fast random/concurrent access

• Interesting working sets often fit in cluster memory• Traces from Facebook’s Hive cluster

• Increasingly affordable to buy a lot of memory• Moore’s law• 1TB server is 40k on HP’s website

Page 8: In-memory  Caching in HDFS Lower  latency, same great taste

Alice

Page cache

Page 9: In-memory  Caching in HDFS Lower  latency, same great taste

Alice

Repeated query

?

Page 10: In-memory  Caching in HDFS Lower  latency, same great taste

Alice

Rollup

Page 11: In-memory  Caching in HDFS Lower  latency, same great taste

Alice

Extra copies

Page 12: In-memory  Caching in HDFS Lower  latency, same great taste

Alice

Checksum verification

Extra copies

Page 13: In-memory  Caching in HDFS Lower  latency, same great taste

13

Design considerations

1. Explicitly pin hot datasets in memory2. Place tasks for memory locality3. Zero overhead reads of cached data

Page 14: In-memory  Caching in HDFS Lower  latency, same great taste

14

Outline

• Implementation• NameNode and DataNode modifications• Zero-copy read API

• Evaluation• Microbenchmarks• MapReduce• Impala

• Future work

Page 15: In-memory  Caching in HDFS Lower  latency, same great taste

15

Outline

• Implementation• NameNode and DataNode modifications• Zero-copy read API

• Evaluation• Microbenchmarks• MapReduce• Impala

• Future work

Page 16: In-memory  Caching in HDFS Lower  latency, same great taste

16

Architecture

DataNode

The NameNode schedules which DataNodes cache each block of a file.

DataNode DataNode

NameNode

Page 17: In-memory  Caching in HDFS Lower  latency, same great taste

17

Architecture

NameNode

DataNode

DataNodes periodically send cache reports describing which replicas they have cached.

DataNode DataNode

Page 18: In-memory  Caching in HDFS Lower  latency, same great taste

18

Cache Locations API

• Clients can ask the NameNode where a file is cached via getFileBlockLocations

DataNode

DataNode DataNode

NameNode DFSClient

Page 19: In-memory  Caching in HDFS Lower  latency, same great taste

19

Cache Directives

• A cache directive describes a file or directory that should be cached• Path• Cache replication factor

• Stored permanently on the NameNode

• Also have cache pools for access control and quotas, but we won’t be covering that here

Page 20: In-memory  Caching in HDFS Lower  latency, same great taste

20

mlock

• The DataNode pins each cached block into the page cache using mlock.

• Because we’re using the page cache, the blocks don’t take up any space on the Java heap.

DataNode

Page Cache

DFSClientread

mlock

Page 21: In-memory  Caching in HDFS Lower  latency, same great taste

21

Zero-copy read API

• Clients can use the zero-copy read API to map the cached replica into their own address space

• The zero-copy API avoids the overhead of the read() and pread() system calls

• However, we don’t verify checksums when using the zero-copy API• The zero-copy API can be only used on cached data, or

when the application computes its own checksums.

Page 22: In-memory  Caching in HDFS Lower  latency, same great taste

22

Skipping Checksums

• We would like to skip checksum verification when reading cached data• DataNode already checksums when caching the block

• Requirements• Client needs to know that the replica is cached• DataNode needs to notify the client if the replica is

uncached

Page 23: In-memory  Caching in HDFS Lower  latency, same great taste

23

Skipping Checksums

• The DataNode and DFSClient use shared memory segments to communicate which blocks are cached.

DataNode

Page Cache

DFSClient

read

mlock

Shared MemorySegment

Page 24: In-memory  Caching in HDFS Lower  latency, same great taste

24

Outline

• Implementation• NameNode and DataNode modifications• Zero-copy read API

• Evaluation• Single-Node Microbenchmarks• MapReduce• Impala

• Future work

Page 25: In-memory  Caching in HDFS Lower  latency, same great taste

25

Test Cluster

• 5 Nodes• 1 NameNode• 4 DataNodes

• 48GB of RAM• Configured 38GB of HDFS cache per DN

• 11x SATA hard disks• 2x4 core 2.13 GHz Westmere Xeon processors• 10 Gbit/s full-bisection bandwidth network

Page 26: In-memory  Caching in HDFS Lower  latency, same great taste

26

Single-Node Microbenchmarks

• How much faster are cached and zero-copy reads?• Introducing vecsum (vector sum)

• Computes sums of a file of doubles• Highly optimized: uses SSE intrinsics• libhdfs program• Can toggle between various read methods

Page 27: In-memory  Caching in HDFS Lower  latency, same great taste

27

Throughput

TCP TCP no csums

SCR SCR no csums

ZCR0

1

2

3

4

5

6

7

0.8 0.9

1.92.4

5.9

GB/s

Page 28: In-memory  Caching in HDFS Lower  latency, same great taste

28

ZCR 1GB vs 20GB

1GB 20GB0

1

2

3

4

5

6

75.9

2.7GB/s

Page 29: In-memory  Caching in HDFS Lower  latency, same great taste

29

Throughput

• Skipping checksums matters more when going faster• ZCR gets close to bus bandwidth

• ~6GB/s• Need to reuse client-side mmaps for maximum perf

• page_fault function is 1.16% of cycles in 1G• 17.55% in 20G

Page 30: In-memory  Caching in HDFS Lower  latency, same great taste

30

Client CPU cycles

TCP TCP no csums

SCR SCR no csums

ZCR0

10

20

30

40

50

60

7057.6

51.8

27.123.4

12.7

CPU

cycl

es (b

illio

ns)

Page 31: In-memory  Caching in HDFS Lower  latency, same great taste

31

Why is ZCR more CPU-efficient?

Page 32: In-memory  Caching in HDFS Lower  latency, same great taste

32

Why is ZCR more CPU-efficient?

Page 33: In-memory  Caching in HDFS Lower  latency, same great taste

33

Remote Cached vs. Local Uncached

• Zero-copy is only possible for local cached data• Is it better to read from remote cache, or local disk?

Page 34: In-memory  Caching in HDFS Lower  latency, same great taste

34

Remote Cached vs. Local Uncached

TCP iperf SCR dd0

200

400

600

800

1000

1200

841

1092

125 137

MB/

s

Page 35: In-memory  Caching in HDFS Lower  latency, same great taste

35

Microbenchmark Conclusions

• Short-circuit reads need less CPU than TCP reads• ZCR is even more efficient, because it avoids a copy• ZCR goes much faster when re-reading the same

data, because it can avoid mmap page faults• Network and disk may be bottleneck for remote or

uncached reads

Page 36: In-memory  Caching in HDFS Lower  latency, same great taste

36

Outline

• Implementation• NameNode and DataNode modifications• Zero-copy read API

• Evaluation• Microbenchmarks• MapReduce• Impala

• Future work

Page 37: In-memory  Caching in HDFS Lower  latency, same great taste

37

MapReduce

• Started with example MR jobs• Wordcount• Grep

• Same 4 DN cluster• 38GB HDFS cache per DN• 11 disks per DN

• 17GB of Wikipedia text• Small enough to fit into cache at 3x replication

• Ran each job 10 times, took the average

Page 38: In-memory  Caching in HDFS Lower  latency, same great taste

38

wordcount and grep

wordcount

wordcount c

ached

grep

grep ca

ched

byteco

unt

byteco

unt cach

ed

byteco

unt-2G

byteco

unt-2G ca

ched

050

100150200250300350400

280

55

275

52

Page 39: In-memory  Caching in HDFS Lower  latency, same great taste

39

wordcount and grep

wordcount

wordcount c

ached

grep

grep ca

ched

byteco

unt

byteco

unt cach

ed

byteco

unt-2G

byteco

unt-2G ca

ched

050

100150200250300350400

280

55

275

52

Almost no speedup!

Page 40: In-memory  Caching in HDFS Lower  latency, same great taste

40

wordcount and grep

wordcount

wordcount c

ached

grep

grep ca

ched

byteco

unt

byteco

unt cach

ed

byteco

unt-2G

byteco

unt-2G ca

ched

050

100150200250300350400

280

55

275

52

~60MB/s

~330MB/s

Not I/O bound

Page 41: In-memory  Caching in HDFS Lower  latency, same great taste

41

wordcount and grep

• End-to-end latency barely changes• These MR jobs are simply not I/O bound!

• Best map phase throughput was about 330MB/s• 44 disks can theoretically do 4400MB/s

• Further reasoning• Long JVM startup and initialization time• Many copies in TextInputFormat, doesn’t use zero-copy• Caching input data doesn’t help reduce step

Page 42: In-memory  Caching in HDFS Lower  latency, same great taste

42

Introducing bytecount

• Trivial version of wordcount• Counts # of occurrences of byte values• Heavily CPU optimized

• Each mapper processes an entire block via ZCR• No additional copies• No record slop across block boundaries• Fast inner loop

• Very unrealistic job, but serves as a best case• Also tried 2GB block size to amortize startup costs

Page 43: In-memory  Caching in HDFS Lower  latency, same great taste

43

bytecount

grep

grep ca

ched

byteco

unt

byteco

unt cach

ed

byteco

unt-2G

byteco

unt-2G ca

ched

010203040506070

5545

585239 35

Page 44: In-memory  Caching in HDFS Lower  latency, same great taste

44

bytecount

grep

grep ca

ched

byteco

unt

byteco

unt cach

ed

byteco

unt-2G

byteco

unt-2G ca

ched

010203040506070

5545

585239 35

1.3x faster

Page 45: In-memory  Caching in HDFS Lower  latency, same great taste

45

bytecount

grep

grep ca

ched

byteco

unt

byteco

unt cach

ed

byteco

unt-2G

byteco

unt-2G ca

ched

010203040506070

5545

585239 35

Still only ~500MB/s

Page 46: In-memory  Caching in HDFS Lower  latency, same great taste

46

MapReduce Conclusions

• Many MR jobs will see marginal improvement• Startup costs• CPU inefficiencies• Shuffle and reduce steps

• Even bytecount sees only modest gains• 1.3x faster than disk• 500MB/s with caching and ZCR• Nowhere close to GB/s possible with memory

• Needs more work to take full advantage of caching!

Page 47: In-memory  Caching in HDFS Lower  latency, same great taste

47

Outline

• Implementation• NameNode and DataNode modifications• Zero-copy read API

• Evaluation• Microbenchmarks• MapReduce• Impala

• Future work

Page 48: In-memory  Caching in HDFS Lower  latency, same great taste

48

Impala Benchmarks

• Open-source OLAP database developed by Cloudera• Tested with Impala 1.3 (CDH 5.0)• Same 4 DN cluster as MR section

• 38GB of 48GB per DN configured as HDFS cache• 152GB aggregate HDFS cache• 11 disks per DN

Page 49: In-memory  Caching in HDFS Lower  latency, same great taste

49

Impala Benchmarks

• 1TB TPC-DS store_sales table, text format• count(*) on different numbers of partitions

• Has to scan all the data, no skipping• Queries

• 51GB small query (34% cache capacity)• 148GB big query (98% cache capacity)• Small query with concurrent workload

• Tested “cold” and “hot”• echo 3 > /proc/sys/vm/drop_caches• Lets us compare HDFS caching against page cache

Page 50: In-memory  Caching in HDFS Lower  latency, same great taste

50

Small Query

Uncached cold

Cached cold Uncached hot

Cached hot0

5

10

15

20

25

19.8

5.84.0 3.0

Aver

age

resp

onse

tim

e (s

)

Page 51: In-memory  Caching in HDFS Lower  latency, same great taste

51

Small Query

Uncached cold

Cached cold Uncached hot

Cached hot0

5

10

15

20

25

19.8

5.84.0 3.0

Aver

age

resp

onse

tim

e (s

)

2550 MB/s 17 GB/s

I/O bound!

Page 52: In-memory  Caching in HDFS Lower  latency, same great taste

52

Small Query

Uncached cold

Cached cold Uncached hot

Cached hot0

5

10

15

20

25

Aver

age

resp

onse

tim

e (s

)

3.4x faster,disk vs. memory

Page 53: In-memory  Caching in HDFS Lower  latency, same great taste

53

Small Query

Uncached cold

Cached cold Uncached hot

Cached hot0

5

10

15

20

25

Aver

age

resp

onse

tim

e (s

)

1.3x after warmup, still wins on CPU efficiency

Page 54: In-memory  Caching in HDFS Lower  latency, same great taste

54

Big Query

Uncached cold

Cached cold Uncached hot

Cached hot0

10

20

30

40

50

60

48.2

11.5

40.9

9.4

Aver

age

resp

onse

tim

e (s

)

Page 55: In-memory  Caching in HDFS Lower  latency, same great taste

55

Big Query

Uncached cold

Cached cold Uncached hot

Cached hot0

10

20

30

40

50

60

Aver

age

resp

onse

tim

e (s

)

4.2x faster, disk vs mem

Page 56: In-memory  Caching in HDFS Lower  latency, same great taste

56

Big Query

Uncached cold

Cached cold Uncached hot

Cached hot0

10

20

30

40

50

60

Aver

age

resp

onse

tim

e (s

)

4.3x faster, doesn’t fit in page cache

Cannot schedule for page cache locality

Page 57: In-memory  Caching in HDFS Lower  latency, same great taste

57

Small Query with Concurrent Workload

Uncached Cached Cached (not concurrent)

05

101520253035404550

Aver

age

resp

onse

tim

e (s

)

Page 58: In-memory  Caching in HDFS Lower  latency, same great taste

58

Small Query with Concurrent Workload

Uncached Cached Cached (not concurrent)

05

101520253035404550

Aver

age

resp

onse

tim

e (s

)

7x faster when small query working set is cached

Page 59: In-memory  Caching in HDFS Lower  latency, same great taste

59

Small Query with Concurrent Workload

Uncached Cached Cached (not concurrent)

05

101520253035404550

Aver

age

resp

onse

tim

e (s

)

2x slower than isolated, CPU contention

Page 60: In-memory  Caching in HDFS Lower  latency, same great taste

60

Impala Conclusions

• HDFS cache is faster than disk or page cache• ZCR is more efficient than SCR from page cache• Better when working set is approx. cluster memory

• Can schedule tasks for cache locality• Significantly better for concurrent workloads

• 7x faster when contending with a single background query• Impala performance will only improve

• Many CPU improvements on the roadmap

Page 61: In-memory  Caching in HDFS Lower  latency, same great taste

61

Outline

• Implementation• NameNode and DataNode modifications• Zero-copy read API

• Evaluation• Microbenchmarks• MapReduce• Impala

• Future work

Page 62: In-memory  Caching in HDFS Lower  latency, same great taste

62

Future Work

• Automatic cache replacement• LRU, LFU, ?

• Sub-block caching• Potentially important for automatic cache replacement

• Compression, encryption, serialization• Lose many benefits of zero-copy API

• Write-side caching• Enables Spark-like RDDs for all HDFS applications

Page 63: In-memory  Caching in HDFS Lower  latency, same great taste

63

Conclusion

• I/O contention is a problem for concurrent workloads• HDFS can now explicitly pin working sets into RAM• Applications can place their tasks for cache locality• Use zero-copy API to efficiently read cached data• Substantial performance improvements

• 6GB/s for single thread microbenchmark• 7x faster for concurrent Impala workload

Page 64: In-memory  Caching in HDFS Lower  latency, same great taste
Page 65: In-memory  Caching in HDFS Lower  latency, same great taste
Page 66: In-memory  Caching in HDFS Lower  latency, same great taste
Page 67: In-memory  Caching in HDFS Lower  latency, same great taste

67

bytecount

grep

grep ca

ched

byteco

unt

byteco

unt cach

ed

byteco

unt-2G

byteco

unt-2G ca

ched

010203040506070

5545

585239 35

Less disk parallelism


Recommended