+ All Categories
Home > Technology > Debunking the Myths of HDFS Erasure Coding Performance

Debunking the Myths of HDFS Erasure Coding Performance

Date post: 16-Apr-2017
Category:
Upload: dataworks-summithadoop-summit
View: 1,264 times
Download: 3 times
Share this document with a friend
39
Debunking the Myths of HDFS Erasure Coding Performance
Transcript
Page 1: Debunking the Myths of HDFS Erasure Coding Performance

Debunking the Myths ofHDFS Erasure Coding Performance

Page 2: Debunking the Myths of HDFS Erasure Coding Performance

HDFS inherits 3-way replication from Google File System- Simple, scalable and robust

200% storage overhead Secondary replicas rarely accessed

Replication is Expensive

Page 3: Debunking the Myths of HDFS Erasure Coding Performance

Erasure Coding Saves Storage Simplified Example: storing 2 bits

Same data durability- can lose any 1 bit

Half the storage overhead Slower recovery

1 01 0Replication:XOR Coding: 1 0⊕ 1=

2 extra bits1 extra bit

Page 4: Debunking the Myths of HDFS Erasure Coding Performance

Erasure Coding Saves Storage Facebook

- f4 stores 65PB of BLOBs in EC Windows Azure Storage (WAS)

- A PB of new data every 1~2 days- All “sealed” data stored in EC

Google File System- Large portion of data stored in EC

Page 5: Debunking the Myths of HDFS Erasure Coding Performance

Roadmap Background of EC

- Redundancy Theory- EC in Distributed Storage Systems

HDFS-EC architecture Hardware-accelerated Codec Framework Performance Evaluation

Page 6: Debunking the Myths of HDFS Erasure Coding Performance

Durability and EfficiencyData Durability = How many simultaneous failures can be tolerated?

Storage Efficiency = How much portion of storage is for useful data?

useful data

3-way Replication: Data Durability = 2

Storage Efficiency = 1/3 (33%)

redundant data

Page 7: Debunking the Myths of HDFS Erasure Coding Performance

Durability and EfficiencyData Durability = How many simultaneous failures can be tolerated?

Storage Efficiency = How much portion of storage is for useful data?

XOR:Data Durability = 1

Storage Efficiency = 2/3 (67%)

useful data redundant data

X Y X Y⊕0 0 00 1 11 0 11 1 0

Y = 0 1 = 1⊕

Page 8: Debunking the Myths of HDFS Erasure Coding Performance

Durability and EfficiencyData Durability = How many simultaneous failures can be tolerated?

Storage Efficiency = How much portion of storage is for useful data?

Reed-Solomon (RS):Data Durability = 2

Storage Efficiency = 4/6 (67%)Very flexible!

Page 9: Debunking the Myths of HDFS Erasure Coding Performance

Durability and EfficiencyData Durability = How many simultaneous failures can be tolerated?

Storage Efficiency = How much portion of storage is for useful data?

Data Durability Storage Efficiency Single Replica 0 100%3-way Replication 2 33%XOR with 6 data cells 1 86%RS (6,3) 3 67%RS (10,4) 4 71%

Page 10: Debunking the Myths of HDFS Erasure Coding Performance

EC in Distributed StorageBlock Layout:

Data Locality 👍🏻Small Files 👎🏻

128~256MFile 0~128M … 640~768M

0~128M

bloc

k 0

DataNode 0

128~256M

bloc

k 1

DataNode 1

0~128M 128~256M

… 640~768M

bloc

k 5

DataNode 5 DataNode 6

parity

Contiguous Layout:

Page 11: Debunking the Myths of HDFS Erasure Coding Performance

EC in Distributed StorageBlock Layout:

File

bloc

k 0

DataNode 0

bloc

k 1

DataNode 1

…bl

ock

5

DataNode 5 DataNode 6

parity

Striped Layout:0~1M 1~2M 5~6M6~7M

Data Locality 👎🏻

Small Files 👍🏻Parallel I/O 👍🏻

0~128M 128~256M

Page 12: Debunking the Myths of HDFS Erasure Coding Performance

EC in Distributed Storage

Spectrum:

Replication ErasureCoding

Striping

Contiguous

Ceph

Ceph

Quancast File System

Quancast File System

HDFS Facebook f4Windows Azure

Page 13: Debunking the Myths of HDFS Erasure Coding Performance

Roadmap Background of EC

- Redundancy Theory- EC in Distributed Storage Systems

HDFS-EC architecture Hardware-accelerated Codec Framework Performance Evaluation

Page 14: Debunking the Myths of HDFS Erasure Coding Performance

Choosing Block Layout• Medium: 1~6 blocks• Small files: < 1 block• Assuming (6,3) coding • Large: > 6 blocks (1 group)

96.29%

1.86% 1.85%

26.06%

9.33%

64.61%

small medium large

file count

space usage

Top 2% files occupy ~65% space

Cluster A Profile

86.59%

11.38%2.03%

23.89%36.03% 40.08%

file count

space usage

Top 2% files occupy ~40% space

small medium large

Cluster B Profile

99.64%

0.36% 0.00%

76.05%

20.75%

3.20%

file count

space usage

Dominated by small files

small medium large

Cluster C Profile

Page 15: Debunking the Myths of HDFS Erasure Coding Performance

Choosing Block Layout

CurrentHDFS

Page 16: Debunking the Myths of HDFS Erasure Coding Performance

Generalizing Block NameNodeMapping Logical and Storage Blocks Too Many Storage Blocks?

Hierarchical Naming Protocol:

Page 17: Debunking the Myths of HDFS Erasure Coding Performance

Client Parallel Writing

streamer

queue

streamer … streamer

Coordinator

Page 18: Debunking the Myths of HDFS Erasure Coding Performance

Client Parallel Reading

parity

Page 19: Debunking the Myths of HDFS Erasure Coding Performance

Reconstruction on DataNode Important to avoid delay on the critical path

- Especially if original data is lost Integrated with Replication Monitor

- Under-protected EC blocks scheduled together with under-replicated blocks- New priority algorithms

New ErasureCodingWorker component on DataNode

Page 20: Debunking the Myths of HDFS Erasure Coding Performance

Data Checksum Support Supports getFileChecksum for EC striped mode files

- Comparable checksums for same content striped files- Can’t compare the checksums for contiguous file and striped file- Can reconstruct on the fly if found block misses while computing

Planning to introduce new version of getFileChecksum- To achieve comparable checksums between contiguous and striped file

Page 21: Debunking the Myths of HDFS Erasure Coding Performance

Roadmap Background of EC

- Redundancy Theory- EC in Distributed Storage Systems

HDFS-EC architecture Hardware-accelerated Codec Framework Performance Evaluation

Page 22: Debunking the Myths of HDFS Erasure Coding Performance

Acceleration with Intel ISA-L 1 legacy coder

- From Facebook’s HDFS-RAID project 2 new coders

- Pure Java — code improvement over HDFS-RAID- Native coder with Intel’s Intelligent Storage Acceleration Library (ISA-L)

Page 23: Debunking the Myths of HDFS Erasure Coding Performance

Why is ISA-L Fast?

pre-computed and reused

parallel operation

Direct ByteBuffer

Page 24: Debunking the Myths of HDFS Erasure Coding Performance

Microbenchmark: Codec Calculation

Page 25: Debunking the Myths of HDFS Erasure Coding Performance

Microbenchmark: Codec Calculation

Page 26: Debunking the Myths of HDFS Erasure Coding Performance

Microbenchmark: HDFS I/O

Page 27: Debunking the Myths of HDFS Erasure Coding Performance

Microbenchmark: HDFS I/O

Page 28: Debunking the Myths of HDFS Erasure Coding Performance

Microbenchmark: HDFS I/O

Page 29: Debunking the Myths of HDFS Erasure Coding Performance

DFSIO / MapReduce

Page 30: Debunking the Myths of HDFS Erasure Coding Performance

Hive-on-MR — locality sensitive

Page 31: Debunking the Myths of HDFS Erasure Coding Performance

Hive-on-Spark — locality sensitive

Page 32: Debunking the Myths of HDFS Erasure Coding Performance

Conclusion Erasure coding expands effective storage space by ~50%! HDFS-EC phase I implements erasure coding in striped block layout Upstream effort (HDFS-7285):

- Design finalized Nov. 2014- Development started Jan. 2015- 218 commits, ~25k LoC change- Broad collaboration: Cloudera, Intel, Hortonworks, Huawei, Yahoo, LinkedIn

Phase II will support contiguous block layout for better locality

Page 33: Debunking the Myths of HDFS Erasure Coding Performance

Acknowledgements Cloudera

- Andrew Wang, Aaron T. Myers, Colin McCabe, Todd Lipcon, Silvius Rus Intel

- Kai Zheng, Rakesh R, Yi Liu, Weihua Jiang, Rui Li Hortonworks

- Jing Zhao, Tsz Wo Nicholas Sze Huawei

- Vinayakumar B, Walter Su, Xinwei Qin Yahoo (Japan)

- Gao Rui, Kai Sasaki, Takuya Fukudome, Hui Zheng

Page 34: Debunking the Myths of HDFS Erasure Coding Performance

Questions?

Zhe Zhang, [email protected] | @oldcaphttp://zhe-thoughts.github.io/

Uma Gangumalla, [email protected]

@UmaMaheswaraG

http://blog.cloudera.com/blog/2016/02/progress-report-bringing-erasure-coding-to-apache-hadoop/

Page 35: Debunking the Myths of HDFS Erasure Coding Performance

Come See us at Intel - Booth 305 “Amazing Analytics from Silicon to Software”• Intel powers analytics solutions that are optimized

for performance and security from silicon to software

• Intel unleashes the potential of Big Data to enable advancement in healthcare/ life sciences, retail, manufacturing, telecom and financial services

• Intel accelerates advanced analytics and machine learning solutions Twitter #HS16SJ

Page 36: Debunking the Myths of HDFS Erasure Coding Performance

LinkedIn Hadoop

Dali: LinkedIn’s Logical Data Access Layer for

Hadoop

Meetup Thu 6/306~9PM @LinkedIn

2nd floor, Unite room2025 Stierlin CtMountain View

Dr. Elephant: performance

monitoring and tuning.SFHUG in Aug

Page 37: Debunking the Myths of HDFS Erasure Coding Performance

Backup

Page 38: Debunking the Myths of HDFS Erasure Coding Performance
Page 39: Debunking the Myths of HDFS Erasure Coding Performance

Recommended