Post on 16-Apr-2017
transcript
The Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage Platforms
Abhijith Shenoy – Engineer, Hedvig Inc.@hedviginc
2
The need for new architectures
Business innovation Time-to-market Flexible
infrastructure
Business executives Developers IT infrastructure / DevOps
3
Modern apps need. . .
• Scale• Flexibility• Self-service• Automation
To achieve this, the world is moving to a software-defined, distributed systems approach
Software-defined Storage
Software-defined storage
commodity servers software software-definedstorage
+ =
Common software-defined storage elements
Proxy/client Provides storage access to application compute environment via common protocols
Storagesoftware
Forms elastic storage cluster with commodity servers or cloud infrastructure
7
Modern software-defined storage architectures
Hyperconverged Hyperscale
8
Spanning multiple DCs and clouds
Hyperconverged Hyperscale
Data protection with SDS:RAID, Erasure Coding
& Replication
10
Protecting stored data: RAID• Redundant Array of Independent Disks
– Divides or replicates data across multiple drives to deliver performance and fault tolerance
– Commonly used: RAID 0, RAID 1, RAID 5, RAID 10
• Pros– Trusted protection solution in the
traditional array world– Known performance delivery
• Cons– High-capacity drive (8TB+) rebuilds can
take days or even weeks– RAID controllers add complexity for
requisite performance
11
Protecting stored data: Erasure coding• A parity based protection technique
– Data broken into fragments and encoded– Stored across different locations with a configurable number of
redundant pieces• Pros
– Consumes less storage than replication – good for cheap/deep– Allows for the failure of two or more elements of a storage system
• Cons– Parity calculation is CPU-intensive– Increased latency can slow production writes and rebuilds
12
How erasure coding works• Split a file into n chunks and code into m parity blocks
AX1
X2
split encode
A1
A2
A3
A4
13
How erasure coding works• Tolerate m erasures (failures)
A1
A2
A3
A4
= X1
X2=
X1
X1
=
=
+
+ 2
X2
X2
14
How erasure coding works• In a distributed system, chunks are spread across nodes• In this example, 2 nodes can fail and data can still be rebuilt
Node 1
A1
Node 2
A2
Node 3
A3
Node 4
A4X1 X1 X1 + X2 X1 + (2)X2
15
Erasure coding use case: Archival storage• Goal
– Need long-term storage of PBs of files– Minimizing storage costs critical to business profitability
• Solution– Software-defined storage + erasure coding
• Results– Store and protect archival data in 1.5x disk space– Performance adequate for workload– Rebuilds slower than desired, but capacity savings outweigh latency
16
Protecting stored data: Replication• The creation of data copies across different locations of the
storage system– Typically 2 or 3 copies, configurable based on accepted risk level– If a drive fails, data is recreated on another drive from replica(s)
• Pros– Less CPU intensive = faster write performance– Simple restores = faster rebuild performance
• Cons– Requires 2x or more the original storage space
17
How replication works with software-defined storage• Data broken into chunks
and n copies made across server nodes in a cluster
Data Center 3Data Center 2Data Center 1
App host
Node 1 Node 2 Node 3
18
Offsetting replication overhead• Compression
– ~ 2:1 reduction
• Deduplication– ~5:1 or higher reduction
• Low disk cost– HDD and flash economics declining– Overhead of replication more tolerable
App hosts
19
Doesn’t have to be one-size-fits-all• Modern solutions provide per-volume choice• Choose protection type based on workload
2-6 copies
512 bytes – 64k
Agnostic | Rack-aware | Datacenter-aware
Block (iSCSI) | NFS
20
Replication use case: Primary data storage• Company: Large financial organization• Situation
– Hosting 500TB of data across four datacenters in two countries– Want maximum availability and recoverability
• Solution– Deployed software-defined storage with 4-way replication
• Results– Achieve high-performance, high-availability, and quick rebuilds
Data Center A
Data Center B
Data Center C
Active Active
Data Center D
Active
21
Summary• Protection technologies are evolving along with architectures
• RAID has met its limitation with large capacity drives
• Erasure coding is a good option for latency tolerant, large capacity stores
• Replication provides protection in demanding performance and availability environments
• Software-defined storage offers choice and flexibility to deploy each protection technology where it makes sense
Thank You!