+ All Categories
Home > Technology > Papers We Love Too, June 2015: Haystack

Papers We Love Too, June 2015: Haystack

Date post: 08-Aug-2015
Category:
Upload: sargun-dhillon
View: 189 times
Download: 0 times
Share this document with a friend
134
Facebook Haystack Finding a needle in Haystack: Facebook's photo storage. An Analysis of Facebook Photo Caching PWL SF - June 30, 2015
Transcript

Facebook Haystack

Finding a needle in Haystack: Facebook's photo storage. An Analysis of Facebook Photo Caching

PWL SF - June 30, 2015

Sargun Dhillon@Sargun

Agenda•The Haystack problem •Design & Architecture •Takeaways

What is Haystack?

Storage System

Needle Storage & Serving For Facebook

Workload

•Write Once •Read Often •Delete Rarely

•Write Once •Read Often •Delete Rarely

•Write Once •Read Often •Delete Rarely

Why Haystack as a Paper?

Why Not?

Really BIG dataset

20+ Petabytes

120 millionnew photos a day

Simple

Clever optimizations

Where did Haystack

come from?

History

Network Attached Storage (NAS)

mounted over NFS

CDNs for low-latency

Pareto Distribution

Theoretical Image Access CDF

Pareto

Fetches are expensive• Multiple seeks:

• Directory metadata • Inode • File contents

• File metadata is 10s of kilobytes • Long-tail uncachable

Decision to Build• Existing systems unable to be adapted

• Hadoop • MySQL • Traditional NAS appliances

• Don’t need to solve for the kitchen sink • Log data • Development work

Haystack Design

Design Constraints

High Throughput &

Low Latency

Cost-effective*

*CDNs are Expensive

Simple

Separation of concerns

•Haystack Store •Haystack Cache•Haystack Directory

Read Path

Write Path

Store

Concerns: Read, Write,

Delete Needles

How much cache?

There is no right ratio

Just enough memory for metadata

Store little metadata

Volume LayerVolume layer above filesystem

Volumes are append-only

Arranged Into Logical Volumes

Append-only Data File

Indexing

10 bytes of metadata per

photo

2-bit overhead

Read by <Key, Alt Key,

Cookie>

Checks cookie for security

Modifications are appends

Deletions change offset to 0

Compaction for reclamation

Batch Upload

Similar to Bitcask & CDB

Uses XFS

Volumes preallocated

Fault Tolerance

Pitchfork: generates artificial

load

Checksum verified on compaction

Directory marks volumes offline

Recovery: Rsync*

* With QoS

Restore: Multiple Replicas

The Hardware

OCP: Open Compute

Project

Open Vault: KNOX

12x3TB SATA in RAID6

RAID Controller with NVRAM

Only Writes Cached

Good at reads xor writes not both

ReadThroughput

Avg. Read Latency

Write Throughput

Avg. Write Latency

Only Reads 770.6 33.2 - -

Only Writes - - 6099.4 4.9

Multiwrite (x16) - - 10843.8 43.9

Reads And

Writes718.1 41.6 232.0 11.9

Latencies Table

Haystress

“Known Unknowns” and

“Unknown Unknowns”

Haystack Store• Responsibilities:

• Read needles • Write needles

• Append-only • O(1) read cost*

*Usually

Cache

Concerns: Caching

Organized as DHT

Not just an LRU

Two caching rules

Request isn’t from CDN

Request is to write-enabled store

Haystack Cache

•Simple cache •Optimizations, given access patterns

Directory

The Rug

Concerns: Mapping, Load Balancing, CDN Management,

Directing

Maps logical volumes to

physical machines

Mapping based on business rules

Load balances reads

Directs writes to relevant logical

volume

Directs reads away from CDN

Directory• Manages capacity • Manages volume mapping • Manages image mapping • Manages CDN

Tying it together

Writes

Write-path

• Involves

• Store

• Directory

• Smart client

Reads

Detour: URLs

Directory uses URLs for directing

http://⟨CDN⟩/⟨Cache⟩/⟨Machine id⟩/⟨Logical

volume, Photo⟩

URL Makeup• CDN • Cache Node • Machine ID • Logical Volume ID • Photo ID & Alt ID • Cookie

Strips URL left-to-right

Read-path• Involves:

• Directory • Cache • CDN

Insights

Narrow Scope“That simplicity let us build and

deploy a working system in a few months instead of a few years.”

Sometimes you’re solving the wrong

problem

Smart Clients & Ecosystem Control

Simple Optimizations

Open Source Implementation:

WeedFS

Thanks & Qs


Recommended