EVCache & Moneta (GoSF)

Post on 06-Apr-2017

421 views 0 download

transcript

GoSF July 20, 2016

Ephemeral Volatile Cache

Clustered memcached optimized for AWS and tuned for Netflix use cases.

What is EVCache?

Distributed Memcached

Tunable Replication

Highly Resilient

Topology Aware

DataChunking

Additional Functionality

Ephemeral Volatile Cache

Home PageRequest

Why Optimize for AWS● Instances disappear● Zones disappear● Regions can "disappear" (Chaos Kong)● These do happen (and we test all the time)● Network can be lossy

○ Throttling○ Dropped packets

● Customer requests move between regions

EVCache Use @ Netflix ● 70+ distinct EVCache clusters● Used by 500+ Microservices● Data replicated over 3 AWS regions● Over 1.5 Million replications per second● 65+ Billion objects● Tens of Millions ops/second (Trillions per day)● 170+ Terabytes of data stored● Clusters from 3 to hundreds of instances● 11000+ memcached instances of varying size

Architecture

Eureka(Service Discovery)

ServerMemcached

Prana (Sidecar)Monitoring & Other Processes

Client Application

Client Library

Client

Architecture● Complete bipartite graph between clients and servers● Sets fan out, gets prefer closer servers● Multiple full copies of data

us-west-2a us-west-2cus-west-2b

Client

Readingus-west-2a us-west-2cus-west-2b

ClientClient Client

Writingus-west-2a us-west-2cus-west-2b

ClientClient Client

Use Case: Lookaside cacheApplication (Microservice)

Service Client Library

Client Ribbon Client

S S S S. . .

C C C C. . .

. . .

Data Flow

Use Case: Primary Store

Offline / Nearline Precomputes for

Recommendations

Online Services

Offline Services

. . .

Online Client Application

Client Library

Client

Data Flow

Use Case: Transient Data StoreOnline Client Application

Client Library

Client

Online Client Application

Client Library

Client

Online Client Application

Client Library

Client

. . .

Additional Features● Global data replication● Secondary indexing (debugging)● Cache warming (faster deployments)● Consistency checking

All powered by metadata flowing through Kafka

Region BRegion A

Repl Relay

Repl Proxy

KafkaRepl Relay

Repl Proxy

1 mutate

2 send metadata

3 poll msg

5 https s

end msg

6 mutate

4 get data

for set

APP

Kafka

Cross-Region Replication

7 read APP

Cache Warming (Deployments)

Cache Warmer

Kafka. . . . . .. . .

Online Client Application

Client Library

Client

Minimal Code ExampleCreate EVCache ObjectEVCache evCache = new EVCache.Builder()

.setAppName("EVCACHE_TEST")

.setCachePrefix("pre")

.setDefaultTTL(900)

.build();

Write DataevCache.set("key", "value");

Read DataevCache.get("key");

Delete DataevCache.delete("key");

Failure Resilience in Client● Operation Fast Failure● Tunable Read Retries● Read/Write Queues● Set with Tunable Latch● Async Replication through Kafka

MonetaNext-gen EVCache Server

I.e. why I'm talking about this at a Go meetup

MonetaMoneta: The Goddess of Memory

Juno Moneta: The Protectress of Funds for Juno

● Evolution of the EVCache server● EVCache on SSD● Cost optimization● Ongoing lower EVCache cost per stream● Takes advantage of global request patterns

Old Server● Stock Memcached and Prana (Netflix sidecar)● Solid, worked for years● All data stored in RAM in Memcached● Became more expensive with expansion / N+1 architecture

Memcached

Prana

Metrics & Other Processes

Optimization● Global data means many copies● Access patterns are heavily region-oriented● In one region:

○ Hot data is used often○ Cold data is almost never touched

● Keep hot data in RAM, cold data on SSD● Size RAM for working set, SSD for overall dataset

New Server● Adds Rend and Mnemonic● Still looks like Memcached● Unlocks cost-efficient storage & server-side intelligence

Rend

Prana

Metrics & Other Processes

Memcached (RAM)

Mnemonic (SSD)

external internal

https://github.com/netflix/rend

go get github.com/netflix/rend

Rend

Rend● High-performance Memcached proxy & server● Written in Go

○ Powerful concurrency primitives○ Productive and fast

● Manages the L1/L2 relationship● Server-side data chunking● Tens of thousands of connections

Rend● Modular to allow future changes / expansion of scope

○ Set of libraries and a default main()

● Manages connections, request orchestration, and backing stores

● Low-overhead metrics library● Multiple orchestrators● Parallel locking for data integrity

Server Loop

Request Orchestration

Backend Handlers

METRICS

Connection Management

Protocol

Moneta in Production● Serving some of our most important personalization data● Rend runs with two ports

○ One for regular users (read heavy or active management)○ Another for "batch" uses: Replication and Precompute

● Maintains working set in RAM● Optimized for precomputes

○ Smartly replaces data in L1StdStd

Prana

Metrics & Other Processes

Memcached

Mnemonic

external internal

Batch

MnemonicOpen source Soon™

Mnemonic● Manages data storage to SSD● Reuses Rend server libraries

○ Handles Memcached protocol

● Core logic implements Memcached operations into RocksDB

Rend Server Core Lib (Go)

Mnemonic Op Handler (Go)

Mnemonic Core (C++)

RocksDB

Mnemonic Stack

Why RocksDB for Moneta● Fast at medium to high write load

○ Goal: 99% read latency ~20-25ms

● LSM Tree Design minimizes random writes to SSD○ Data writes are buffered

● SST: Static Sorted TableRecord A Record B

SST SST SST...

memtables

How we use RocksDB● FIFO "Compaction"

○ More suitable for our precompute use cases○ Level compaction generated too much traffic to SSD

● Bloom filters and indices kept in-memory● Records sharded across many RocksDBs per instance

○ Reduces number of SST files checked, decreasing latency

...

Mnemonic Core Lib

Key: ABCKey: XYZ

RocksDB’s

FIFO Limitation● FIFO compaction not suitable for all use cases

○ Very frequently updated records may prematurely push out other valid records

● Future: custom compaction or level compaction

SST

Record A2

Record B1

Record B2

Record A3

Record A1

Record A2

Record B1

Record B2

Record A3

Record A1

Record B3Record B3

Record C

Record D

Record E

Record F

Record G

Record H

SST SST

time

Moneta Performance Benchmark● 1.7ms 99th percentile read latency

○ Server-side latency○ Not using batch port

● Load: 1K writes/sec, 3K reads/sec○ Reads have 10% misses

● Instance type: i2.xlarge

Open Sourcehttps://github.com/netflix/EVCache

https://github.com/netflix/rend

Thank You

@sgmansfieldsmansfield@netflix.com