+ All Categories
Home > Technology > RedisConf17- Using Redis at scale @ Twitter

RedisConf17- Using Redis at scale @ Twitter

Date post: 21-Jan-2018
Category:
Upload: redis-labs
View: 584 times
Download: 4 times
Share this document with a friend
26
Nighthawk Distributed caching with Redis @ Twitter Rashmi Ramesh @rashmi_ur
Transcript
Page 1: RedisConf17- Using Redis at scale @ Twitter

Nighthawk

Distributed caching with Redis @

Twitter

Rashmi Ramesh@rashmi_ur

Page 2: RedisConf17- Using Redis at scale @ Twitter

Agenda

What is Nighthawk?

How does it work?

Scaling out

High availability

Current challenges

Page 3: RedisConf17- Using Redis at scale @ Twitter

Nighthawk - cache-as-a-service

Runs redis at it’s core

> 10M QPS,

Largest cluster runs ~3K redis nodes

> 10TB of data

Page 4: RedisConf17- Using Redis at scale @ Twitter

Who uses Nighthawk?

Some of our biggest customers:

Analytics services - Ads, Video

Ad serving

Ad Exchange

Direct Messaging

Mobile app conversion tracking

Page 5: RedisConf17- Using Redis at scale @ Twitter

Design Goals

Scalable: scale vertically and horizontally

Elastic: add / remove instances without violating SLA

High throughput and low latencies

High availability in the event of machine failures

Topology agnostic client

Page 6: RedisConf17- Using Redis at scale @ Twitter

Nighthawk Architecture

Client

Proxy/Routing layer

Backend N

..……...

Redis 0 Redis N

Backend 0

..……...

Redis 0 Redis N

Topology

Cluster

manager

Page 7: RedisConf17- Using Redis at scale @ Twitter

Cache backend

Mesos Container

Redis nodes

Topology

watcher and

announcer

1 2 3

NM

Proxy/Router

Replica 1 -> Redis1

Replica 2 -> Redis2

Replica 3 -> Redis3

Redis1(dc,host,port1,capacity)

Redis2(dc,host,port2, capacity)

Redis3(dc,host,port3,, capacity)

Topology

Page 8: RedisConf17- Using Redis at scale @ Twitter

Cluster manager

Manages topology membership and changes

- (Re)Balances replicas

- Reacts to topology changes, eg: dead node

- Replicated cache - ensures 2 replicas of same partition are on separate

failure domains

Page 9: RedisConf17- Using Redis at scale @ Twitter

Redis databases for partitions

Partition -> Redis DB

Granular key remapping

Logical data isolation

Enumerating - redis db scan

Deletion - flushdb

Enables replica rehydration

K1 K4K2 K3

Partition X Partition Y

1 2

Page 10: RedisConf17- Using Redis at scale @ Twitter

Scaling

Page 11: RedisConf17- Using Redis at scale @ Twitter

Scaling out with Client/Proxy managed

partitioningKey count: 1.5 M keys

Client

500K 500K500K

Page 12: RedisConf17- Using Redis at scale @ Twitter

Scaling out with Client/Proxy managed

partitioningKey count: 1.5M keys

Remapped keys: 600KClient

300K 300K300K 300K300K

Persistent storage

Page 13: RedisConf17- Using Redis at scale @ Twitter

Scaling out with Cluster managerKey count: 1.5M keys

Partition count: 100

Keys/Partition: 15K

Client

Persistent storage

Proxy

Topology and

cluster manager

500K 500K500K

Page 14: RedisConf17- Using Redis at scale @ Twitter

Scaling out with Cluster managerKey count: 1.5M keys

Partition count: 100

Keys/Partition: 15K

Client

Persistent storage

Proxy

Topology and

cluster manager

500K 485K500K 15K

Page 15: RedisConf17- Using Redis at scale @ Twitter

Scaling out with Cluster managerKey count: 1.5M keys

Partition count: 100

Keys/Partition: 15K

Client

485K 485K500K 15K 15K

Persistent storage

Proxy

Topology and

cluster manager

Page 16: RedisConf17- Using Redis at scale @ Twitter

Scaling out with Cluster manager - Post

balancingKey count: 1.5M keys

Partition count: 100

Post balancing...

Client

Persistent storage

Proxy

Topology and

cluster manager

250K 250K250K 250K 500K

Page 17: RedisConf17- Using Redis at scale @ Twitter

Advantages over Client managed partitioning

- Thin client - simple and oblivious to topology

- Clients, proxy layer and backends scale independently

- Pluggable custom load balancing logic through cluster manager

- No cluster downtime during scaling out/up/back

Page 18: RedisConf17- Using Redis at scale @ Twitter

High Availability

Page 19: RedisConf17- Using Redis at scale @ Twitter

High Availability with Replication

Synchronous, best effort

RF = 2, Intra DC

Supports idempotent operations only - get, put, remove, count, scan

Copies of a partition never on the same host and rack

Passive warming for failed/restarted replicas

Page 20: RedisConf17- Using Redis at scale @ Twitter

High Availability with Replication

Client

Proxy/Routing layer

Backend 0

Partition 2,5,9

Topology

Cluster

manager

GetKey in

Partition 5GetKey in

Partition 5

SERVING

Backend N

Partition

12,5,10

SERVINGFAILED

Backend N*

Partition 12,5,10

WARMING

SetKey in

partition 5

Pool A Pool B

Page 21: RedisConf17- Using Redis at scale @ Twitter

Current challenges

Page 22: RedisConf17- Using Redis at scale @ Twitter

Remember this?

The most retweeted

Tweet of 2014!

Page 23: RedisConf17- Using Redis at scale @ Twitter

Hot key symptom

Significantly high QPS to a single cache server

Page 24: RedisConf17- Using Redis at scale @ Twitter

Hot Key Mitigation

Server side diagnostics:

Sampling a small % of requests and logging

Post processing the logs to identify high frequency keys

Client side solution:

Client side hot key detection and caching

Better to have:

Redis tracks the hot keys

Protocol support to send feedback to client if a key is hot

Page 25: RedisConf17- Using Redis at scale @ Twitter

Active warming of replicas

Client

Proxy/Routing layerTopology

Cluster

managerBackend A

Partition 2,5,9

SERVING

Backend B*

Partition 12,5,10

WARMING

writes

Bootstrapper

Pool APool B

Page 26: RedisConf17- Using Redis at scale @ Twitter

Questions?


Recommended