Timelines at scale

Post on 07-Dec-2014

353 views 5 download

description

Raffi Krikorian explains the architecture used by Twitter to deal with 300K queries per second - tweets, social graph mutations, and direct messages

transcript

timelines at scale

@ra!qcon sf 2012

Pull Push

Targeted twitter.comhome_timeline API

User / Site StreamsMobile Push (SMS, etc.)

Queried Search API Track / Follow Streams

the challenge⇢> 150M world wide active users

⇢> 300K QPS for timelines

⇢naïve timeline “materialization” can be slow

Timeline Service

Ingester

Sear

ch C

ache

RedisRedisEarlybird

Blender

Push

Com

pute

HTTP Push

Mobile Push B

atch

Com

pute

Hadoop

Write API

Fanout

RedisRedis

Tim

elin

e C

ache

Redis

Timeline Service

Ingester

Sear

ch C

ache

RedisRedisEarlybird

Blender

Push

Com

pute

HTTP Push

Mobile Push B

atch

Com

pute

Hadoop

Write API

Fanout

RedisRedis

Tim

elin

e C

ache

Redis

Social Graph Service

Timeline Service

Ingester

Sear

ch C

ache

RedisRedisEarlybird

Blender

Push

Com

pute

HTTP Push

Mobile Push B

atch

Com

pute

Hadoop

Write API

Fanout

RedisRedis

Tim

elin

e C

ache

Redis

Social Graph Service

insert⇢keyed o"

“recipient”

⇢pipelined 4k “destinations” at a time

⇢replicated

Timeline Service

Ingester

Sear

ch C

ache

RedisRedisEarlybird

Blender

Push

Com

pute

HTTP Push

Mobile Push B

atch

Com

pute

Hadoop

Write API

Fanout

RedisRedis

Tim

elin

e C

ache

Redis

using redis⇢native list

structure

Tweet ID BitsUser ID

8 bytes 4 bytes8 bytes

Timeline Service

Ingester

Sear

ch C

ache

RedisRedisEarlybird

Blender

Push

Com

pute

HTTP Push

Mobile Push B

atch

Com

pute

Hadoop

Write API

Fanout

RedisRedis

Tim

elin

e C

ache

Redis

using redis⇢native list

structure

⇢RPUSHX to only add to cached timelines

Tweet ID BitsUser ID

Tweet ID BitsUser ID

Tweet ID BitsUser ID

Tweet ID BitsUser ID

Tweet ID BitsUser ID

Tweet ID BitsUser ID

Tweet ID BitsUser ID

Tweet ID BitsUser ID

Tweet ID BitsUser ID

Tweet ID BitsUser ID Tweet ID

Tweet ID

Tweet ID

Ingester

Sear

ch C

ache

RedisRedisEarlybird

Blender

Push

Com

pute

HTTP Push

Mobile Push B

atch

Com

pute

Hadoop

Write API

Fanout

RedisRedis

Tim

elin

e C

ache

Timeline Service

Redis

Timeline Service

Write API

Fanout

RedisRedis

Tim

elin

e C

ache

Redis

TweetyPieGizmoduck

Pull Push

Targeted twitter.comhome_timeline API

User / Site StreamsMobile Push (SMS, etc.)

Queried Search API Track / Follow Streams

Ingester

Sear

ch C

ache

RedisRedisEarlybird

Blender

Push

Com

pute

HTTP Push

Mobile Push B

atch

Com

pute

Hadoop

Write API

Fanout

RedisRedis

Tim

elin

e C

ache

Timeline Service

Redis

Push

Com

pute

HTTP Push

Mobile Push B

atch

Com

pute

HadoopSe

arch

Inde

x

Blender

Redis

Timeline Service

Ingester

Earlybird

Write API

Fanout

RedisRedis

Tim

elin

e C

ache

RedisEarlybird

blender⇢queries one

replica of all indexes

⇢merges & ranks results

Push

Com

pute

HTTP Push

Mobile Push B

atch

Com

pute

HadoopSe

arch

Inde

x

Blender

Redis

Timeline Service

Ingester

Earlybird

Write API

Fanout

RedisRedis

Tim

elin

e C

ache

RedisEarlybird

Write API Redis

Redis

Redis

Write API Earlybird

Earlybird

Earlybird

API

Cach

e

Read API Redis

Redis

Redis

Read API Earlybird

Earlybird

EarlybirdAPI

Cach

e⇢O(n) write

⇢O(1) write

⇢O(1) read

⇢O(n) read

the challenge (part #2)⇢fanout can be really slow!

⇢ ...especially for high follower counts

@barackobama23 million followers

31 million followers

@katyperry28 million followers

@justinbieber28 million followers

@ra!0.019 million followers

@ladygaga

there are over400 million tweetsa day

a second4600 tweets

0.2 msa tweet≈

Write API

Ingester Fanout

Sear

ch In

dex

RedisEarlybird

EarlybirdRedis

RedisRedis

Tim

elin

e C

ache

search index ⇢[‘hello’,‘world’]

fanout index ⇢[@danadanger, ...]

User Intent Query Expansion

“Hello, world” “Hello” AND “world”

@ra!’s home timeline home_timeline:ra!

User Intent Query Expansion

“Hello, world” “Hello” AND “world”

@ra!’s home timelineuser_timeline:nelson

ORuser_timeline:danadanger

User Intent Query Expansion

“Hello, world” “Hello” AND “world”

@ra!’s home timeline home_timeline:ra!

User Intent Query Expansion

“Hello, world” “Hello” AND “world”

@ra!’s home timelinehome_timeline:ra!

ORuser_timeline:taylorswift13

Bat

ch C

ompu

te

Hadoop

Push

Com

pute

HTTP Push

Sear

ch In

dex

Blender

Redis

Timeline Service

Ingester

Earlybird

Write API

Fanout

RedisRedis

Tim

elin

e C

ache

RedisEarlybirdMobile Push

Asynchronous Path

Query Path

Bat

ch C

ompu

te

Hadoop

Synchronous Path

Push

Com

pute

HTTP Push

Sear

ch In

dex

Blender

Redis

Timeline Service

Ingester

Earlybird

Write API

Fanout

RedisRedis

Tim

elin

e C

ache

RedisEarlybirdMobile Push

Synchronous Path

Query Path

Bat

ch C

ompu

te

Hadoop

Asynchronous Path

Push

Com

pute

HTTP Push

Sear

ch In

dex

Blender

Redis

Timeline Service

Ingester

Earlybird

Write API

Fanout

RedisRedis

Tim

elin

e C

ache

RedisEarlybirdMobile Push

Asynchronous Path

Synchronous Path

Bat

ch C

ompu

te

Hadoop

Query Path

Push

Com

pute

HTTP Push

Sear

ch In

dex

Blender

Redis

Timeline Service

Ingester

Earlybird

Write API

Fanout

RedisRedis

Tim

elin

e C

ache

RedisEarlybirdMobile Push

timeline query statistics⇢>150m active users worldwide

⇢>300k qps poll-based timelines @ 1ms p50 / 4ms p99

⇢>30k qps search-based timelines

tweet input⇢~400m tweets per day

⇢~5K/sec daily average

⇢~7K/sec daily peak

⇢>12K/sec during large events

timeline delivery statistics⇢30b deliveries / day (~21m / min)

⇢3.5 seconds @ p50 to deliver to 1m

⇢~300k deliveries / sec

thanks!