+ All Categories
Home > Documents > Real-world queuing strategies - Nosql Search Roadshow

Real-world queuing strategies - Nosql Search Roadshow

Date post: 04-Feb-2022
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
65
“Hold this for a moment” Real-world queuing strategies David Dawson Director of Technology Marcus Kern VP, Technology
Transcript
Page 1: Real-world queuing strategies - Nosql Search Roadshow

“Hold this for a moment” Real-world queuing strategies

David Dawson Director of Technology

Marcus Kern VP, Technology

Page 2: Real-world queuing strategies - Nosql Search Roadshow

Powerful mobile technology that puts ideas in motion – an mCMS and a mobile campaign platform available for both: self-service or managed service.

VELTI PLATFORMS

The complete mobile engagement solution. We help brands progress along their mobile roadmap, from fast growth pilots to optimisation of current assets and revenue growth.

MOBILE MARKETING

Cultivate relationships that build excitement through fun and interesting experiences they want to participate in. From on-pack promos to premium competitions.

LARGE SCALE CAMPAIGNS

Rewards based performance marketing, aimed at increasing customer lifetime value, revenue growth and acquisition of insightful consumer analytics. We provide both the programme and loyalty fulfillment.

LOYALTY MCRM

We build your mCRM engine that builds opt in customers into a mobile database and pushes it through the measurement tool so we can show you what you spend and what you gain.

The complete mobile advertising solution. Our own ad network & exchange, equipped with dynamic “real time” analytics of all your mobile activity using our Visualise tool, all under one roof.

VELTI MEDIA

Our instantly available predictive analysis and personalisation tool provides a single view of your brand from all your dispersed data points and overlays sales data in real time so you can manage your mobile campaigns “in action”.

BRAND BLOTTER

WHAT DO WE DO ?

Page 3: Real-world queuing strategies - Nosql Search Roadshow

3 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Velti Technologies •  Erlang •  RIAK & leveldb •  Redis •  Ubuntu

•  Ruby on Rails •  Java •  Node.js •  MongoDB •  MySQL

Page 4: Real-world queuing strategies - Nosql Search Roadshow

4 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Two parts to this story •  Queuing Strategies

•  Optimizing hardware

Page 5: Real-world queuing strategies - Nosql Search Roadshow

5 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Background •  Building apps rely on messaging + queuing > 8 years •  Multiple iterations, also combining NoSQL to help scale •  Started with Mysql

–  Operations friendly –  Although struggled to scale as we our throughputs increased

•  What about existing solutions ( e.g. RabbitMQ ) –  Features missing until recently –  More confidence from a iterative approach of our existing design

Page 6: Real-world queuing strategies - Nosql Search Roadshow

6 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Building a Robust Queue

Q Workers Q

•  Reliable + Replicated •  Scheduled jobs + Retries •  High performance ( >10,000 tx/sec ) •  Multiple producers and consumers ( > 100 ) •  Easy to debug + Operations friendly

Sender Receiver

Q

Producers Consumers

Page 7: Real-world queuing strategies - Nosql Search Roadshow

7 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

~

Test Harness

Connection pool

Reporting thread

Configuration

Producer threads

Consumer threads

Harness

Mysql + lock

Mysql + Redis

Mysql

Implementations

•  Built using Jruby –  Fast ( Hotspot ) –  Threads without the GIL

•  Pluggable design –  Multiple implementations

•  Configurable variables –  Batch size –  Number of Producers and

Consumers –  Number of itterations

•  Reporting

Page 8: Real-world queuing strategies - Nosql Search Roadshow

8 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Implementation #1 •  Mysql only ( v5.5 ) percona •  Innodb ( xtradb ) •  Replication •  1 x table ( ‘queue’ )

–  Id ( primary key, auto_inc, int ) –  Worker_id ( int ) –  Process_at ( datetime ) –  Payload ( varchar ) –  Index ( worker_id, process_at )

•  Dedicated hardware –  Harness: HP DL365 ( 12 cores ) –  Mysql: HP DL365 ( 12 cores )

Page 9: Real-world queuing strategies - Nosql Search Roadshow

9 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Implementation #1

Insert into queue ( worker_id, process_at, payload ) values ( 0, ’2012-01-01 01:01:00’, ’{ json}’ )

Update queue set worker_id=123 where worker_id=0 and process_at <= now() limit 10

Select * from queue where worker_id=123

Update queue set worker_id=-1 where id=2

Update queue set worker_id=-1 where id=3

Insert into queue ( worker_id, process_at, payload ) values ( 0, ’2012-01-01 01:01:00’, ’{ json}’ )

Multiple write operations Batched update / read operations

id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }

id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }

2 0 2012-01-01 01:01:00 { json }

id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }

2 0 2012-01-01 01:01:00 { json }

3 0 2012-01-01 01:01:00 { json }

id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }

2 123 2012-01-01 01:01:00 { json }

3 123 2012-01-01 01:01:00 { json }

id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }

2 -1 2012-01-01 01:01:00 { json }

3 123 2012-01-01 01:01:00 { json }

id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }

2 -1 2012-01-01 01:01:00 { json }

3 -1 2012-01-01 01:01:00 { json }

Page 10: Real-world queuing strategies - Nosql Search Roadshow

10 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Implementation #1

0"

500"

1,000"

1,500"

2,000"

2,500"

3,000"

00:00"

00:15"

00:30"

00:45"

01:00"

01:15"

01:30"

01:45"

02:00"

02:15"

02:30"

02:45"

03:00"

03:15"

03:30"

03:45"

04:00"

04:15"

04:30"

04:45"

05:00"

Transac0on

s"per"se

cond

"

Time"[mm:ss]"

Mode"1,"Producers:"1,"Consumers:"1"Pop"on"Pop"off"

Total"popped"ON:"500000"BB"Total"popped"OFF:"500000"

Page 11: Real-world queuing strategies - Nosql Search Roadshow

11 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Implementation #1

0"

2,000"

4,000"

6,000"

8,000"

10,000"

12,000"

14,000"

16,000"

18,000"

00:00"

00:15"

00:30"

00:45"

01:00"

01:15"

01:30"

01:45"

02:00"

02:15"

02:30"

02:45"

03:00"

03:15"

03:30"

03:45"

04:00"

04:15"

04:30"

04:45"

05:00"

Transac2on

s"per"se

cond

"

Time"[mm:ss]"

Mode"1,"Producers:"10,"Consumers:"10"Pop"on"Pop"off"

Total"popped"ON:"500000"DD"Total"popped"OFF:"500000"

Page 12: Real-world queuing strategies - Nosql Search Roadshow

12 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Implementation #1

0"

5,000"

10,000"

15,000"

20,000"

25,000"

00:00"

00:15"

00:30"

00:45"

01:00"

01:15"

01:30"

01:45"

02:00"

02:15"

02:30"

02:45"

03:00"

03:15"

03:30"

03:45"

04:00"

04:15"

04:30"

04:45"

05:00"

Transac0on

s"per"se

cond

"

Time"[mm:ss]"

Mode"1,"Producers:"30,"Consumers:"30"Pop"on"Pop"off"

Total"popped"ON:"500000"BB"Total"popped"OFF:"136587"

Page 13: Real-world queuing strategies - Nosql Search Roadshow

13 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Implementation #1

0"

5,000"

10,000"

15,000"

20,000"

25,000"

00:00"

00:15"

00:30"

00:45"

01:00"

01:15"

01:30"

01:45"

02:00"

02:15"

02:30"

02:45"

03:00"

03:15"

03:30"

03:45"

04:00"

04:15"

04:30"

04:45"

05:00"

Transac0on

s"per"se

cond

"

Time"[mm:ss]"

Mode"1,"Producers:"50,"Consumers:"50"Pop"on"Pop"off"

Total"popped"ON:"304321"BB"Total"popped"OFF:"35579"

Page 14: Real-world queuing strategies - Nosql Search Roadshow

14 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Implementation #1

0"

5000"

10000"

15000"

20000"

25000"

00:00"

00:15"

00:30"

00:45"

01:00"

01:15"

01:30"

01:45"

02:00"

02:15"

02:30"

02:45"

03:00"

03:15"

03:30"

03:45"

04:00"

04:15"

04:30"

04:45"

05:00"

Transac/on

s"per"se

cond

"

Time"[mm:ss]"

Mode"1"queue"'pop<on'"with"varying"Producer"levels"

1"Producer"10"Producers"30"Producers"50"Producers"

Page 15: Real-world queuing strategies - Nosql Search Roadshow

15 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Implementation #2 •  Same Mysql setup as implementation #1 •  Although we wrap a lock around the point of

most contention ( batch update ) –  Select get_lock( str, timeout ) –  Select release_lock( str )

Page 16: Real-world queuing strategies - Nosql Search Roadshow

16 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Implementation #2 ( mysql + Lock )

Insert into queue ( worker_id, process_at, payload ) values ( 0, ’2012-01-01 01:01:00’, ’{ json}’ )

Update queue set worker_id=123 where worker_id=0 and process_at > now() limit 10

Select * from queue where worker_id=123

Update queue set worker_id=-1 where id=2

Update queue set worker_id=-1 where id=3

Insert into queue ( worker_id, process_at, payload ) values ( 0, ’2012-01-01 01:01:00’, ’{ json}’ )

Multiple write operations Batched update / read operations

id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }

id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }

2 0 2012-01-01 01:01:00 { json }

id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }

2 0 2012-01-01 01:01:00 { json }

3 0 2012-01-01 01:01:00 { json }

id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }

2 123 2012-01-01 01:01:00 { json }

3 123 2012-01-01 01:01:00 { json }

id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }

2 -1 2012-01-01 01:01:00 { json }

3 123 2012-01-01 01:01:00 { json }

id worker_id process_at payload 1 -1 2012-01-01 01:01:00 { json }

2 -1 2012-01-01 01:01:00 { json }

3 -1 2012-01-01 01:01:00 { json }

Select get_lock( ‘queue’,-1 )

Select release_lock( ‘queue’ )

Page 17: Real-world queuing strategies - Nosql Search Roadshow

17 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Implementation #2

0"

5,000"

10,000"

15,000"

20,000"

25,000"

30,000"

00:00"

00:15"

00:30"

00:45"

01:00"

01:15"

01:30"

01:45"

02:00"

02:15"

02:30"

02:45"

03:00"

03:15"

03:30"

03:45"

04:00"

04:15"

04:30"

04:45"

05:00"

Transac0on

s"per"se

cond

"

Time"[mm:ss]"

Mode"2,"Producers:"50,"Consumers:"50"Pop"on"Pop"off"

Total"popped"ON:"500000"BB"Total"popped"OFF:"500000"

Page 18: Real-world queuing strategies - Nosql Search Roadshow

18 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Implementation #2

0"

5000"

10000"

15000"

20000"

25000"

00:00"

00:15"

00:30"

00:45"

01:00"

01:15"

01:30"

01:45"

02:00"

02:15"

02:30"

02:45"

03:00"

03:15"

03:30"

03:45"

04:00"

04:15"

04:30"

04:45"

05:00"

Transac/on

s"per"se

cond

"

Time"[mm:ss]"

Mode"2"queue"'pop<on'"with"varying"Producer"levels"

1"Producer"10"Producers"30"Producers"50"Producers"

Page 19: Real-world queuing strategies - Nosql Search Roadshow

19 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Implementation #3 •  Same Mysql setup as implementation #1 •  1 x table ( ‘queue’ )

–  Id ( primary key, auto_inc, int ) –  Status( enum ) –  Process_at ( datetime ) –  Payload ( varchar )

•  1 x Redis using the following data structures –  SortedSet ( range query, schedule jobs ) –  Queue ( fast push / pop sematics )

•  Dedicated hardware –  Harness: HP DL365 ( 12 cores ) –  Mysql + Redis: HP DL365 ( 12 cores )

Page 20: Real-world queuing strategies - Nosql Search Roadshow

20 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Implementation #3

Insert into queue ( worker_id, process_at, payload ) values ( 0, ’2012-01-01 01:01:00’, ’{ json}’ )

Update queue set status=‘working’ where id in ( 2,3 )

Update queue set status=‘finished’ where id = 2 Insert into queue ( worker_id, process_at, payload ) values ( 0, ’2012-01-01 01:01:00’, ’{ json}’)

Multiple write operations Batched update / read operations

id status process_at payload 1 ‘finished’ 2012-01-01 01:01:00 { json }

id status process_at payload 1 ‘finished’ 2012-01-01 01:01:00 { json }

2 ‘pending’ 2012-01-01 01:01:00 { json }

id status process_at payload 1 ‘finished’ 2012-01-01 01:01:00 { json }

2 ‘pending’ 2012-01-01 01:01:00 { json }

3 ‘pending’ 2012-01-01 01:01:00 { json }

id status process_at payload 1 ‘finished’ 2012-01-01 01:01:00 { json }

2 ‘working’ 2012-01-01 01:01:00 { json }

3 ‘working’ 2012-01-01 01:01:00 { json }

id status process_at payload 1 ‘finished’ 2012-01-01 01:01:00 { json }

2 ‘finished’ 2012-01-01 01:01:00 { json }

3 ‘working’ 2012-01-01 01:01:00 { json }

id status process_at payload 1 ‘finished’ 2012-01-01 01:01:00 { json }

2 ‘finished’ 2012-01-01 01:01:00 { json }

3 ‘finished’ 2012-01-01 01:01:00 { json }

RedisQueue.push( 2, ‘2012-01-01 01:01:00’ )

RedisQueue.push( 3, ‘2012-01-01 01:01:00’ )

RedisQueue.pop( ‘2012-01-01 01:01:00’ , 10 )

Update queue set status=‘finished’ where id = 2

Page 21: Real-world queuing strategies - Nosql Search Roadshow

21 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Implementation #3

queue_name = ‘queue’ + scheduled_time

rpush( queue_name, id_of_mysql_insert )

zadd( ‘q_set’, scheduled_time, queue_name )

queue_name = redis.zrangebyscore('q_set', 0, current_time, :limit => [0,1] )

Item = lpop( queue_name )

If item.nil? Zrem( ‘q_set’, queue_name )

•  Redis Sorted Sets O(log N ) complexity –  Zadd/ zrangebyscore /zrem –  Used to store the name of the queue and

when it should be processed •  Redis Queues O(1) complexity

–  Rpush / lpop –  User to store the items that need to be

processed

RedisQueue.push

RedisQueue.pop

future

Queues Sorted Set

now

Page 22: Real-world queuing strategies - Nosql Search Roadshow

22 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Implementation #3

0"

2,000"

4,000"

6,000"

8,000"

10,000"

12,000"

14,000"

16,000"

00:00"

00:15"

00:30"

00:45"

01:00"

01:15"

01:30"

01:45"

02:00"

02:15"

02:30"

02:45"

03:00"

03:15"

03:30"

03:45"

04:00"

04:15"

04:30"

04:45"

05:00"

Transac2on

s"per"se

cond

"

Time"[mm:ss]"

Mode"3,"Producers:"50,"Consumers:"50"Pop"on"Pop"off"

Total"popped"ON:"500000"DD"Total"popped"OFF:"500000"

Page 23: Real-world queuing strategies - Nosql Search Roadshow

23 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Implementation #3

0"

5,000"

10,000"

15,000"

20,000"

25,000"

30,000"

00:00"

00:15"

00:30"

00:45"

01:00"

01:15"

01:30"

01:45"

02:00"

02:15"

02:30"

02:45"

03:00"

03:15"

03:30"

03:45"

04:00"

04:15"

04:30"

04:45"

05:00"

Transac0on

s"per"se

cond

"

Time"[mm:ss]"

Mode"3,"Producers:"100,"Consumers:"50"Pop"on"Pop"off"

Total"popped"ON:"1000000"BB"Total"popped"OFF:"1000000"

Page 24: Real-world queuing strategies - Nosql Search Roadshow

24 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Comparison

0"

5,000"

10,000"

15,000"

20,000"

25,000"

30,000"

00:00"

00:15"

00:30"

00:45"

01:00"

01:15"

01:30"

01:45"

02:00"

02:15"

02:30"

02:45"

03:00"

03:15"

03:30"

03:45"

04:00"

04:15"

04:30"

04:45"

05:00"

Transac0on

s"per"se

cond

"

Time"[mm:ss]"

Mode"1,"Producers:"50,"Consumers:"50"Pop"on"Pop"off"

0"

5,000"

10,000"

15,000"

20,000"

25,000"

30,000"

00:00"

00:15"

00:30"

00:45"

01:00"

01:15"

01:30"

01:45"

02:00"

02:15"

02:30"

02:45"

03:00"

03:15"

03:30"

03:45"

04:00"

04:15"

04:30"

04:45"

05:00"

Transac0on

s"per"se

cond

"

Time"[mm:ss]"

Mode"2,"Producers:"50,"Consumers:"50"Pop"on"Pop"off"

0"

5,000"

10,000"

15,000"

20,000"

25,000"

30,000"

00:00"

00:15"

00:30"

00:45"

01:00"

01:15"

01:30"

01:45"

02:00"

02:15"

02:30"

02:45"

03:00"

03:15"

03:30"

03:45"

04:00"

04:15"

04:30"

04:45"

05:00"

Transac0on

s"per"se

cond

"

Time"[mm:ss]"

Mode"3,"Producers:"50,"Consumers:"50"Pop"on"Pop"off"

Page 25: Real-world queuing strategies - Nosql Search Roadshow

25 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Summarize Results

•  Simplest Option •  1 Moving part •  Easy to diagnose •  Tried and tested

•  Prone to deadlocking •  Contention •  Slowest solution

Implementation #1

•  Less deadlocks •  Easy to diagnose •  Removed Contention •  Big speed boost

•  Still deadlocks ( rare ) •  Yet to be proven in

production

Implementation #2

+

Page 26: Real-world queuing strategies - Nosql Search Roadshow

26 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Summarize Results

•  Fastest •  No Contention •  Predictable •  Tried and tested •  Dynamic queues

•  Most complicated •  Recovery scripts •  Multiple moving parts •  Transactions are hard

Implementation #3

•  Currently limited by speed of Mysql •  Try a distributed key-value store

–  Recovery? –  Eventual consistency?

Future Considerations

+

+

Page 27: Real-world queuing strategies - Nosql Search Roadshow

27 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Two parts to this story •  Queuing Strategies

•  Optimizing hardware

Page 28: Real-world queuing strategies - Nosql Search Roadshow

28 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Hardware optimisation

Photograph and Logo © 2010 Time Out Group Ltd.

•  Observed ‘time outs’ App ó RIAK DB

•  Developed sophisticated balancing mechanisms to code around them, but they still occurred

•  Especially under load

Page 29: Real-world queuing strategies - Nosql Search Roadshow

29 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Nature of the problem •  Delayed responses of up to 60 seconds! •  Our live environment contains:

–  2 x 9 App & RIAK Nodes –  HP DL385 G6 –  2 x AMD Opteron 2431 (6 cores)

•  We built a dedicated test environment to get to the bottom of this:

–  3 x App & RIAK Nodes –  2 x Intel Xeon (8 cores)

Looking for contention…

Page 30: Real-world queuing strategies - Nosql Search Roadshow

30 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Contention options •  CPU

•  Disk IO

•  Network IO

Less than 60%

utilisation

?

?

• Got SSD (10x), Independent OEM • RIAK (SSD) / Logs/OS (HDD)

• RIAK I/O hungry • Use second NICs/RIAK VLAN

Page 31: Real-world queuing strategies - Nosql Search Roadshow

31 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Memory contention / NUMA •  Looking at the 60% again

–  Non-Uniform Memory Access (NUMA) is a computer memory design used in Multiprocessing, where the memory access time depends on the memory location relative to a processor. - Wikipedia

•  In the 1960s CPUs became faster then memory •  Race for larger cache memory & Cache algorithms •  Multi processors accessing the same memory leads to

contention and significant performance impact •  Dedicate memory to processors/cores/threads •  BUT, - most memory data is required by more then one

process. => cache coherent access (ccNUMA) •  Linux threading allocation is challenged •  Cache-coherence attracts significant overheads, especially

for processes in quick succession!

Page 32: Real-world queuing strategies - Nosql Search Roadshow

32 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Gain control! - NUMACTL •  Processor affinity – Bind a particular process type to a specific processor •  Instruct memory usage to use different memory banks •  For example: numactl --cpunodebind 1 –interleave all erl•  Get it here: apt-get install numactl

•  => No timeouts •  => 20%+ speed increase when running App & RIAK •  => Full use of existing hardware

Page 33: Real-world queuing strategies - Nosql Search Roadshow

33 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

How about load testing ? •  Our interactive voting platform required load testing •  Requiring 10,000’s connections / second •  Mixture of Http / Https •  Session based requests

–  Login a user –  Get a list of candidates –  Get the balance –  Vote for a candidate if credit available

Page 34: Real-world queuing strategies - Nosql Search Roadshow

34 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Load testing - lessons learned WAN

FW

LAN

LB

Servs

ASA5520 limited at 3-4k new connections per second ⇒ Replaced with ASA5585

(Spec 50k/s, Tested 20k/s)

HAProxy on 2xDL120 ⇒ # of Linux procs 1 -> 4 ⇒ Added conn. Throttle 4k/

server

6 x DL360 G6 ⇒ Apache Cipher reduction ⇒ K/A consumed all threads

-> reduced & disabled ⇒ Ulimit per proc 1k -> 65k

nn x AWS ⇒ Tsung SSL

SessionID bug

Page 35: Real-world queuing strategies - Nosql Search Roadshow

35 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Load testing Tools •  ab ( apache bench )

–  Easy to use –  Lots of documentation –  Hard to distribute ( although we did find “bees with machine guns” )

•  https://github.com/newsapps/beeswithmachineguns )

–  We experienced Inconsistent results with our setup –  Struggled to create the complex sessions we required

•  httperf –  Easy to use –  Lots of documentation –  Hard to distribute ( no master / slave setup )

Page 36: Real-world queuing strategies - Nosql Search Roadshow

36 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Load testing Tools •  Write our own

–  Will do exactly what we want –  Time

•  Tsung –  Very configurable –  Scalable –  Easier to distribute –  Already used in the department –  Steep learning curve –  Setting up a large cluster requires effort

Page 37: Real-world queuing strategies - Nosql Search Roadshow

37 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Tsung •  What is Tsung?

–  Open-source multi-protocol distributed load testing tool –  Written in erlang –  Can support multiple protocols: HTTP / SOAP / XMPP / etc.

–  Support for sessions –  Master slave setup for distributed load testing

–  Very configurable –  Scalable –  Easier to distribute –  Already used in the department –  Steep learning curve –  Setting up a large cluster requires effort

Page 38: Real-world queuing strategies - Nosql Search Roadshow

38 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Distributed Tsung •  Although Tsung provided us almost everything we needed •  We still had to setup lots of instances manually •  This was time consuming / error prone •  We needed a tool to alleviate and automate this •  So we built……

Page 39: Real-world queuing strategies - Nosql Search Roadshow

39 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Ion Storm •  Tool to setup a Tsung cluster on multiple EC2 instances •  With co-ordinated start stop functionality •  Written in ruby, using the rightscale gem:

rightaws.rubyforge.org •  Which uploads the results to S3 after each run

Page 40: Real-world queuing strategies - Nosql Search Roadshow

40 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Performance •  From a cluster of 20 machines we achieved

–  20K HTTPS / Sec –  50K HTTP / Sec –  12K Session based request ( mixture of api calls ) / Sec

•  Be warned though –  Can be expensive to run through EC2 –  Limited to 20 EC2 instances unless you speak to Amazon nicely –  Have a look at spot instances

Page 41: Real-world queuing strategies - Nosql Search Roadshow

41 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Open Sourced! •  Designed and built by two Velti engineers

–  Ben Murphy

–  David Townsend

•  Try it out:

[email protected]:mitadmin/ionstorm.git

Page 42: Real-world queuing strategies - Nosql Search Roadshow

42 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Two parts to this story •  Queuing Strategies

•  Optimizing hardware

Page 43: Real-world queuing strategies - Nosql Search Roadshow

43 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

David Dawson +44 7900 005 759 [email protected]

Marcus Kern +44 7932 661 527 [email protected]

If you’d like to work with or for Velti please contact the Velti :

Questions?

Thank You

Page 44: Real-world queuing strategies - Nosql Search Roadshow

44 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Building a wallet •  Fast

–  Over 1,000 credits / sec –  Over 10,000 debits / sec ( votes )

•  Scalable –  Double hardware == Double performance

•  Robust / Recoverable –  Transactions can not be lost –  Wallet balances recoverable in the event of multi-server failure

•  Auditable –  Complete transaction history

Page 45: Real-world queuing strategies - Nosql Search Roadshow

45 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Building a wallet - attempt #1 •  Use RIAK Only

–  Keep things simple –  Less moving parts

•  A wallet per user containing:

–  Previous Balance –  Transactions with unique IDs –  Rolling Balance –  Credits ( facebook / itunes ) –  Debits ( votes )

Key = dave@mig

Previous Balance = 2

1-abcd-1234 (+5) = 7 1-abcd-1235 (+2) = 9 1-abcd-1236 (-1) = 8

Purchase of Credits A Vote

Page 46: Real-world queuing strategies - Nosql Search Roadshow

46 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Building a wallet - attempt #1 •  RIAK = Eventual Consistency

–  In the event of siblings –  Deterministic due to unique transactions ID’s –  Merge the documents and store

Key = dave@mig

Previous Balance = 2

1-abcd-1234 (+5) = 7 1-abcd-1235 (+2) = 9

Key = dave@mig

Previous Balance = 2

1-abcd-1234 (+5) = 7 1-abcd-1236 (-1) = 6

Key = dave@mig

Previous Balance = 2

1-abcd-1234 (+5) = 7 1-abcd-1235 (+2) = 9 1-abcd-1236 (-1) = 8

Page 47: Real-world queuing strategies - Nosql Search Roadshow

47 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Building a wallet - attempt #1

•  Compacting the wallet –  Periodically –  In event it grows to large

Key = dave@mig

Previous Balance = 2

1-abcd-1234 (+5) = 7 1-abcd-1235 (+2) = 9 1-abcd-1236 (-1) = 8

… 1-abcd-9999 (+1) = 78

Key = dave@mig

Previous Balance = 78

Compactor

Page 48: Real-world queuing strategies - Nosql Search Roadshow

48 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Building a wallet - attempt #1

•  Our experiences –  Open to abuse –  As wallet grows, performance decreases –  Risk of sibling explosion –  User can go over drawn

Page 49: Real-world queuing strategies - Nosql Search Roadshow

49 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Building a wallet - attempt #2

•  Introduce REDIS –  REDIS stores the balance –  RIAK stores individual transactions

Credit (2)

Key: dave@mig Value: 78

Key: dave@mig Value: 80

Debit (1) Key: dave@mig Value:79

Key = dave@mig:1-abcd-1235 Value: +2

Key = dave@mig:1-abcd-1236 Value: -1

Key = dave@mig:1-abcd-1234 Value: +1

Page 50: Real-world queuing strategies - Nosql Search Roadshow

50 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Building a wallet - attempt #2

•  Keeping it all in sync –  Periodically compare REDIS and RIAK

•  Disaster Recovery

–  Rebuild all balances in REDIS –  Using transactions from RIAK

Page 51: Real-world queuing strategies - Nosql Search Roadshow

51 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Building a wallet - attempt #2

•  Our experiences –  It works –  Fast 10,000 votes / sec ( 6 x HP DL385 ) –  Used wallet recovery ( Data Center Power Fail )

•  The future –  Possible use of levelDB backend for RIAK –  Faster wallet recovery

Page 52: Real-world queuing strategies - Nosql Search Roadshow

52 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Battle Stories #2

•  Building a wallet •  Optimizing your hardware stack

•  Building a robust queue – final version

Page 53: Real-world queuing strategies - Nosql Search Roadshow

53 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Building a Queue •  Fast

–  > 1000 msg /sec

•  Scalable –  Double the machines, double the capacity

•  Recoverable –  In the event of a failure, all messages can be recovered

Page 54: Real-world queuing strategies - Nosql Search Roadshow

54 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Design •  Queues stored in memory ( volatile )

–  Hand rolled our own using ETS ( erlang ) –  We needed to add complex behavior such as scheduling –  Overflow protection by paging to disk

•  Copy of the data and state stored in a shared data store –  RIAK ticked all the boxes –  Scalable –  Robust –  Fast

Page 55: Real-world queuing strategies - Nosql Search Roadshow

55 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Previously •  We explored RIAK to store and recover the queues using:

–  Index’s ( levelDB ) •  Latencies too unpredictable •  Performance was less than half of bitcask

–  Key Filtering ( bitcask ) •  Write overhead too expensive as we had to update the key not the value ( delete and insert ) •  Real world performance under load was not great

–  Map Reduce across all key ( bitcask ) •  Great for small data sets •  Forget it as your data set get’s into the 10 of millions

Page 56: Real-world queuing strategies - Nosql Search Roadshow

56 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

New Approach •  With a little help from the Basho guys we came up with a new

approach

•  Predictable keys + Snapshots ( bitcask ) –  Simple –  Smallish impact on performance –  It worked –  And it scales

Page 57: Real-world queuing strategies - Nosql Search Roadshow

57 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Our Architecture

Client_Node

Q Erlang Node

Riak Node 1

Q Erlang Node

Riak Node 2

Q Erlang Node

Riak Node 3

Router Node Operator Node

•  Each Node has it’s own Queue •  Each Node lives on it’s own physical machine •  RIAK runs as a cluster on all of the nodes

Basic SMS Gateway topology

Page 58: Real-world queuing strategies - Nosql Search Roadshow

58 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Predictable Key •  Key: “node : date : counter “

–  node: the name of the originating node for the request e.g “client_node” –  date: e.g. “2012-01-01” –  counter: number of message since the beginning of the day. “3000”

•  Value: <message : current_node >

–  message: the original request e.g. “send sms” –  current_node: the current node the message is located e.g. “router_node”

Page 59: Real-world queuing strategies - Nosql Search Roadshow

59 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Snapshot •  Every 1000 messages •  Take a snapshot of the counter

–  Key: “ client_node : 2012-01-01 : snapshot ” –  Value: 5000

•  This is then used to help determine an upper limit for the recovery –  Which will be discussed in more detail in a couple of slides

Page 60: Real-world queuing strategies - Nosql Search Roadshow

60 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Queue – incoming node

RIAK Cluster

Local Memory Queue

Predictable Key Generator Request

Receiver <message>

Generate Key

< key > = “node : date : counter “

Persist Push

Request

<key> <message : current_node>

Sender

Pop

<key> <message : current_node>

Page 61: Real-world queuing strategies - Nosql Search Roadshow

61 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Queue – intermediate node

RIAK Cluster

Local Memory Queue

Request Receiver

Persist Push

Request

<key> <message : current_node>

Sender Pop

<key> <message : current_node>

<key> <message : previous_node>

Page 62: Real-world queuing strategies - Nosql Search Roadshow

62 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Queue – outgoing node

RIAK Cluster

Local Memory Queue

Request Receiver

Persist

Push Request

<key> <message : current_node>

Sender

Pop <key> <message : current_node>

<key> <message : previous_node>

<key> <message : current_node>

delete

Page 63: Real-world queuing strategies - Nosql Search Roadshow

63 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Recovery •  Take all originating nodes e.g. “client_node” •  Using the current date e.g. “2012-01-01” •  Use the snapshot to get the last current count recorded e.g. “3000”

–  Key: “ client_node : 2012-01-01 : snapshot ” –  Value: 3000

•  Rebuild by walking the keys from: –  from the value: 1 –  to the current count + ( 2 x snapshot interval ): 5000 –  Across all originating nodes and dates < 5 days

Page 64: Real-world queuing strategies - Nosql Search Roadshow

64 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Testing •  Benchmarking with 3 x HP365’s ( AMD )

–  Production has 18 x HP360’s

•  Sustained 2000 req/sec ( 8 x RIAK ops per request ) –  Linear scaling in testing

•  Recovered 5 million messages in < 1 hour after crashing a node –  Whilst processing 500 req/sec sustained

Page 65: Real-world queuing strategies - Nosql Search Roadshow

65 | © 2013 Velti @ NoSQL Search Roadshow . Nr Copenhagen

Production •  Currently live and used for our SMS Gateway •  No noticeable drop in performance when under peak loads •  Plan to be used in our other products •  Hopefully our final soloution


Recommended