Pushing Python: Building a High Throughput, Low Latency System

Kevin Ballard SFpython.org 2014-‐03-‐12

kevin@ tellapart.com

Introductions

Taba • Distributed event aggregation service

import taba ... taba.RecordValue(‘winning_bid_price’, wincpm) ...

$ taba-cli aggregate winning_bid_price {“name”: “winning_bid_price”, “10m”: {“count”: 14709, “total”: 5836.4}, “percentiles”: [0.07 0.16 0.32 0.84 1.33 8.03]}

Taba +10,000,000 events/sec

+50,000 metrics

+1,000 clients

+100 processors

GET THE DATA MODEL RIGHT Lesson #1

Data Model

Data Model Event: (‘bid_cpm’, ‘Counter’, time(), 0.233) State: Aggregate: {“10m”: 43.9, “1h”: 592.22}

Data Model

Data Model

Data Model

STATE IS HARD Lesson #2

Centralizing State

GENERATORS + GREENLETS = AWESOME

Lesson #3

Asynchronous Iterator

• JIT processing • Automatically switches through I/O

CPYTHON SUFFERS FROM MEMORY FRAGMENTATION

Lesson #4

Fragmentation • Fragmentation is when a process’s heap is

inefficiently used.

• The GC may report a low memory footprint, but the OS reports a much larger RSS.

Fragmentation

Fragmentation

Fragmentation

Fragmentation

Fragmentation

Hybrid Memory Management • Use Cython to allocate page-sized blocks of

pointers into incoming chunk • Hand-off the whole thing to the CPython

memory manager • Whole thing gets deallocated at once

Hybrid Memory Management



Ratcheting •  Ratcheting is a pathological case of Fragmentation,

caused by the fact that the heap must be contiguous*:

•  It’s a limitation of CPython that it cannot compact memory (mostly due to extensions).










Ratcheting • Avoid persistent objects • Sockets are common offenders

• Anything that has to be persistent should be created at application startup, before processing data

• Avoid letting the heap grow in the first place

fin.

github.com/tellapart/taba

[email protected] | @misterkgb

We’re Hiring! tellapart.com/careers

Date post:	25-May-2015
Category:	Technology
Upload:	kevin-ballard
View:	22,431 times
Download:	0 times

Pushing Python: Building a High Throughput, Low Latency System

Technology