Date post: | 29-Nov-2014 |
Category: |
Technology |
Upload: | aerospike-inc |
View: | 396 times |
Download: | 0 times |
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 1
Aerospike aer . o . spike [air-oh- spahyk] noun, 1. tip of a rocket that enhances speed and stability
YOU SNOOZE YOU LOSE OR
HOW TO WIN IN AD TECH?
THE ONLY FLASH-OPTIMIZED DATABASE
BRIAN BULKOWSKI FOUNDER, CTO, PRODUCT
STACK EXCHANGE MEETUP
APRIL 10, 2014
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 2
Aerospike: the gold standard for high throughput, low latency, high reliability transactions
Performance
• Over ten trillion transactions per month
• 99% of transactions faster than 2 ms
• 150K TPS per server
Scalability
• Billions of Internet users • Clustered Software • Automatic Data Rebalancing
Reliability
• 50 customers; zero service down-time
• Immediate Consistency • Rapid Failover; Data Center Replication
Price/Performance
• Makes impossible projects affordable
• Flash-optimized • 1/10 the servers required
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 3
Aerospike Proven in Production ■ AppNexus - #2 RTB after Google
■ 45 Billion auctions per day ■ 2M QPS ■ 3 12 server clusters ■ 4.8T Flash per server ■ 120K read TPS, 60K write TPS
■ Chango – #2 Search after Google ■ Sees more Searches than
Yahoo! + bing ■ Data on 300 Million users
■ TradeDesk – first Ad Exchange
■ Facebook Exchange partner ■ FBX serves 25% of Ads on the Internet
■ Snapdeal ■ 2 servers replace 10 mongo servers ■ 10GB data ■ “changed our company”
“Aerospike has operated without interruptions and easily scaled to meet our performance demands.” – Mike Nolet, CTO, AppNexus
© 2013 Aerospike. All rights reserved. Pg. 3
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 4
MILLIONS OF CONSUMERS BILLIONS OF DEVICES
AEROSPIKE CLUSTER
APP SERVERS RDBMS
DATA WAREHOUSE
SEGMENTS
WRITE REAL-TIME CONTEXT READ RECENT CONTENT PROFILE STORE Cookies, email, deviceID, IP address, location, segments, clicks, likes, tweets, search terms... REAL-TIME ANALYTICS Best sellers, top scores, trending tweets
BATCH ANALYTICS Discover patterns, segment data: location patterns, audience affinity
TYPICAL REAL-TIME DATABASE DEPLOYMENT
TRANSACTIONS
WRITE CONTEXT
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 5
KEY CHALLENGES 1. Handle extremely high rates of read/write transactions
over persistent data
2. Avoid hot spots to maintain tight latency SLAs
3. Provide immediate consistency with replication
4. Ensure long running tasks do not slow down transactions
5. Scale linearly as data sizes and workloads increase
6. Add capacity with no service interruption
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 6
SYSTEM ARCHITECTURE FOR 100% UPTIME
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 7
SHARED-NOTHING SYSTEM:100% DATA AVAILABILITY ■ Every node in a cluster is identical,
handles both transactions and long running tasks
■ Data is replicated synchronously with immediate consistency within the cluster
■ Data is replicated asynchronously across data centers
OHIO Data Center
© 2013 Aerospike. All rights reserved Pg. 7
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 8
ROBUST DHT TO ELIMINATE HOT SPOTS How Data Is Distributed (Replication Factor 2)
■ Every key is hashed into a 20 byte (fixed length) string using the RIPEMD160 hash function
■ This hash + additional data (fixed 64 bytes) are stored in RAM in the index
■ Some bits from this hash value are used to compute the partition id
■ There are 4096 partitions
■ Partition id maps to node id based on cluster membership
cookie-abcdefg-12345678
182023kh15hh3kahdjsh
Partition ID
Master node
Replica node
… 1 4
1820 2 3
1821 3 2
4096 4 1
© 2013 Aerospike. All rights reserved Pg. 8
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 9
REAL-TIME PRIORITIZATION TO MEET SLA
1. Write sent to row master
2. Latch against simultaneous writes
3. Apply write to master memory and replica memory synchronously
4. Queue operations to disk
5. Signal completed transaction (optional storage commit wait)
6. Master applies conflict resolution policy (rollback/ rollforward)
master replica
1. Cluster discovers new node via gossip protocol
2. Paxos vote determines new data organization
3. Partition migrations scheduled
4. When a partition migration starts, write journal starts on destination
5. Partition moves atomically
6. Journal is applied and source data deleted
transactions continue Writing with Immediate Consistency Adding a Node
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 10
INTELLIGENT CLIENT TO MAKE APPS SIMPLER
■ Implements Aerospike API ■ Optimistic row locking ■ Optimized binary protocol
■ Cluster tracking ■ Learns about cluster changes,
partition map ■ Gossip protocol
■ Transaction semantics ■ Global transaction ID ■ Retransmit and timeout
■ Linear scale ■ No extra hop ■ No load balancers
© 2013 Aerospike. All rights reserved Pg. 10
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 11
OTHER DATABASE
OS FILE SYSTEM
PAGE CACHE
BLOCK INTERFACE
SSD HDD
BLOCK INTERFACE
SSD SSD
OPEN NVM
SSD
OTHER DATABASE
AEROSPIKE FLASH OPTIMIZED IN-MEMORY DATABASE
Ask me and I’ll tell you the answer. Ask me. I’ll look up the answer and then tell it to you.
AEROSPIKE
HYBRID MEMORY SYSTEM™
• Direct device access • Large Block Writes • Indexes in DRAM • Highly Parallelized • Log-structured FS “copy-on-write” • Fast restart with shared memory
FLASH OPTIMIZED HIGH PERFORMANCE
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 12
Storage type DRAM & NoSQL SSD & DRAM Storage per server 180 GB (196 GB Server) 2.4 GB (4 x 700 GB)
TPS per server 500,000 500,000 Cost per server $8,000 $11,000
Server costs $1,488,000 $154,000 Power/server 0.9 kW 1.1 kW
Power (2 years) $0.12 per kWh ave. US $352,000 $32,400
Maintenance (2 years) $3,600 per server $670,000 $50,400
Total $2,510,000 $236,800
THE BOTTOM LINE
Actual customer analysis. Customer requires 500K TPS,
10 TB of storage, with 2x replication factor.
186 SERVERS REQUIRED 14 SERVERS REQUIRED
OTHER DATABASES
ONLY
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 13
Only up in 2013, 2014
Everyone wants that “Facebook architecture”
Facebook and Apple bought at least$200+M in FusionIO cards in 2012
+ = $200+M
© 2013 Aerospike. All rights reserved Pg. 13
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 14 © 2012 Aerospike. All rights reserved. Pg. 14
Measure your drives! Aerospike Certification Tool (ACT) http://github.com/aerospike/act Transactional database workload Reads: 1.5KB
(can’t batch / cache reads, random) Writes: 128K blocks
(log based layout) (plus defragmentation)
Turn up the load until latency is over required SLA
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 15
➤ Super Storm Sandy 2012 § NYC down for 17 hours § Back up and synched in 1 hour via
Aerospike Cross-Data Center Replication (XDR)
Replication that Works
“Aerospike allows us to handle business continuity and reliability across 4 data centers seamlessly. And we can now expand our deployment to new data centers in less than a week.” - Elad Efraim, CTO
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 16
HOT ANALYTICS BY ROW
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 17
➤ Namespaces (policy containers) § Determine storage - DRAM or Flash § Determine replication factor § Contain records and sets
➤ Sets (tables) of records § Arbitrary grouping
➤ Records (rows) § Max 128k, contain key and bins § Bin with same name can contain
values of different types u String, integer, bytes (raw, blob, etc) u list ( an ordered collection of
values ) u map ( a collection of keys and
values ) § Bins can be added anytime
NOSQL EXTENSIBILITY
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 18
DISTRIBUTED QUERIES 1. “Scatter” requests to all nodes
2. Indexes in DRAM for fast map of secondary à primary keys
3. Indexes co-located with data to guarantee ACID, manage migrations
4. Records read in parallel from all SSDs using lock free concurrency control
5. Aggregate results on each node
6. “Gather” results from all nodes on client
STREAM AGGREGATIONS 1. Push Code/ Security Policies/ Rules to Data with UDFs
2. Pipe Query results through UDFs to Filter, Transform, Aggregate.. Map, Reduce
REAL-TIME ANALYTICS on OPERATIONAL DATA (No ETL) ➤ In Database, within the same Cluster ➤ On the same Data, on XDR Replicated Clusters
Real-time Analytics on Operational Data
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 19
LESSONS LEARNED
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 20
NATIVE FLASH à PERFORMANCE
■ Low Latency at High Throughput
0
2.5
5
7.5
10
0 50,000 100,000 150,000 200,000
Aver
age
Late
ncy,
ms
Throughput, ops/sec
Balanced Workload Read Latency (Full view)
Aerospike
Cassandra
MongoDB
0
4
8
12
16
0 50,000 100,000 150,000 200,000
Aver
age
Late
ncy,
ms
Throughput, ops/sec
Balanced Workload Update Latency (Full view)
Aerospike
Cassandra
MongoDB
0
1
2
3
4
0 75,000 150,000 225,000 300,000
Aver
age
Late
ncy,
ms
Throughput, ops/sec
Read-Heavy Workload Read Latency (Full view)
Aerospike
Cassandra
MongoDB
0
6
12
18
24
0 75,000 150,000 225,000 300,000
Aver
age
Late
ncy,
ms
Throughput, ops/sec
Read-Heavy Workload Update Latency (Full view)
Aerospike
Cassandra
MongoDB
© 2013 Aerospike. All rights reserved Pg. 20
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 21
LESSONS
1. Keep architecture simple ■ No hot spots (e.g., robust DHT) ■ Scales up easily (e.g., easy to size) ■ Avoids points of failure (e.g., single node type)
2. Avoid manual operation – automate, automate! ■ Self-managed cluster responds to node failures ■ Data rebalancing requires no intervention ■ Real-time prioritization allows unattended system operation
3. Keep system asynchronous ■ Shared nothing – nodes are autonomous ■ Async writes across data centers ■ Independent tuning parameters for different classes of tasks
© 2013 Aerospike. All rights reserved Pg. 21
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 22
LESSONS (cont’d)
4. Monitor the Health of the System Extensively ■ Growth in load sneaks up on you over weeks ■ Early detection means better service ■ Most failures can be predicted (e.g., capacity, load, …)
5. Size clusters properly ■ Have enough capacity ALWAYS! ■ Upgrade SSDs every couple years ■ Reduce cluster sizes to make operations simple
6. Have geographically distributed data centers ■ Size the distributed data centers properly ■ Use active-active configurations if possible ■ Size bandwidth requirements accurately
© 2013 Aerospike. All rights reserved Pg. 22
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 23
LESSONS (CONT’D)
7. Have plan for unforeseen situations ■ Devise scenarios and practice during normal work time ■ Ensure you can do rolling upgrades during high load time ■ Make sure that your nodes can restart fast (< 1 minute)
8. Constantly test and monitor app end-to-end ■ Application level metrics are more important than DB metrics ■ Most issues in a service are due to a combination of application, network,
database, storage, etc. 9. Separate online and offline workloads
■ Reserve real-time edge database for transactions and hot analytics queries (where newest data is important)
■ Avoid ad-hoc queries on on-line system ■ Perform deep analysis in offline system (Hadoop)
10. Use the Right Data Management System for the job ■ Fast NoSQL DB for real-time transactions and hot analytics on rapidly
changing data ■ Hadoop or other comparable systems for exhaustive analytics on mostly
read-only data
© 2013 Aerospike. All rights reserved Pg. 23
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 24
AEROSPIKE REAL-TIME BIG DATA PLATFORM Rapid Development Complete Customizability
➤ Support for popular languages and tools § ASQL and Aerospike Client in
Java, C#, Ruby, Python..
➤ Complex data types § Nested documents
(map, list, string, integer) § Large (Stack, Set, List) Objects
➤ Queries § Single record § Batch multi-record lookups § Equality and range § Aggregations and MapReduce
➤ User Defined Functions (UDFs) § In-DB processing
➤ Aggregation Framework § UDF Pipeline § MapReduce ++
➤ Time Series Queries § Just 2 IOPs for most r/w
independent of object size
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 25
HOW TO GET AEROSPIKE?
Free Community Edition Enterprise Edition ➤ For developers looking
for speed and stability and transparently scale as they grow
➤ No transaction limits ➤ No time limit ➤ No production limit ➤ Data per cluster limit ➤ Community support
➤ For mission critical apps needing to scale right from the start § Unlimited number of
nodes, clusters, data centers
§ Cross data center replication
§ Premium 24x7 support § Priced by TBs of unique
data (not replicas)
© 2013 Aerospike. All rights reserved Pg. 25
© 2013 Aerospike, Inc. All rights reserved. Confidential. | <Title of Presentation> | 26
QUESTIONS?
www.aerospike.com
© 2013 Aerospike. All rights reserved Pg. 26