Date post: | 15-Jan-2015 |
Category: |
Technology |
Upload: | aerospike-inc |
View: | 228 times |
Download: | 6 times |
BIG DATA LEARNINGS
FROM A VENDOR’s
PerspectiveSrini V. SRINIVASAN
Toronto BIG data WEEKAPRIL 23, 2013
Response time: Hours, WeeksTB to PBRead Intensive
TRANSACTIONS (OLTP)
Response time: SecondsGigabytes of data
Balanced Reads/Writes
ANALYTICS (OLAP)
STRUCTURED DATA
Response time: Seconds
Terabytes of dataRead Intensive
© 2013 Aerospike. All rights reserved. Confidential Pg. 2
BIG DATA ANALYTICS
Real-time TransactionsResponse time: < 10 ms1-20 TBBalanced Reads/Writes24x7x365 Availability
UNSTRUCTURED DATA
REAL-TIME BIG DATA
Database Landscape
Requirements for Internet Enterprises1. Know who the Interaction is
with Monitor 200+ Million US Consumers,
5+ Billion mobile devices and sensors
2. Determine intent based on current context
Page views, search terms, game state, last purchase, friends list, ads served, location
3. Respond now, use big data for more accurate decisions
Display the most relevant Ad Recommend the best product Deliver the richest gaming
experience Eliminate fraud…
4. Service can NEVER go down!
© 2013 Aerospike. All rights reserved. Confidential Pg. 3
Challenges1. Handle extremely high rates of persistent
read/write transactions
2. Avoid hot spots to maintain tight latency SLAs
3. Provide immediate consistency with replication
4. Allow long running tasks with transactions
5. Scale linearly as data sizes increase
6. Add capacity with no service interruption© 2013 Aerospike. All rights reserved. Pg. 4
Native Flash Performance
➤ Low Latency at High Throughput
© 2012 Aerospike. All rights reserved. Confidential Pg. 5
© 2013 Aerospike. All rights reserved. Confidential Pg. 6
“Only Aerospike was able to function in synchronous mode with a replication factor of two.. it is a significant advantage that Aerospike is able to function reliably on a smaller amount of hardware while still maintaining true consistency.”
Shared-Nothing Architecture
© 2013 Aerospike. All rights reserved. Pg. 7
OHIO Data Center
➤ Every node in a cluster is identical,
handles both transactions and long running tasks
➤ Data is replicated synchronously with immediate consistency within the cluster
➤ Data is replicated asynchronously across data centers
Distributed Hash TableHow Data Is Distributed (Replication Factor 2)
➤ Every key is hashed into a 20 byte (fixed length) string using the RIPEMD160 hash function
➤ This hash + additional data (fixed 64 bytes)are stored in RAM in the index
➤ Some bits from this hash value are used to compute the partition id
➤ There are 4096 partitions
➤ Partition id maps to node id based on cluster membership
© 2013 Aerospike. All rights reserved. Pg. 8
cookie-abcdefg-12345678cookie-abcdefg-12345678
182023kh15hh3kahdjsh182023kh15hh3kahdjsh
PartitionID
Master node
Replica node
… 1 4
1820 2 3
1821 3 2
4096 4 1
Organizing the cluster
➤ Automatic multicast gossip protocol for node discovery➤ Paxos consensus algorithm determines nodes in cluster➤ Ordered list of nodes determines data location➤ Data partitions balanced for minimal data motion➤ Vote initiated and terminated in 100 milliseconds
© 2013 Aerospike. All rights reserved. Pg. 9
How it Works
1. Write sent to row master
2. Latch against simultaneous writes
3. Apply write to master memory and replica memory synchronously
4. Queue operations to disk
5. Signal completed transaction (optional storage commit wait)
6. Master applies conflict resolution policy (rollback/ rollforward)
© 2013 Aerospike. All rights reserved. Pg. 10
master replica
1. Cluster discovers new node via gossip protocol
2. Paxos vote determines new data organization
3. Partition migrations scheduled
4. When a partition migration starts, write journal starts on destination
5. Partition moves atomically
6. Journal is applied and source data deleted
transactions continue
Writing with Immediate Consistency Adding a Node
Intelligent Client Shields Applications from the Complexity of the Cluster
➤ Implements Aerospike API
➤ Optimistic row locking➤ Optimized binary
protocol
➤ Cluster tracking Learns about cluster
changes, partition map Gossip protocol
➤ Transaction semantics Global transaction ID Retransmit and timeout
© 2013 Aerospike. All rights reserved. Pg. 11
Cross Data Center Replication (XDR)➤ Asynchronous replication for long link
delays and outages➤ Namespace is configured to replicate
to a destination cluster – master / slave, including star and ring
➤ Replication process Transaction journal on partition master
and replica XDR process writes batches to destination Transmission state shared with source
replica Retransmission in case of network fault When data arrives back at originating
cluster, transaction ID matching prevents subsequent application and forwarding
➤ In master / master replication, conflict resolution via multiple versions, or timestamp
© 2013 Aerospike. All rights reserved. Confidential Pg. 12
Multi-core Optimization Right Architecture
Shared nothing In-memory (or multiple SSDs) Tight code loop Lock free isolation
OS, Programming Language, Libraries Modern Linux kernel C language Use epoll
Tweaks Pin threads to processor cores IRQ affinity settings for NIC CPU Socket Isolation via pairing of CPU to NIC
Russ’s 10 Ingredient Recipe for Making 1 Million TPS on $5K Hardware
© 2013 Aerospike. All rights reserved. Pg. 13
Flash-optimized Storage Layer➤ Direct device access
Direct attach performance Data written in flash optimal
large block patterns All indexes in RAM for low
wear Constant background
defragmentation Log structured file system,
“copy on write” Clean restart through shared
memory
➤ Random distribution using hash does not require RAID hardware
© 2013 Aerospike. All rights reserved. Pg. 14
…
SSD performance varies widely•Aerospike has a certified hardware list•Free SSD certification tool, CIO, is also available
Native Flash 17x better TCO“…data-in-DRAM implementations like SAP HANA..should be bypassed… ..current leading data-in-flash database for transactional analytic apps is Aerospike.” - David Floyer, CTO, Wikibon
© 2012 Aerospike. All rights reserved. Confidential | Pg. 15
$$$
http://wikibon.org/wiki/v/Data_in_DRAM_is_a_Flash_in_the_Pan
Case studies
Proven in Production➤ AppNexus - #2 RTB after
Google 27 Billion auctions per day 600+ QPS Aerospike servers in 6 clusters
in 3 data centers
➤ Chango – #2 Search after Google
Sees more Searches than Yahoo! + bing
Data on 300 Million users
➤ TradeDesk – first Ad Exchange Facebook Exchange partner FBX serves 25% of Ads on the
Internet 1200% growth in 2012
“Aerospike has operated without interruptions and easily scaled to meet our performance demands.” – Mike Nolet, CTO, AppNexus
© 2013 Aerospike. All rights reserved. Confidential Pg. 17
Proven in Production➤ eXelate – Data on 500 Million
users
Online data plus Nielsen, Mastercard, Autobytel, Bizo data..
Data on 400 million users 20 Billion Transactions per month 4x2 TB data per cluster 4 clusters across 4 data centers
“Scale. Real-time performance. Real-time replication at 4 datacenters. Aerospike delivered.”- Elad Efraim, eXelate CTO
➤ BlueKai – Serves half the Fortune 30
#1 Data Exchange 2 Trillion Transactions per month
© 2013 Aerospike. All rights reserved. Confidential Pg. 18
Mission➤ Build the Modern Real-time Data Platform
1. Scaling the Internet of Everything2. Pushing the limits of modern hardware3. No data loss and No downtime
© 2013 Aerospike. All rights reserved. Confidential Pg. 19
Publish & Subscribe
• ASQL & NoSQL• Powerful Aggregations
(MapReduce++)
• ASQL & NoSQL• Powerful Aggregations
(MapReduce++)
• Secondary Index Queries
Transactions
• User Defined Functions (UDF)
SecurityEncryptionCompressi
on
AEROSPIKE REAL-TIME DATA DATA PLATFORM
• Distribution - Shared Nothing, ACID, Scale-out, Multiple datacenters
• Data Types – Int, Str, Blob, List, Map, Large Stack, Large Set, Large List
• Storage– DRAM, SSD, HDD
Mission➤ Build the Modern Real-time Data Platform
1. Scaling the Internet of Everything2. Pushing the limits of modern hardware3. No data loss and No downtime
© 2013 Aerospike. All rights reserved. Confidential Pg. 20
Publish & Subscribe
• ASQL & NoSQL• Powerful Aggregations
(MapReduce++)
• ASQL & NoSQL• Powerful Aggregations
(MapReduce++)
• Secondary Index Queries
Transactions
• User Defined Functions (UDF)
SecurityEncryptionCompressi
on
AEROSPIKE REAL-TIME DATA DATA PLATFORM
• Distribution - Shared Nothing, ACID, Scale-out, Multiple datacenters
• Data Types – Int, Str, Blob, List, Map, Large Stack, Large Set, Large List
• Storage– DRAM, SSD, HDD
Aerospike Real-time Big Data Platform
Purpose Built Proven in Production➤ In-DRAM + Flash
Indexes always in-DRAM, data in DRAM + Flash
➤ ACID + Tunable Consistency Never loses data –
synchronous replication, cross data center synchronization
➤ Vertical + Horizontal Scaling Multi-core, multi-processor Shared nothing, elastic,
transparent sharding, clients know exactly where the data is
➤ Predictable High Performance 99.9% < 2-3ms at 500k TPS
➤ ACID, Zero Downtime in 3 Years Tunable consistency No SPoF, self-managing
clusters Cross Data Center Replication
➤ 17x better TCO Fewer servers, less power,
easier maintenance
© 2013 Aerospike. All rights reserved. Confidential Pg. 21
Aerospike Real-time Big Data Platform
Rapid Development Complete Customizability
➤ Support for popular languages and tools ASQL and Aerospike Client
in Java, C#, Ruby, Python..
➤ Complex data types Nested documents
(map, list, string, integer) Large (Stack, Set, List)
Objects
➤ Queries Single record Batch multi-record lookups Equality and range Aggregations and
MapReduce
➤ User Defined Functions (UDFs) In-DB processing
➤ Aggregation Framework UDF Pipeline MapReduce ++
➤ Time Series Queries Just 2 IOPs for most r/w
independent of object size
© 2013 Aerospike. All rights reserved. Confidential Pg. 22
How to get Aerospike?
Free Community Edition Enterprise Edition➤ For developers
looking for speed and stability and transparently scale as they grow All features for
2 nodes, 100GB 1 cluster 1 datacenter
Community support
➤ For mission critical apps needing to scale right from the start Unlimited number of
nodes, clusters, data centers
Cross data center replication
Premium 24x7 support Priced by TBs of
unique data (not replicas)
➤ © 2013 Aerospike. All rights reserved. Pg. 23
Questions
© 2013 Aerospike. All rights reserved. Confidential Pg. 24