+ All Categories
Home > Technology > Big Data Learnings from a Vendor's Perspective

Big Data Learnings from a Vendor's Perspective

Date post: 15-Jan-2015
Category:
Upload: aerospike-inc
View: 228 times
Download: 6 times
Share this document with a friend
Description:
Best practices in building and operating 24x7 Internet scale services.
Popular Tags:
24
BIG DATA LEARNINGS FROM A VENDOR’s Perspective Srini V. SRINIVASAN Toronto BIG data WEEK APRIL 23, 2013
Transcript
Page 1: Big Data Learnings from a Vendor's Perspective

BIG DATA LEARNINGS

FROM A VENDOR’s

PerspectiveSrini V. SRINIVASAN

Toronto BIG data WEEKAPRIL 23, 2013

Page 2: Big Data Learnings from a Vendor's Perspective

Response time: Hours, WeeksTB to PBRead Intensive

TRANSACTIONS (OLTP)

Response time: SecondsGigabytes of data

Balanced Reads/Writes

ANALYTICS (OLAP)

STRUCTURED DATA

Response time: Seconds

Terabytes of dataRead Intensive

© 2013 Aerospike. All rights reserved. Confidential Pg. 2

BIG DATA ANALYTICS

Real-time TransactionsResponse time: < 10 ms1-20 TBBalanced Reads/Writes24x7x365 Availability

UNSTRUCTURED DATA

REAL-TIME BIG DATA

Database Landscape

Page 3: Big Data Learnings from a Vendor's Perspective

Requirements for Internet Enterprises1. Know who the Interaction is

with Monitor 200+ Million US Consumers,

5+ Billion mobile devices and sensors

2. Determine intent based on current context

Page views, search terms, game state, last purchase, friends list, ads served, location

3. Respond now, use big data for more accurate decisions

Display the most relevant Ad Recommend the best product Deliver the richest gaming

experience Eliminate fraud…

4. Service can NEVER go down!

© 2013 Aerospike. All rights reserved. Confidential Pg. 3

Page 4: Big Data Learnings from a Vendor's Perspective

Challenges1. Handle extremely high rates of persistent

read/write transactions

2. Avoid hot spots to maintain tight latency SLAs

3. Provide immediate consistency with replication

4. Allow long running tasks with transactions

5. Scale linearly as data sizes increase

6. Add capacity with no service interruption© 2013 Aerospike. All rights reserved. Pg. 4

Page 5: Big Data Learnings from a Vendor's Perspective

Native Flash Performance

➤ Low Latency at High Throughput

© 2012 Aerospike. All rights reserved. Confidential Pg. 5

Page 6: Big Data Learnings from a Vendor's Perspective

© 2013 Aerospike. All rights reserved. Confidential Pg. 6

“Only Aerospike was able to function in synchronous mode with a replication factor of two.. it is a significant advantage that Aerospike is able to function reliably on a smaller amount of hardware while still maintaining true consistency.”

Page 7: Big Data Learnings from a Vendor's Perspective

Shared-Nothing Architecture

© 2013 Aerospike. All rights reserved. Pg. 7

OHIO Data Center

➤ Every node in a cluster is identical,

handles both transactions and long running tasks

➤ Data is replicated synchronously with immediate consistency within the cluster

➤ Data is replicated asynchronously across data centers

Page 8: Big Data Learnings from a Vendor's Perspective

Distributed Hash TableHow Data Is Distributed (Replication Factor 2)

➤ Every key is hashed into a 20 byte (fixed length) string using the RIPEMD160 hash function

➤ This hash + additional data (fixed 64 bytes)are stored in RAM in the index

➤ Some bits from this hash value are used to compute the partition id

➤ There are 4096 partitions

➤ Partition id maps to node id based on cluster membership

© 2013 Aerospike. All rights reserved. Pg. 8

cookie-abcdefg-12345678cookie-abcdefg-12345678

182023kh15hh3kahdjsh182023kh15hh3kahdjsh

PartitionID

Master node

Replica node

… 1 4

1820 2 3

1821 3 2

4096 4 1

Page 9: Big Data Learnings from a Vendor's Perspective

Organizing the cluster

➤ Automatic multicast gossip protocol for node discovery➤ Paxos consensus algorithm determines nodes in cluster➤ Ordered list of nodes determines data location➤ Data partitions balanced for minimal data motion➤ Vote initiated and terminated in 100 milliseconds

© 2013 Aerospike. All rights reserved. Pg. 9

Page 10: Big Data Learnings from a Vendor's Perspective

How it Works

1. Write sent to row master

2. Latch against simultaneous writes

3. Apply write to master memory and replica memory synchronously

4. Queue operations to disk

5. Signal completed transaction (optional storage commit wait)

6. Master applies conflict resolution policy (rollback/ rollforward)

© 2013 Aerospike. All rights reserved. Pg. 10

master replica

1. Cluster discovers new node via gossip protocol

2. Paxos vote determines new data organization

3. Partition migrations scheduled

4. When a partition migration starts, write journal starts on destination

5. Partition moves atomically

6. Journal is applied and source data deleted

transactions continue

Writing with Immediate Consistency Adding a Node

Page 11: Big Data Learnings from a Vendor's Perspective

Intelligent Client Shields Applications from the Complexity of the Cluster

➤ Implements Aerospike API

➤ Optimistic row locking➤ Optimized binary

protocol

➤ Cluster tracking Learns about cluster

changes, partition map Gossip protocol

➤ Transaction semantics Global transaction ID Retransmit and timeout

© 2013 Aerospike. All rights reserved. Pg. 11

Page 12: Big Data Learnings from a Vendor's Perspective

Cross Data Center Replication (XDR)➤ Asynchronous replication for long link

delays and outages➤ Namespace is configured to replicate

to a destination cluster – master / slave, including star and ring

➤ Replication process Transaction journal on partition master

and replica XDR process writes batches to destination Transmission state shared with source

replica Retransmission in case of network fault When data arrives back at originating

cluster, transaction ID matching prevents subsequent application and forwarding

➤ In master / master replication, conflict resolution via multiple versions, or timestamp

© 2013 Aerospike. All rights reserved. Confidential Pg. 12

Page 13: Big Data Learnings from a Vendor's Perspective

Multi-core Optimization Right Architecture

Shared nothing In-memory (or multiple SSDs) Tight code loop Lock free isolation

OS, Programming Language, Libraries Modern Linux kernel C language Use epoll

Tweaks Pin threads to processor cores IRQ affinity settings for NIC CPU Socket Isolation via pairing of CPU to NIC

Russ’s 10 Ingredient Recipe for Making 1 Million TPS on $5K Hardware

© 2013 Aerospike. All rights reserved. Pg. 13

Page 14: Big Data Learnings from a Vendor's Perspective

Flash-optimized Storage Layer➤ Direct device access

Direct attach performance Data written in flash optimal

large block patterns All indexes in RAM for low

wear Constant background

defragmentation Log structured file system,

“copy on write” Clean restart through shared

memory

➤ Random distribution using hash does not require RAID hardware

© 2013 Aerospike. All rights reserved. Pg. 14

SSD performance varies widely•Aerospike has a certified hardware list•Free SSD certification tool, CIO, is also available

Page 15: Big Data Learnings from a Vendor's Perspective

Native Flash 17x better TCO“…data-in-DRAM implementations like SAP HANA..should be bypassed… ..current leading data-in-flash database for transactional analytic apps is Aerospike.” - David Floyer, CTO, Wikibon

© 2012 Aerospike. All rights reserved. Confidential | Pg. 15

$$$

http://wikibon.org/wiki/v/Data_in_DRAM_is_a_Flash_in_the_Pan

Page 16: Big Data Learnings from a Vendor's Perspective

Case studies

Page 17: Big Data Learnings from a Vendor's Perspective

Proven in Production➤ AppNexus - #2 RTB after

Google 27 Billion auctions per day 600+ QPS Aerospike servers in 6 clusters

in 3 data centers

➤ Chango – #2 Search after Google

Sees more Searches than Yahoo! + bing

Data on 300 Million users

➤ TradeDesk – first Ad Exchange Facebook Exchange partner FBX serves 25% of Ads on the

Internet 1200% growth in 2012

“Aerospike has operated without interruptions and easily scaled to meet our performance demands.” – Mike Nolet, CTO, AppNexus

© 2013 Aerospike. All rights reserved. Confidential Pg. 17

Page 18: Big Data Learnings from a Vendor's Perspective

Proven in Production➤ eXelate – Data on 500 Million

users

Online data plus Nielsen, Mastercard, Autobytel, Bizo data..

Data on 400 million users 20 Billion Transactions per month 4x2 TB data per cluster 4 clusters across 4 data centers

“Scale. Real-time performance. Real-time replication at 4 datacenters. Aerospike delivered.”- Elad Efraim, eXelate CTO

➤ BlueKai – Serves half the Fortune 30

#1 Data Exchange 2 Trillion Transactions per month

© 2013 Aerospike. All rights reserved. Confidential Pg. 18

Page 19: Big Data Learnings from a Vendor's Perspective

Mission➤ Build the Modern Real-time Data Platform

1. Scaling the Internet of Everything2. Pushing the limits of modern hardware3. No data loss and No downtime

© 2013 Aerospike. All rights reserved. Confidential Pg. 19

Publish & Subscribe

• ASQL & NoSQL• Powerful Aggregations

(MapReduce++)

• ASQL & NoSQL• Powerful Aggregations

(MapReduce++)

• Secondary Index Queries

Transactions

• User Defined Functions (UDF)

SecurityEncryptionCompressi

on

AEROSPIKE REAL-TIME DATA DATA PLATFORM

• Distribution - Shared Nothing, ACID, Scale-out, Multiple datacenters

• Data Types – Int, Str, Blob, List, Map, Large Stack, Large Set, Large List

• Storage– DRAM, SSD, HDD

Page 20: Big Data Learnings from a Vendor's Perspective

Mission➤ Build the Modern Real-time Data Platform

1. Scaling the Internet of Everything2. Pushing the limits of modern hardware3. No data loss and No downtime

© 2013 Aerospike. All rights reserved. Confidential Pg. 20

Publish & Subscribe

• ASQL & NoSQL• Powerful Aggregations

(MapReduce++)

• ASQL & NoSQL• Powerful Aggregations

(MapReduce++)

• Secondary Index Queries

Transactions

• User Defined Functions (UDF)

SecurityEncryptionCompressi

on

AEROSPIKE REAL-TIME DATA DATA PLATFORM

• Distribution - Shared Nothing, ACID, Scale-out, Multiple datacenters

• Data Types – Int, Str, Blob, List, Map, Large Stack, Large Set, Large List

• Storage– DRAM, SSD, HDD

Page 21: Big Data Learnings from a Vendor's Perspective

Aerospike Real-time Big Data Platform

Purpose Built Proven in Production➤ In-DRAM + Flash

Indexes always in-DRAM, data in DRAM + Flash

➤ ACID + Tunable Consistency Never loses data –

synchronous replication, cross data center synchronization

➤ Vertical + Horizontal Scaling Multi-core, multi-processor Shared nothing, elastic,

transparent sharding, clients know exactly where the data is

➤ Predictable High Performance 99.9% < 2-3ms at 500k TPS

➤ ACID, Zero Downtime in 3 Years Tunable consistency No SPoF, self-managing

clusters Cross Data Center Replication

➤ 17x better TCO Fewer servers, less power,

easier maintenance

© 2013 Aerospike. All rights reserved. Confidential Pg. 21

Page 22: Big Data Learnings from a Vendor's Perspective

Aerospike Real-time Big Data Platform

Rapid Development Complete Customizability

➤ Support for popular languages and tools ASQL and Aerospike Client

in Java, C#, Ruby, Python..

➤ Complex data types Nested documents

(map, list, string, integer) Large (Stack, Set, List)

Objects

➤ Queries Single record Batch multi-record lookups Equality and range Aggregations and

MapReduce

➤ User Defined Functions (UDFs) In-DB processing

➤ Aggregation Framework UDF Pipeline MapReduce ++

➤ Time Series Queries Just 2 IOPs for most r/w

independent of object size

© 2013 Aerospike. All rights reserved. Confidential Pg. 22

Page 23: Big Data Learnings from a Vendor's Perspective

How to get Aerospike?

Free Community Edition Enterprise Edition➤ For developers

looking for speed and stability and transparently scale as they grow All features for

2 nodes, 100GB 1 cluster 1 datacenter

Community support

➤ For mission critical apps needing to scale right from the start Unlimited number of

nodes, clusters, data centers

Cross data center replication

Premium 24x7 support Priced by TBs of

unique data (not replicas)

➤ © 2013 Aerospike. All rights reserved. Pg. 23

Page 24: Big Data Learnings from a Vendor's Perspective

Questions

© 2013 Aerospike. All rights reserved. Confidential Pg. 24


Recommended