+ All Categories
Home > Documents > Designing for Massive Scalability at BackType #bigdatacamp

Designing for Massive Scalability at BackType #bigdatacamp

Date post: 15-Jan-2015
Category:
Upload: michael-montano
View: 7,144 times
Download: 0 times
Share this document with a friend
Description:
 
Popular Tags:
25
Designing for Massive Scalability at BackType Michael Montano / @michaelmontano
Transcript
Page 1: Designing for Massive Scalability at BackType #bigdatacamp

Designing for Massive Scalability at BackType

Michael Montano / @michaelmontano

Page 2: Designing for Massive Scalability at BackType #bigdatacamp

Desired properties of a back-end

Wednesday, November 17, 2010

Page 3: Designing for Massive Scalability at BackType #bigdatacamp

Desired properties of a back-end

• Robust and fault-tolerant to both machine and human error.

Wednesday, November 17, 2010

Page 4: Designing for Massive Scalability at BackType #bigdatacamp

Desired properties of a back-end

• Robust and fault-tolerant to both machine and human error.

• Low latency reads and updates.

Wednesday, November 17, 2010

Page 5: Designing for Massive Scalability at BackType #bigdatacamp

Desired properties of a back-end

• Robust and fault-tolerant to both machine and human error.

• Low latency reads and updates.

• Scalable to increases in data or traffic.

Wednesday, November 17, 2010

Page 6: Designing for Massive Scalability at BackType #bigdatacamp

Desired properties of a back-end

• Robust and fault-tolerant to both machine and human error.

• Low latency reads and updates.

• Scalable to increases in data or traffic.

• Extensible to support new features or related services.

Wednesday, November 17, 2010

Page 7: Designing for Massive Scalability at BackType #bigdatacamp

Desired properties of a back-end

• Robust and fault-tolerant to both machine and human error.

• Low latency reads and updates.

• Scalable to increases in data or traffic.

• Extensible to support new features or related services.

• Generalizes to diverse types of data and requests.

Wednesday, November 17, 2010

Page 8: Designing for Massive Scalability at BackType #bigdatacamp

Desired properties of a back-end

• Robust and fault-tolerant to both machine and human error.

• Low latency reads and updates.

• Scalable to increases in data or traffic.

• Extensible to support new features or related services.

• Generalizes to diverse types of data and requests.

• Allows ad hoc queries.

Wednesday, November 17, 2010

Page 9: Designing for Massive Scalability at BackType #bigdatacamp

Desired properties of a back-end

• Robust and fault-tolerant to both machine and human error.

• Low latency reads and updates.

• Scalable to increases in data or traffic.

• Extensible to support new features or related services.

• Generalizes to diverse types of data and requests.

• Allows ad hoc queries.

• Minimal maintenance.

Wednesday, November 17, 2010

Page 10: Designing for Massive Scalability at BackType #bigdatacamp

Desired properties of a back-end

• Robust and fault-tolerant to both machine and human error.

• Low latency reads and updates.

• Scalable to increases in data or traffic.

• Extensible to support new features or related services.

• Generalizes to diverse types of data and requests.

• Allows ad hoc queries.

• Minimal maintenance.

• Debuggable: can trace how any value in the system came to be.

Wednesday, November 17, 2010

Page 11: Designing for Massive Scalability at BackType #bigdatacamp

Layered Architecture

Speed layer

Batch layer

Serving layer

Wednesday, November 17, 2010

Page 12: Designing for Massive Scalability at BackType #bigdatacamp

Layered Architecture

Speed layer

Batch layer

Serving layer

Work in tandem to satisfy our desired properties

Wednesday, November 17, 2010

Page 13: Designing for Massive Scalability at BackType #bigdatacamp

Batch Layer

view = fn(complete dataset)

Wednesday, November 17, 2010

Page 14: Designing for Massive Scalability at BackType #bigdatacamp

Batch Layer Views

• Arbitrary

• High latency

• No random access

Wednesday, November 17, 2010

Page 15: Designing for Massive Scalability at BackType #bigdatacamp

Serving Layer

• Provide random access to batch-computed views

• Update in batch, no random writes

• High latency updates

Wednesday, November 17, 2010

Page 16: Designing for Massive Scalability at BackType #bigdatacamp

ElephantDB

• Our implementation of serving layer

• Pre-shard key/value data via MapReduce

• ElephantDB ring pulls shards from HDFS on startup

• Read-only access to data

Wednesday, November 17, 2010

Page 17: Designing for Massive Scalability at BackType #bigdatacamp

ElephantDB

ElephantDB

0

1

2

3

Shards on HDFS

Batch Layer

ElephantDB Flow

Wednesday, November 17, 2010

Page 18: Designing for Massive Scalability at BackType #bigdatacamp

Batch and Serving Layers

Complete dataset (HDFS)

Tweet count view

Influencer scores view

Site affinity view

Batch Layer Serving Layer

ElephantDBShards

ElephantDBShards

ElephantDBShards

ElephantDBRing

Wednesday, November 17, 2010

Page 19: Designing for Massive Scalability at BackType #bigdatacamp

Batch and Serving Layers

Robust and fault-tolerant to both machine and human error. Low latency reads and updates.

Scalable to increases in data or traffic.

Extensible to support new features or related services.Generalizes to diverse types of data and requests.

Allows ad hoc queries.

Minimal maintenance.

Debuggable: can trace how any value in the system came to be.

Wednesday, November 17, 2010

Page 20: Designing for Massive Scalability at BackType #bigdatacamp

Speed Layer

• Compensate for high latency of updates to serving layer

Wednesday, November 17, 2010

Page 21: Designing for Massive Scalability at BackType #bigdatacamp

Speed Layer

Key point: Only needs to compensate for data not yet absorbed in serving layer

Wednesday, November 17, 2010

Page 22: Designing for Massive Scalability at BackType #bigdatacamp

Speed Layer

Key point: Only needs to compensate for data not yet absorbed in serving layer

Hours of data instead of years of data

Wednesday, November 17, 2010

Page 23: Designing for Massive Scalability at BackType #bigdatacamp

Application-level Queries

Serving Layer

Speed Layer

Query

Query

Merge

Wednesday, November 17, 2010

Page 24: Designing for Massive Scalability at BackType #bigdatacamp

Speed Layer

• Speed layer is transient

• Serving layer eventually corrects speed layer

• Can make tradeoffs aggressively for performance

• Can even tradeoff accuracy

Wednesday, November 17, 2010

Page 25: Designing for Massive Scalability at BackType #bigdatacamp

ExampleExample: Unique visitors to a domain

• Batch/Serving layers

• Compute exact count

• Speed layer

• Keep set of visitors in a bloom filter

• Incrementally update count and bloom filter

Wednesday, November 17, 2010


Recommended