Designing for Massive Scalability at BackType #bigdatacamp

Post on 15-Jan-2015

7,144 views 0 download

Tags:

description

 

transcript

Designing for Massive Scalability at BackType

Michael Montano / @michaelmontano

Desired properties of a back-end

Wednesday, November 17, 2010

Desired properties of a back-end

• Robust and fault-tolerant to both machine and human error.

Wednesday, November 17, 2010

Desired properties of a back-end

• Robust and fault-tolerant to both machine and human error.

• Low latency reads and updates.

Wednesday, November 17, 2010

Desired properties of a back-end

• Robust and fault-tolerant to both machine and human error.

• Low latency reads and updates.

• Scalable to increases in data or traffic.

Wednesday, November 17, 2010

Desired properties of a back-end

• Robust and fault-tolerant to both machine and human error.

• Low latency reads and updates.

• Scalable to increases in data or traffic.

• Extensible to support new features or related services.

Wednesday, November 17, 2010

Desired properties of a back-end

• Robust and fault-tolerant to both machine and human error.

• Low latency reads and updates.

• Scalable to increases in data or traffic.

• Extensible to support new features or related services.

• Generalizes to diverse types of data and requests.

Wednesday, November 17, 2010

Desired properties of a back-end

• Robust and fault-tolerant to both machine and human error.

• Low latency reads and updates.

• Scalable to increases in data or traffic.

• Extensible to support new features or related services.

• Generalizes to diverse types of data and requests.

• Allows ad hoc queries.

Wednesday, November 17, 2010

Desired properties of a back-end

• Robust and fault-tolerant to both machine and human error.

• Low latency reads and updates.

• Scalable to increases in data or traffic.

• Extensible to support new features or related services.

• Generalizes to diverse types of data and requests.

• Allows ad hoc queries.

• Minimal maintenance.

Wednesday, November 17, 2010

Desired properties of a back-end

• Robust and fault-tolerant to both machine and human error.

• Low latency reads and updates.

• Scalable to increases in data or traffic.

• Extensible to support new features or related services.

• Generalizes to diverse types of data and requests.

• Allows ad hoc queries.

• Minimal maintenance.

• Debuggable: can trace how any value in the system came to be.

Wednesday, November 17, 2010

Layered Architecture

Speed layer

Batch layer

Serving layer

Wednesday, November 17, 2010

Layered Architecture

Speed layer

Batch layer

Serving layer

Work in tandem to satisfy our desired properties

Wednesday, November 17, 2010

Batch Layer

view = fn(complete dataset)

Wednesday, November 17, 2010

Batch Layer Views

• Arbitrary

• High latency

• No random access

Wednesday, November 17, 2010

Serving Layer

• Provide random access to batch-computed views

• Update in batch, no random writes

• High latency updates

Wednesday, November 17, 2010

ElephantDB

• Our implementation of serving layer

• Pre-shard key/value data via MapReduce

• ElephantDB ring pulls shards from HDFS on startup

• Read-only access to data

Wednesday, November 17, 2010

ElephantDB

ElephantDB

0

1

2

3

Shards on HDFS

Batch Layer

ElephantDB Flow

Wednesday, November 17, 2010

Batch and Serving Layers

Complete dataset (HDFS)

Tweet count view

Influencer scores view

Site affinity view

Batch Layer Serving Layer

ElephantDBShards

ElephantDBShards

ElephantDBShards

ElephantDBRing

Wednesday, November 17, 2010

Batch and Serving Layers

Robust and fault-tolerant to both machine and human error. Low latency reads and updates.

Scalable to increases in data or traffic.

Extensible to support new features or related services.Generalizes to diverse types of data and requests.

Allows ad hoc queries.

Minimal maintenance.

Debuggable: can trace how any value in the system came to be.

Wednesday, November 17, 2010

Speed Layer

• Compensate for high latency of updates to serving layer

Wednesday, November 17, 2010

Speed Layer

Key point: Only needs to compensate for data not yet absorbed in serving layer

Wednesday, November 17, 2010

Speed Layer

Key point: Only needs to compensate for data not yet absorbed in serving layer

Hours of data instead of years of data

Wednesday, November 17, 2010

Application-level Queries

Serving Layer

Speed Layer

Query

Query

Merge

Wednesday, November 17, 2010

Speed Layer

• Speed layer is transient

• Serving layer eventually corrects speed layer

• Can make tradeoffs aggressively for performance

• Can even tradeoff accuracy

Wednesday, November 17, 2010

ExampleExample: Unique visitors to a domain

• Batch/Serving layers

• Compute exact count

• Speed layer

• Keep set of visitors in a bloom filter

• Incrementally update count and bloom filter

Wednesday, November 17, 2010