Date post: | 15-Jan-2015 |
Category: |
Documents |
Upload: | michael-montano |
View: | 7,144 times |
Download: | 0 times |
Designing for Massive Scalability at BackType
Michael Montano / @michaelmontano
Desired properties of a back-end
Wednesday, November 17, 2010
Desired properties of a back-end
• Robust and fault-tolerant to both machine and human error.
Wednesday, November 17, 2010
Desired properties of a back-end
• Robust and fault-tolerant to both machine and human error.
• Low latency reads and updates.
Wednesday, November 17, 2010
Desired properties of a back-end
• Robust and fault-tolerant to both machine and human error.
• Low latency reads and updates.
• Scalable to increases in data or traffic.
Wednesday, November 17, 2010
Desired properties of a back-end
• Robust and fault-tolerant to both machine and human error.
• Low latency reads and updates.
• Scalable to increases in data or traffic.
• Extensible to support new features or related services.
Wednesday, November 17, 2010
Desired properties of a back-end
• Robust and fault-tolerant to both machine and human error.
• Low latency reads and updates.
• Scalable to increases in data or traffic.
• Extensible to support new features or related services.
• Generalizes to diverse types of data and requests.
Wednesday, November 17, 2010
Desired properties of a back-end
• Robust and fault-tolerant to both machine and human error.
• Low latency reads and updates.
• Scalable to increases in data or traffic.
• Extensible to support new features or related services.
• Generalizes to diverse types of data and requests.
• Allows ad hoc queries.
Wednesday, November 17, 2010
Desired properties of a back-end
• Robust and fault-tolerant to both machine and human error.
• Low latency reads and updates.
• Scalable to increases in data or traffic.
• Extensible to support new features or related services.
• Generalizes to diverse types of data and requests.
• Allows ad hoc queries.
• Minimal maintenance.
Wednesday, November 17, 2010
Desired properties of a back-end
• Robust and fault-tolerant to both machine and human error.
• Low latency reads and updates.
• Scalable to increases in data or traffic.
• Extensible to support new features or related services.
• Generalizes to diverse types of data and requests.
• Allows ad hoc queries.
• Minimal maintenance.
• Debuggable: can trace how any value in the system came to be.
Wednesday, November 17, 2010
Layered Architecture
Speed layer
Batch layer
Serving layer
Wednesday, November 17, 2010
Layered Architecture
Speed layer
Batch layer
Serving layer
Work in tandem to satisfy our desired properties
Wednesday, November 17, 2010
Batch Layer
view = fn(complete dataset)
Wednesday, November 17, 2010
Batch Layer Views
• Arbitrary
• High latency
• No random access
Wednesday, November 17, 2010
Serving Layer
• Provide random access to batch-computed views
• Update in batch, no random writes
• High latency updates
Wednesday, November 17, 2010
ElephantDB
• Our implementation of serving layer
• Pre-shard key/value data via MapReduce
• ElephantDB ring pulls shards from HDFS on startup
• Read-only access to data
Wednesday, November 17, 2010
ElephantDB
ElephantDB
0
1
2
3
Shards on HDFS
Batch Layer
ElephantDB Flow
Wednesday, November 17, 2010
Batch and Serving Layers
Complete dataset (HDFS)
Tweet count view
Influencer scores view
Site affinity view
Batch Layer Serving Layer
ElephantDBShards
ElephantDBShards
ElephantDBShards
ElephantDBRing
Wednesday, November 17, 2010
Batch and Serving Layers
Robust and fault-tolerant to both machine and human error. Low latency reads and updates.
Scalable to increases in data or traffic.
Extensible to support new features or related services.Generalizes to diverse types of data and requests.
Allows ad hoc queries.
Minimal maintenance.
Debuggable: can trace how any value in the system came to be.
Wednesday, November 17, 2010
Speed Layer
• Compensate for high latency of updates to serving layer
Wednesday, November 17, 2010
Speed Layer
Key point: Only needs to compensate for data not yet absorbed in serving layer
Wednesday, November 17, 2010
Speed Layer
Key point: Only needs to compensate for data not yet absorbed in serving layer
Hours of data instead of years of data
Wednesday, November 17, 2010
Application-level Queries
Serving Layer
Speed Layer
Query
Query
Merge
Wednesday, November 17, 2010
Speed Layer
• Speed layer is transient
• Serving layer eventually corrects speed layer
• Can make tradeoffs aggressively for performance
• Can even tradeoff accuracy
Wednesday, November 17, 2010
ExampleExample: Unique visitors to a domain
• Batch/Serving layers
• Compute exact count
• Speed layer
• Keep set of visitors in a bloom filter
• Incrementally update count and bloom filter
Wednesday, November 17, 2010