BIG CONTAINERS, BIG ORCHESTRATION, BIG DATAWilliam Benton Red Hat, Inc.@willb
COMMONS GATHERINGSeattle | November 7#OCGathering2016
BACKGROUND
COMMONS GATHERINGSeattle | November 7#OCGathering2016
Mesos
WHAT OUR CLUSTER LOOKED LIKE IN 2014
Networked POSIX FS
Spark executor
Spark executor
Spark executor
Spark executor
Spark executor
Spark executor
1
2
3
4
1
1
2
3
3
4
Analytics is no longer a separate workload.Analytics is an essential component of modern data-driven applications.
COMMONS GATHERINGSeattle | November 7#OCGathering2016
OUR GOALS
git
COMMONS GATHERINGSeattle | November 7#OCGathering2016
FORECAST
Spark and microservices
Architectures for analytics and applications
Scheduling and storage
Future work (and how to get involved)
SPARK AND MICROSERVICES
Apache Spark is a fast and general framework for distributed data processing.
Resilient Distributed Datasets are partitioned, lazy, and immutable homogeneous collections.
COMMONS GATHERINGSeattle | November 7#OCGathering2016
RESILIENT DISTRIBUTED DATASETS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
2 3 4 6 7 8 10 11 121 5 9 13 14 15 16
COMMONS GATHERINGSeattle | November 7#OCGathering2016
RESILIENT DISTRIBUTED DATASETS
COMMONS GATHERINGSeattle | November 7#OCGathering2016
1 2 3 λ x: x % 2 != 0 λ x: x * 3
FILTER MAP
λ x: [x, x+1]
FLATMAP
COMMONS GATHERINGSeattle | November 7#OCGathering2016
3 λ x: x % 2 != 0 λ x: x * 3
FILTER MAP
λ x: [x, x+1]
FLATMAP
3 4 9 10COLLECT
COMMONS GATHERINGSeattle | November 7#OCGathering2016
1 2 3 λ x: x % 2 != 0λ x: x * 3
FILTERMAP
λ x: [x, x+1]
FLATMAP
3 4 9 10SAVE AS TEXT FILE
CACHE
COMMONS GATHERINGSeattle | November 7#OCGathering2016
executor1
1 2 3
executorn
10 11 12
cluster manager
2 4 6 20 22 24
λ x: x * 2λ x: x * 2
driver
CACHCACH
COMMONS GATHERINGSeattle | November 7#OCGathering2016
Spark core
Graph SQL ML Streaming
ad hoc Mesos YARN
COMMONS GATHERINGSeattle | November 7#OCGathering2016
Spark core
Graph SQL ML Streaming
ad hoc Mesos YARNk8s
A microservice architecture employs lightweight, modular, and typically stateless components with well-defined interfaces and contracts.
COMMONS GATHERINGSeattle | November 7#OCGathering2016
BENEFITS OF MICROSERVICE ARCHITECTURES
COMMONS GATHERINGSeattle | November 7#OCGathering2016
BENEFITS OF MICROSERVICE ARCHITECTURES
COMMONS GATHERINGSeattle | November 7#OCGathering2016
BENEFITS OF MICROSERVICE ARCHITECTURES
2 + 2 5
COMMONS GATHERINGSeattle | November 7#OCGathering2016
MICROSERVICES AND SPARK
executor
1 2 3
executor
4 5 6
executor
7 8 9
executor
10 11 12
master
λ x: x * 22 4 6 8 10 12 14 16 18 20 22 24
λ x: x * 2 λ x: x * 2 λ x: x * 2 λ x: x * 2
ARCHITECTURES FOR ANALYTICS AND APPLICATIONS
COMMONS GATHERINGSeattle | November 7#OCGathering2016
APPLICATION RESPONSIBILITIES
archive
trainmodels
transform
transform
transform
aggregate
events
databases
file, object storage
COMMONS GATHERINGSeattle | November 7#OCGathering2016
APPLICATION RESPONSIBILITIES
archive
trainmodels
transform
transform
transform
aggregate
events
databases
file, object storage
management
web and mobile
reporting
developer UI
LEGACY ARCHITECTURES
COMMONS GATHERINGSeattle | November 7#OCGathering2016
transactionprocessing
CONVENTIONAL DATA WAREHOUSE
transformevents
UI business logic
RDBMS
COMMONS GATHERINGSeattle | November 7#OCGathering2016
transactionprocessing
CONVENTIONAL DATA WAREHOUSE
transformevents
UI business logic
RDBMS analytic processing
RDBMS
analysis
interactive queryreporting
COMMONS GATHERINGSeattle | November 7#OCGathering2016
HADOOP-STYLE “DATA LAKE”
HDFS
events
HDFS HDFS HDFS HDFS
COMMONS GATHERINGSeattle | November 7#OCGathering2016
HADOOP-STYLE “DATA LAKE”
HDFS
compute
events
HDFS
compute
HDFS
compute compute compute
HDFS HDFS
MODERN ARCHITECTURES
COMMONS GATHERINGSeattle | November 7#OCGathering2016
serving layerspeed layer
THE LAMBDA ARCHITECTURE
events
batch layer
UIfederate
(precise)analysistransform
(imprecise)analysistransform
DFS
COMMONS GATHERINGSeattle | November 7#OCGathering2016
queue for “raw data” topic
THE KAPPA ARCHITECTURE
events
transform analysis
queue for “preprocessed data” topic
queue for “analysis results” topic
reporting end-user UI
COMMONS GATHERINGSeattle | November 7#OCGathering2016
DATA FEDERATION IN THE COMPUTE LAYER
aggregate
trainmodels
archive
events
databases
file, object storage
management
web and mobile
reporting
developer UItransform
transform
transform
PRACTICALITIES AND POTENTIAL PITFALLS
COMMONS GATHERINGSeattle | November 7#OCGathering2016
Cluster scheduler
SIDEBAR: THE MONOLITHIC SPARK ANTIPATTERN
Shared FSSpark executor
Spark executor
Spark executor
Spark executor
Spark executor
Spark executor
Resource manager
app 1 app 2
app 4app 3
COMMONS GATHERINGSeattle | November 7#OCGathering2016
OpenShift
ONE CLUSTER PER APPLICATION
Object storesapp 1 app 2
app 5app 4
app 3
app 6
app 1 app 2
app 5app 4
app 3
app 6
Databases
COMMONS GATHERINGSeattle | November 7#OCGathering2016
OpenShift
app 1 app 2
app 5app 4
app 3
app 6
app 1
COMMONS GATHERINGSeattle | November 7#OCGathering2016
OpenShift
app 1 app 2
app 5app 4
app 3
app 6
app 1 app 2
app 5app 4
app 3
app 6
POSIX FS
HDFS HDFS
HDFS HDFS
HDFS
HDFS
COMMONS GATHERINGSeattle | November 7#OCGathering2016
OpenShift
app 1 app 2
app 5app 4
app 3
app 6
app 1 app 2
app 5app 4
app 3
app 6
object store
✓ interoperability✓ fine-grained AC✓ many implementations
✗ consistency model✗ performance
“For the workloads from Facebook and Bing, we see that 96% and 89% of the active jobs respectively can have their data entirely fit in memory, given an allowance of 32GB memory per server for caching”
—“PACMan: Coordinated Memory Caching for Parallel Jobs.” G. Ananthanarayanan et al., in Proceedings of NSDI ’12.
“Recent studies have shown that reading data from local disks is only about 8% faster than reading it from remote disks over the network … [and] this 8% number is decreasing.”
—Tom Phelan, “The Elephant in the Big Data Room: Data Locality is Irrelevant for Hadoop” (goo.gl/MnCKuM)
“Three out of ten hours of job runtime were spent moving files from the staging directory to the final directory in HDFS…We were essentially compressing, serializing, and replicating three copies for a single read.”
—“Apache Spark @Scale: a 60+ TB production use case”Facebook Engineering Blog Post
COMMONS GATHERINGSeattle | November 7#OCGathering2016
executor1 executornCACHCACH
driver
COMMONS GATHERINGSeattle | November 7#OCGathering2016
COLOCATED COMPUTE AND STORAGE: YAGNI
Disk locality is just another kind of caching, but memory is much faster than disk and working set sizes typically fit in cluster memory after ETL.
The I/O-heavy behavior of frameworks designed for colocated compute and storage performs worse than iterative processing in memory.
Colocating compute and storage prevents independent scale-out of compute and turns “cattle” into “pets.”
COMMONS GATHERINGSeattle | November 7#OCGathering2016
…BUT IF YOU DOOpenShift
app 1 app 2 app 3app 1 app 2 app 3
Storage
COMMONS GATHERINGSeattle | November 7#OCGathering2016
…BUT IF YOU DOOpenShift
app 1 app 2 app 3app 1 app 2 app 3
Storage Storage Storage
PLAYING ALONG AT HOME
COMMONS GATHERINGSeattle | November 7#OCGathering2016
TRY IT OUT YOURSELF
Enabling Spark on OpenShift: https://github.com/radanalyticsio
Video demo: https://vimeo.com/189710503
Meet the teams at lunch!