Streaming Data Pipelines
Pierre Matri, Philip Carns, Robert Latham, Shane Snyder, and Robert RossArgonne National Laboratory
Gabriel Antoniu and Alexandru Costan INRIA
Sam Gutierrez, Bob Robey, Brad Settlemyer, and Galen ShipmanLos Alamos National Laboratory
Jerome Soumagne and Neil FortnerThe HDF Group
George Amvrosiadis, Chuck Cranor, Greg Ganger, Ankush Jain, and Qing ZhengCarnegie Mellon University
KV Logs
Previously: Týr blob storage system
FrameworkHadoop M/R, Spark, Flink
FrameworkMPI
BDA Application HPC Application
HPCBDA
KV Logs DFS DFS
FrameworkHadoop M/R, Spark, Flink
FrameworkMPI
BDA Application HPC Application
HPC
KV Logs Unified DFS
Týr: Converging Storage Layer
Logs
Previously: Týr blob storage system
BDA
Pure-HPC use-cases
Buffering between source & processingIn-Situ VisualizationComputational Steering
Log-formatted data storageTime SeriesStreaming data (sensor events)
Checkpointing, recoverySimilar to cloud use-cases
Convergence use-case
Cross-platform application portabilityEnsure cross-platform portability when some basic structures are not available?
Cross-platform researchMany cloud algorithms using distributed logsLeverage those on HPC?Ex: Failure detection
Distributed logging on HPC?
Use-Case: LCLS-II
The Linac Coherent Light-Source @ Stanford World’s first hard X-ray free-electron laser
LCLS-II is an upgrade of the current LCLS
Use-Case: LCLS-II
Use-Case: LCLS-II
Use-Case: LCLS-II
EventsEvents
EventsEvents
Events
Events
Use-Case: LCLS-II
Data Pipeline Requirements
Scalability
Needs to scale to hundreds ofterabytes per second
= 170 million events per second
Simplicity
Building blocks should be availablefor simple use-cases
Variability
Event generation rate is highly-variabledepending on sensor data
Reproducibility
Results should be reproducible= storage
Each step is a process / microserviceA step exposes an API over RPC, with a single endpoint
Using Thallium for RPC [Mercury + Argobots]
Data pipeline model
Events can be augmented with tags (e.g., topics)
A pipeline is composed of a sequence of steps, performing actions on the events
High-level Steps
MapStep(Event) -> Event
FilterStep(Event) -> bool
TagStep(key, Event) -> value
TimedBatchStep(msecs, (set<Event>) -> Event)
CountBatchStep(count, (set<Event>) -> Event)
transforms an event
drops events not matching predicate
sets a tag on an event
time-based event aggregation
count-based event aggregation
Decoupling with storage
Successive steps should allow decouplingE.g., event bursts, buffering, persistence, offline processing, …
Ingress EgressBlobs, FS, …
MetaStep
High-level Storage Steps
MemoryStorageMetaStep()
BlobStorageMetaStep(host, blob_key)
FSStorageMetaStep(path)
Preliminary storage evaluation
Experiments on the Theta supercomputer, with:
Up to 100,000 event generators (1 per core)Simple pipeline, composed of a single storage step
8,192 parallel pipelines, with round-robin routing
This year: deployment
Data reduction pipeline composed of potentially hundreds of steps,that must be deployed alongside the application
= challenge in HPC
How do we describe / deploy hundreds / thousands of micro services on an HPC platform?