© 2015 Mesosphere, Inc. All Rights Reserved.
SIMPLIFYING STREAMING ANALYTICS
1
Cassandra Summit 2015
Brenden Matthews @brndnmtthws
© 2015 Mesosphere, Inc. All Rights Reserved.
AGENDA
2
• Introduction • Streaming analytics:
• What is it? • Why do it? • When do I need it? • How? - Demo! • What are the limitations?
© 2015 Mesosphere, Inc. All Rights Reserved.
ABOUT ME - BRENDEN MATTHEWS
3
• ASF member, Mesos committer • Have contributed to a number of related OSS projects,
including Spark, Storm, Kafka, Presto, and a number of Mesos schedulers
• SA @Mesosphere, formerly on the DI team @Airbnb
© 2015 Mesosphere, Inc. All Rights Reserved.
STREAMING ANALYTICS: WHAT IS IT?
5
• Perform joins, aggregations, mutations on data as it happens
• Components typically include: • Producer • Message broker • [E] Consumer • [T] Processing engine • [L] Storage
© 2015 Mesosphere, Inc. All Rights Reserved.
STREAMING ANALYTICS: WHAT IS IT?
6
• Perform joins, aggregations, mutations on data as it happens
• Components typically include: • Producer • Message broker • [Extract] Consumer • [Transform] Processing engine • [Load] Storage
© 2015 Mesosphere, Inc. All Rights Reserved.
STREAMING ANALYTICS: WHY DO IT?
7
• Data is constantly being generated • HTTP traffic, clickstream, IoT, metrics
• Most data is correlated (requires joins) • Data can be pre-denormalized (i.e.,
flattened) • Immutability • Build “real time” services—what’s
happening right now? • Compute once
© 2015 Mesosphere, Inc. All Rights Reserved.
STREAMING ANALYTICS: WHEN DO I NEED IT?
8
• Messaging platform • Compliance • Fraud detection • Firehose consumption • Recommendation engine
© 2015 Mesosphere, Inc. All Rights Reserved.
STREAMING ANALYTICS: HOW?
9
Producer
Broker
Consumer/ML
Storage
Pipeline
© 2015 Mesosphere, Inc. All Rights Reserved.
STREAMING ANALYTICS: HOW?
11
Demo time!
github.com/mesosphere/iot-demo
© 2015 Mesosphere, Inc. All Rights Reserved.
STREAMING ANALYTICS: WHAT ARE THE LIMITATIONS?
12
• Not a replacement for all batch workloads
• Backfilling is tricky • Unless you retain a log of all data
mutations, backfilling my be impossible
• Maintaining a completely immutable system may explode storage costs