+ All Categories

Spec + onyx

Date post: 21-Feb-2017
Category:
Upload: simon-belak
View: 25 times
Download: 0 times
Share this document with a friend
28
Spec + Onyx: an experience report @sbelak [email protected]
Transcript
Page 1: Spec + onyx

Spec + Onyx: an experience report

@sbelak [email protected]

Page 2: Spec + onyx

Onyxa masterless, cloud scale, fault tolerant, high performance distributed computation system

… written entirely in Clojure

Page 3: Spec + onyx

Onyx at• In production for almost a year

• ETL

• online machine learning

• offline (batch) machine learning

• ad-hoc analysis

Page 4: Spec + onyx

Self-service infrastructure for data scientists

Page 5: Spec + onyx

1.Onyx at a glance

2.How Onyx rewired my brain

3.Building on top of spec

Page 6: Spec + onyx

Onyx at a glance

Page 7: Spec + onyx

Job =

[[:input :processing-1] [:input :processing-2] [:processing-1 :output-1] [:processing-2 :output-2]]

[{:flow/from :input-stream :flow/to [:process-adults] :flow/predicate :my.ns/adult? :flow/doc "Emits segment if an adult.”}]

workflow + flow conditions + catalogue [{:onyx/name :add-5

:onyx/fn :my/adder :onyx/type :function :my/n 5 :onyx/params [:my/n] :onyx/batch-size batch-size}

{:onyx/name :in :onyx/plugin :onyx.plugin.core-async/input :onyx/type :input :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Reads segments from a core.async channel"}

{:onyx/name :out :onyx/plugin :onyx.plugin.core-async/output :onyx/type :output :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Writes segments to a core.async channel"}]

Page 8: Spec + onyx

Catalogue[{:onyx/name :add-5 :onyx/fn :my/adder :onyx/type :function :my/n 5 :onyx/params [:my/n] :onyx/batch-size batch-size}

{:onyx/name :in :onyx/plugin :onyx.plugin.core-async/input :onyx/type :input :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Reads segments from a core.async channel"}

{:onyx/name :out :onyx/plugin :onyx.plugin.core-async/output :onyx/type :output :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Writes segments to a core.async channel"}]

Vanilla Clojure function(defn adder [n {:keys [x] :as segment}] (assoc segment :x (+ n x))))

Plugins (I/O)seq, async, Kafka, Datomic, SQL,…

parameter

self-documenting

Page 9: Spec + onyx

Computation entirely described with data

data is

code!

Page 10: Spec + onyx

Everything can be run locally!

Page 11: Spec + onyx

Testing without mocking

Page 12: Spec + onyx

How Onyx rewired my brain

Page 13: Spec + onyx

It’s not about scaling, but clean architecture

Page 14: Spec + onyx

My goto architecture

KafkaDB EventsOnyx Onyx

Onyx

Persist all messages to S3

(time travel!)

Page 15: Spec + onyx

Decomplect everything

Page 16: Spec + onyx

Computation graphs

Page 17: Spec + onyx

Building on top of spec

Page 18: Spec + onyx

Queryable data descriptions

• s/registry, s/form

• Build a graph (Datomic)

Interact with your type system!co

de is d

ata!

Page 19: Spec + onyx

Case study: autogenerating materialised views

KafkaMaterialised views

Events External data

Automatic view generation• Event & attribute ontology

• Manual (via spec) • Inferred

• Statistical analysis (seasonality detection, outlier removal, …)

Onyx Onyx

Onyx

Page 20: Spec + onyx

Automatic view generation

1. Walk spec registry

2. Apply rules

1. Define new view (spec)

2. Trigger Onyx job that creates the view

Page 21: Spec + onyx

Code is data or

data is code?

Page 22: Spec + onyx

Takeouts

Page 23: Spec + onyx

Onyx is production ready

Page 24: Spec + onyx

Everything should be live and interactive

Page 25: Spec + onyx

Computation graphs are a great way to structure data processing code

Page 26: Spec + onyx

Queryable data and computation descriptions supercharge interactive development and are a great building block for automation


Recommended