Flink and NiFi, Two Stars in the Apache Big Data Constellation

Post on 15-Apr-2017

2,503 views 4 download

transcript

Flink and NiFi, Two Stars in the Apache Big Data Constellation

Matthew Ring, Chicago Apache Flink Meetup, Jan. 19, 2016

About me:● Matthew Ring is currently a Senior Software Engineer at HP Enterprise.

● Matt has been a professional Java developer in multiple industries, including finance, healthcare

and education, since 1999.

● Prior to that, he was an electrical engineer in defense communications.

● He is currently working on a new Investigative Analytics product for HP Enterprise.

● He has presented talks at JavaOne and Bank of America's developer conferences.

● His github is https://github.com/mring33621

What is NiFi?Origin:

NSA -> Onyara -> Apache NiFi

-> Hortonworks DataFlow

Summary:

Visual Dataflow Programming for Big Data/Fast Data Ingestion!

(Or, yet another package where you drop stuff on the screen and connect it with arrows)

What is NiFi?IMHO, good for:

● Ingestion● Format Conversion● Light (simple) Processing● Delivery to other systems

Screenshot?from: https://www.silvercloudcomputing.com/nifi.html

What is Flink?I’m pretty sure you’ve

already heard about it...

Together?● Similar, but different...● Friends in common:

○ Sockets○ Kafka○ HDFS○ Flume○ RabbitMQ○ NATS Messaging○ Elasticsearch○ Solr

● There is also the option of direct NiFi <-> Flink connections!

Together?● NiFi is visual● NiFi keeps a paper trail RE: the data

running through it● Supports monitoring/metrics reporting

○ Ambari○ Ganglia○ Reimann

● Oh, and you can modify flows while they are LIVE!

● NiFi has more friends to bring to the party:○ JSON/Avro/Parquet/Kite○ HTTP/S, UDP, S/FTP○ Text matching/parsing with regex○ Tagging (meta data)○ Scripting○ AWS S3, SQS, SNS, Azure events○ Tailing/Syslog○ HL7○ MongoDB○ HBase○ SQL○ JMS○ Images○ ...AND MORE!

Paper Trail!NiFi records:

● Content● Metadata● Provenance (touches)

Sooooo what?

● Allows replay of individual items!● Queryable through UI or REST interface● Assists in post hoc data forensics (compliance? legal discovery?)

Downsides?● Weak deployment paradigm

○ Can import/export flow templates

○ But various processor config values will need to be updated by hand when moving from env to env

● Weak clustering story○ non-elastic○ SPOF master node

● Weak querying capability from UI● Most processors are micro-batching (event-time stream processing is still

experimental)● Sometimes tedious -- have to think in terms of several little, built-in pieces to

get a simple job done

NOW IS TIME FOR QUIZ!

...err, how ‘bout a demo?

Demo NotesCustom Java code provides:

● synthetic intraday ticks● trader state management● glue logic● websocket backend for dashboard UI

Custom HTML/JS code provides:

● live dashboard UI● smoothie.js charts● knockout.js binding/templating

NiFi:

● observes orders○ can deny orders based on ‘compliance

rules’● observes executions

○ routes ‘suspicious’ executions to file system for future scrutiny

Flink Streaming provides:

● trade recommendation engine● execution engine

Demo: Screenshot of NiFi Flow

Demo: Screenshot of Live Web Dashboard

Questions?

Thank you!