+ All Categories
Home > Technology > Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

Date post: 23-Feb-2017
Category:
Upload: aldrin-piri
View: 1,129 times
Download: 1 times
Share this document with a friend
38
Dataflow with Apache NiFi Aldrin Piri - @aldrinpiri Apache NiFi Meetup Hadoop Summit – San Jose 27 June 2016
Transcript
Page 1: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

Dataflow with Apache NiFiAldrin Piri - @aldrinpiriApache NiFi MeetupHadoop Summit – San Jose

27 June 2016

Page 2: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Slides available at the conclusion of the talk:http://slideshare.net/aldrinpiri/

Page 3: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Key: 'Apache NiFi’ Value: 'PMC Member'Key: 'Work’ Value: ’Sr. Member of Technical Staff @ Hortonworks'Key: 'Working with NiFi Since’ Value: '2010’

Page 4: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat is dataflow and what are the challenges?

Apache NiFi

Architecture

Live Demo

Community

Page 5: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat is dataflow and what are the challenges?

Apache NiFi

Architecture

Live Demo

Community

Page 6: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Let’s Connect A to BProducers A.K.A Things

AnythingAND

Everything

Internet!

Consumers• User• Storage• System• …More Things

Page 7: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Moving data effectively is hard

Standards: http://xkcd.com/927/

Page 8: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Why is moving data effectively hard?

Standards Formats “Exactly Once” Delivery Protocols Veracity of Information Validity of Information Ensuring Security Overcoming Security

Compliance Schemas Consumers Change Credential Management “That [person|team|group]” Network “Exactly Once” Delivery

Page 9: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Let’s Connect Lots of As to Bs to As to Cs to Bs to Δs to Cs to ϕsLet’s consider the needs of a courier service

Physical Store

Gateway Server

Mobile Devices

Registers

Server Cluster

Distribution Center Core Data Center at HQ

Server Cluster

On Delivery Routes

Trucks Deliverers

Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/Deliverer: Rigo Peter, https://thenounproject.com/rigo/Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/

Page 10: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Great! I am collecting all this data! Let’s use it!Finding our needles in the haystack

Physical Store

Gateway Server

Mobile Devices

Registers

Server Cluster

Distribution Center

Kafka

Core Data Center at HQ

Server Cluster

Others

Storm / Spark / Flink / Apex

Kafka

Storm / Spark / Flink / Apex

On Delivery Routes

Trucks Deliverers

Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/Deliverer: Rigo Peter, https://thenounproject.com/rigo/Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/

Page 11: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Why is moving data effectively hard when scoped internally?

Standards Formats “Exactly Once” Delivery Protocols Veracity of Information Validity of Information Ensuring Security Overcoming Security

Compliance Schemas Consumers Change Credential Management “That [person|team|group]” Network “Exactly Once” Delivery

Page 12: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Let’s Connect Lots of As to Bs to As to Cs to Bs to Δs to Cs to ϕsOh, that courier service is global

Page 13: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Why is moving data effectively hard when scoped globally?

Standards Formats “Exactly Once” Delivery Protocols Veracity of Information Validity of Information Ensuring Security Overcoming Security

Compliance Schemas Consumers Change Credential Management “That [person|team|group]” Network “Exactly Once” Delivery

Page 14: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

The Unassuming Line: A Case StudyWe’ve seen a few lines show up in the wild thus far

Internet! Inter- & Intra- connections inour global courier enterprise

Spotlight: Arthur Lacôte, https://thenounproject.com/turo/

Page 15: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Dataflow Line Anatomy 101Let’s dissect what this line typically represents

Fig 1. Lineus Worldwidewebus. Common Name: Internet!

Script or Application

Script or Application

Data Data

Disparate TransportMechanisms

Page 16: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Dataflow Line Anatomy 201Sometimes that transport is just more lines

Fig 1. Lineus Worldwidewebus. Common Name: Internet!

Script or Application

Script or Application

Line Inception

Data Data

Page 17: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Dataflow Line Anatomy 301But those lines could also have components…

Fig 1. Lineus Worldwidewebus. Common Name: Internet! Fig 2. Good Recursion Joke

NoSuchJokeException

footage not found

Page 18: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat is dataflow and what are the challenges?

Apache NiFi

Architecture

Live Demo

Community

Page 19: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache NiFiKey Features

• Guaranteed delivery• Data buffering

- Backpressure- Pressure release

• Prioritized queuing• Flow specific QoS

- Latency vs. throughput- Loss tolerance

• Data provenance• Supports push and pull

models

• Recovery/recording a rolling log of fine-grained history

• Visual command and control

• Flow templates• Pluggable/multi-role

security• Designed for extension• Clustering

Page 20: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache NiFi Subproject: MiNiFi

Let me get the key parts of NiFi close to where data begins and provide bidrectional communication

NiFi lives in the data center. Give it an enterprise server or a cluster of them. MiNiFi lives as close to where data is born and is a guest on that device or system

Page 21: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Let’s revisit our courier service from the perspective of NiFi

Physical Store

Gateway Server

Mobile Devices

Registers

Server Cluster

Distribution Center

Kafka

Core Data Center at HQ

Server Cluster

Others

Storm / Spark / Flink / Apex

Kafka

Storm / Spark / Flink / Apex

On Delivery Routes

Trucks Deliverers

Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/Deliverer: Rigo Peter, https://thenounproject.com/rigo/Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/

Client Libraries

Client Libraries

MiNiFi

MiNiFiNiFi NiFi NiFi NiFi NiFi NiFi

Client Libraries

Page 22: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache NiFi Managed DataflowSOURCES REGIONAL

INFRASTRUCTURECORE

INFRASTRUCTURE

Page 23: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi is based on Flow Based Programming (FBP)FBP Term NiFi Term DescriptionInformation Packet

FlowFile Each object moving through the system.

Black Box FlowFile Processor

Performs the work, doing some combination of data routing, transformation, or mediation between systems.

Bounded Buffer

Connection The linkage between processors, acting as queues and allowing various processes to interact at differing rates.

Scheduler Flow Controller

Maintains the knowledge of how processes are connected, and manages the threads and allocations thereof which all processes use.

Subnet Process Group

A set of processes and their connections, which can receive and send data via ports. A process group allows creation of entirely new component simply by composition of its components.

Page 24: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

FlowFiles & Data Agnosticism

NiFi is data agnostic! But, NiFi was designed understanding that users

can care about specifics and provides tooling

to interact with specific formats, protocols, etc.

ISO 8601 - http://xkcd.com/1179/

Robustness principle

Be conservative in what you do, be liberal in what you accept from others“

Page 25: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

FlowFiles are like HTTP dataHTTP Data FlowFile

HTTP/1.1 200 OKDate: Sun, 10 Oct 2010 23:26:07 GMTServer: Apache/2.2.8 (CentOS) OpenSSL/0.9.8gLast-Modified: Sun, 26 Sep 2010 22:04:35 GMTETag: "45b6-834-49130cc1182c0"Accept-Ranges: bytesContent-Length: 13Connection: closeContent-Type: text/html

Hello world!

Standard FlowFile AttributesKey: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016'Key: 'lineageStartDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016'Key: 'fileSize’ Value: '23609'FlowFile Attribute Map ContentKey: 'filename’Value: '15650246997242'Key: 'path’ Value: './’

Binary Content *

Header

Content

Page 26: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat is dataflow and what are the challenges?

Apache NiFi

Architecture

Live Demo

Community

Page 27: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

OS/Host

JVM

Flow Controller

Web Server

Processor 1 Extension N

FlowFileRepository

ContentRepository

ProvenanceRepository

Local Storage

OS/Host

JVM

Flow Controller

Web Server

Processor 1 Extension N

FlowFileRepository

ContentRepository

ProvenanceRepository

Local Storage

Architecture

OS/Host

JVM

Flow Controller

Web Server

Processor 1 Extension N

FlowFileRepository

ContentRepository

ProvenanceRepository

Local Storage

OS/Host

JVM

NiFi Cluster Manger – Request Replicator

Web Server

MasterNiFi Cluster Manager (NCM)

OS/Host

JVM

Flow Controller

Web Server

Processor 1 Extension N

FlowFileRepository

ContentRepository

ProvenanceRepository

Local Storage

SlavesNiFi Nodes

Page 28: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi Architecture – Repositories - Pass by reference

FlowFile Content Provenance

F1 C1 C1 P1 F1

Excerpt of demo flow… What’s happening inside the repositories…

BEFORE

AFTER

F2 C1 C1 P3 F2 – Clone (F1)

F1 C1 P2 F1 – Route

P1 F1 – Create

Page 29: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi Architecture – Repositories – Copy on Write

FlowFile Content Provenance

F1 C1 C1 P1 F1 - CREATE

Excerpt of demo flow… What’s happening inside the repositories…

BEFORE

AFTER

F1 C1

F1.1 C2C2 (encrypted)

C1 (plaintext)

P2 F1.1 - MODIFY

P1 F1 - CREATE

Page 30: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat is dataflow and what are the challenges?

Apache NiFi

Architecture

Demo

Community

Page 31: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Learn, Share at Birds of a Feather Streaming, DataFlow & Cybersecurity

Thursday June 306:30 pm, Ballroom C

Page 32: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thank You

Page 33: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat is dataflow and what are the challenges?

Apache NiFi

Architecture

Demo

Community

Page 34: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Matured at NSA 2006-2014

Brief history of the Apache NiFi Community

• Contributors from Government and several commercial industries

• Releases on a 6-8 week schedule

• Apache NiFi 1.0.0. release on the horizon• Zero-Master Clustering

Code developed at NSA

2006

Today

Achieved TLP

status in just 7 months

July 2015

Code available open source

ASL v2

November 2014

Page 35: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Extension / Integration PointsNiFi Term DescriptionFlow File Processor

Push/Pull behavior. Custom UI

Reporting Task

Used to push data from NiFi to some external service (metrics, provenance, etc..)

Controller Service

Used to enable reusable components / shared services throughout the flow

REST API Allows clients to connect to pull information, change behavior, etc..

Page 36: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Page 37: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Page 38: Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Learn more and join us!

Apache NiFi sitehttp://nifi.apache.org

Subproject MiNiFi sitehttp://nifi.apache.org/minifi/

Subscribe to and collaborate [email protected]@nifi.apache.org

Submit Ideas or Issueshttps://issues.apache.org/jira/browse/NIFI

Follow us on Twitter@apachenifi


Recommended