+ All Categories
Home > Technology > Joe Witt presentation on Apache NiFi

Joe Witt presentation on Apache NiFi

Date post: 04-Aug-2015
Category:
Upload: markkerzner
View: 114 times
Download: 5 times
Share this document with a friend
Popular Tags:
23
Apache NiFi Better Analytics Demand Better Dataflow Presented by: Joe Witt Apache NiFi PPMC Member
Transcript
Page 1: Joe Witt presentation on Apache NiFi

Apache NiFi Better Analytics Demand Better Dataflow

Presented by: Joe Witt Apache NiFi PPMC Member

Page 2: Joe Witt presentation on Apache NiFi

2  

@apachenifi

Page 3: Joe Witt presentation on Apache NiFi

History

3  

•  Developed at NSA for over eight years

•  Donated to the Apache Software Foundation Nov 2014

•  Undergoing incubation

•  Three ASF releases to date •  Closing in on 0.2.0 release

 

Page 4: Joe Witt presentation on Apache NiFi

The problem space: Enterprise Dataflow

4  

Automate the flow of data from any source

…to systems which extract meaning and insight

…and to those that store and make it available for users

Page 5: Joe Witt presentation on Apache NiFi

Use Cases for NiFi

5  

•  Remote sensor delivery

•  Inter-site / global distribution

•  Intra-site distribution

•  ‘Big Data’ ingest

•  Data Processing (enrichment, filtering, sanitization)  

Page 6: Joe Witt presentation on Apache NiFi

The challenges we faced

6  

•  Transport / Messaging was not enough

•  Needed to understand the big picture

•  Needed the ability to make *immediate* changes

•  Must maintain chain of custody for data •  Rigorous security and compliance requirements  

Page 7: Joe Witt presentation on Apache NiFi

Why transport and messaging was not enough?

7  

•  Data access exceeded resources to transport

•  Decoupling systems is about more than the connectivity

•  Message sizes ranged from B to GB

•  Not all data is created equal

•  Needed precise security controls •  SSL and topic level authorization insufficient

 

Page 8: Joe Witt presentation on Apache NiFi

 The basic building blocks

Real-time Command and Control

The Power of Provenance

8  

Apache NiFi Foundational Concepts

2

3

1

Page 9: Joe Witt presentation on Apache NiFi

HEADER  -­‐  UUID  -­‐  Name  -­‐  Size  -­‐  Entry  Time  

           A3ributes  Map                [[Key  |  Value]]  

CONTENT  

Flow File

9  

•  Types •  Events •  Objects •  Files •  Messages •  Media

•  Formats •  JSON •  Avro •  Text •  Mp4 •  Proprietary

•  Sizes •  Bytes to GBs

Page 10: Joe Witt presentation on Apache NiFi

Flow File Processor

10  

• Routing •  Context •  Content

• Transformation •  Enrich •  Obfuscate •  Filter •  Convert •  Analyze •  Split •  Aggregate

• Mediation •  Push / Pull • …

Page 11: Joe Witt presentation on Apache NiFi

Connections

11  

• Queuing • Back Pressure • Expiration

• Prioritize

• Swapping

Page 12: Joe Witt presentation on Apache NiFi

Flow Controller

12  

Page 13: Joe Witt presentation on Apache NiFi

NiFi Architecture

13  

Page 14: Joe Witt presentation on Apache NiFi

NiFi Clustering Model

14  

Page 15: Joe Witt presentation on Apache NiFi

Tighten the feedback loop •  Changes have consequences (good or bad) •  And you see them as they occur

Continuous Improvement •  Compare real-time vs. historical statistics •  View data provenance •  View Content at any stage Intuitive user experience •  Visual programming •  Logical flow graph

15  

Real-time command and control 2

Page 16: Joe Witt presentation on Apache NiFi

Latency Optimization •  Intra process •  Inter process •  End-to-end Compliance •  Prove handling •  Assess impact Understanding •  Step through time •  View content •  View Context

16  

The Power of Provenance – Chain of custody for data 3

Page 17: Joe Witt presentation on Apache NiFi

17  

Demo

Page 18: Joe Witt presentation on Apache NiFi

Flow File Repo – Write Ahead Log Content Repo

Add more partitions Input/Output Streams

Copy on Write Pass by Reference Allow tradeoffs of latency vs throughput

18  

How fast is it and why?

Page 19: Joe Witt presentation on Apache NiFi

- User to System and System to System -  Authentication (2-Way SSL, more coming…)

-  Authorization (pluggable)

-  Authorize a specific piece of data to a specific system

-  Data provenance -  Prove you have done the right thing -  Recover when you have not

19  

How does it deal with security?

Page 20: Joe Witt presentation on Apache NiFi

Web UI Push API

Reporting Tasks (ganglia, graphite, etc…) Pull API

REST API

20  

How can I monitor this at runtime?

Page 21: Joe Witt presentation on Apache NiFi

Flow File Processors Advanced UI

Flow File Prioritizer Reporting Tasks Controller Services Build Clients against our REST API

21  

What are the points of extension?

Page 22: Joe Witt presentation on Apache NiFi

Status and direction for NiFi

22  

Efficient use of each node -  100s of MB/s per node -  100Ks transactions/s per node Simple / Effective scaling model Runtime Command and Control Data Provenance  

Distributed durability of data - Maybe Kafka backed queues High Availability Cluster Manager Live / Rolling Upgrades Provenance Query Language / Reporting A complete user experience enabled by provenance

Existing Strengths Roadmap Highlights

Page 23: Joe Witt presentation on Apache NiFi

Apache NiFi (incubating) site http://nifi.incubator.apache.org Subscribe to and collaborate at [email protected] [email protected] Submit Ideas or Issues https://issues.apache.org/jira/browse/NIFI @apachenifi  

23  

Learn more about Apache NiFi


Recommended