+ All Categories
Home > Technology > [253] apache ni fi

[253] apache ni fi

Date post: 24-Jan-2018
Category:
Upload: naver-d2
View: 7,571 times
Download: 0 times
Share this document with a friend
19
Beyond Messaging Enterprise Dataflow powered by Apache NiFi © Hortonworks Inc. 2011 2015. All Rights Reserved Aldrin Piri DEVIEW 2015 2015.09.15
Transcript
Page 1: [253] apache ni fi

Beyond Messaging Enterprise Dataflow powered by Apache NiFi

© Hortonworks Inc. 2011 – 2015. All Rights Reserved

Aldrin Piri

DEVIEW 2015 2015.09.15

Page 2: [253] apache ni fi

Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

About me

Member of Technical Staff

Project Management Committee and Committer

@aldrinpiri

Page 3: [253] apache ni fi

Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Simplistic View of Enterprise Data Flow

The Data Flow Thing

Process and

Analyze Data Acquire Data

Store Data

Page 4: [253] apache ni fi

Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

4

• Remote sensor delivery (Internet of Things - IoT)

• Intra-site / Inter-site / global distribution (Enterprise)

• Ingest for driving analytics (Big Data)

• Data Processing (Simple Event Processing)

Where do we find data flow?

Page 5: [253] apache ni fi

Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Basics of Connecting Systems

For every connection,

these must agree:

1. Protocol

2. Format

3. Schema

4. Priority

5. Size of event

6. Frequency of event

7. Authorization access

8. Relevance

P1

Producer

C1

Consumer

Page 6: [253] apache ni fi

Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

6

• Messaging addresses only a small subset of the problem space

• Needed to understand the big picture

• Needed the ability to make immediate changes

• Must maintain chain of custody for data

• Rigorous security and compliance requirements

Challenges of dataflow in the enterprise

Page 7: [253] apache ni fi

Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

7

Great options including:

• Kafka

• ActiveMQ

• Tibco

Let us consider the perfect messaging system for this talk:

• It has zero latency

• It has perfect data durability

• It supports unlimited consumers and producers

Messaging Systems as Dataflow

Page 8: [253] apache ni fi

Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

8

“But my system needs…”

• A different format and/or schema

• To use a different protocol

• The highest priority information first

• Large objects (event batches) / Small Objects (streams)

• Authorization to the data level

• Only interested in a subset of data on a topic

• Data needs to be enriched/sanitized before it arrives

Dataflow as a messaging problem

Page 9: [253] apache ni fi

Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Using Messaging

Only a subset agree

using messaging

1. Protocol

2. Format

3. Schema

4. Priority

5. Size of event

6. Frequency of event

7. Authorization access

8. Relevance

P1

CN

C1

Messaging

More issues to consider:

• How do you know what the data flow looks like?

• How is it managed?

• How is it working – today, yesterday?

Page 10: [253] apache ni fi

Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

10

• Add new systems to handle the protocol differences

• Add new systems to convert the data

• Add new systems to reorder the data

• Add new systems to filter the unauthorized data

• Add new topics to represent ‘stages of the flow’

Which leads to latency, complexity, and limited retention

Ultimately, the operations teams who handle data at flow boundaries become

responsible for managing.

How these issues are typically solved

Page 11: [253] apache ni fi

Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Real-time Data Flow

It’s not just how quickly you move data – it’s about how quickly you can change behavior and seize new opportunities

Page 12: [253] apache ni fi

Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Introducing Apache NiFi

• Guaranteed delivery

• Data buffering

- Backpressure

- Pressure release

• Prioritized queuing

• Flow specific QoS

- Latency vs. throughput

- Loss tolerance

• Data provenance

• Recovery/recording

a rolling log of fine-

grained history

• Visual command and

control

• Flow templates

• Pluggable/multi-role

security

• Designed for extension

• Clustering

Page 13: [253] apache ni fi

Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

13

November 2014

NiFi is donated to the Apache Software Foundation

(ASF) through NSA’s Technology Transfer Program

and enters ASF’s incubator.

2006

NiagaraFiles (NiFi) was first incepted by Joe Witt at

the National Security Agency (NSA)

A Brief History

July 2015

NiFi reaches ASF top-level project status

Page 14: [253] apache ni fi

Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Flow Based Programming (FBP)

FBP Term NiFi Term Description

Information

Packet

FlowFile Each object moving through the system.

Black Box FlowFile

Processor

Performs the work, doing some combination of data routing,

transformation, or mediation between systems.

Bounded

Buffer

Connection The linkage between processors, acting as queues and allowing various

processes to interact at differing rates.

Scheduler Flow

Controller

Maintains the knowledge of how processes are connected, and manages

the threads and allocations thereof which all processes use.

Subnet Process

Group

A set of processes and their connections, which can receive and send

data via ports. A process group allows creation of entirely new

component simply by composition of its components.

Page 15: [253] apache ni fi

Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

OS/Host

JVM

Flow Controller

Web Server

Processor 1 Extension N

FlowFile

Repository

Content

Repository

Provenance

Repository

Local Storage

OS/Host

JVM

Flow Controller

Web Server

Processor 1 Extension N

FlowFile

Repository

Content

Repository

Provenance

Repository

Local Storage

Architecture OS/Host

JVM

NiFi Cluster Manager – Request Replicator

Web Server

Master

NiFi Cluster Manager (NCM)

OS/Host

JVM

Flow Controller

Web Server

Processor 1 Extension N

FlowFile

Repository

Content

Repository

Provenance

Repository

Local Storage

Slaves

NiFi Nodes

Page 16: [253] apache ni fi

Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Live Demonstration

Page 17: [253] apache ni fi

Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Feature Proposals – Status

FUTURE Better integration with Apache Kafka

FUTURE Clustering redesign

IN PROGRESS Configuration management of flows

STARTED Extension and template registry

RELEASE COMING SOON First-class Avro support 1

STARTED Interactive queue management

STARTED Multi-tenant data flow

FUTURE Pluggable authentication

FUTURE Reference-able process groups

FUTURE Variable registry

FUTURE ‘Wormhole’ connections

https://cwiki.apache.org/confluence/display/NIFI/NiFi+Feature+Proposals

Page 18: [253] apache ni fi

Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Learn more and join us!

Apache NiFi site

http://nifi.apache.org

Subscribe to and collaborate at

[email protected]

[email protected]

Submit Ideas or Issues

https://issues.apache.org/jira/browse/NIFI

Follow us on Twitter

@apachenifi

Page 19: [253] apache ni fi

Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Thank you!


Recommended