+ All Categories
Home > Technology > Apache NiFi Toronto Meetup

Apache NiFi Toronto Meetup

Date post: 18-Feb-2017
Category:
Upload: hortonworks
View: 3,938 times
Download: 0 times
Share this document with a friend
28
Introducing Hortonworks DataFlow © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Transcript
Page 1: Apache NiFi Toronto Meetup

Introducing Hortonworks DataFlow

© Hortonworks Inc. 2011 – 2015. All Rights Reserved

Page 2: Apache NiFi Toronto Meetup

Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Simplistic View of Enterprise Data Flow

The Data Flow Thing

Process and Analyze DataAcquire Data

Store Data

Page 3: Apache NiFi Toronto Meetup

Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Realistic View of Enterprise Data Flow

Page 4: Apache NiFi Toronto Meetup

Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Enterprise DataFlow Challenges

GATHER

DELIVER

PRIORITIZE

Track from the edge Through the datacenter

• Variability in Data Protocols, Formats and Schemas

• Data Size and Speed• Security at Data Plane• Traceability (Data Lineage)• Prioritization of Resources• Multi-Directional Flow• Recoverability and Replay• Transparency of DataFlow • Scaling Down• Enrichment/Transformation• Unreliable Comms

Page 5: Apache NiFi Toronto Meetup

Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

• Add Systems….• Add new systems to handle the protocol differences

• Add new systems to convert the data

• Add new systems to reorder the data

• Add new systems to filter the unauthorized data

• Add new system to slow down or speed up data

• Add new topics to represent ‘stages of the flow’

And Complexity….

Typical Answer to Challenges

Page 6: Apache NiFi Toronto Meetup

Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Hortonworks DataFlow

Visual User InterfaceHTML 5, drag and drop, for agile execution

Provenance Metadatafor governance and compliance

Secure End-to-End Data Routingwith encryption and compression

Powered by Apache NiFi

Page 7: Apache NiFi Toronto Meetup

Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Manage Flow of Data in Real Time

Operators• Transparency• Immediate feedback• Agility

Data Scientists• Flexibility• Autonomy

Page 8: Apache NiFi Toronto Meetup

Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Track Flow of Data from Beginning to End

IT and Cloud Operators• Understand Traceability, Lineage• Enable Recovery and Replay

Compliance Regulations• Provide an Audit Trail• Remediation Capabilities

BEGIN

ENDLINEAGE

Page 9: Apache NiFi Toronto Meetup

Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Secure Data at the Edge

Beyond Simple Encryption• Enterprise authorization services –

entitlements can change often

• People and systems with different roles require difference access levels

Understanding and Classifying Data• Tagged/classified data traced

• Understand who/what/when/where data is leveraged.

Page 10: Apache NiFi Toronto Meetup

Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Common Apache NiFi Use Cases

ComplianceGain full transparency into provenance and flow of data

Digital SecurityAcquire and prioritize data into data lake for analysis

IoT OptimizationSecure, Prioritize, Enrich and Trace data at the edge

Fraud DetectionMove sales transaction data in real time to analyze on demand

Big Data IngestEasily and efficiently ingest data into Hadoop

Value ResourcesGain visibility into how data sources are used to determine value

Page 11: Apache NiFi Toronto Meetup

Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

OS/Host

JVM

Flow Controller

Web Server

Processor 1 Extension N

FlowFileRepository

ContentRepository

ProvenanceRepository

Local Storage

OS/Host

JVM

Flow Controller

Web Server

Processor 1 Extension N

FlowFileRepository

ContentRepository

ProvenanceRepository

Local Storage

Architecture

OS/Host

JVM

Flow Controller

Web Server

Processor 1 Extension N

FlowFileRepository

ContentRepository

ProvenanceRepository

Local Storage

OS/Host

JVM

NiFi Cluster Manger – Request Replicator

Web Server

MasterNiFi Cluster Manager (NCM)

OS/Host

JVM

Flow Controller

Web Server

Processor 1 Extension N

FlowFileRepository

ContentRepository

ProvenanceRepository

Local Storage

SlavesNiFi Nodes

Page 12: Apache NiFi Toronto Meetup

Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

SecurityAdministrationCentral management and consistent security

• NiFi Cluster Manager

AuthenticationAuthenticate users and systems

• 2-Way SSL support out of the box; additional types coming

AuthorizationProvision access to data

• Pluggable authorization designed to fit any Identity and Access Management (IAM) scheme• File-based authority provider out of the box• Multi-role

AuditMaintain a record of data access

• Detailed logging of all user actions• Detailed logging of key system behaviors• Data Provenance enables unparalleled tracking from the edge through the Lake

Data ProtectionProtect data at rest and in motion

• Support a variety of SSL/encrypted protocols• Tag and utilize tags on data for fine grained access controls• Encrypt/decrypt content using pre-shared key mechanisms

Administrator Configure system threads, user accounts, and flow audit history

Data Flow Manager Manipulate the dataflow

Read Only View the dataflow only

+NiFi Configure system threads, user accounts, and flow audit history

Proxy Manipulate the dataflow

Provenance Query the provenance repository and download content

Page 13: Apache NiFi Toronto Meetup

Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Apache NiFi User Quotes

“The NiFi user interface and ease of extension have made it extremely easy to get up and running and even customize. It is great that it also easily integrates with other parts of the Apache Big Data world like Spark, Kafka and Hadoop.”

Craig Connell, Leverege, Chief Technology Officer

“NiFi's well designed, mature API has made our integration process remarkably straightforward. With it, we're able to track the origin, transformation, and persistence of data throughout our analytic processes.”

Mike BishopPrescient EdgeChief Systems Architect

“NiFi addresses dataflow challenges we have right now and provides upside for where we're heading. That it is designed for the global enterprise, is also a big win for us.”

Alexandar RyabovWargaming.netSenior Director of Data Engineering

Page 14: Apache NiFi Toronto Meetup

Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Thank You

Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Page 15: Apache NiFi Toronto Meetup

Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Hortonworks DataFlow Use CasesAdminister Flows, Enhance Security and Manage Equipment

Page 16: Apache NiFi Toronto Meetup

Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Data Flow Management

Data Ingestion

Data as a Service Provenance

Data Regulatory Compliance

DATA FLOW MANAGEMENT

Page 17: Apache NiFi Toronto Meetup

Page 17 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

DATA FLOW MANAGEMENT

Data Ingestion, with bi-directional intelligence and provenance metadata

• DATA INGESTION

Most ingest tools are unidirectional—data streams in the same way no matter what

They don’t preserve detail on in-flow data transformations

PROBLEM

HDF manages bi-directional, point-to-point data flows that are easily configured

Data reaches its destination with its provenance data intact

SOLUTION

Users can update data flow logic to always receive the data they need

Provenance data improves confidence in your insights

IMPACT “The NiFi user interface and ease of extension have made it extremely easy to get up and running and even customize.”

Craig Connell, CTO, Leverege

Page 18: Apache NiFi Toronto Meetup

Page 18 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

DATA FLOW MANAGEMENT

Providers of data as a service assign value to data using NiFi’s provenance metadata

• DATA AS A SERVICE PROVENANCE

A new genre of companies provide data as a service

They have limited ability to prioritize which data is most valuable

PROBLEM

NiFi’s data provenance capabilities help DaaS companies understand (in much more detail) how their data is consumed

SOLUTION

They can understand which information resources are valuable and which are not

This helps them invest in capturing the most valuable data sources

IMPACT

Page 19: Apache NiFi Toronto Meetup

Page 19 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

DATA FLOW MANAGEMENT

Firms Comply with Financial Regulations by Showing Complete Chain of Custody

• DATA REGULATORY COMPLIANCE

Financial firms such as retail banks, capital markets firms and insurance companies are required to show chain of custody for certain transactions

PROBLEM

Apache NiFi’s data provenance capabilities show a complete chain of custody, for compliance with rules such as Basal capital requirements

SOLUTION

Firms can go back to a point in time and show regulators exactly what happened to a key piece of data in a transaction

IMPACT

Page 20: Apache NiFi Toronto Meetup

Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Enhance Security

Asset and People Security

Secure Data Ingestion

Fraud and Theft Protection

ENHANCE SECURITY

Page 21: Apache NiFi Toronto Meetup

Page 21 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

ENHANCE SECURITY

• ASSET AND PEOPLE SECURITY

Prescient Edge Helps Its Customers Protect the Physical Safety of Their Personnel

With [Apache NiFi], we're able to track the origin, transformation, and persistence of data throughout our analytic processes.”

Mike Bishop, Chief Systems Architect, Prescient Edge

Globally distributed firms and government agencies have personnel in risky areas

Prescient Edge provides analytics to protect employees

PROBLEM

The company uses Apache NiFi to feed real-time, unstructured data, from dozens of sources, to Prescient Edge analytics systems, to determine emergent threats,

SOLUTION

Prescient Edge is able to provide their clients with detailed, up to the minute threat and risk information, thereby allowing their clients to respond quickly to safeguard its teams and assets

IMPACT

Page 22: Apache NiFi Toronto Meetup

Page 22 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

ENHANCE SECURITY

A major US financial firm uses HDF to prioritize data ingest and speed time to protection

• SECURE DATA INGESTION

Digital security depends on the ability to detect threats quickly.

Protection algorithms evaluate metadata with equal priority, slowing time to protection

PROBLEM

Apache NiFi helps to more effectively acquire, evaluate and prioritize security logs upstream, before they reach the analytics engine

SOLUTION

By prioritizing which data to send to its analytics engine, the company sees faster time to protection for its cyber assets

IMPACT

Page 23: Apache NiFi Toronto Meetup

Page 23 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

ENHANCE SECURITY

A huge US retailer uses Apache NiFi to reduce theft and shrinkage by hundreds of millions annually

• FRAUD AND THEFT PROTECTION

Thieves shoplift merchandise in the morning and then return the stolen goods later the same day for credit to their card

PROBLEM

Apache NiFi pushes a real time stream of inventory and transactional data into Hadoop more quickly, reducing the time to detect this fraudulent pattern

SOLUTION

The company expects to reduce shrinkage by hundreds of millions of dollars annually

IMPACT

Page 24: Apache NiFi Toronto Meetup

Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Manage Equipment

Equipment Repair

Remote Security Protection

MANAGE EQUIPMENT

Page 25: Apache NiFi Toronto Meetup

Page 25 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

MANAGE EQUIPMENT

Global oil company uses Apache NiFi to prioritize which sensor data to send ashore from offshore rigs

• EQUIPMENT REPAIR

Offshore oil rigs have physical constraints on their hardware footprints and associated bandwidth

Far more sensor data is generated than can be transmitted to shore

PROBLEM

Apache Nifi uses rules-based prioritization to determine which sensor data is most important and thus needs to be transmitted back first, for immediate analysis

SOLUTION

Ability to distinguish important readings from standard readings helps the company isolate important signals and take action to improve efficiency and safety

IMPACT

Page 26: Apache NiFi Toronto Meetup

Page 26 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

MANAGE EQUIPMENT

Firm with a high security profile enriches on-site video data to detect intrusions

• REMOTE SECURITY PROTECTION

Digital security cameras present a “needle in a haystack” problem

Individuals monitoring video feeds can be lulled by 100s of hours where nothing happens

PROBLEM

Hortonworks DataFlow can identify a “trigger moment” like when a human face appears in a video, enrich that “trigger moment” with additional data and prioritize back for immediate analysis

SOLUTION

Analytics systems and analysts are able to more quickly sift through the “noise” to identify known human threats in a particular area

IMPACT

Page 27: Apache NiFi Toronto Meetup

Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Apache NiFi User Quotes

“The NiFi user interface and ease of extension have made it extremely easy to get up and running and even customize. It is great that it also easily integrates with other parts of the Apache Big Data world like Spark, Kafka and Hadoop.”

Craig Connell, Leverege, Chief Technology Officer

“NiFi's well designed, mature API has made our integration process remarkably straightforward. With it, we're able to track the origin, transformation, and persistence of data throughout our analytic processes.”

Mike BishopPrescient EdgeChief Systems Architect

“NiFi addresses dataflow challenges we have right now and provides upside for where we're heading. That it is designed for the global enterprise, is also a big win for us.”

Alexandar RyabovWargaming.netSenior Director of Data Engineering

Page 28: Apache NiFi Toronto Meetup

Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Thank You

Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved


Recommended