+ All Categories
Home > Technology > Hortonworks Data in Motion Webinar Series - Part 1

Hortonworks Data in Motion Webinar Series - Part 1

Date post: 09-Jan-2017
Category:
Upload: hortonworks
View: 2,220 times
Download: 9 times
Share this document with a friend
26
Harnessing Data-in- Motion with Hortonworks DataFlow Introduction to HDF 2.0 Haimo Liu Product Manager Aldrin Piri Technical Staff
Transcript
Page 1: Hortonworks Data in Motion Webinar Series - Part 1

Harnessing Data-in-Motion with Hortonworks DataFlow

Introduction to HDF 2.0

Haimo LiuProduct Manager

Aldrin PiriTechnical Staff

Page 2: Hortonworks Data in Motion Webinar Series - Part 1

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda HDF 2.0: Flow Management– NiFi basics– NiFi use cases– NiFi demos

HDF 2.0: Streaming Analytics

Page 3: Hortonworks Data in Motion Webinar Series - Part 1

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Simplistic View of Enterprise Data Flow

Data Flow

Process and Analyze DataAcquire Data

Store Data

Page 4: Hortonworks Data in Motion Webinar Series - Part 1

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Interacting with different business partners and customers

Realistic View of Enterprise Data Flow

Page 5: Hortonworks Data in Motion Webinar Series - Part 1

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

• For agile and immediate creation, configuration, control of dataflowsVisual Command and Control

• Ensures trust of your dataData Lineage (Provenance)

• Because not all data is of equal importanceData Prioritization

• Since not all senders/receivers/connections work perfectly all the timeData Buffering/Back-Pressure

• Adapt to different situations with different requirementsControl Latency vs Throughput

• Security of data, and data accessSecure Control Plane/Data Plane

• ScalabilityScale out Clustering

• Ecosystem flexibility and growthExtensibility

Apache NiFi: Designed for 8 challenges of global enterprise dataflow

Page 6: Hortonworks Data in Motion Webinar Series - Part 1

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

What is Apache NiFi used for?• Reliable and secure transfer of data between systems• Delivery of data from sources to analytic platforms• Enrichment and preparation of data:

– Conversion between formats– Extraction/Parsing– Routing decisions

What is Apache NiFi NOT used for?• Distributed Computation• Complex Event Processing• Joins / Complex Rolling Window Operations

Use Cases for Apache NiFi

Page 7: Hortonworks Data in Motion Webinar Series - Part 1

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

FlowFile• Unit of data moving through the system• Content + Attributes (key/value pairs)

Processor• Performs the work, can access FlowFiles

Connection• Links between processors• Queues that can be dynamically prioritized

Terminology

Page 8: Hortonworks Data in Motion Webinar Series - Part 1

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

HTTP Data FlowFile

HTTP/1.1 200 OKDate: Sun, 10 Oct 2010 23:26:07 GMTServer: Apache/2.2.8 (CentOS) OpenSSL/0.9.8gLast-Modified: Sun, 26 Sep 2010 22:04:35 GMTContent-Type: text/html

Hello world XXXXXXXXXXXXXXXXXXXXXXXXXXXX

Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016'Key: 'fileSize’ Value: '23609'Key: 'filename’ Value: '15650246997242'Key: 'path’ Value: './’

0101010101110101010101010101 (Binary)

Header

Content

Analogy: FlowFiles are like HTTP Data

Page 9: Hortonworks Data in Motion Webinar Series - Part 1

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

1. Drag and drop processors to build a flow2. Start, stop, and configure components in real time3. View errors and corresponding error messages4. View statistics and health of data flow5. Create templates of common processor & connections

Create, Run, View, Start, Stop, Change, Fix, Dataflows in Real-Time

Page 10: Hortonworks Data in Motion Webinar Series - Part 1

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache NiFi Demo: Tail Logs, Route on Content, Buffer in Kafka, Deliver to HDFS

Page 11: Hortonworks Data in Motion Webinar Series - Part 1

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

What is Data Provenance and Why is it Important?

BEGIN

ENDLINEAGE

IT and Cloud Operators• Understand traceability, lineage• Enable recovery and replay

Compliance Regulations• Provide an audit trail• Remediation capabilities

Page 12: Hortonworks Data in Motion Webinar Series - Part 1

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Provenance Enables Easy Access and Traceability of Changes

Page 13: Hortonworks Data in Motion Webinar Series - Part 1

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Need Fine-Grained Security and Compliance?

Security• Secured authentication• Enterprise authorization services –

entitlements change often• Encrypted content, encrypted

communications• People and systems with different roles

require difference access levels• Tagged/classified data

Page 14: Hortonworks Data in Motion Webinar Series - Part 1

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Repositories - Pass by reference

Page 15: Hortonworks Data in Motion Webinar Series - Part 1

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Repositories – Copy on Write

Page 16: Hortonworks Data in Motion Webinar Series - Part 1

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda HDF 2.0 Flow Management

HDF 2.0 Platform Evolution– Product offering– Example use case

Page 17: Hortonworks Data in Motion Webinar Series - Part 1

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Constrained High-latency Localized context

Hybrid – cloud / on-premises Low-latency Global context

CoreInfrastructure

Hortonworks DataFlow Manages Data in MotionRegional

InfrastructureSources

Page 18: Hortonworks Data in Motion Webinar Series - Part 1

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

DataFlow Management and Stream ProcessingCore

InfrastructureSources

Constrained High-latency Localized context

Hybrid – cloud / on-premises Low-latency Global context

RegionalInfrastructure

Page 19: Hortonworks Data in Motion Webinar Series - Part 1

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Edge Intelligence with Apache MiNiFi

Guaranteed delivery Data buffering

‒ Backpressure‒ Pressure release

Prioritized queuing Flow specific QoS

‒ Latency vs. throughput‒ Loss tolerance

Data provenance

Recovery / recording a rolling log of fine-grained history

Designed for extension

Different from Apache NiFi Design and Deploy Warm re-deploys

Key Features

Page 20: Hortonworks Data in Motion Webinar Series - Part 1

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi vs. MiNiFi Java Agent

NiFi Framework

Components

MiNiFi

NiFi Framework

User Interface

Components

NiFi

Page 21: Hortonworks Data in Motion Webinar Series - Part 1

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Example: Company X provides alerting services when users’ resting heart rate higher than a threshold

Real-Time Insights Require DataFlow Mgmt and Stream Processing

Acquire Data

Company X Cloud Instance 1

Acquire Data

Company X Cloud Instance 2

Acquire Data

Company X Cloud Instance 3

Acquire Data Across Cloud

Instances

Parse, Filter, Validate, Enrich

and Route

Core Data Center

Analytics/Pattern Match

Data Store

Alerts

Dashboards/Visualization

Flow Management Stream ProcessingLegend:

Page 22: Hortonworks Data in Motion Webinar Series - Part 1

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Data in Motion Needs Dataflow Management and Stream Processing

Acquire data from various Wearable Device’s Cloud Instances

Move Data from Customer Cloud Instances to on-premise instance

Perform Intelligent Routing & Filtering of data. The routing and filtering rules will be often changed at run-time.

Deliver the data data to various downstream systems. New downstream apps should will always appear and the data should be fed to it when it comes online.

Parse the device data to standardized format that downstream sysem can understand

Enrich the data with contextual information including patient/customer info (age, sex, etc..)

Recognize the Pattern when the resting heart rate exceeds a certain threshold (the insight), and then create an alert/notification.

Run a Outlier detection model on streaming heart rate that comes in. If the score is above certain threshold, alert on the heart rate.

Flow Management (NiFi, MiNiFi)

StreamProcessing

(Storm, Kafka)

Page 23: Hortonworks Data in Motion Webinar Series - Part 1

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Use Cases for Data in Motion

Use Cases for Data-in-Motion Using DataFlow Mgmt• Data Ingestion • Edge Intelligence• First Mile Problem • Physical Data Movement • Simple event processing such as Route, Filter, Enrich,

Transform, etc.

When Only DataFlow Management is

Required

Use Cases for Data-in-Motion Using DataFlow Mgmt and Steam Processing• Flow Management to deliver data for Stream Processing• PLUS: Complex pattern matching on unbounded streams of

data.

When Both DataFlow Management and Stream Processing

Page 24: Hortonworks Data in Motion Webinar Series - Part 1

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Flow management

D A T A I N M O T I O N D A T A A T R E S T

IoT Data Sources AWSAzure

Google CloudHadoop

NiFiKafka

Storm

Others…NiFi

NiFi NiFi

MiNiFi

MiNiFi

MiNiFi

MiNiFi

MiNiFi

MiNiFi

MiNiFi

NiFi

HDF 2.0: Data-in-Motion Platform

Enterprise Services

Ambari Ranger Other services

Flow management + Stream Processing

Page 25: Hortonworks Data in Motion Webinar Series - Part 1

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

New Stream Processing Features HDF 2.0

New Storm Connectors Storm-Kafka Spout using new

client APIs Storm Distributed Log Search Storm Dynamic Worker

Profiling Kafka Grafana Integration Storm Grafana Integration

Improved Nimbus HA Storm Automatic Back

Pressure Storm Distributed cache Storm Windowing and State

Management Storm Performance

improvements Improved Kafka SASL

Storm Topology Event inspector Storm Resource Aware

Scheduling Storm Dynamic Log Levels Pacemaker Storm Daemon Kafka Rack Awareness

Developer Productivity Enterprise Readiness Operational Simplicity

Page 26: Hortonworks Data in Motion Webinar Series - Part 1

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

For More Info: https://community.hortonworks.com/

Hortonworks Community Connection:Data Ingestion and Streaming


Recommended