Date post: | 18-Feb-2017 |
Category: |
Technology |
Upload: | hortonworks |
View: | 3,938 times |
Download: | 0 times |
Introducing Hortonworks DataFlow
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Simplistic View of Enterprise Data Flow
The Data Flow Thing
Process and Analyze DataAcquire Data
Store Data
Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Realistic View of Enterprise Data Flow
Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Enterprise DataFlow Challenges
GATHER
DELIVER
PRIORITIZE
Track from the edge Through the datacenter
• Variability in Data Protocols, Formats and Schemas
• Data Size and Speed• Security at Data Plane• Traceability (Data Lineage)• Prioritization of Resources• Multi-Directional Flow• Recoverability and Replay• Transparency of DataFlow • Scaling Down• Enrichment/Transformation• Unreliable Comms
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
• Add Systems….• Add new systems to handle the protocol differences
• Add new systems to convert the data
• Add new systems to reorder the data
• Add new systems to filter the unauthorized data
• Add new system to slow down or speed up data
• Add new topics to represent ‘stages of the flow’
And Complexity….
Typical Answer to Challenges
Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hortonworks DataFlow
Visual User InterfaceHTML 5, drag and drop, for agile execution
Provenance Metadatafor governance and compliance
Secure End-to-End Data Routingwith encryption and compression
Powered by Apache NiFi
Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Manage Flow of Data in Real Time
Operators• Transparency• Immediate feedback• Agility
Data Scientists• Flexibility• Autonomy
Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Track Flow of Data from Beginning to End
IT and Cloud Operators• Understand Traceability, Lineage• Enable Recovery and Replay
Compliance Regulations• Provide an Audit Trail• Remediation Capabilities
BEGIN
ENDLINEAGE
Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Secure Data at the Edge
Beyond Simple Encryption• Enterprise authorization services –
entitlements can change often
• People and systems with different roles require difference access levels
Understanding and Classifying Data• Tagged/classified data traced
• Understand who/what/when/where data is leveraged.
Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Common Apache NiFi Use Cases
ComplianceGain full transparency into provenance and flow of data
Digital SecurityAcquire and prioritize data into data lake for analysis
IoT OptimizationSecure, Prioritize, Enrich and Trace data at the edge
Fraud DetectionMove sales transaction data in real time to analyze on demand
Big Data IngestEasily and efficiently ingest data into Hadoop
Value ResourcesGain visibility into how data sources are used to determine value
Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFileRepository
ContentRepository
ProvenanceRepository
Local Storage
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFileRepository
ContentRepository
ProvenanceRepository
Local Storage
Architecture
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFileRepository
ContentRepository
ProvenanceRepository
Local Storage
OS/Host
JVM
NiFi Cluster Manger – Request Replicator
Web Server
MasterNiFi Cluster Manager (NCM)
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFileRepository
ContentRepository
ProvenanceRepository
Local Storage
SlavesNiFi Nodes
Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
SecurityAdministrationCentral management and consistent security
• NiFi Cluster Manager
AuthenticationAuthenticate users and systems
• 2-Way SSL support out of the box; additional types coming
AuthorizationProvision access to data
• Pluggable authorization designed to fit any Identity and Access Management (IAM) scheme• File-based authority provider out of the box• Multi-role
AuditMaintain a record of data access
• Detailed logging of all user actions• Detailed logging of key system behaviors• Data Provenance enables unparalleled tracking from the edge through the Lake
Data ProtectionProtect data at rest and in motion
• Support a variety of SSL/encrypted protocols• Tag and utilize tags on data for fine grained access controls• Encrypt/decrypt content using pre-shared key mechanisms
Administrator Configure system threads, user accounts, and flow audit history
Data Flow Manager Manipulate the dataflow
Read Only View the dataflow only
+NiFi Configure system threads, user accounts, and flow audit history
Proxy Manipulate the dataflow
Provenance Query the provenance repository and download content
Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache NiFi User Quotes
“The NiFi user interface and ease of extension have made it extremely easy to get up and running and even customize. It is great that it also easily integrates with other parts of the Apache Big Data world like Spark, Kafka and Hadoop.”
Craig Connell, Leverege, Chief Technology Officer
“NiFi's well designed, mature API has made our integration process remarkably straightforward. With it, we're able to track the origin, transformation, and persistence of data throughout our analytic processes.”
Mike BishopPrescient EdgeChief Systems Architect
“NiFi addresses dataflow challenges we have right now and provides upside for where we're heading. That it is designed for the global enterprise, is also a big win for us.”
Alexandar RyabovWargaming.netSenior Director of Data Engineering
Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Thank You
Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hortonworks DataFlow Use CasesAdminister Flows, Enhance Security and Manage Equipment
Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Data Flow Management
Data Ingestion
Data as a Service Provenance
Data Regulatory Compliance
DATA FLOW MANAGEMENT
Page 17 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
DATA FLOW MANAGEMENT
Data Ingestion, with bi-directional intelligence and provenance metadata
• DATA INGESTION
Most ingest tools are unidirectional—data streams in the same way no matter what
They don’t preserve detail on in-flow data transformations
PROBLEM
HDF manages bi-directional, point-to-point data flows that are easily configured
Data reaches its destination with its provenance data intact
SOLUTION
Users can update data flow logic to always receive the data they need
Provenance data improves confidence in your insights
IMPACT “The NiFi user interface and ease of extension have made it extremely easy to get up and running and even customize.”
Craig Connell, CTO, Leverege
Page 18 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
DATA FLOW MANAGEMENT
Providers of data as a service assign value to data using NiFi’s provenance metadata
• DATA AS A SERVICE PROVENANCE
A new genre of companies provide data as a service
They have limited ability to prioritize which data is most valuable
PROBLEM
NiFi’s data provenance capabilities help DaaS companies understand (in much more detail) how their data is consumed
SOLUTION
They can understand which information resources are valuable and which are not
This helps them invest in capturing the most valuable data sources
IMPACT
Page 19 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
DATA FLOW MANAGEMENT
Firms Comply with Financial Regulations by Showing Complete Chain of Custody
• DATA REGULATORY COMPLIANCE
Financial firms such as retail banks, capital markets firms and insurance companies are required to show chain of custody for certain transactions
PROBLEM
Apache NiFi’s data provenance capabilities show a complete chain of custody, for compliance with rules such as Basal capital requirements
SOLUTION
Firms can go back to a point in time and show regulators exactly what happened to a key piece of data in a transaction
IMPACT
Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Enhance Security
Asset and People Security
Secure Data Ingestion
Fraud and Theft Protection
ENHANCE SECURITY
Page 21 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ENHANCE SECURITY
• ASSET AND PEOPLE SECURITY
Prescient Edge Helps Its Customers Protect the Physical Safety of Their Personnel
With [Apache NiFi], we're able to track the origin, transformation, and persistence of data throughout our analytic processes.”
Mike Bishop, Chief Systems Architect, Prescient Edge
Globally distributed firms and government agencies have personnel in risky areas
Prescient Edge provides analytics to protect employees
PROBLEM
The company uses Apache NiFi to feed real-time, unstructured data, from dozens of sources, to Prescient Edge analytics systems, to determine emergent threats,
SOLUTION
Prescient Edge is able to provide their clients with detailed, up to the minute threat and risk information, thereby allowing their clients to respond quickly to safeguard its teams and assets
IMPACT
Page 22 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ENHANCE SECURITY
A major US financial firm uses HDF to prioritize data ingest and speed time to protection
• SECURE DATA INGESTION
Digital security depends on the ability to detect threats quickly.
Protection algorithms evaluate metadata with equal priority, slowing time to protection
PROBLEM
Apache NiFi helps to more effectively acquire, evaluate and prioritize security logs upstream, before they reach the analytics engine
SOLUTION
By prioritizing which data to send to its analytics engine, the company sees faster time to protection for its cyber assets
IMPACT
Page 23 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ENHANCE SECURITY
A huge US retailer uses Apache NiFi to reduce theft and shrinkage by hundreds of millions annually
• FRAUD AND THEFT PROTECTION
Thieves shoplift merchandise in the morning and then return the stolen goods later the same day for credit to their card
PROBLEM
Apache NiFi pushes a real time stream of inventory and transactional data into Hadoop more quickly, reducing the time to detect this fraudulent pattern
SOLUTION
The company expects to reduce shrinkage by hundreds of millions of dollars annually
IMPACT
Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Manage Equipment
Equipment Repair
Remote Security Protection
MANAGE EQUIPMENT
Page 25 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
MANAGE EQUIPMENT
Global oil company uses Apache NiFi to prioritize which sensor data to send ashore from offshore rigs
• EQUIPMENT REPAIR
Offshore oil rigs have physical constraints on their hardware footprints and associated bandwidth
Far more sensor data is generated than can be transmitted to shore
PROBLEM
Apache Nifi uses rules-based prioritization to determine which sensor data is most important and thus needs to be transmitted back first, for immediate analysis
SOLUTION
Ability to distinguish important readings from standard readings helps the company isolate important signals and take action to improve efficiency and safety
IMPACT
Page 26 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
MANAGE EQUIPMENT
Firm with a high security profile enriches on-site video data to detect intrusions
• REMOTE SECURITY PROTECTION
Digital security cameras present a “needle in a haystack” problem
Individuals monitoring video feeds can be lulled by 100s of hours where nothing happens
PROBLEM
Hortonworks DataFlow can identify a “trigger moment” like when a human face appears in a video, enrich that “trigger moment” with additional data and prioritize back for immediate analysis
SOLUTION
Analytics systems and analysts are able to more quickly sift through the “noise” to identify known human threats in a particular area
IMPACT
Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache NiFi User Quotes
“The NiFi user interface and ease of extension have made it extremely easy to get up and running and even customize. It is great that it also easily integrates with other parts of the Apache Big Data world like Spark, Kafka and Hadoop.”
Craig Connell, Leverege, Chief Technology Officer
“NiFi's well designed, mature API has made our integration process remarkably straightforward. With it, we're able to track the origin, transformation, and persistence of data throughout our analytic processes.”
Mike BishopPrescient EdgeChief Systems Architect
“NiFi addresses dataflow challenges we have right now and provides upside for where we're heading. That it is designed for the global enterprise, is also a big win for us.”
Alexandar RyabovWargaming.netSenior Director of Data Engineering
Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Thank You
Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved