+ All Categories
Home > Documents > Hortonworks HDP

Hortonworks HDP

Date post: 10-Feb-2016
Category:
Upload: mliss
View: 201 times
Download: 1 times
Share this document with a friend
Description:
HDInsight. Hortonworks HDP. Seamlessly scale in cloud Backed by Azure Storage Vault (ASV)/Azure Blob Storage. On-Premise or VM Based on HDFS. HDInsight. Hortonworks HDP. Lack of community support Untested to scale of traditional Hadoop setup in production setting - PowerPoint PPT Presentation
Popular Tags:
25
Transcript
Page 1: Hortonworks  HDP
Page 2: Hortonworks  HDP
Page 3: Hortonworks  HDP
Page 4: Hortonworks  HDP
Page 5: Hortonworks  HDP
Page 6: Hortonworks  HDP

Hortonworks HDPOn-Premise or VM

Based on HDFS

HDInsightSeamlessly scale in cloud

Backed by Azure Storage Vault (ASV)/Azure Blob Storage

Page 7: Hortonworks  HDP

HDInsight Hortonworks HDPLack of community support

Untested to scale of traditional Hadoop setup in production

setting

Lack of clear migration path to alternative Hadoop setup

Reliance on MS to bake in required Hadoop tools

Huge community support

Can setup on multitude of Linux and Windows VM’s

Migration to alternate platforms a known quantity

Support for new tools such as MRv2 or YARN quickly available

Page 8: Hortonworks  HDP

Hadoop/HDFS

HiveData Warehouse

Reporting Tools

Azure SQLCassandraSqoopMapReduce

ODBC

ODBC

Page 9: Hortonworks  HDP

Problem… Hadoop is great for batch of processing of millions of records. What about real-time processing?

Page 10: Hortonworks  HDP

Azure Queue

Data Warehouse

Trustev API

Azure Worker Roles

Page 11: Hortonworks  HDP

Message routing…

Can be complex, brittle and hard to scale

Page 12: Hortonworks  HDP

Azure Queue

Azure Queue Azure Queue

Page 13: Hortonworks  HDP

Message routing…

Routing must be re-configured when scaling out

Page 14: Hortonworks  HDP

Azure Queue

Azure Queue Azure Queue Azure Queue

Page 15: Hortonworks  HDP

And… Definition of fraud detection algorithms, weightings, rules get trapped in a release cycle. Fraud moves too fast!!!

Page 16: Hortonworks  HDP

Enter… Apache Storm. Doing for real-time data what Hadoop did for batch processing.

Page 17: Hortonworks  HDP

Azure Queue

Storm Cluster Data Warehouse

Trustev API

Shared Algorithms ML Generated Algorithms

Page 18: Hortonworks  HDP

Tuples

Streams

Spout

Bolts

Ordered List of ElementsName list of values of any type

Unbounded sequence of tuplesCan come from multiple source, like Twitter API or bolts

Source of streamCan talk with queues, logs, API calls, event data

Process Tuples, Create New StreamsApply functions, transforms, filter, aggregate, join and access DB’s and API’s etc.

Page 19: Hortonworks  HDP

Storm topologies

Are a directed graph of Spouts and Bolts. Using the correct tools, topologies can be created by fraud analysts, conversion analysts and most importantly automatically created and published using machine learning

Page 20: Hortonworks  HDP

Data Warehouse

Merchant A has a fraud problem that needs solving quickly. Merchant A can use our Shared Algorithm topology to immediately block common fraud problems.

Page 21: Hortonworks  HDP

Data Warehouse

Merchant A has been on our system for an extended period of time, and our system knows better what their fraud problem actually looks like. Our ML systems create a new topology to better deal with Merchant A’s fraud problem.

Page 22: Hortonworks  HDP

+

Hadoop Storm

Batch processing system than can churn huge volume of data

Real-time complex event processing system then can

process data stream

Page 23: Hortonworks  HDP

Speed Layer

Only New Data

Compensates for high latency ‘Serving Layer’

updates

‘Batch Layer’ overrides ‘Speed Layer’

Serving Layer

Loads and expose the batch views for querying

Random access to batch views

Batch Layer

Immutable, constantly growing datasets

Batch views computed from this raw dataset

This gives us our Lambda Architecture.

Real Time Big Data = Storm Process + Hadoop Process

Use the history data produce by Hadoop to make the to make your real time result faster, and more accurate

Page 24: Hortonworks  HDP

You can build this out in hours!

A simple combination of Azure Queues, SQL Azure, Azure VM’s running Cassandra, Hadoop and Storm

Page 25: Hortonworks  HDP

Recommended