Hortonworks HDP

Post on 10-Feb-2016

201 views 1 download

Tags:

description

HDInsight. Hortonworks HDP. Seamlessly scale in cloud Backed by Azure Storage Vault (ASV)/Azure Blob Storage. On-Premise or VM Based on HDFS. HDInsight. Hortonworks HDP. Lack of community support Untested to scale of traditional Hadoop setup in production setting - PowerPoint PPT Presentation

transcript

Hortonworks HDPOn-Premise or VM

Based on HDFS

HDInsightSeamlessly scale in cloud

Backed by Azure Storage Vault (ASV)/Azure Blob Storage

HDInsight Hortonworks HDPLack of community support

Untested to scale of traditional Hadoop setup in production

setting

Lack of clear migration path to alternative Hadoop setup

Reliance on MS to bake in required Hadoop tools

Huge community support

Can setup on multitude of Linux and Windows VM’s

Migration to alternate platforms a known quantity

Support for new tools such as MRv2 or YARN quickly available

Hadoop/HDFS

HiveData Warehouse

Reporting Tools

Azure SQLCassandraSqoopMapReduce

ODBC

ODBC

Problem… Hadoop is great for batch of processing of millions of records. What about real-time processing?

Azure Queue

Data Warehouse

Trustev API

Azure Worker Roles

Message routing…

Can be complex, brittle and hard to scale

Azure Queue

Azure Queue Azure Queue

Message routing…

Routing must be re-configured when scaling out

Azure Queue

Azure Queue Azure Queue Azure Queue

And… Definition of fraud detection algorithms, weightings, rules get trapped in a release cycle. Fraud moves too fast!!!

Enter… Apache Storm. Doing for real-time data what Hadoop did for batch processing.

Azure Queue

Storm Cluster Data Warehouse

Trustev API

Shared Algorithms ML Generated Algorithms

Tuples

Streams

Spout

Bolts

Ordered List of ElementsName list of values of any type

Unbounded sequence of tuplesCan come from multiple source, like Twitter API or bolts

Source of streamCan talk with queues, logs, API calls, event data

Process Tuples, Create New StreamsApply functions, transforms, filter, aggregate, join and access DB’s and API’s etc.

Storm topologies

Are a directed graph of Spouts and Bolts. Using the correct tools, topologies can be created by fraud analysts, conversion analysts and most importantly automatically created and published using machine learning

Data Warehouse

Merchant A has a fraud problem that needs solving quickly. Merchant A can use our Shared Algorithm topology to immediately block common fraud problems.

Data Warehouse

Merchant A has been on our system for an extended period of time, and our system knows better what their fraud problem actually looks like. Our ML systems create a new topology to better deal with Merchant A’s fraud problem.

+

Hadoop Storm

Batch processing system than can churn huge volume of data

Real-time complex event processing system then can

process data stream

Speed Layer

Only New Data

Compensates for high latency ‘Serving Layer’

updates

‘Batch Layer’ overrides ‘Speed Layer’

Serving Layer

Loads and expose the batch views for querying

Random access to batch views

Batch Layer

Immutable, constantly growing datasets

Batch views computed from this raw dataset

This gives us our Lambda Architecture.

Real Time Big Data = Storm Process + Hadoop Process

Use the history data produce by Hadoop to make the to make your real time result faster, and more accurate

You can build this out in hours!

A simple combination of Azure Queues, SQL Azure, Azure VM’s running Cassandra, Hadoop and Storm