+ All Categories
Transcript
Page 1: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.
Page 2: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Under the covers of HDP for WindowsRohit Bakhshi

DBI-B387

Page 3: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Speaker

Rohit BakhshiProduct ManagerHortonworks

Page 4: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Modern Data Architecture

Hadoop for Windows

Hortonworks Data Platform under the covers

Q&A

Agenda

Page 5: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Modern Data Architecture

Page 6: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

What Makes Up Big Data?

Megabytes

Gigabytes

Terabytes

Petabytes

Purchase detail

Purchase record

Payment record

ERP

CRM

WEB

BIG DATA

Offer details

Support Contacts

Customer Touches

Segmentation

Web logs

Offer history

A/B testing

Dynamic Pricing

Affiliate Networks

Search Marketing

Behavioral Targeting

Dynamic Funnels

User Generated Content

Mobile Web

SMS/MMSSentiment

External Demographics

HD Video, Audio, Images

Speech to Text

Product/Service Logs

Social Interactions & Feeds

Business Data Feeds

User Click Stream

Sensors / RFID / Devices

Spatial & GPS Coordinates

Increasing Data Variety and Complexity

Transactions + Interactions +

Observations=

BIG DATA

Page 7: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

A data architecture under pressure from new data

APPL

ICAT

ION

SDA

TA S

YSTE

M

REPOSITORIES

SOU

RCES Existing Sources

(CRM, ERP, Clickstream, Logs)

RDBMS EDW MPP

Business Analytics

Custom Applications

PackagedApplications

Source: IDC

2.8 ZB in 2012

85% from New Data Types

15x Machine Data by 2020

40 ZB by 2020

OLTP, ERP, CRM Systems

Unstructured documents, emails

Clickstream

Server logs

Sentiment, Web Data

Sensor. Machine Data

Geolocation

Page 8: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Hadoop within an emerging Modern Data Architecture

OPERATIONS TOOLS

Provision, Manage &Monitor

DEV & DATA TOOLS

Build & Test

DATA

SYS

TEM

REPOSITORIES

SOU

RCES

RDBMS EDW MPP

OLTP, ERP,CRM Systems

Documents, Emails

Web Logs,Click Streams

Social Networks

Machine Generated

SensorData

Geolocation Data

G

ove

rnan

ce

& I

nte

gra

tio

n

Sec

uri

ty

Op

erat

ion

sData Access

Data Management

APPL

ICAT

ION

S

Business Analytics

Custom Applications

PackagedApplications

OLTP, ERP, CRM Systems

Unstructured documents, emails

Clickstream

Server logs

Sentiment, Web Data

Sensor. Machine Data

Geolocation

Page 9: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Hadoop: typically used for new analytic applications…

SC

ALE

SCOPE

New Analytic AppsNew types of dataLOB-driven

Page 10: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

… and incrementally delivers a ‘Data Lake’S

CA

LE

SCOPE

A Modern Data Architecture/Data Lake

New Analytic AppsNew types of dataLOB-driven

RDBMS

MPP

EDW

Go

vern

ance

&

In

teg

rati

on

Sec

uri

ty

Op

erat

ion

sData Access

Data Management

Data LakeAn architectural shift in the data center that uses Hadoop to deliver deeper insight across a large, broad, diverse set of data at efficient scale

Page 11: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Hadoop for Windows

Page 12: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

HDP for Windows

HDP 2.1Hortonworks Data Platform

Provision, Manage & Monitor

Ambari (SCOM)Zookeeper

Scheduling

Oozie

Data Workflow, Lifecycle &

Governance

FalconSqoopFlume

WebHDFS YARN : Data Operating System

DATA MANAGEMENT

SECURITYDATA ACCESSGOVERNANCE &

INTEGRATION

AuthenticationAuthorization

AccountingData Protection

Storage: HDFSResources: YARNAccess: Hive, … Pipeline: Falcon

Cluster: Knox

OPERATIONS

Script

Pig

Search

Solr

SQL

Hive/Tez, HCatalog

NoSQL

HBase

Stream

Storm

Others

In-Memory Analytics,

ISV engines

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° °

°

°

N

HDFS (Hadoop Distributed File System)

Batch

Map Reduce

Deployment ChoiceLinux Windows On-

Premise Cloud

Hortonworks Data Platform (HDP)

The Only Completely Open Distribution for Apache Hadoop

Fundamentally Versatile and Comprehensive enterprise capabilities

Wholly Integrated for deep ecosystem interoperability

Page 13: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

HDP: Enterprise Data PlatformHDP certifies most recent & stable community innovation

Hortonworks Data Platform

S

olr

H

ad

oo

p

&Y

AR

N

P

ig

Te

z

H

ive

& H

Cat

alog

H

Ba

se

S

qo

op

O

ozi

e

Z

oo

ke

ep

er

M

ah

ou

t

A

mb

ari

S

torm

F

lum

e

K

no

x

P

ho

en

ix

2.2.0

1.1.2

0.11.0

0.11.0

0.12.0

0.12.0

HDP 1.3

May

2013

2.4.0 0.12.1

HDP 2.0

October

2013

HDP 2.1

April

2014

SecurityOperationsData AccessData

Management

0.13.0

0.94.6

0.96.1

0.98.0

0.9.1

0.7.0

0.8.0

0.9.04.7.2

1.4.3

1.4.4

1.3.1

1.4.0

1.2.5

1.4.4

1.5.1

3.3.2

4.0.0

3.4.5

0.4.0

0.4.04.0.0

F

alc

on

0.5.0

Governance & Integration

Page 14: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Seamless InteroperabilityIntegrations with Microsoft tools for native big data analysis

SOU

RCES

APPL

ICAT

ION

S

OPERATIONAL TOOLS

DEV & DATA TOOLS

INFRASTRUCTURExΩ

a

DATA

SYS

TEM

HDInsight

Azure

New! Power BI

Page 15: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Right Tool for the Right Usage

TraditionalDatabase

SCALE (storage & processing)

HadoopPlatform

NoSQLMPPAnalytics

EDW

schema

speed

governance

best fit use

processing

Required on write Required on read

Reads are fast Writes are fast

Standards and structured Loosely structured

Limited, no data processing

Processing coupled with data

data typesStructured Multi and unstructured

Interactive OLAP AnalyticsComplex ACID Transactions

Operational Data Store

Data DiscoveryProcessing unstructured dataMassive Storage/Processing

Page 16: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Maximize Hadoop Deployment ChoiceHortonworks Data Platform (HDP) for Windows100% Apache open source Hadoop software for Windows Server

Microsoft Azure HDInsightHadoop-based managed service in the cloud via Microsoft Azure

Microsoft Analytics Platform System (APS)Scale-out appliance with data warehousing and Hadoop in one box

All offerings co-engineered by Hortonworks and MicrosoftEnjoy seamless interoperability across on-premises and cloud

Page 17: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

HDP under the covers

Page 18: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Data Operating System of Hadoop

Single Cluster, Shared Data Set, Multiple WorkloadsSupport a range of access patternsShared operational services

HDP 2.1: Core Platform

DATA ACCESS

YARN : Data Operating System

DATA MANAGEMENT

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° °

°

°

N

HDFS (Hadoop Distributed File System)

Script

Pig

Search

Solr

SQL

Hive/Tez, HCatalog

NoSQL

HBaseAccumulo

Stream

Storm

Others

In-Memory Analytics,

ISV engines

Batch

Map Reduce

Page 19: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

YARN: Next Generation HadoopSingle Use System

Batch AppsMulti Use Data Platform

Batch, Interactive, Online, Streaming, …

1st Gen of Hadoop

HDFS(redundant, reliable storage)

MapReduce(cluster resource management

& data processing)

Redundant, Reliable Storage(HDFS)

Efficient Cluster Resource Management & Shared Services

(YARN)

Flexible DataProcessing

Hive, Pig, others…

BatchMapReduce

Batch & InteractiveTez

Online Data Processing

HBase, Accumulo

Stream Processing

Stormothers

2nd Gen of HadoopClassic

Hadoop Apps

Page 20: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

YARN: Data Operating System

NodeManager NodeManager NodeManager NodeManager

map 1.1

vertex1.2.2

NodeManager NodeManager NodeManager NodeManager

NodeManager NodeManager NodeManager NodeManager

map1.2

reduce1.1

Batch

vertex1.1.1

vertex1.1.2

vertex1.2.1

Interactive SQL

ResourceManager

Scheduler

Real-Time

nimbus0

nimbus1

nimbus2

Page 21: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

HDP 2.1 SQL Access: Stinger InitiativeStinger Initiative

Next generation SQL based interactive query in Hadoop

SpeedInteractive Hive Query response

Scalequeries that scale from TB to PB

SQLbroadest range of SQL semantics for analytic applications

Business Analytics CustomApps

Apache YARN

Apache MapReduce

1

°

°

°

°

°

°

°

°

°

°

°

°

°

N

Apache Tez

Apache Hive

SQL

°

°

°

°

°

°

HDFS (Hadoop Distributed File System)

Apache Hive Contribution… an Open Community at its finest

1,672Jira Tickets Closed

145Developers

44Companies

~390,000Lines Of Code Added… (2x)

13Months

Page 22: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Apache Tez (“Speed”)Replaces MapReduce as primitive for Hive, Pig, etc

Task with pluggable Input, Processor and Output

Tez Task - <Input, Processor, Output>

Task

ProcessorInput Output

Page 23: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Hive with Tez as execution engine

Hive – MR Hive – Tez

SELECT a.state

JOIN (a, c)SELECT c.price

SELECT b.id

JOIN(a, b)GROUP BY a.state

COUNT(*)AVERAGE(c.price)

M M M

R R

M M

R

M M

R

M M

R

HDFS

HDFS

HDFS

M M M

R R

R

M M

R

R

SELECT a.state,c.itemId

JOIN (a, c)

JOIN(a, b)GROUP BY a.state

COUNT(*)AVERAGE(c.price)

SELECT b.id

SELECT a.state, COUNT(*), AVERAGE(c.price)

FROM a

JOIN b ON (a.id = b.id)

JOIN c ON (a.itemId = c.itemId)

GROUP BY a.state

Tez avoids unneeded writes to

HDFS

Page 24: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Hive: Enhanced SQL SemanticsHive SQL Datatypes Hive SQL SemanticsINT SELECT, INSERT

TINYINT/SMALLINT/BIGINT GROUP BY, ORDER BY, SORT BY

BOOLEAN JOIN on explicit join key

FLOAT Inner, outer, cross and semi joins

DOUBLE Sub-queries in FROM clause

STRING ROLLUP and CUBE

TIMESTAMP UNION

BINARY Windowing Functions (OVER, RANK, etc)

DECIMAL Custom Java UDFs

ARRAY, MAP, STRUCT, UNION Standard Aggregation (SUM, AVG, etc.)

DATE Advanced UDFs (ngram, Xpath, URL)

VARCHAR Sub-queries for IN/NOT IN, HAVING

CHAR Expanded JOIN Syntax

INTERSECT / EXCEPT

Hive 0.12 (HDP 2.0)

Hive 0.11

Hive 0.13 (HDP 2.1)

SQL ComplianceHive provides a wide array of SQL datatypes and semantics so your existing tools integrate more seamlessly with Hadoop

Page 25: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

HDP 2.1: Data Governance & IntegrationApache FalconSimplified Data Governance for Enterprise Hadoop

Provides key governance framework for:Acquisition & processing of data setsReplication & Retention of datasetsRedirect datasets to non-Hadoop extensionsProvides audit trail & lineage

Page 26: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Apache Falcon: ReplicationDisaster Recovery and Backup between environments

Publishing data between environments for Discovery

Site to Site

Site to Cloud

Page 27: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Apache Falcon: RetentionDefine sophisticated retention policiesSimplify data retention for audit, compliance, or for data re-processing

Staged Data

Retain 5 Years

Cleansed Data

Retain 3 Years

Conformed Data

Retain 3 Years

Presented Data

Retain Last Copy Only

Page 28: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

HDP 2.1: SearchApache SolrOpen source enterprise search for Hadoop

Simple, powerful UI for advanced search applications

High performance indexing & sub-second search times over billions of documents

Page 29: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Search Architecture

HDFS (Hadoop Distributed File System)

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

Raw FilesIndexed

Documents

MapReduce Indexing Job

Solr Solr Solr

Lucene

HTMLPDFWordXMLLogs

Search Web App

Query

Response

Page 30: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

HDP 2.1: Stream ProcessingApache StormReal-time event processing for sensor and business activity monitoring

Unlocks new business cases for Hadoop

Scale: Ingest millions of events per second. Fast query on petabytes of data

http://storm.incubator.apache.org/

Page 31: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

HDP 2.1: Perimeter SecurityApache KnoxA common place to preform authentication across Hadoop and all related projects

Integrated to LDAP and AD

Secure interfaces for:WebHDFS, WebHCAT, Oozie, Hive & HBase

Broad community effort, Incubated with Microsoft, broad set of developers invovled

Page 32: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Apache Knox: Perimeter Security

EnterpriseIdentityProvider

LDAP/AD

Identity Providers

Knox Gateway

GW

DMZ

A stateless reverse proxy instance deployed in DMZ

Firew

all

HDP Cluster 1

Masters

JTNNWebHCat

Oozie

YARNHBaseHive

DN TT

HDP Hadoop Cluster 2

Masters

JTNNWebHCat

Oozie

YARNHBaseHive

DN TT

-Requests streamed through GW to Hadoop services after auth. -URLs rewritten to refer to gateway

Firew

all

RESTClient

JDBCClient

Browser

Page 33: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Operating Enterprise HadoopAmbari: Deploy, Manage, Monitor

AMBARI WEB

compute&

storage. . .

. . .

. .compute

&storage

.

.

PROVISION

MANAGE

MONITOR

REST APIs

AMBARI SERVERPROVISION | MANAGE | MONITOR

Page 34: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Enables Microsoft System Center Operations Manager (SCOM) to monitor HadoopAmbari SCOM Management Pack gives insight into the performance and health of HadoopAmbari SCOM leverages the Ambari framework to aggregate and expose Hadoop metrics

Ambari SCOM

Ambari SCOMMgmtPack

HADOOPStorage & Process

at Scale

AmbariSCOMServer

Ambari SCOM Server aggregates + exposes Hadoop metrics

Ambari SCOM monitors health + alerts in case of problems

Page 35: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

HDP - Reference Architecture

Page 36: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

For More InformationWebhortonworks.com/products/hdp-windows/hortonworks.com/labs/microsoft/microsoft.com/bigdata

Traininghortonworks.com/hadoop-training/hadoop-on-windows/

Online documentationdocs.hortonworks.com

Forumshortonworks.com/community/forums/

Page 37: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Questions?

Page 38: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Track resources

Download Microsoft SQL Server 2014 http://www.trySQLSever.com

Try out Power BI for Office 365! http://www.powerbi.com

Sign up for Microsoft HDInsight today! http://microsoft.com/bigdata

Page 39: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Resources

Learning

Microsoft Certification & Training Resources

www.microsoft.com/learning

msdn

Resources for Developers

http://microsoft.com/msdn

TechNet

Resources for IT Professionals

http://microsoft.com/technet

Sessions on Demand

http://channel9.msdn.com/Events/TechEd

Page 40: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Complete an evaluation and enter to win!

Page 41: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

Evaluate this session

Scan this QR code to evaluate this session.

Page 42: Megabytes Gigabytes Terabytes Petabytes Purchase detail Purchase record Payment record Purchase detail Purchase record Payment record ERP CRM.

© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Top Related