Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… ·...

Post on 12-Oct-2018

219 views 0 download

transcript

Microsoft Big Data EssentialsModule 1 - Introduction to Big Data

Saptak Sen, MicrosoftBill Ramos, Advaiya

• Why Big Data?

• Big Data Lambda Architecture

• Getting started with Windows Azure HDInsight Service

Agenda

The Business Imperative

1. 2. 4. 3. Human Fault Tolerance

Minimize CapEx Low Learning CurveHyper Scale on Demand

CAP Theorem

Consistency

C

Partition Tolerance

PAvailabili

ty

A

Big Data Lambda Architecture

Big Data Lambda Architecture• Batch layer• Stores master dataset• Compute arbitrary views

• Speed layer• Fast, incremental algorithms• Batch layer eventually

overrides speed layer

• Serving layer• Random access to batch

views• Updated by batch layer

Serving Layer

Speed Layer

Batch Layer

The Batch Layer

• Stores master dataset (in append mode)

• Unrestrained computation

• Horizontally scalable

• High latency

Incoming data

streamsMaster dataset

Batch views

The Speed Layer

• Stream processing of data

• Stores a limited window of data

• Dynamic computation

Real-time increments

Incoming data

streams

Process stream

Increment views

Real-time views

The Serving Layer

• Queries the batch and real-time views

• Merges the resultsReal-time views

Batch views

Querying and

mergingOutput

Microsoft Lambda Architecture Support Serving LayerSpeed LayerBatch Layer

Windows Azure HDInsightAzure Blob storageMapReduce, Hive, Pig, Oozie, SSIS

Federations in Windows Azure SQL Database Azure tablesMemcached/MongoDBSQL Server database engineSQL Server VM:• Columnstore

indexes• Analysis Services• StreamInsight

Azure Storage ExplorerMicrosoft ExcelPower QueryPowerPivot Power ViewPower MapReporting ServicesLINQ to HiveAnalysis Services

Serving LayerSpeed LayerBatch Layer

Apache Hadoop

Yahoo!

SQL Server Analysis Service (SSAS)Microsoft Excel and PowerPivotOther BI Tools and Custom Applications

Hadoop Data

Third Party Database

SQL Server Analysis Services

(SSAS Cube)

+Custom

Applications

SQL Server Connector (Hadoop Hive ODBC)

Staging Database

Microsoft Excel & PowerPivot for

Excel

Serving LayerSpeed LayerBatch Layer

Windows Azure HDInsight

Ferranti Computer Systems

Microsoft Dynamics AXSQL Server Analysis ServicesSQL Server Reporting Services

SQL Server (In-Memory OLTP)

Data Feed from Smart Meters

Reactive Extensions (Rx)SQL Server Database (In-Memory OLTP)

Reactive Extensions (Rx)

Windows Azure

HDInsight

SQL Server Analysis Services

SQL Server ReportingServices

Microsoft Dynamics

AX

Windows Azure Storage

Serving LayerSpeed LayerBatch Layer

Azure Blob storage

Windows AzureBlob storage

Demo 1: Setting up the Windows Azure storage account

Azure Storage Explorer

Azure Storage Explorer

Blob Storage Concepts• Store large amounts of

unstructured text or binary data with the fastest read performance

• Highly scalable, durable, and available file system

• Blobs can be exposed publically over HTTP

• Securely lock down permissions to blobs

BlobContainer

Account

Images

PIC01.JPG

Video

VID1.AVI

http://<account>.blob.core.windows.net/<container>/<blobname>

Pages/Blocks

Block/Page

Block/Page

PIC02.JPGContoso

Getting started with HDInsight Service

Demo 2: Setting up the Windows Azure HDInsight cluster

Windows Azure HDInsightAzure Blob storage

Windows AzureHDInsight

Windows AzureBlob storage

HDInsight Console

HDInsight Console

https://<ClusterName>.azurehdinsight.net/

Serving LayerSpeed LayerBatch Layer

Demo 3: Loading data into Windows Azure storage for use with HDInsight

Windows Azure HDInsightAzure Blob storage

Windows AzureHDInsight

Windows AzureBlob storage

HDInsight Console

HDInsight Console

https://<ClusterName>.azurehdinsight.net/

Serving LayerSpeed LayerBatch Layer

CSV files from local disk

Easy Access to Data, Big & Small

Easy Access to Data, Big & SmallSimplify access to public & corporate dataEasily preview, shape, & format your data

Combine and refine data across multiple sourcesGain insight across relational, unstructured, & semi-structured data

Common management of structured & unstructured dataQuery across relational DB & Hadoop with single T-SQL Query

Power QueryWindows Azure MarketplaceWindows Azure HDInsight ServiceParallel Data Warehouse with Polybase

Questions?