+ All Categories
Home > Documents > Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… ·...

Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… ·...

Date post: 12-Oct-2018
Category:
Upload: vuongxuyen
View: 219 times
Download: 0 times
Share this document with a friend
23
Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya
Transcript
Page 1: Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… · PPT file · Web viewMicrosoft Big Data EssentialsModule 1 - Introduction to Big Data.

Microsoft Big Data EssentialsModule 1 - Introduction to Big Data

Saptak Sen, MicrosoftBill Ramos, Advaiya

Page 2: Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… · PPT file · Web viewMicrosoft Big Data EssentialsModule 1 - Introduction to Big Data.

• Why Big Data?

• Big Data Lambda Architecture

• Getting started with Windows Azure HDInsight Service

Agenda

Page 3: Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… · PPT file · Web viewMicrosoft Big Data EssentialsModule 1 - Introduction to Big Data.

The Business Imperative

1. 2. 4. 3. Human Fault Tolerance

Minimize CapEx Low Learning CurveHyper Scale on Demand

Page 4: Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… · PPT file · Web viewMicrosoft Big Data EssentialsModule 1 - Introduction to Big Data.

CAP Theorem

Consistency

C

Partition Tolerance

PAvailabili

ty

A

Page 5: Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… · PPT file · Web viewMicrosoft Big Data EssentialsModule 1 - Introduction to Big Data.

Big Data Lambda Architecture

Page 6: Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… · PPT file · Web viewMicrosoft Big Data EssentialsModule 1 - Introduction to Big Data.

Big Data Lambda Architecture• Batch layer• Stores master dataset• Compute arbitrary views

• Speed layer• Fast, incremental algorithms• Batch layer eventually

overrides speed layer

• Serving layer• Random access to batch

views• Updated by batch layer

Serving Layer

Speed Layer

Batch Layer

Page 7: Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… · PPT file · Web viewMicrosoft Big Data EssentialsModule 1 - Introduction to Big Data.

The Batch Layer

• Stores master dataset (in append mode)

• Unrestrained computation

• Horizontally scalable

• High latency

Incoming data

streamsMaster dataset

Batch views

Page 8: Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… · PPT file · Web viewMicrosoft Big Data EssentialsModule 1 - Introduction to Big Data.

The Speed Layer

• Stream processing of data

• Stores a limited window of data

• Dynamic computation

Real-time increments

Incoming data

streams

Process stream

Increment views

Real-time views

Page 9: Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… · PPT file · Web viewMicrosoft Big Data EssentialsModule 1 - Introduction to Big Data.

The Serving Layer

• Queries the batch and real-time views

• Merges the resultsReal-time views

Batch views

Querying and

mergingOutput

Page 10: Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… · PPT file · Web viewMicrosoft Big Data EssentialsModule 1 - Introduction to Big Data.

Microsoft Lambda Architecture Support Serving LayerSpeed LayerBatch Layer

Windows Azure HDInsightAzure Blob storageMapReduce, Hive, Pig, Oozie, SSIS

Federations in Windows Azure SQL Database Azure tablesMemcached/MongoDBSQL Server database engineSQL Server VM:• Columnstore

indexes• Analysis Services• StreamInsight

Azure Storage ExplorerMicrosoft ExcelPower QueryPowerPivot Power ViewPower MapReporting ServicesLINQ to HiveAnalysis Services

Page 11: Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… · PPT file · Web viewMicrosoft Big Data EssentialsModule 1 - Introduction to Big Data.

Serving LayerSpeed LayerBatch Layer

Apache Hadoop

Yahoo!

SQL Server Analysis Service (SSAS)Microsoft Excel and PowerPivotOther BI Tools and Custom Applications

Hadoop Data

Third Party Database

SQL Server Analysis Services

(SSAS Cube)

+Custom

Applications

SQL Server Connector (Hadoop Hive ODBC)

Staging Database

Microsoft Excel & PowerPivot for

Excel

Page 12: Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… · PPT file · Web viewMicrosoft Big Data EssentialsModule 1 - Introduction to Big Data.

Serving LayerSpeed LayerBatch Layer

Windows Azure HDInsight

Ferranti Computer Systems

Microsoft Dynamics AXSQL Server Analysis ServicesSQL Server Reporting Services

SQL Server (In-Memory OLTP)

Data Feed from Smart Meters

Reactive Extensions (Rx)SQL Server Database (In-Memory OLTP)

Reactive Extensions (Rx)

Windows Azure

HDInsight

SQL Server Analysis Services

SQL Server ReportingServices

Microsoft Dynamics

AX

Page 13: Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… · PPT file · Web viewMicrosoft Big Data EssentialsModule 1 - Introduction to Big Data.

Windows Azure Storage

Page 14: Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… · PPT file · Web viewMicrosoft Big Data EssentialsModule 1 - Introduction to Big Data.

Serving LayerSpeed LayerBatch Layer

Azure Blob storage

Windows AzureBlob storage

Demo 1: Setting up the Windows Azure storage account

Azure Storage Explorer

Azure Storage Explorer

Page 15: Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… · PPT file · Web viewMicrosoft Big Data EssentialsModule 1 - Introduction to Big Data.

Blob Storage Concepts• Store large amounts of

unstructured text or binary data with the fastest read performance

• Highly scalable, durable, and available file system

• Blobs can be exposed publically over HTTP

• Securely lock down permissions to blobs

BlobContainer

Account

Images

PIC01.JPG

Video

VID1.AVI

http://<account>.blob.core.windows.net/<container>/<blobname>

Pages/Blocks

Block/Page

Block/Page

PIC02.JPGContoso

Page 16: Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… · PPT file · Web viewMicrosoft Big Data EssentialsModule 1 - Introduction to Big Data.

Getting started with HDInsight Service

Page 17: Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… · PPT file · Web viewMicrosoft Big Data EssentialsModule 1 - Introduction to Big Data.

Demo 2: Setting up the Windows Azure HDInsight cluster

Windows Azure HDInsightAzure Blob storage

Windows AzureHDInsight

Windows AzureBlob storage

HDInsight Console

HDInsight Console

https://<ClusterName>.azurehdinsight.net/

Serving LayerSpeed LayerBatch Layer

Page 18: Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… · PPT file · Web viewMicrosoft Big Data EssentialsModule 1 - Introduction to Big Data.

Demo 3: Loading data into Windows Azure storage for use with HDInsight

Windows Azure HDInsightAzure Blob storage

Windows AzureHDInsight

Windows AzureBlob storage

HDInsight Console

HDInsight Console

https://<ClusterName>.azurehdinsight.net/

Serving LayerSpeed LayerBatch Layer

CSV files from local disk

Page 19: Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… · PPT file · Web viewMicrosoft Big Data EssentialsModule 1 - Introduction to Big Data.

Easy Access to Data, Big & Small

Page 20: Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… · PPT file · Web viewMicrosoft Big Data EssentialsModule 1 - Introduction to Big Data.

Easy Access to Data, Big & SmallSimplify access to public & corporate dataEasily preview, shape, & format your data

Combine and refine data across multiple sourcesGain insight across relational, unstructured, & semi-structured data

Common management of structured & unstructured dataQuery across relational DB & Hadoop with single T-SQL Query

Power QueryWindows Azure MarketplaceWindows Azure HDInsight ServiceParallel Data Warehouse with Polybase

Page 22: Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… · PPT file · Web viewMicrosoft Big Data EssentialsModule 1 - Introduction to Big Data.

Questions?

Page 23: Day 1 Module 1 - Introduction to Big Datadownload.microsoft.com/download/1/E/3/1E3EF370-9B… · PPT file · Web viewMicrosoft Big Data EssentialsModule 1 - Introduction to Big Data.

Recommended