Thomas Kejser Senior Program Manager Microsoft Corp. Introducing Parallel Data Warehouse (The...

Thomas KejserSenior Program ManagerMicrosoft Corp.

Introducing Parallel Data Warehouse(The project formerly known as Madison)

2

AgendaThe Typical problem with data warehouses

MPP vs SMP

SQL Server Parallel Data WarehouseHardware architecture

Query Processing

Data Loading

My email: [email protected]

mailto:[email protected]

3

Introducing Parallel Data Warehouse

The Typical Problem with Data Warehouses

11

Microsoft DW Solutions

SSRS SSAS SSIS

Microsoft & PartnerServices

12

Symmetric Multi-Processing vs. Massively Parallel

Processing

HW advancements increasing ability to scale-up

But scaling limited by design

High end SMP very expensive

Extremely high concurrency for simple workloads

Less than 1-2 TB of data SMP will almost always be better.

At higher sizes - depends

HW advancements increasing ability to scale-out

Scaling to 1 PB+

Scale out is relatively low cost

Relatively high concurrency for complex workloads

> 2TB up to 1 PB for DW workloads

Data Warehousing(esp. VLDB, complex workloads)

OLTP, Transactional,Data Warehousing

MPPSMP

13

PDW: No Assembly RequiredSoftware

Servers

Storage arrays

Network switches

Cables

Licenses

Power distribution units

Racks

Comes fully assembled

Software is installed at the factory

Fully configured

14

Basic Building BlocksCompute Nodes

Handles the CPU cycles required to answer queries

Storage NodesStores data using Fiber Attached Disks.

Scaled to support CPU with enough throughput

Other nodesMore about those later

15

Anatomy of a Compute Node

Pre-configured For Each SQL Server Instance On Each Compute Node.

Drives Configured As RAID1 To Avoid Appliance Failover for a Single Drive Failure

IBM Compute Nodes Will Have 1 Lun (1 RAID1 Pair)

Dell Compute Nodes Will Have 2 Lun’s (2 RAID1 Pairs)

HP Compute Nodes Will Have 3 Luns’s (3 RAID1 Pairs)

TempDB: Sort-work Area For Data Loading Into Clustered Index Tables

Work Area for PDW Temporary Work Files

Spill Area For Hash Joins Not Fitting Into Memory

16

Anatomy of a Storage Node

Pre-configured4 RAID10 Pairs for Primary User Data

1 RAID10 Pair for Database Logs

2 LUN’s Are Spread Across Each RAID Pair

User Databases are Separate Physical SQL Server Databases

Staging Database (Optional) Used for Loading & to Minimize Fragmentation

17

More Node TypesBackup node:

Stores backup files from the appliance

Can be logged into by authorized Windows users

Can be augmented with 3rd party H/W and S/W

Landing Zone:Used as a holding place for data to be loaded

Can be logged into by authorized Windows users

Can be augmented with 3rd party H/W and S/W

Management node:Runs the Windows domain controller (Active Directory)

Used for deploying patches to all nodes in the appliance

Holds images in case a node needs reimaging

18

Putting It All Together - PDW

Control Node

Failover Protection:• Redundant Control Node• Redundant Compute Node• Cluster Failover

•Redundante Array of Inexpensive Databases

Spare Node

19

Software Architecture

SQL Server

DW Authenticati

on

DW Configuratio

n

DW Schema

TempDB

MPP EngineData Movement

Service

IIS

Compute NodesCompute Nodes

Compute Node

Query Tool

SQL Server

Data Movement Service

User Data

Admin Console

MS BI(AS, RS)

Control Node

Other 3rd

Party Tools

OLEDB, ODBC, ADO.Net, JDBC

DWSQLInternet Explorer

Landing Zone Node


20

Create DatabaseCREATE DATABASE database_name WITH ( AUTOGROW = ON , REPLICATED_SIZE = 1024

, DISTRIBUTED_SIZE = 16384 , LOG_SIZE = 300

)

21

Date Dim

D_DATE_SK

D_DATE_ID

D_DATE

D_MONTH

…

Item

I_ITEM_SK

I_ITEM_ID

I_REC_START_

DATE

I_ITEM_DESC

…

Store Sales

Ss_sold_date_sk

Ss_item_sk

Ss_customer_sk

Ss_cdemo_sk

Ss_store_sk

Ss_promo_sk

Ss_quantity

…

Promotion

P_PROMO_SK

P_PROMO_ID

P_START_DATE

_SK

P_END_DATE_

SK

…

Store

S_STORE_SK

S_STORE_ID

S_REC_START_D

ATE

S_REC_END_DAT

E

S_STORE_NAME

…

Customer

C-

CUSTOMER_SK

C_CUSTOMER_I

D

C_CURRENT_AD

DR

…

Customer

Demographics

CD_DEMO_SK

CD_GENDER

CD_MARITAL_STATU

S

CD_EDUCATION

…

Database Distributed & Replicated Tables

Data Distribution with Replication

C I

D

CD

S

P

C I

D

CD

S

P

C I

D

CD

S

P

C I

D

CD

S

P

C I

D

CD

S

P

C I

D

CD

S

P

SS

SS

SS

SS

SS

SS

Distribution and Replication

22

Table CreationCREATE TABLE table_name [ ( { <column_definition> } [ ,...n ] ) [ AS SELECT select_criteria ] [ WITH ( <table_option> ) ] [;] <column_definition> ::= column_name <data_type> [ NULL | NOT NULL ] <data

type> ::= type_name [ ( precision [ , scale ] ) ] <table_option> ::= { [ CLUSTER_ON ( column_name [ ,...n ] ) ]

, [ DISTRIBUTE_ON ( column_name ) ] | [ REPLICATE ] , [ PARTITION_ON column_name ( RANGE { LEFT | RIGHT } FOR VALUES

{ [ boundary_value [,...n] ] ) ) ] }

Type Class Types Supported

Integers tinyint, smallint, int, bigint

Floating point float, real

Character char, varchar, nchar, nvarchar

Date & time date, time, datetime, dateime2, datetimeoffset,timestamp, smalldatetime

Fixed point decimal, money, smallmoney

Binary binary, varbinary (8192)

Other uniqueidentifier (?)

23

Create Table – Behind the ScenesCreate Table store_sales withdistribute_on (ss_item_sk) partition_on(ss_sold_date_sk)cluster_on (ss_sold_date_sk)

8K8K

8K8K

8K

8 Filegroups (one per core) - 1 Table per Filegroup

12 Partitions(ss_sold_date_sk)

N-number ofPages

Row

24

Physical File Layout (Per Compute Node)

25

MPP Query Processing

Control Node

Query Rewritten Into Steps That Run Efficiently On Compute Nodes

ODBC/JDBCSQL92 with Analytical Extensions

Distribution-incompatible JoinsResolved Using High Speed Dynamic Re-distribution

Select location, yearsum(b.sales_amt)from customer a, sales bwhere b.sales > 500 anda.custid = b.custidgroup by 2,1order by 1,2

26

MPP Execution PlansThe MPP engine creates parallel execution plans from client SQL

The plans can include the following types of operations:

SQL operations: used to pass SQL directly to SQL Server on 1 or more nodes.

DMS operations: used to move data among the nodes in an appliance for further processing.

Temp tables operations: used to stage data for further processing.

Return operations: push data back to the client.

Simple plans may include just one type of operation.

Complex plans may include all of these operations.

Plans are executed serially, one step at a time.

27

Date Dim

D_DATE_SK

D_DATE_ID

D_DATE

D_MONTH

…

Item

I_ITEM_SK

I_ITEM_ID

I_REC_START_

DATE

I_ITEM_DESC

…

Store Sales

Ss_sold_date_sk

Ss_item_sk

Ss_customer_sk

Ss_cdemo_sk

Ss_store_sk

Ss_promo_sk

Ss_quantity

…

Promotion

P_PROMO_SK

P_PROMO_ID

P_START_DATE

_SK

P_END_DATE_

SK

…

Store

S_STORE_SK

S_STORE_ID

S_REC_START_D

ATE

S_REC_END_DAT

E

S_STORE_NAME

…

Customer

C-

CUSTOMER_SK

C_CUSTOMER_I

D

C_CURRENT_AD

DR

…

Customer

Demographics

CD_DEMO_SK

CD_GENDER

CD_MARITAL_STATU

S

CD_EDUCATION

…

Data Distribution with Replication Sales table distributed

on customer

... And partitioned by time

Example Schema

28

Distribution Compatible QuerySELECT CustomerId, SUM(Amount) AS TotalSales,

SUM(Quantity) AS TotalUnitsSold

FROM Sales s

JOIN Item i ON s.ItemId = i.ItemId

WHERE SaleDate BETWEEN '2009-08-01' AND '2009-08-31‘ AND Description LIKE '%gadgets%'

GROUP BY CustomerId

ORDER BY CustomerId;

29

MPP Query PlanStep 1 – On each compute node:

SELECT s.[customerid], sum(s.[amount]) AS totalsales, sum(s.[quantity]) AS totalunitssold

FROM [tpch_3].[dbo].[h_sales_34] s JOIN [tpch_3].[dbo].item_37 I ON (s.[itemid] = i.[itemid])

WHERE (s.[saledate] BETWEEN '2009-08-01' AND '2009-08-31' and i.[description] like '%gadgets%')

GROUP BY s.[customerid]

ORDER BY s.[customerid];

30

Query 1 Processing Flow

SQL Server

DW Authenticati

on

DW Configuratio

n

DW Schema

TempDB

Data Movement

Service

Compute Node 1

Query Tool

SQL Server


User Data

Control Node

MPP Engine

Parse SQL

Validate & AuthorizeBuild MPP Plan

Execute Plan

Return Data to Client

Compute Node N

SQL Server


User Data

31

Reshuffling the dataSELECT SaleDate, SUM(Amount) AS TotalSales,

SUM(Quantity) AS TotalUnitsSold

FROM Sales s JOIN Item i ON s.ItemId = i.ItemId

WHERE SaleDate BETWEEN '2009-08-01' AND '2009-08-31' AND Description LIKE '%gadgets%‘

GROUP BY SaleDate

ORDER BY SaleDate;

32

MPP Query PlanStep 1 – Create temp table on control node

CREATE TABLE [tempdb].[dbo].Q_[TEMP_ID_6760]

( saledate DATE, totalsales DECIMAL(38, 2), totalunitssold INTEGER )

WITH (DATA_COMPRESSION = PAGE);

Step 2 – Run on each compute node

SELECT s.[saledate], sum(s.[amount]) AS totalsales, sum(s.[quantity]) AS totalunitssold

FROM [tpch_3].[dbo].[h_sales_34] s JOIN [tpch_3].[dbo].item_37 i ON (s.[itemid] = i.[itemid])

WHERE (s.[saledate] BETWEEN '2009-08-01' AND '2009-08-31' and i.[description] like '%gadgets%’)

GROUP BY s.[saledate]

33

MPP Query Plan continuedStep 3:

SELECT [saledate], sum([totalsales]) AS totalsales, sum([totalunitssold]) AS totalunitssold

FROM [tempdb].[dbo].Q_[TEMP_ID_6760]

GROUP BY [saledate]

ORDER BY [saledate]

Step 4:

DROP TABLE [tempdb].[dbo].Q_[TEMP_ID_6760];

34

Reshuffling – Query Processing Flow

SQL Server

DW Authenticati

on

DW Configuratio

n

DW Schema

TempDB

Data Movement

Service

Compute Node

Query Tool

SQL Server


User Data

Control Node

MPP Engine

Parse SQL

Validate & AuthorizeBuild MPP Plan

Execute Plan

Return Data to Client Compute Node

SQL Server


User Data

35

Control Node

Spare Node

Landing Zone Node

Text FileText

FileText FileText

File

Data Loading

Tables Are Hash Distributed Or

Replicated

36

Load File

Bulk Insert

Partitioned Staging

Table(Heap)

Insert-Select

Partitioned FinalTable(CIDX)

Sort each BATCH

in memory

or TempDB

Sort each partition

In memory

or TempDB

Bulk Insert Phase

Trace Flags

None

BATCHSIZE

Calculated

TABLOCK ON

TempDBEntire BATCHSIZE for Sort

TempDB Log

Minimal

StageDB Log

Minimal

ROLLBACK

Commits per BATCHSIZERollback to last BATCH Only

Trace Flags

610 per NUMA Session

MAXDOP 1 Per NUMA Session

TABLOCK OFF

TempDBEntire PARTITION for sort

TempDB Log

Minimal

UserDB Log

Twice Data File Size

ROLLBACK

Commits Full TRANSACTIONRollback Full TRANSACTION

Insert-Select Phase

Data Loader Process

37

© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after

the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Date post:	15-Dec-2015
Category:	Documents
Upload:	mark-rudge
View:	219 times
Download:	1 times

Thomas Kejser Senior Program Manager Microsoft Corp. Introducing Parallel Data Warehouse (The...

Documents