+ All Categories
Home > Documents > Richard Tkachuk Senior Program Manager Microsoft DAT301.

Richard Tkachuk Senior Program Manager Microsoft DAT301.

Date post: 23-Dec-2015
Category:
Upload: sheila-whitehead
View: 215 times
Download: 0 times
Share this document with a friend
44
Transcript
Page 1: Richard Tkachuk Senior Program Manager Microsoft DAT301.
Page 2: Richard Tkachuk Senior Program Manager Microsoft DAT301.

A First Look at Large-Scale Data Warehousing in Microsoft SQL Server Code Name "Madison"

Richard TkachukSenior Program ManagerMicrosoftDAT301

Page 3: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Agenda

Concepts and PrinciplesMadison functional overviewEarly adoption

Page 4: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Symmetric Multiprocessing

Single DB instance“Shared Everything” ArchitectureServer/CPU’s share

memorydisks

Can lead to resource contention as you scale

SMP

Page 5: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Massively Parallel Processing

Server/CPU’s have their own dedicated resources“Shared Nothing” Architecture“Secret Sauce” is parallelizing operationsLightning-fast Queries, Data Loads and UpdatesLinear Scalability

Problem needs to be partitionable

MPP

Page 6: Richard Tkachuk Senior Program Manager Microsoft DAT301.

SMP vs MPP

SMP

HW advancements increasing ability to scale-up

Scaling is limitedHigh end SMP very expensive

Extremely high concurrency for some workloadsLess than 1-2 TB of data SMP will almost always be betterFull SQL Server functionalityHA must be architected in

MPP

HW advancements increasing ability to scale-up & scale-out

Scaling to 1 PB+Scale out is relatively low cost

Relatively high concurrency for complex workloads> 2 TB up to 1 PBLimited SQL Server functionalityHA is built in

Page 7: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Sequential I/O

Sequential I/O

Ideal for data warehousingScalable, predictable performanceLarge reads & writesRequires 1/3 or fewer drives for same performance

Random I/O

Ideal for OLTPNot as predictable & scalable for data warehousingSmall reads and writesRequires large number of drives

Best practices focus on preserving the sequential order of data

Page 8: Richard Tkachuk Senior Program Manager Microsoft DAT301.

About DATAllegro…

Industry Standard Networking

Proprietary Appliance Management and MPP Database

Industry Standard Storage

Open Source Database and OS

Technology Partners

Industry Standard Servers

Page 9: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Integration PlansProvide scale out through MPP on SQL Server and WindowsOffer ‘Appliance like’ user experience to Data Warehouse customersLower TCO to high end Data WarehousingOffer integrated BI platform to small and very large Enterprises

OPEN SOURCEDATABASE& OS

Microsoft BI

Reference Hardware Platforms

Industry Standard Servers

Industry StandardNetworking

Industry Standard Storage

Page 10: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Balanced Across All Components

A Holistic Approach

FCHBA

AB

AB

FCHBA

AB

AB FC

Sw

itch

STORAGECONTROLLER

AB

ABCA

CHE

SERV

ER

CACH

ESQ

L Serv

er

WIN

DO

WS

CPU

Core

s

CPU Feed Rate HBA Port Rate Switch Port Rate SP Port Rate

A

BDISK DISK

LUN

DISK DISK

LUN

SQL Server Read Ahead Rate

LUN Read Rate Disk Feed Rate

SQL Server 2008 Potential Performance Bottlenecks

Page 11: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Sequential I/O

Physical table structures, file layouts and SQL Server settings to maximize sequential I/OEnough disks to feed available CPU coresCarefully designed storage infrastructure to maximize and sustain sequential I/O

No bottlenecksWhere possible, separate I/O paths and disks for data, TempDB and logs

Page 12: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Accelerate scalable Data Warehouse deployments at lower TCO

Pre-configured, pre-tested HW reference architectures (4-32 TB)

SI Solution Templates

Fast Track DW

Appliance-like time to valueFlexibility through choice of HW platformsLow TCO through commodity hardware and value pricingReduced risk through pre-tested and pre-tuned configurationsProvides a clear upgrade path to “Madison” via Hub/Spoke

Page 13: Richard Tkachuk Senior Program Manager Microsoft DAT301.

MPP Additional Considerations

Principles & approach of SMP carry forwardDeeper level of complexity –

High AvailabilityParallelizationInter node data movement

Page 14: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Modular building blocksBalanced CPU and storage

Both SMP and MPP are based on building blocks that scale by the CPU coreAdds network, storage processing and disk bandwidth for each coreBased on maximizing & sustaining true sequential I/O while minimizing disks

Generally changes balance of systems so more can be spent on CPU and SW than on storage to give better overall performance for a given budgetBuilding blocks can be adjusted for multiple MPP configurations – high performance, archive and extreme performance

Page 15: Richard Tkachuk Senior Program Manager Microsoft DAT301.

The Future of SQL Server Data Warehousing Project "Madison"

Build on Proven Scale for SQL Server Data WarehousingPredictable Scale out through MPPCustomers with over 400 TB data warehouses

Accelerate plan to support largest Data WarehousesProvide Massive Scale with Low TCOIntegrated with Microsoft BI

Page 16: Richard Tkachuk Senior Program Manager Microsoft DAT301.

SQL Server MPP: 10,000-foot view

Appliance-like model

Hardware and Software In unison and in balance

no bottlenecks

Achieve max performance per componentFor each HW component and each SW module:

Define max performanceIdentify optimum workload type

Adjust surrounding HW/SW to achieve optimum

Packages engineering talent

Lots of knowledge, many hours of tuning, trying, testing

Hardware Software

Page 17: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Commodity Hardware

Lower costFrequent performance improvementsEasier upgrade and maintenanceHigher customer comfort Better compatibility

Page 18: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Madison MPP Data Warehouse Architecture

Control NodeActive/Passive

Landing Zone

Configuration & Monitoring

Backup

Compute Nodes

Client Drivers

ETL Load Interface

Corporate Backup Solution

Corporate Network

Private Network

Spare Node

Industry Standard SAN Storage Distributed DB

SQL

SQL

SQL

SQL

SQL

SQL

Microsoft Cluster Server

Page 19: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Ultra Shared Nothing

An extension of traditional shared nothing design

Push shared nothing architecture into SMP nodeIO and CPU affinity within SMP nodes

Eliminate contention per user queryUse full resources for each user query

Multiple physical instances of tablesDistribute large tablesReplicate small tablesDistribute AND Replicate medium tables

Re-Distribute rows “on-the-fly” when necessary

Page 20: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Control Node & Client DriversClient connections always go through the control node

Clustered to a passive nodeProcesses SQL requestsPrepares execution planOrchestrates distributed executionLocal SQL Server to do final query plan processing / result aggregationWill use same set of drivers used by DATAllegro

Provided by DataDirectODBC, OLE-DB, JDBC and Ado.Net client driversWire protocol (SeQuel Link)

Available drivers for 32 and 64 bits

Page 21: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Compute Nodes

A SQL Server 2008 instanceDB engine nodes autonomous on local dataSQL as primary interfaceEach MPP node is a highly tuned SMP node with standard interfaces

Page 22: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Landing Zone

Provides high capacity storage for data files from ETL processesIntegration services available on the landing zoneConnected to internal networkAvailable as sandbox for other applications and scripts that run on internal network.

Source Landing Zone Files

Data Loader

Compute Nodes

Page 23: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Backup Node

Builds on SQL Server native backup/restore facility

Use VDI interface to plug into backup pipelineDatabase-level backup

Coordinated backup across the nodesQuiesce write activity to synchronizeCan only restore to another appliance with exactly the same number of distributions

Page 24: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Configuration and MonitoringMadison services instrumented

Logs and Performance Counters

Capture and forward SNMP alerts from devices within the applianceSmall subset of DMVs to union underlying node DMVsLeverage HPC for monitoring

Challenge: Is it an appliance or a collection of nodes?

Page 25: Richard Tkachuk Senior Program Manager Microsoft DAT301.

High AvailabilityMultiple levels of redundancy:

• Leveraging MSCS for node availability• Cluster aware services:

• SQL Server, Madison, DMS

• Leveraging MSCS for SQL Services, DMS• 1 spare node for every 6* compute nodes

6x1

Page 26: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Security and Encryption

Retain DA v3 designAuthentication and authorization done by Madison serverUsers and Roles as first class principalsNested role capabilitiesConnection to SQL back-ends through high privilege accountSQL nodes reside on private network

No support for integrated authLeverages TDE to expose DB-level encryption

Supports key rotation

Page 27: Richard Tkachuk Senior Program Manager Microsoft DAT301.

The Logical Data Model

Multiple databases per applianceEach user database maps to one SQL Server db per node

TablesReplicated, Distributed, Replicated + DistributedLeverage SQL Server compressionSupports PartitioningSupports secondary indexes

Views

Page 28: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Data TypesMost scalar data types supported by SQL Server 2008 are supported by MadisonMain exceptions

Character and binary strings limited to 8K (i.e. no BLOB support)XMLSql-VariantSystem and CLR UDTs

Latin1_General with binary comparison only

SQL Server Data Types DAv3 Madison

bigint P P

binary

bit P

char / nchar P P

date, time P

datetime (was date in DA) P P

datetime2 P

datetimeoffset P

decimal P P

float P P

geometry / geography

hierarchyid

Int (was integer in DA) P P

money P

real P

smalldatetime P

smallint P P

smallmoney P

sql_variant

text / ntext / image

timestamp

tinyint P P

varchar / nvarchar / varbinary P P

v*(max)

uniqueidentifier

xml

Page 29: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Supported SQL Syntax

Aligned with ANSI SQL 92Basic INSERT, UPDATE, DELETE, SELECT

CREATE TABLE AS SELECT

Limited analytical function supportTeradata extensions

Quantile, Sample,…

Page 30: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Web-based main administrative user interfaceBased on DATAllegro manageability UIMonitoring system health and activity

Leveraging HPC pack 2008 Systems managementMonitoringCluster health

Manageability

Page 31: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Query ToolsGUI Tool:

Nexus (CoffingDW)Table & view object explorerInteractive query execution

Command line tool:Replacement for DA-SQLFlavor of SqlCmd

Page 32: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Tools Walk throughDemo

Page 33: Richard Tkachuk Senior Program Manager Microsoft DAT301.

MS BI Integration

Integration ServicesMadison enabled as a source

Data movement, lookup operations, etc. Will add a new SSIS destination

Ensure integrated high performance loads

Reporting ServicesFully supported; including parameterized queriesWill customize experience for report builder and report designer

Analysis ServicesWill get connectivity through OLE-DB provider Will enable both MOLAP and ROLAP storage

Page 34: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Madison - Hub & Spoke

Each business unit has own Data MartsMore responsive to business needsFits budget realities

Hub provides centralized data governance platform

Madison HUB

Madison Spoke

SQL Server DM Spoke

SQL Server AS Spoke

SQL Server DM Spoke

HR

Finance Sales

Manufacturing

Node-to-node data movementParallel over Infiniband or 10 Gig Networks~500GB per min with minimal overhead

Page 35: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Benefits of Hub-And-SpokeAll systems connect via a dedicated high speed networkParallel database copy – speeds of up to 500 GB per minSimplification of data mart ETL / ELT processes with publishing modelSeparation of management and user workloadsIntegration of SMP SS08 and MPP systemsAbility to independently expand any systemAbility to add additional spokes without impacting other usersDeployment of development and test environments that leverage parallel connectivity

Page 36: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Early AdoptionMTP – Madison Technology Preview

Our flavor of CTPAssess product and field/partners readinessProvide roadmap for competitive situationsLocation

MTC’s, Partners, other MS facilities, …Working with partners to secure hardware

2-3 week engagementsTAP – Technology Adoption Program

Closer to traditional TAPAssess production readinessLonger engagementGo-live requirementsCustomer secures hardware

Page 37: Richard Tkachuk Senior Program Manager Microsoft DAT301.

High Level Release Definitions

“Madison” (aka v1)

Focus on time to marketCompatibility with DATAllegro v3MS BI integration

H1 2010

Closer functional alignment with SQL ServerBetter integration with SQL and MS ecosystem, tools and technologies

V2+Will start

running MTPs in the

summer

Page 38: Richard Tkachuk Senior Program Manager Microsoft DAT301.

RecapData Warehousing Reference Architectures available today!

SQL Server Fast TrackSQL Server “Madison”

Built for advanced, large scale data warehousesShared-nothing MPP architecture

Early evaluation programs starting soon

All feedback welcome: [email protected]

Thank you!

Page 39: Richard Tkachuk Senior Program Manager Microsoft DAT301.

question & answer

Page 40: Richard Tkachuk Senior Program Manager Microsoft DAT301.

SQL Server Community Resources

Become a FREE PASS Member: www.sqlpass.org/RegisterforSQLPASS.aspxLearn more about the PASS organization www.sqlpass.org/

Additional Community ResourcesSQL Server Community Center www.microsoft.com/sqlserver/2008/en/us/community-center.aspxTechNet Community for IT Professionalshttp://technet.microsoft.com/en-us/sqlserver/bb671048.aspxDeveloper Center http://msdn.microsoft.com/en-us/sqlserver/bb671064.aspxSQL Server 2008 Learning Portalhttp://www.microsoft.com/learning/sql/2008/default.mspx

• Connect: Local Chapters, Special Interest Groups, Online Community• Share: PASSPort Social Networking, Community Connection Event• Learn: PASS Summit Annual Conference, Technical Articles, Webcasts

• More about the PASS organization www.sqlpass.org/

The Professional Association for SQL Server (PASS) is an independent, not-for-profit association, dedicated to supporting, educating, and promoting the Microsoft SQL Server community.

Page 41: Richard Tkachuk Senior Program Manager Microsoft DAT301.

SQL Server Word of the Day

Data Compression

Monday, May 11

*Game cards may be picked up at the SQL Server booths in the TLC

Page 42: Richard Tkachuk Senior Program Manager Microsoft DAT301.

www.microsoft.com/teched Sessions On-Demand & Community

http://microsoft.com/technet Resources for IT Professionals

http://microsoft.com/msdn Resources for Developers

www.microsoft.com/learning Microsoft Certification & Training Resources

Resources

www.microsoft.com/learningMicrosoft Certification and Training Resources

Page 43: Richard Tkachuk Senior Program Manager Microsoft DAT301.

Complete an evaluation on CommNet and enter to win!

Page 44: Richard Tkachuk Senior Program Manager Microsoft DAT301.

© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,

IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Recommended