Date post: | 23-Dec-2015 |
Category: |
Documents |
Upload: | sheila-whitehead |
View: | 215 times |
Download: | 0 times |
A First Look at Large-Scale Data Warehousing in Microsoft SQL Server Code Name "Madison"
Richard TkachukSenior Program ManagerMicrosoftDAT301
Agenda
Concepts and PrinciplesMadison functional overviewEarly adoption
Symmetric Multiprocessing
Single DB instance“Shared Everything” ArchitectureServer/CPU’s share
memorydisks
Can lead to resource contention as you scale
SMP
Massively Parallel Processing
Server/CPU’s have their own dedicated resources“Shared Nothing” Architecture“Secret Sauce” is parallelizing operationsLightning-fast Queries, Data Loads and UpdatesLinear Scalability
Problem needs to be partitionable
MPP
SMP vs MPP
SMP
HW advancements increasing ability to scale-up
Scaling is limitedHigh end SMP very expensive
Extremely high concurrency for some workloadsLess than 1-2 TB of data SMP will almost always be betterFull SQL Server functionalityHA must be architected in
MPP
HW advancements increasing ability to scale-up & scale-out
Scaling to 1 PB+Scale out is relatively low cost
Relatively high concurrency for complex workloads> 2 TB up to 1 PBLimited SQL Server functionalityHA is built in
Sequential I/O
Sequential I/O
Ideal for data warehousingScalable, predictable performanceLarge reads & writesRequires 1/3 or fewer drives for same performance
Random I/O
Ideal for OLTPNot as predictable & scalable for data warehousingSmall reads and writesRequires large number of drives
Best practices focus on preserving the sequential order of data
About DATAllegro…
Industry Standard Networking
Proprietary Appliance Management and MPP Database
Industry Standard Storage
Open Source Database and OS
Technology Partners
Industry Standard Servers
Integration PlansProvide scale out through MPP on SQL Server and WindowsOffer ‘Appliance like’ user experience to Data Warehouse customersLower TCO to high end Data WarehousingOffer integrated BI platform to small and very large Enterprises
OPEN SOURCEDATABASE& OS
Microsoft BI
Reference Hardware Platforms
Industry Standard Servers
Industry StandardNetworking
Industry Standard Storage
Balanced Across All Components
A Holistic Approach
FCHBA
AB
AB
FCHBA
AB
AB FC
Sw
itch
STORAGECONTROLLER
AB
ABCA
CHE
SERV
ER
CACH
ESQ
L Serv
er
WIN
DO
WS
CPU
Core
s
CPU Feed Rate HBA Port Rate Switch Port Rate SP Port Rate
A
BDISK DISK
LUN
DISK DISK
LUN
SQL Server Read Ahead Rate
LUN Read Rate Disk Feed Rate
SQL Server 2008 Potential Performance Bottlenecks
Sequential I/O
Physical table structures, file layouts and SQL Server settings to maximize sequential I/OEnough disks to feed available CPU coresCarefully designed storage infrastructure to maximize and sustain sequential I/O
No bottlenecksWhere possible, separate I/O paths and disks for data, TempDB and logs
Accelerate scalable Data Warehouse deployments at lower TCO
Pre-configured, pre-tested HW reference architectures (4-32 TB)
SI Solution Templates
Fast Track DW
Appliance-like time to valueFlexibility through choice of HW platformsLow TCO through commodity hardware and value pricingReduced risk through pre-tested and pre-tuned configurationsProvides a clear upgrade path to “Madison” via Hub/Spoke
MPP Additional Considerations
Principles & approach of SMP carry forwardDeeper level of complexity –
High AvailabilityParallelizationInter node data movement
Modular building blocksBalanced CPU and storage
Both SMP and MPP are based on building blocks that scale by the CPU coreAdds network, storage processing and disk bandwidth for each coreBased on maximizing & sustaining true sequential I/O while minimizing disks
Generally changes balance of systems so more can be spent on CPU and SW than on storage to give better overall performance for a given budgetBuilding blocks can be adjusted for multiple MPP configurations – high performance, archive and extreme performance
The Future of SQL Server Data Warehousing Project "Madison"
Build on Proven Scale for SQL Server Data WarehousingPredictable Scale out through MPPCustomers with over 400 TB data warehouses
Accelerate plan to support largest Data WarehousesProvide Massive Scale with Low TCOIntegrated with Microsoft BI
SQL Server MPP: 10,000-foot view
Appliance-like model
Hardware and Software In unison and in balance
no bottlenecks
Achieve max performance per componentFor each HW component and each SW module:
Define max performanceIdentify optimum workload type
Adjust surrounding HW/SW to achieve optimum
Packages engineering talent
Lots of knowledge, many hours of tuning, trying, testing
Hardware Software
Commodity Hardware
Lower costFrequent performance improvementsEasier upgrade and maintenanceHigher customer comfort Better compatibility
Madison MPP Data Warehouse Architecture
Control NodeActive/Passive
Landing Zone
Configuration & Monitoring
Backup
Compute Nodes
Client Drivers
ETL Load Interface
Corporate Backup Solution
Corporate Network
Private Network
Spare Node
Industry Standard SAN Storage Distributed DB
SQL
SQL
SQL
SQL
SQL
SQL
Microsoft Cluster Server
Ultra Shared Nothing
An extension of traditional shared nothing design
Push shared nothing architecture into SMP nodeIO and CPU affinity within SMP nodes
Eliminate contention per user queryUse full resources for each user query
Multiple physical instances of tablesDistribute large tablesReplicate small tablesDistribute AND Replicate medium tables
Re-Distribute rows “on-the-fly” when necessary
Control Node & Client DriversClient connections always go through the control node
Clustered to a passive nodeProcesses SQL requestsPrepares execution planOrchestrates distributed executionLocal SQL Server to do final query plan processing / result aggregationWill use same set of drivers used by DATAllegro
Provided by DataDirectODBC, OLE-DB, JDBC and Ado.Net client driversWire protocol (SeQuel Link)
Available drivers for 32 and 64 bits
Compute Nodes
A SQL Server 2008 instanceDB engine nodes autonomous on local dataSQL as primary interfaceEach MPP node is a highly tuned SMP node with standard interfaces
Landing Zone
Provides high capacity storage for data files from ETL processesIntegration services available on the landing zoneConnected to internal networkAvailable as sandbox for other applications and scripts that run on internal network.
Source Landing Zone Files
Data Loader
Compute Nodes
Backup Node
Builds on SQL Server native backup/restore facility
Use VDI interface to plug into backup pipelineDatabase-level backup
Coordinated backup across the nodesQuiesce write activity to synchronizeCan only restore to another appliance with exactly the same number of distributions
Configuration and MonitoringMadison services instrumented
Logs and Performance Counters
Capture and forward SNMP alerts from devices within the applianceSmall subset of DMVs to union underlying node DMVsLeverage HPC for monitoring
Challenge: Is it an appliance or a collection of nodes?
High AvailabilityMultiple levels of redundancy:
• Leveraging MSCS for node availability• Cluster aware services:
• SQL Server, Madison, DMS
• Leveraging MSCS for SQL Services, DMS• 1 spare node for every 6* compute nodes
6x1
Security and Encryption
Retain DA v3 designAuthentication and authorization done by Madison serverUsers and Roles as first class principalsNested role capabilitiesConnection to SQL back-ends through high privilege accountSQL nodes reside on private network
No support for integrated authLeverages TDE to expose DB-level encryption
Supports key rotation
The Logical Data Model
Multiple databases per applianceEach user database maps to one SQL Server db per node
TablesReplicated, Distributed, Replicated + DistributedLeverage SQL Server compressionSupports PartitioningSupports secondary indexes
Views
Data TypesMost scalar data types supported by SQL Server 2008 are supported by MadisonMain exceptions
Character and binary strings limited to 8K (i.e. no BLOB support)XMLSql-VariantSystem and CLR UDTs
Latin1_General with binary comparison only
SQL Server Data Types DAv3 Madison
bigint P P
binary
bit P
char / nchar P P
date, time P
datetime (was date in DA) P P
datetime2 P
datetimeoffset P
decimal P P
float P P
geometry / geography
hierarchyid
Int (was integer in DA) P P
money P
real P
smalldatetime P
smallint P P
smallmoney P
sql_variant
text / ntext / image
timestamp
tinyint P P
varchar / nvarchar / varbinary P P
v*(max)
uniqueidentifier
xml
Supported SQL Syntax
Aligned with ANSI SQL 92Basic INSERT, UPDATE, DELETE, SELECT
CREATE TABLE AS SELECT
Limited analytical function supportTeradata extensions
Quantile, Sample,…
Web-based main administrative user interfaceBased on DATAllegro manageability UIMonitoring system health and activity
Leveraging HPC pack 2008 Systems managementMonitoringCluster health
Manageability
Query ToolsGUI Tool:
Nexus (CoffingDW)Table & view object explorerInteractive query execution
Command line tool:Replacement for DA-SQLFlavor of SqlCmd
Tools Walk throughDemo
MS BI Integration
Integration ServicesMadison enabled as a source
Data movement, lookup operations, etc. Will add a new SSIS destination
Ensure integrated high performance loads
Reporting ServicesFully supported; including parameterized queriesWill customize experience for report builder and report designer
Analysis ServicesWill get connectivity through OLE-DB provider Will enable both MOLAP and ROLAP storage
Madison - Hub & Spoke
Each business unit has own Data MartsMore responsive to business needsFits budget realities
Hub provides centralized data governance platform
Madison HUB
Madison Spoke
SQL Server DM Spoke
SQL Server AS Spoke
SQL Server DM Spoke
HR
Finance Sales
Manufacturing
Node-to-node data movementParallel over Infiniband or 10 Gig Networks~500GB per min with minimal overhead
Benefits of Hub-And-SpokeAll systems connect via a dedicated high speed networkParallel database copy – speeds of up to 500 GB per minSimplification of data mart ETL / ELT processes with publishing modelSeparation of management and user workloadsIntegration of SMP SS08 and MPP systemsAbility to independently expand any systemAbility to add additional spokes without impacting other usersDeployment of development and test environments that leverage parallel connectivity
Early AdoptionMTP – Madison Technology Preview
Our flavor of CTPAssess product and field/partners readinessProvide roadmap for competitive situationsLocation
MTC’s, Partners, other MS facilities, …Working with partners to secure hardware
2-3 week engagementsTAP – Technology Adoption Program
Closer to traditional TAPAssess production readinessLonger engagementGo-live requirementsCustomer secures hardware
High Level Release Definitions
“Madison” (aka v1)
Focus on time to marketCompatibility with DATAllegro v3MS BI integration
H1 2010
Closer functional alignment with SQL ServerBetter integration with SQL and MS ecosystem, tools and technologies
V2+Will start
running MTPs in the
summer
RecapData Warehousing Reference Architectures available today!
SQL Server Fast TrackSQL Server “Madison”
Built for advanced, large scale data warehousesShared-nothing MPP architecture
Early evaluation programs starting soon
All feedback welcome: [email protected]
Thank you!
question & answer
SQL Server Community Resources
Become a FREE PASS Member: www.sqlpass.org/RegisterforSQLPASS.aspxLearn more about the PASS organization www.sqlpass.org/
Additional Community ResourcesSQL Server Community Center www.microsoft.com/sqlserver/2008/en/us/community-center.aspxTechNet Community for IT Professionalshttp://technet.microsoft.com/en-us/sqlserver/bb671048.aspxDeveloper Center http://msdn.microsoft.com/en-us/sqlserver/bb671064.aspxSQL Server 2008 Learning Portalhttp://www.microsoft.com/learning/sql/2008/default.mspx
• Connect: Local Chapters, Special Interest Groups, Online Community• Share: PASSPort Social Networking, Community Connection Event• Learn: PASS Summit Annual Conference, Technical Articles, Webcasts
• More about the PASS organization www.sqlpass.org/
The Professional Association for SQL Server (PASS) is an independent, not-for-profit association, dedicated to supporting, educating, and promoting the Microsoft SQL Server community.
SQL Server Word of the Day
Data Compression
Monday, May 11
*Game cards may be picked up at the SQL Server booths in the TLC
www.microsoft.com/teched Sessions On-Demand & Community
http://microsoft.com/technet Resources for IT Professionals
http://microsoft.com/msdn Resources for Developers
www.microsoft.com/learning Microsoft Certification & Training Resources
Resources
www.microsoft.com/learningMicrosoft Certification and Training Resources
Complete an evaluation on CommNet and enter to win!
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.