+ All Categories
Home > Documents > marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect...

marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect...

Date post: 18-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
23
Transcript
Page 2: marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular

* Disclaimer: Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors

with the highest ratings. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner

disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose. 2

GARTNER MAGIC QUADRANT DW & BI

Business Intelligence and Analytics PlatformsData Warehouse Database Management Systems

Page 3: marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular

The Traditional Data Warehouse

3

Page 4: marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular
Page 5: marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular

Breaking Points of The Traditional Data Warehouse

5

1 2

3

4

Page 6: marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular

Introducing The Modern Data Warehouse

6

Data Sources

Business Intelligence

Page 7: marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular

Microsoft Hadoop VisionInsights to all users by activating new types of data

Page 8: marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular

Diminishing performance

Limitations: Performance and Scale today

Existing Tables (Partitions)

Rowstore

Diminishing Scale as

requirements grow

Non-optimal performance

for many DW queries

Scale UP

Page 9: marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular

SQL Server 2012 Parallel Data Warehouse (PDW)Insights on any data of any size

Next-generation

Performance At ScaleBuilt For Big Data

Page 10: marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular

Manageable Costs

Appliance

Simplicity:

HW + SW

Query

Performance

Scale Out MPP versus Scale Up SMP

“Big Data”

Integration

Updateable

xVelocity

Columnstore

Page 11: marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular

What is Parallel Data Warehouse?

• Shared-nothing parallel database system» Massively parallel processing (MPP)

» A “Control” server that accepts user queries, generates a plan, and distributes operations in parallel to compute nodes

» Multiple “Compute” servers running SQL Server

» A “Management” server for administering the system

» A “Data Movement Service” that facilitates parallel SQL operations

• Delivered as an appliance» Balanced and pre-configured software and industry standard hardware from Dell

or HP

» Single Call Support

» Fastest Time to Market

» Scales from 2 to 56 Nodes

HP Example

Page 12: marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular

Key Design Elements

• Modular Design

• High Density

• Leverage latest Microsoft software

features

» Windows Server 2012 Storage Spaces

» Windows Server 2012 Hyper-V

» SQL Server 2012 xVelocity ColumnStore

HP Example

Page 13: marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular

Ultra Shared Nothing architecture: DistributionLarger Fact Table is Hash Distributed

Across All Compute NodesTD

SD

PD

MD

SF

01-08Time Dim

Date Dim ID

Calendar Year

Calendar Qtr

Calendar Mo

Calendar Day

Store Dim

Store Dim ID

Store Name

Store Mgr

Store Size

Product Dim

Prod Dim ID

Prod Category

Prod Sub Cat

Prod Desc

Sales Facts

Date Dim ID

Store Dim ID

Prod Dim ID

Mktg Camp ID

Qty Sold

Dollars Sold

Mktg Campaign

Dim

Mktg Camp ID

Camp Name

Camp Mgr

Camp Start

Camp End

TD

SD

PD

MD

SF

09-16

TD

SD

PD

MD

SF

17-24

TD

SD

PD

MD

SF

25-32

TD

SD

PD

MD

SF

33-n

Page 14: marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular

• xVelocity in-memory columnstore in PDW columnstore index as primary data store in a scale-out MPP Data Warehouse - PDW V2 Appliance

• Updateable clustered columnstore index (CCI)

• Support for bulk load and insert/update/delete

• Extended data types – decimal/numeric for all precision and scale

• Query processing enhancements for more batch mode processing (for example, Outer/Semi/Antisemi joins, union all, scalar aggregation)

Customer benefits

• Outstanding query performance from in-memory columnstore index

• 600 GB per hour for a single 12-core server

• Significant hardware cost savings due to high compression

• 4–15x compression ratio

• Improved productivity through updateable index

• Ships in PDW V2 appliance and SQL Server 2014

In-Memory Columnstore in PDW V2 & SQL Server 2014

14

Page 15: marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular

Introducing PolyBaseFundamental breakthrough in data processing

Single Query; Structured and Unstructured

• Query and join Hadoop tables with Relational Tables

• Use Standard SQL language

• Select, From Where

Existing SQLSkillset

No ITIntervention

Save Timeand CostsDatabase HDFS

(Hadoop)

SQL Server

2012 PDW

Powered by

PolyBase

SQL

Analyze AllData Types

Page 16: marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular

External Tables» An external table is PDW’s representation of data residing in HDFS

» The “table” (metadata) lives in the context of a SQL Server database

» The actual table data resides in HDFS

CREATE EXTERNAL TABLE table_name ({<column_definition>} [,...n ])

{WITH (LOCATION =‘<URI>’,[FORMAT_OPTIONS = (<VALUES>)])}

[;]

Required to indicate

location of Hadoop clusterOptional format options

associated with parsing of data

from HDFS (e.g. field delimiters

& reject-related thresholds)

Page 17: marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular

Native Query Across Hadoop and PDWParallel Data Import from HDFS into PDW

Persistently storing data from HDFS in PDW tablesFully parallelized via CREATE TABLE AS SELECT (CTAS) with external tables as source table and PDW tables (either distributed or replicated) as destination

CREATE TABLE ClickStream_PDW WITH DISTRIBUTION = HASH(url)

AS SELECT url, event_date, user_IP FROM ClickStream

Retrieval of data in HDFS “on-the-fly”

Enhanced

PDW query

engine

CTAS Results

External Table

DMS

Reader

1

DMS

Reader

N

HDFS bridge

Parallel

HDFS Reads

Parallel

Importing

Sensor

&

RFIDWeb

Apps

Unstructured data

Hadoop

Social

Apps

Mobile

Apps

Structured data

Traditional DW

applications

PDW

Page 18: marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular

Sensor

&

RFIDWeb

Apps

Unstructured data

Social

Apps

Mobile

Apps

HDFS data nodes

Native Query Across Hadoop and PDWParallel Data Export from PDW into HDFS• Fully parallelized via CREATE EXTERNAL TABLE AS SELECT (CETAS) with external tables as

destination table and PDW tables as source

• ‘Round-trip of data’ possible with first importing data from HDFS, joining it with relational data, and then exporting results back to HDFS

CREATE EXTERNAL TABLE ClickStream (url, event_date, user_IP)

WITH (LOCATION =‘hdfs://MyHadoop:5000/users/outputDir’, FORMAT_OPTIONS

(FIELD_TERMINATOR = '|')) AS SELECT url, event_date, user_IP FROM ClickStream_PDW

Enhanced

PDW query

engine

CETAS Results

External Table

DMS

Writer

1

DMS

Writer

N

HDFS bridge

Parallel

HDFS Writes

Parallel

Reading

Structured data

Traditional DW

applications

PDW

Page 19: marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular

PDW V2.0 Management Dashboard

Page 20: marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular

PDW V2.0 Management Dashboard

Page 21: marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular

PDW V2.0 Management Dashboard

Page 22: marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular

Microsoft Business Intelligence Platform

Page 23: marko.hotti@Microsoft€¦ · Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular

Recommended