+ All Categories
Home > Documents > Overview SQL Server 2019 Big Data Cluster-CT · 2021. 2. 21. · Microsoft PowerPoint - Overview...

Overview SQL Server 2019 Big Data Cluster-CT · 2021. 2. 21. · Microsoft PowerPoint - Overview...

Date post: 26-Feb-2021
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
20
A Lap Around SQL Server 2019 Big Data Cluster Niels Berglund [email protected] https://nielsberglund.com @nielsberglund
Transcript
Page 1: Overview SQL Server 2019 Big Data Cluster-CT · 2021. 2. 21. · Microsoft PowerPoint - Overview SQL Server 2019 Big Data Cluster-CT.pptx Author: niels Created Date: 9/15/2019 8:13:57

A Lap Around SQL Server 2019 Big Data Cluster

Niels [email protected]://nielsberglund.com@nielsberglund

Page 2: Overview SQL Server 2019 Big Data Cluster-CT · 2021. 2. 21. · Microsoft PowerPoint - Overview SQL Server 2019 Big Data Cluster-CT.pptx Author: niels Created Date: 9/15/2019 8:13:57

Thank Sponsors

Page 3: Overview SQL Server 2019 Big Data Cluster-CT · 2021. 2. 21. · Microsoft PowerPoint - Overview SQL Server 2019 Big Data Cluster-CT.pptx Author: niels Created Date: 9/15/2019 8:13:57

Niels Obligatory Shameless Self Promo

• Software Architect - Derivco.• Author - "First Look at SQL Server 2005 for Developers".• Microsoft Data Platform MVP.• Researcher / Instructor - DevelopMentor.• Speaker - TechEd, DevWeek, SQL Pass, etc.• Longtime user of SQL Server.• Working closely with MS around SQL Server.

https://nielsberglund.com

Page 4: Overview SQL Server 2019 Big Data Cluster-CT · 2021. 2. 21. · Microsoft PowerPoint - Overview SQL Server 2019 Big Data Cluster-CT.pptx Author: niels Created Date: 9/15/2019 8:13:57

Data Landscape

• We generate more and more data.• 2016 - 16.1 ZBs• 2025 - 163 ZBs

• The data is stored "all over the place".• How do we manage all this data?

https://nielsberglund.com

Page 5: Overview SQL Server 2019 Big Data Cluster-CT · 2021. 2. 21. · Microsoft PowerPoint - Overview SQL Server 2019 Big Data Cluster-CT.pptx Author: niels Created Date: 9/15/2019 8:13:57

SQL Server - Intelligence Over All Your Data

• Manage all data• Integrate all data• Analyze all data

https://nielsberglund.com

Page 6: Overview SQL Server 2019 Big Data Cluster-CT · 2021. 2. 21. · Microsoft PowerPoint - Overview SQL Server 2019 Big Data Cluster-CT.pptx Author: niels Created Date: 9/15/2019 8:13:57

SQL Server 2019 Big Data Cluster

• Apache Spark, Hadoop HDFS "in the box".• Extend SQL Server to store data in the teta-byte range.• Store any kind of data.• Linux containers on Kubernetes.

https://nielsberglund.com

Page 7: Overview SQL Server 2019 Big Data Cluster-CT · 2021. 2. 21. · Microsoft PowerPoint - Overview SQL Server 2019 Big Data Cluster-CT.pptx Author: niels Created Date: 9/15/2019 8:13:57

SQL Server 2019 Architecture on Kubernetes

https://nielsberglund.com

Page 8: Overview SQL Server 2019 Big Data Cluster-CT · 2021. 2. 21. · Microsoft PowerPoint - Overview SQL Server 2019 Big Data Cluster-CT.pptx Author: niels Created Date: 9/15/2019 8:13:57

Deploying a BDC Cluster

• We are not in Kansas any more.

• Deployment via Python scripts.

• Scripts for different environments.

• Deployment from Azure Data Studio deploy notebook.

• Requires Azure Data Studio -Insiders build.

• Deploy to existing K8s cluster, or create new.

• During deployment set number of Nodes, etc.

https://nielsberglund.com

Page 9: Overview SQL Server 2019 Big Data Cluster-CT · 2021. 2. 21. · Microsoft PowerPoint - Overview SQL Server 2019 Big Data Cluster-CT.pptx Author: niels Created Date: 9/15/2019 8:13:57

Managing a BDC Cluster

• Command line tools:• kubectl• az - Azure command line interface for managing Azure services.• azdata – Python command line tool for installing and managing BDC.

https://nielsberglund.com

# loginaz login# set contextaz aks get-credentials --name <aks_cluster_name>

--resource-group <azure_resource_group_name># get all podskubectl get pods --all-namespaces# browse Kubernetes dashboardaz aks browse --resource-group <azure_resource_group_name>

--name <aks_cluster_name># retrieve endpointsazdata bdc endpoint list

Page 10: Overview SQL Server 2019 Big Data Cluster-CT · 2021. 2. 21. · Microsoft PowerPoint - Overview SQL Server 2019 Big Data Cluster-CT.pptx Author: niels Created Date: 9/15/2019 8:13:57

Data Virtualization - PolyBase

PolyBase External tables• Database scoped object• Uses ODBC drivers• Supports read-only operations

only. Will be expanded in future

• Queries can be scaled-out & push-down supported

• No separate configuration needed for Always On Availability Group

https://nielsberglund.com

Linked Servers• Instance scoped object• Uses OLEDB providers• Supports both read/write & pass-

through statements• Queries are single-threaded &

push-down supported• Separate configuration needed for

each instance in Always On Availability Group

Page 11: Overview SQL Server 2019 Big Data Cluster-CT · 2021. 2. 21. · Microsoft PowerPoint - Overview SQL Server 2019 Big Data Cluster-CT.pptx Author: niels Created Date: 9/15/2019 8:13:57

SQL Server 2019 - Data Integration Hub

https://nielsberglund.com

SQL Server

T-SQLAnalytics Apps

ODBC NoSQL Relational databases Big Data

PolyBase external tables

Page 12: Overview SQL Server 2019 Big Data Cluster-CT · 2021. 2. 21. · Microsoft PowerPoint - Overview SQL Server 2019 Big Data Cluster-CT.pptx Author: niels Created Date: 9/15/2019 8:13:57

Scale Out - Query Compute

• Query data in relational and non-relational data stores with new PolyBaseconnectors

• Create a scale-out data pool cache of combined data

• Expose the datasets as a shared data source, without writing code to move and integrate data

https://nielsberglund.com

SQL Server

Scale-out data pool

HDFS Cosmos DB SQL Server

Polybaseconnectors

Shard 1 Shard nShard 2

Page 13: Overview SQL Server 2019 Big Data Cluster-CT · 2021. 2. 21. · Microsoft PowerPoint - Overview SQL Server 2019 Big Data Cluster-CT.pptx Author: niels Created Date: 9/15/2019 8:13:57

Scale Out - Storage

• SQL Server can now read directly from HDFS files.

• Elastically scale compute and storage using HDFS-based storage pools with SQL Server and Spark built in

• Mount and manage remote stores through HDFS

• Mount various on-prem and cloud data stores

• Accelerate computation by caching data locally

https://nielsberglund.com

Storage pool

SQL Server Master instance/Spark

SQL Server

HDFS Data Node

Spark SQL Server

HDFS Data Node

Spark SQL Server

HDFS Data Node

Spark

Other HDFS store Remote cloud store

Page 14: Overview SQL Server 2019 Big Data Cluster-CT · 2021. 2. 21. · Microsoft PowerPoint - Overview SQL Server 2019 Big Data Cluster-CT.pptx Author: niels Created Date: 9/15/2019 8:13:57

Analyze ALL Data

• Use Azure Data Studio Notebooks to run Spark jobs over structured and unstructured data.

• SPARK SQL can access data in SQL Server.

• Queries can be pushed down to other data sources like Oracle database and Mongo DB.

• Let the Spark job return the data to the notebook.

https://nielsberglund.com

SQL Server master instance

External data sources

Storage pool

Spark Spark Spark

Azure Data Studio

Page 15: Overview SQL Server 2019 Big Data Cluster-CT · 2021. 2. 21. · Microsoft PowerPoint - Overview SQL Server 2019 Big Data Cluster-CT.pptx Author: niels Created Date: 9/15/2019 8:13:57

Integrate Structured and Unstructured Data

https://nielsberglund.com

Model & serve

Business/custom apps(Structured)

Sensors and IoT(unstructured)

Predictive apps

BI tools

Store

HDFS

SQL Server data pools

Ingest

Spark streaming

Prep & train

Spark

Spark ML

SQL Server ML Services

SQL Servermaster instance

SQL Servermaster instance

REST API containers for models

SQL Server Integration Services

Page 16: Overview SQL Server 2019 Big Data Cluster-CT · 2021. 2. 21. · Microsoft PowerPoint - Overview SQL Server 2019 Big Data Cluster-CT.pptx Author: niels Created Date: 9/15/2019 8:13:57

Java Language Extension

https://nielsberglund.com

SQL Server

Page 17: Overview SQL Server 2019 Big Data Cluster-CT · 2021. 2. 21. · Microsoft PowerPoint - Overview SQL Server 2019 Big Data Cluster-CT.pptx Author: niels Created Date: 9/15/2019 8:13:57

SQL Server Big Data

https://nielsberglund.com

Managed SQL Server, Spark and data lake

SQL Server

Data virtualization

Combine data from many sources without moving or replicating it

Scale out compute and caching to boost performance

T-SQLAnalytics Apps

Open database

connectivity

NoSQL Relational databases

HDFS

Complete AI platform

Easily feed integrated data from many sources to your model training

Ingest and prep data and then train, store, and operationalize your models all in one system

SQL ServerExternal Tables

Compute pools and data pools

Spark

Scalable, shared storage (HDFS)

External data

sources

Admin portal and management services

Integrated AD-based security

SQL ServerML Services

Spark & Spark ML

HDFS

REST API containers for models

Page 18: Overview SQL Server 2019 Big Data Cluster-CT · 2021. 2. 21. · Microsoft PowerPoint - Overview SQL Server 2019 Big Data Cluster-CT.pptx Author: niels Created Date: 9/15/2019 8:13:57

Summary

• Data volumes increase by the second.• The data is of all types and shapes.• We need a way to easily manage, integrate and handle the data.• SQL Server 2019 Big Data Cluster runs on Kubernetes.• Kubernetes:

• Nodes, Pods, Clusters, Namespace, Volumes.• SQL Server BDC:

• Control plane, Master instance, Compute pool, Data pool, Storage pool, App pool.• Polybase works against more storage types.• Apache Spark and HDFS part of SQL Server 2019 BDC.

https://nielsberglund.com

Page 19: Overview SQL Server 2019 Big Data Cluster-CT · 2021. 2. 21. · Microsoft PowerPoint - Overview SQL Server 2019 Big Data Cluster-CT.pptx Author: niels Created Date: 9/15/2019 8:13:57

Thank Sponsors

Page 20: Overview SQL Server 2019 Big Data Cluster-CT · 2021. 2. 21. · Microsoft PowerPoint - Overview SQL Server 2019 Big Data Cluster-CT.pptx Author: niels Created Date: 9/15/2019 8:13:57

Thank You!Questions?

Niels [email protected]://nielsberglund.com

@nielsberglund


Recommended