+ All Categories
Home > Technology > Azure data lake sql konf 2016

Azure data lake sql konf 2016

Date post: 08-Jan-2017
Category:
Upload: kenneth-michael-nielsen
View: 478 times
Download: 1 times
Share this document with a friend
41
Azure Data Lake Kenneth M. Nielsen
Transcript
Page 1: Azure data lake   sql konf 2016

Azure Data LakeKenneth M. Nielsen

Page 2: Azure data lake   sql konf 2016

About meKenneth M. NielsenWorked with SQL Server since 1999Data Solution Architect at [email protected]@doktorkermitLinkedin.com/in/KennethMNielsenwww.funkylab.com

Page 3: Azure data lake   sql konf 2016

Agenda• Azure Data Lake Store• Azure Data Lake Analytics• Azure Data Lake Analytics – Using Visual Studio • Azure Data Lake Analytics – Using PowerShell• Q & A

Page 4: Azure data lake   sql konf 2016

Data Lake Store

Page 5: Azure data lake   sql konf 2016

Azure Data Lake Store

A hyper scale repository for big data analytics workloads

No limits to SCALE

Store ANY DATA in its native format

HADOOP FILE SYSTEM (HDFS) for the cloud

ENTERPRISE READY access control,

encryption at rest

Optimized for analytic workload

PERFORMANCE

Page 6: Azure data lake   sql konf 2016

Azure Data Lake StoreAny Data

• Unstructured• Semi-structured• Structured

Page 7: Azure data lake   sql konf 2016

Azure Data Lake Store

Page 8: Azure data lake   sql konf 2016

Azure Data Lake StoreHDFS for the cloudNew filesystem build from the ground up, based on HADOOP file system

• Integrates with HDInsight, Hortonworks and Cloudera• Supports Files and Folder

objects and operations

Page 9: Azure data lake   sql konf 2016

Azure Data Lake StoreUnlimited storage • Files sizes can be from

Gigabytes to Petabytes• No limits to scale

Page 10: Azure data lake   sql konf 2016

Azure Data Lake StoreSecurity • Integrates with Azure

Active Directory• Audit logs for all

operations*• Server side Encryption*• ACL on files and folders*• Enterprise ready security

when in GA

Page 11: Azure data lake   sql konf 2016

Data Lake Analytics

Page 12: Azure data lake   sql konf 2016

Azure Data Lake Analytics

A elastic analytics service

built on Apache YARN that processes all data, at any size

• No limits to SCALE• Includes U-SQL, a language that unifies

the benefits of SQL with the expressive power of C#

• Optimized to work with ADL STORE• FEDERATED QUERY across Azure data

sources• ENTERPRISE READY Role based access

control & Auditing• Pay PER JOB & Scale PER JOB

Page 13: Azure data lake   sql konf 2016

U-SQL

A new language for

Big Data

• Familiar syntax to millions of SQL & .NET developers

• Unifies declarative nature of SQL with the imperative power of C#

• Unifies structured, semi-structured and unstructured data

• Distributed query support over all data

Page 14: Azure data lake   sql konf 2016

Language Overview

U-SQL Fundamentals

• All the familiar SQL clauses

SELECT | FROM | WHEREGROUP BY | JOIN | OVER

• Operate on unstructured and structured data

• Relational metadata objects

.NET integration and extensibility• U-SQL expressions are full C#

expressions• Reuse .NET code in your own

assemblies• Use C# to define your own:

Types | Functions | Joins | Aggregators | I/O (Extractors, Outputters)

Page 15: Azure data lake   sql konf 2016

U-SQL Capabilities

InteractiveBatch

StreamingMachine Learning

IN PROGRESS

AVAILABLE NOW

FUTURE

FUTURE

Page 16: Azure data lake   sql konf 2016

U-SQL Distributed Query

Azure Storage BlobsAzure Data Lake Store

Azure SQL DatabaseAzure SQL Data WarehouseAzure SQL DB in Azure VM

READ

READ

READ

READ

READ

WRITE

WRITE

WRITE

WRITE

WRITE

Page 17: Azure data lake   sql konf 2016

@orders = EXTRACT OrderId int, Customer string, Date DateTime, Amount float FROM "/input/orders.txt" USING Extractors.Tsv();

OUTPUT @orders TO "/output/orders_copy.txt" USING Outputters.Tsv();

Apply Schema on read

From a file in a Data Lake

Easy delimited text handling

Write out

Read the input, write it directly to output (just a simple copy)

Rowset

Page 18: Azure data lake   sql konf 2016

Azure Data Lake Pattern

Tweets

ADL StorageVisual Studio

ADL

Power BI Desktop

Data Science

VM

Get DataFrom CSV

Azure Services

Azure Services

Azure Storage

Where CAQS Files are stored, but would load into

ADLS directly if ingesting from scratch

UploadDataset

ADL Analytics

AML Experiment

ADL Storage

DataAnalyst

DataScienti

st

DataEngine

er

Page 19: Azure data lake   sql konf 2016

Execution with Requested Parallelism

Requested Parallelism = 1(reserve enough to do 1

vertex at a time)

Requested Parallelism = 4(reserve enough to do 4

vertices at a time)

Page 20: Azure data lake   sql konf 2016

Stage Details252 Pieces of work

AVG Vertex execution time

4.3 Billion rows

Data Read & Written

Page 21: Azure data lake   sql konf 2016

ADLAUs AzureData LakeAnalyticsUnit

Parallelism N = N ADLAUs

1 ADLAU ~= A VM with 2 cores and 6 GB of memory

Page 22: Azure data lake   sql konf 2016

Data Lake AnalyticsVisual Studio

Page 23: Azure data lake   sql konf 2016

Azure Data Lake – Visual Studio

Available project types

Page 24: Azure data lake   sql konf 2016

Azure Data Lake – Visual Studio

Fully integrates to Solution Explorer

Page 25: Azure data lake   sql konf 2016

Azure Data Lake – Visual Studio• Monitor and

manage jobs• Browse and

manage storage

• Browse U-SQL catalog

Page 26: Azure data lake   sql konf 2016

Creating U-SQL

Page 27: Azure data lake   sql konf 2016

Creating U-SQL

IntelliSense Supported

Page 28: Azure data lake   sql konf 2016

Creating U-SQL

Code behind enhance your

code

Page 29: Azure data lake   sql konf 2016

Demonstration: Using Visual Studio

Page 30: Azure data lake   sql konf 2016

Installing Azure PowerShell• PowerShell Gallery• Recommended approach• PowerShell 5.0 supports PowerShell Gallery• Windows 10 ships with PowerShell 5.0

• Web Platform Installation (WebPI)

Page 31: Azure data lake   sql konf 2016

Installing from the PowerShell Gallery

• Launch Windows PowerShell ISE as Administrator

• Install-Module AzureRM• Install-AzureRM

Page 32: Azure data lake   sql konf 2016

Finding the ADL cmdlets

• Option 1• Get-Command -Module AzureRM.DataLakeStore• Get-Command -Module AzureRM.DataLakeAnalytics

• Option 2• Get-Command *DataLake*

Page 33: Azure data lake   sql konf 2016

Logging in to AzureLaunch Windows PowerShell ISE

$subname = “BDHadoopTeamPMTestDemo”Login-AzureRmAccount –SubscriptionName $subname

Page 34: Azure data lake   sql konf 2016

ADLS: Listing files in a store

•$adls = “sqlkonferenz”•Get-AzureRmDataLakeStoreChildItem • -Account $adls • -Path /

Page 35: Azure data lake   sql konf 2016

ADLS: Upload and download• $adls = “sqlkonferenz”

• Import-AzureRmDataLakeStoreItem -Account $adls -Path d:\somefile.txt -Destination /somefile.txt

• Export-AzureRmDataLakeStoreItem -Account $adls -Path /somefile.txt -Destination d:\somefile_copy.txt

Page 36: Azure data lake   sql konf 2016

ADLA: List and submit jobs• $adla = “sqlkonferenz”

• Get-AzureRmDataLakeAnalyticsJob -Account $adla

•Submit-AzureRmDataLakeAnalyticsJob -Account $adla -Script “…” # U-SQL text -Name myjob

• Submit-AzureRmDataLakeAnalyticsJob -Account $adla -ScriptPath D:\test.script -Name myjob

Page 37: Azure data lake   sql konf 2016

ADL Store (ADLS) feature setAccount ManagementCreate new accountList accountsUpdate account propertiesDelete account

Transferring DataUpload into store from local diskDownload from store to local disk

Files and FoldersList contents of folderCreateMoveDeleteDoes file exist

SecurityGet ACLsUpdate ACLsGet OwnerSet Owner

File ContentSet file contentAppend file contentGet file contentMerge files

Page 38: Azure data lake   sql konf 2016

ADL Analytics (ADLA) feature setAccount ManagementCreate new accountList accountsUpdate account propertiesDelete account

Data SourcesAdd a data sourceList data sourcesUpdate data sourceDelete data source

ComputeList jobsSubmit jobCancel job

Catalog ItemsList items in U-SQL catalogUpdate item

Catalog SecretsCreate catalog secretList catalog secretsDelete catalog secrets

Page 39: Azure data lake   sql konf 2016

Demonstration: Using ADL PowerShell

Page 40: Azure data lake   sql konf 2016

Questions

Page 41: Azure data lake   sql konf 2016

Recommended