Azure data lake sql konf 2016

Post on 08-Jan-2017

478 views 1 download

transcript

Azure Data LakeKenneth M. Nielsen

About meKenneth M. NielsenWorked with SQL Server since 1999Data Solution Architect at MicrosoftKenneth.Nielsen@microsoft.com@doktorkermitLinkedin.com/in/KennethMNielsenwww.funkylab.com

Agenda• Azure Data Lake Store• Azure Data Lake Analytics• Azure Data Lake Analytics – Using Visual Studio • Azure Data Lake Analytics – Using PowerShell• Q & A

Data Lake Store

Azure Data Lake Store

A hyper scale repository for big data analytics workloads

No limits to SCALE

Store ANY DATA in its native format

HADOOP FILE SYSTEM (HDFS) for the cloud

ENTERPRISE READY access control,

encryption at rest

Optimized for analytic workload

PERFORMANCE

Azure Data Lake StoreAny Data

• Unstructured• Semi-structured• Structured

Azure Data Lake Store

Azure Data Lake StoreHDFS for the cloudNew filesystem build from the ground up, based on HADOOP file system

• Integrates with HDInsight, Hortonworks and Cloudera• Supports Files and Folder

objects and operations

Azure Data Lake StoreUnlimited storage • Files sizes can be from

Gigabytes to Petabytes• No limits to scale

Azure Data Lake StoreSecurity • Integrates with Azure

Active Directory• Audit logs for all

operations*• Server side Encryption*• ACL on files and folders*• Enterprise ready security

when in GA

Data Lake Analytics

Azure Data Lake Analytics

A elastic analytics service

built on Apache YARN that processes all data, at any size

• No limits to SCALE• Includes U-SQL, a language that unifies

the benefits of SQL with the expressive power of C#

• Optimized to work with ADL STORE• FEDERATED QUERY across Azure data

sources• ENTERPRISE READY Role based access

control & Auditing• Pay PER JOB & Scale PER JOB

U-SQL

A new language for

Big Data

• Familiar syntax to millions of SQL & .NET developers

• Unifies declarative nature of SQL with the imperative power of C#

• Unifies structured, semi-structured and unstructured data

• Distributed query support over all data

Language Overview

U-SQL Fundamentals

• All the familiar SQL clauses

SELECT | FROM | WHEREGROUP BY | JOIN | OVER

• Operate on unstructured and structured data

• Relational metadata objects

.NET integration and extensibility• U-SQL expressions are full C#

expressions• Reuse .NET code in your own

assemblies• Use C# to define your own:

Types | Functions | Joins | Aggregators | I/O (Extractors, Outputters)

U-SQL Capabilities

InteractiveBatch

StreamingMachine Learning

IN PROGRESS

AVAILABLE NOW

FUTURE

FUTURE

U-SQL Distributed Query

Azure Storage BlobsAzure Data Lake Store

Azure SQL DatabaseAzure SQL Data WarehouseAzure SQL DB in Azure VM

READ

READ

READ

READ

READ

WRITE

WRITE

WRITE

WRITE

WRITE

@orders = EXTRACT OrderId int, Customer string, Date DateTime, Amount float FROM "/input/orders.txt" USING Extractors.Tsv();

OUTPUT @orders TO "/output/orders_copy.txt" USING Outputters.Tsv();

Apply Schema on read

From a file in a Data Lake

Easy delimited text handling

Write out

Read the input, write it directly to output (just a simple copy)

Rowset

Azure Data Lake Pattern

Tweets

ADL StorageVisual Studio

ADL

Power BI Desktop

Data Science

VM

Get DataFrom CSV

Azure Services

Azure Services

Azure Storage

Where CAQS Files are stored, but would load into

ADLS directly if ingesting from scratch

UploadDataset

ADL Analytics

AML Experiment

ADL Storage

DataAnalyst

DataScienti

st

DataEngine

er

Execution with Requested Parallelism

Requested Parallelism = 1(reserve enough to do 1

vertex at a time)

Requested Parallelism = 4(reserve enough to do 4

vertices at a time)

Stage Details252 Pieces of work

AVG Vertex execution time

4.3 Billion rows

Data Read & Written

ADLAUs AzureData LakeAnalyticsUnit

Parallelism N = N ADLAUs

1 ADLAU ~= A VM with 2 cores and 6 GB of memory

Data Lake AnalyticsVisual Studio

Azure Data Lake – Visual Studio

Available project types

Azure Data Lake – Visual Studio

Fully integrates to Solution Explorer

Azure Data Lake – Visual Studio• Monitor and

manage jobs• Browse and

manage storage

• Browse U-SQL catalog

Creating U-SQL

Creating U-SQL

IntelliSense Supported

Creating U-SQL

Code behind enhance your

code

Demonstration: Using Visual Studio

Installing Azure PowerShell• PowerShell Gallery• Recommended approach• PowerShell 5.0 supports PowerShell Gallery• Windows 10 ships with PowerShell 5.0

• Web Platform Installation (WebPI)

Installing from the PowerShell Gallery

• Launch Windows PowerShell ISE as Administrator

• Install-Module AzureRM• Install-AzureRM

Finding the ADL cmdlets

• Option 1• Get-Command -Module AzureRM.DataLakeStore• Get-Command -Module AzureRM.DataLakeAnalytics

• Option 2• Get-Command *DataLake*

Logging in to AzureLaunch Windows PowerShell ISE

$subname = “BDHadoopTeamPMTestDemo”Login-AzureRmAccount –SubscriptionName $subname

ADLS: Listing files in a store

•$adls = “sqlkonferenz”•Get-AzureRmDataLakeStoreChildItem • -Account $adls • -Path /

ADLS: Upload and download• $adls = “sqlkonferenz”

• Import-AzureRmDataLakeStoreItem -Account $adls -Path d:\somefile.txt -Destination /somefile.txt

• Export-AzureRmDataLakeStoreItem -Account $adls -Path /somefile.txt -Destination d:\somefile_copy.txt

ADLA: List and submit jobs• $adla = “sqlkonferenz”

• Get-AzureRmDataLakeAnalyticsJob -Account $adla

•Submit-AzureRmDataLakeAnalyticsJob -Account $adla -Script “…” # U-SQL text -Name myjob

• Submit-AzureRmDataLakeAnalyticsJob -Account $adla -ScriptPath D:\test.script -Name myjob

ADL Store (ADLS) feature setAccount ManagementCreate new accountList accountsUpdate account propertiesDelete account

Transferring DataUpload into store from local diskDownload from store to local disk

Files and FoldersList contents of folderCreateMoveDeleteDoes file exist

SecurityGet ACLsUpdate ACLsGet OwnerSet Owner

File ContentSet file contentAppend file contentGet file contentMerge files

ADL Analytics (ADLA) feature setAccount ManagementCreate new accountList accountsUpdate account propertiesDelete account

Data SourcesAdd a data sourceList data sourcesUpdate data sourceDelete data source

ComputeList jobsSubmit jobCancel job

Catalog ItemsList items in U-SQL catalogUpdate item

Catalog SecretsCreate catalog secretList catalog secretsDelete catalog secrets

Demonstration: Using ADL PowerShell

Questions