Post on 08-Jan-2017
transcript
Azure Data LakeKenneth M. Nielsen
About meKenneth M. NielsenWorked with SQL Server since 1999Data Solution Architect at MicrosoftKenneth.Nielsen@microsoft.com@doktorkermitLinkedin.com/in/KennethMNielsenwww.funkylab.com
Agenda• Azure Data Lake Store• Azure Data Lake Analytics• Azure Data Lake Analytics – Using Visual Studio • Azure Data Lake Analytics – Using PowerShell• Q & A
Data Lake Store
Azure Data Lake Store
A hyper scale repository for big data analytics workloads
No limits to SCALE
Store ANY DATA in its native format
HADOOP FILE SYSTEM (HDFS) for the cloud
ENTERPRISE READY access control,
encryption at rest
Optimized for analytic workload
PERFORMANCE
Azure Data Lake StoreAny Data
• Unstructured• Semi-structured• Structured
Azure Data Lake Store
Azure Data Lake StoreHDFS for the cloudNew filesystem build from the ground up, based on HADOOP file system
• Integrates with HDInsight, Hortonworks and Cloudera• Supports Files and Folder
objects and operations
Azure Data Lake StoreUnlimited storage • Files sizes can be from
Gigabytes to Petabytes• No limits to scale
Azure Data Lake StoreSecurity • Integrates with Azure
Active Directory• Audit logs for all
operations*• Server side Encryption*• ACL on files and folders*• Enterprise ready security
when in GA
Data Lake Analytics
Azure Data Lake Analytics
A elastic analytics service
built on Apache YARN that processes all data, at any size
• No limits to SCALE• Includes U-SQL, a language that unifies
the benefits of SQL with the expressive power of C#
• Optimized to work with ADL STORE• FEDERATED QUERY across Azure data
sources• ENTERPRISE READY Role based access
control & Auditing• Pay PER JOB & Scale PER JOB
U-SQL
A new language for
Big Data
• Familiar syntax to millions of SQL & .NET developers
• Unifies declarative nature of SQL with the imperative power of C#
• Unifies structured, semi-structured and unstructured data
• Distributed query support over all data
Language Overview
U-SQL Fundamentals
• All the familiar SQL clauses
SELECT | FROM | WHEREGROUP BY | JOIN | OVER
• Operate on unstructured and structured data
• Relational metadata objects
.NET integration and extensibility• U-SQL expressions are full C#
expressions• Reuse .NET code in your own
assemblies• Use C# to define your own:
Types | Functions | Joins | Aggregators | I/O (Extractors, Outputters)
U-SQL Capabilities
InteractiveBatch
StreamingMachine Learning
IN PROGRESS
AVAILABLE NOW
FUTURE
FUTURE
U-SQL Distributed Query
Azure Storage BlobsAzure Data Lake Store
Azure SQL DatabaseAzure SQL Data WarehouseAzure SQL DB in Azure VM
READ
READ
READ
READ
READ
WRITE
WRITE
WRITE
WRITE
WRITE
@orders = EXTRACT OrderId int, Customer string, Date DateTime, Amount float FROM "/input/orders.txt" USING Extractors.Tsv();
OUTPUT @orders TO "/output/orders_copy.txt" USING Outputters.Tsv();
Apply Schema on read
From a file in a Data Lake
Easy delimited text handling
Write out
Read the input, write it directly to output (just a simple copy)
Rowset
Azure Data Lake Pattern
Tweets
ADL StorageVisual Studio
ADL
Power BI Desktop
Data Science
VM
Get DataFrom CSV
Azure Services
Azure Services
Azure Storage
Where CAQS Files are stored, but would load into
ADLS directly if ingesting from scratch
UploadDataset
ADL Analytics
AML Experiment
ADL Storage
DataAnalyst
DataScienti
st
DataEngine
er
Execution with Requested Parallelism
Requested Parallelism = 1(reserve enough to do 1
vertex at a time)
Requested Parallelism = 4(reserve enough to do 4
vertices at a time)
Stage Details252 Pieces of work
AVG Vertex execution time
4.3 Billion rows
Data Read & Written
ADLAUs AzureData LakeAnalyticsUnit
Parallelism N = N ADLAUs
1 ADLAU ~= A VM with 2 cores and 6 GB of memory
Data Lake AnalyticsVisual Studio
Azure Data Lake – Visual Studio
Available project types
Azure Data Lake – Visual Studio
Fully integrates to Solution Explorer
Azure Data Lake – Visual Studio• Monitor and
manage jobs• Browse and
manage storage
• Browse U-SQL catalog
Creating U-SQL
Creating U-SQL
IntelliSense Supported
Creating U-SQL
Code behind enhance your
code
Demonstration: Using Visual Studio
Installing Azure PowerShell• PowerShell Gallery• Recommended approach• PowerShell 5.0 supports PowerShell Gallery• Windows 10 ships with PowerShell 5.0
• Web Platform Installation (WebPI)
Installing from the PowerShell Gallery
• Launch Windows PowerShell ISE as Administrator
• Install-Module AzureRM• Install-AzureRM
Finding the ADL cmdlets
• Option 1• Get-Command -Module AzureRM.DataLakeStore• Get-Command -Module AzureRM.DataLakeAnalytics
• Option 2• Get-Command *DataLake*
Logging in to AzureLaunch Windows PowerShell ISE
$subname = “BDHadoopTeamPMTestDemo”Login-AzureRmAccount –SubscriptionName $subname
ADLS: Listing files in a store
•$adls = “sqlkonferenz”•Get-AzureRmDataLakeStoreChildItem • -Account $adls • -Path /
ADLS: Upload and download• $adls = “sqlkonferenz”
• Import-AzureRmDataLakeStoreItem -Account $adls -Path d:\somefile.txt -Destination /somefile.txt
• Export-AzureRmDataLakeStoreItem -Account $adls -Path /somefile.txt -Destination d:\somefile_copy.txt
ADLA: List and submit jobs• $adla = “sqlkonferenz”
• Get-AzureRmDataLakeAnalyticsJob -Account $adla
•Submit-AzureRmDataLakeAnalyticsJob -Account $adla -Script “…” # U-SQL text -Name myjob
• Submit-AzureRmDataLakeAnalyticsJob -Account $adla -ScriptPath D:\test.script -Name myjob
ADL Store (ADLS) feature setAccount ManagementCreate new accountList accountsUpdate account propertiesDelete account
Transferring DataUpload into store from local diskDownload from store to local disk
Files and FoldersList contents of folderCreateMoveDeleteDoes file exist
SecurityGet ACLsUpdate ACLsGet OwnerSet Owner
File ContentSet file contentAppend file contentGet file contentMerge files
ADL Analytics (ADLA) feature setAccount ManagementCreate new accountList accountsUpdate account propertiesDelete account
Data SourcesAdd a data sourceList data sourcesUpdate data sourceDelete data source
ComputeList jobsSubmit jobCancel job
Catalog ItemsList items in U-SQL catalogUpdate item
Catalog SecretsCreate catalog secretList catalog secretsDelete catalog secrets
Demonstration: Using ADL PowerShell
Questions